genome size difference: Topics by Science.gov

Sample records for genome size difference

In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

PubMed

Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

2015-01-01

The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.
The Effect of Different Oceanic Abiotic Factors on Prokaryotic Body Sizes

NASA Astrophysics Data System (ADS)

Pidathala, S.; Bellon, M.; Heim, N.; Payne, J.

2016-12-01

We are studying the impact of abiotic factors in the Pacific and Atlantic on prokaryotic body sizes and genome sizes because we are interested in the manner in which abiotic factors influence genome sizes independent of their influence on body sizes. Some research has been done in the past on marine bacterial evolution, including data collection on marine ecology in relation to bacterial body sizes (Straza 2009). We are using the abiotic factors: temperature, salinity, and pH to compare the biovolumes/genome sizes of different phyla by using R. We made 9 scatter plots to model these relationships. Regardless of the phyla or the ocean, we found that there is no relation between pH, temperature, and body size, with several exceptions: Deinococcus. thermus has an indirect relationship with size in respect to temperature; size only correlates to temperature for phyla that are thermophiles. We also found that bacteria like D. thermus and Thermotogae are taxa only found in higher temperatures. Additionally, almost all phyla have genome sizes restricted by certain pH levels:, Proteobacteria only reach genomes with acidity levels greater than 6. In terms of salinity levels, certain bacteria are only found within a small range, and others, like Proteobacteria, can only reach genomes at low salinity levels. Finally, Proteobacteria have large genome sizes between 30 and 40 °, and Crenarchaeota have constant genome sizes in higher temperatures. Conclusively, we discovered that these abiotic factors generally do not affect body size, with the exception of D. thermus' indirect relationship to temperature due to its small biovolume in high temperatures. However, we determined that these abiotic factors have a great impact on genome sizes. This is due to genome size independence from body size. Also, genome size could have served as an adaptive feature for bacteria in marine environments, explaining why different phyla may have diverged to accommodate their lifestyles.
Random Distribution Pattern and Non-adaptivity of Genome Size in a Highly Variable Population of Festuca pallens

PubMed Central

Šmarda, Petr; Bureš, Petr; Horová, Lucie

2007-01-01

Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968
Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms

PubMed Central

Fleischmann, Andreas; Michael, Todd P.; Rivadavia, Fernando; Sousa, Aretuza; Wang, Wenqin; Temsch, Eva M.; Greilhuber, Johann; Müller, Kai F.; Heubl, Günther

2014-01-01

Background and Aims Some species of Genlisea possess ultrasmall nuclear genomes, the smallest known among angiosperms, and some have been found to have chromosomes of diminutive size, which may explain why chromosome numbers and karyotypes are not known for the majority of species of the genus. However, other members of the genus do not possess ultrasmall genomes, nor do most taxa studied in related genera of the family or order. This study therefore examined the evolution of genome sizes and chromosome numbers in Genlisea in a phylogenetic context. The correlations of genome size with chromosome number and size, with the phylogeny of the group and with growth forms and habitats were also examined. Methods Nuclear genome sizes were measured from cultivated plant material for a comprehensive sampling of taxa, including nearly half of all species of Genlisea and representing all major lineages. Flow cytometric measurements were conducted in parallel in two laboratories in order to compare the consistency of different methods and controls. Chromosome counts were performed for the majority of taxa, comparing different staining techniques for the ultrasmall chromosomes. Key Results Genome sizes of 15 taxa of Genlisea are presented and interpreted in a phylogenetic context. A high degree of congruence was found between genome size distribution and the major phylogenetic lineages. Ultrasmall genomes with 1C values of <100 Mbp were almost exclusively found in a derived lineage of South American species. The ancestral haploid chromosome number was inferred to be n = 8. Chromosome numbers in Genlisea ranged from 2n = 2x = 16 to 2n = 4x = 32. Ascendant dysploid series (2n = 36, 38) are documented for three derived taxa. The different ploidy levels corresponded to the two subgenera, but were not directly correlated to differences in genome size; the three different karyotype ranges mirrored the different sections of the genus. The smallest known plant genomes were not found in G. margaretae, as previously reported, but in G. tuberosa (1C ≈ 61 Mbp) and some strains of G. aurea (1C ≈ 64 Mbp). Conclusions Genlisea is an ideal candidate model organism for the understanding of genome reduction as the genus includes species with both relatively large (∼1700 Mbp) and ultrasmall (∼61 Mbp) genomes. This comparative, phylogeny-based analysis of genome sizes and karyotypes in Genlisea provides essential data for selection of suitable species for comparative whole-genome analyses, as well as for further studies on both the molecular and cytogenetic basis of genome reduction in plants. PMID:25274549
Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays

PubMed Central

Berg, Jeremy J.; Birchler, James A.; Grote, Mark N.; Lorant, Anne; Quezada, Juvenal

2018-01-01

While the vast majority of genome size variation in plants is due to differences in repetitive sequence, we know little about how selection acts on repeat content in natural populations. Here we investigate parallel changes in intraspecific genome size and repeat content of domesticated maize (Zea mays) landraces and their wild relative teosinte across altitudinal gradients in Mesoamerica and South America. We combine genotyping, low coverage whole-genome sequence data, and flow cytometry to test for evidence of selection on genome size and individual repeat abundance. We find that population structure alone cannot explain the observed variation, implying that clinal patterns of genome size are maintained by natural selection. Our modeling additionally provides evidence of selection on individual heterochromatic knob repeats, likely due to their large individual contribution to genome size. To better understand the phenotypes driving selection on genome size, we conducted a growth chamber experiment using a population of highland teosinte exhibiting extensive variation in genome size. We find weak support for a positive correlation between genome size and cell size, but stronger support for a negative correlation between genome size and the rate of cell production. Reanalyzing published data of cell counts in maize shoot apical meristems, we then identify a negative correlation between cell production rate and flowering time. Together, our data suggest a model in which variation in genome size is driven by natural selection on flowering time across altitudinal clines, connecting intraspecific variation in repetitive sequence to important differences in adaptive phenotypes. PMID:29746459
Genome size diversity in orchids: consequences and evolution

PubMed Central

Leitch, I. J.; Kahandawala, I.; Suda, J.; Hanson, L.; Ingrouille, M. J.; Chase, M. W.; Fay, M. F.

2009-01-01

Background The amount of DNA comprising the genome of an organism (its genome size) varies a remarkable 40 000-fold across eukaryotes, yet most groups are characterized by much narrower ranges (e.g. 14-fold in gymnosperms, 3- to 4-fold in mammals). Angiosperms stand out as one of the most variable groups with genome sizes varying nearly 2000-fold. Nevertheless within angiosperms the majority of families are characterized by genomes which are small and vary little. Species with large genomes are mostly restricted to a few monocots families including Orchidaceae. Scope A survey of the literature revealed that genome size data for Orchidaceae are comparatively rare representing just 327 species. Nevertheless they reveal that Orchidaceae are currently the most variable angiosperm family with genome sizes ranging 168-fold (1C = 0·33–55·4 pg). Analysing the data provided insights into the distribution, evolution and possible consequences to the plant of this genome size diversity. Conclusions Superimposing the data onto the increasingly robust phylogenetic tree of Orchidaceae revealed how different subfamilies were characterized by distinct genome size profiles. Epidendroideae possessed the greatest range of genome sizes, although the majority of species had small genomes. In contrast, the largest genomes were found in subfamilies Cypripedioideae and Vanilloideae. Genome size evolution within this subfamily was analysed as this is the only one with reasonable representation of data. This approach highlighted striking differences in genome size and karyotype evolution between the closely related Cypripedium, Paphiopedilum and Phragmipedium. As to the consequences of genome size diversity, various studies revealed that this has both practical (e.g. application of genetic fingerprinting techniques) and biological consequences (e.g. affecting where and when an orchid may grow) and emphasizes the importance of obtaining further genome size data given the considerable phylogenetic gaps which have been highlighted by the current study. PMID:19168860
Drosophila Females Undergo Genome Expansion after Interspecific Hybridization

PubMed Central

Romero-Soriano, Valèria; Burlet, Nelly; Vela, Doris; Fontdevila, Antonio; Vieira, Cristina; García Guerreiro, María Pilar

2016-01-01

Genome size (or C-value) can present a wide range of values among eukaryotes. This variation has been attributed to differences in the amplification and deletion of different noncoding repetitive sequences, particularly transposable elements (TEs). TEs can be activated under different stress conditions such as interspecific hybridization events, as described for several species of animals and plants. These massive transposition episodes can lead to considerable genome expansions that could ultimately be involved in hybrid speciation processes. Here, we describe the effects of hybridization and introgression on genome size of Drosophila hybrids. We measured the genome size of two close Drosophila species, Drosophila buzzatii and Drosophila koepferae, their F1 offspring and the offspring from three generations of backcrossed hybrids; where mobilization of up to 28 different TEs was previously detected. We show that hybrid females indeed present a genome expansion, especially in the first backcross, which could likely be explained by transposition events. Hybrid males, which exhibit more variable C-values among individuals of the same generation, do not present an increased genome size. Thus, we demonstrate that the impact of hybridization on genome size can be detected through flow cytometry and is sex-dependent. PMID:26872773
Genome size differentiates co-occurring populations of the planktonic diatom Ditylum brightwellii (Bacillariophyta)

PubMed Central

2010-01-01

Background Diatoms are one of the most species-rich groups of eukaryotic microbes known. Diatoms are also the only group of eukaryotic micro-algae with a diplontic life history, suggesting that the ancestral diatom switched to a life history dominated by a duplicated genome. A key mechanism of speciation among diatoms could be a propensity for additional stable genome duplications. Across eukaryotic taxa, genome size is directly correlated to cell size and inversely correlated to physiological rates. Differences in relative genome size, cell size, and acclimated growth rates were analyzed in isolates of the diatom Ditylum brightwellii. Ditylum brightwellii consists of two main populations with identical 18s rDNA sequences; one population is distributed globally at temperate latitudes and the second appears to be localized to the Pacific Northwest coast of the USA. These two populations co-occur within the Puget Sound estuary of WA, USA, although their peak abundances differ depending on local conditions. Results All isolates from the more regionally-localized population (population 2) possessed 1.94 ± 0.74 times the amount of DNA, grew more slowly, and were generally larger than isolates from the more globally distributed population (population 1). The ITS1 sequences, cell sizes, and genome sizes of isolates from New Zealand were the same as population 1 isolates from Puget Sound, but their growth rates were within the range of the slower-growing population 2 isolates. Importantly, the observed genome size difference between isolates from the two populations was stable regardless of time in culture or the changes in cell size that accompany the diatom life history. Conclusions The observed two-fold difference in genome size between the D. brightwellii populations suggests that whole genome duplication occurred within cells of population 1 ultimately giving rise to population 2 cells. The apparent regional localization of population 2 is consistent with a recent divergence between the populations, which are likely cryptic species. Genome size variation is known to occur in other diatom genera; we hypothesize that genome duplication may be an active and important mechanism of genetic and physiological diversification and speciation in diatoms. PMID:20044934
Intrapopulation Genome Size Variation in D. melanogaster Reflects Life History Variation and Plasticity

PubMed Central

Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.

2014-01-01

We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905
Rapid Increase in Genome Size as a Consequence of Transposable Element Hyperactivity in Wood-White (Leptidea) Butterflies

PubMed Central

Talla, Venkat; Suh, Alexander; Kalsoom, Faheema; Dincă, Vlad; Vila, Roger; Friberg, Magne; Wiklund, Christer

2017-01-01

Abstract Characterizing and quantifying genome size variation among organisms and understanding if genome size evolves as a consequence of adaptive or stochastic processes have been long-standing goals in evolutionary biology. Here, we investigate genome size variation and association with transposable elements (TEs) across lepidopteran lineages using a novel genome assembly of the common wood-white (Leptidea sinapis) and population re-sequencing data from both L. sinapis and the closely related L. reali and L. juvernica together with 12 previously available lepidopteran genome assemblies. A phylogenetic analysis confirms established relationships among species, but identifies previously unknown intraspecific structure within Leptidea lineages. The genome assembly of L. sinapis is one of the largest of any lepidopteran taxon so far (643 Mb) and genome size is correlated with abundance of TEs, both in Lepidoptera in general and within Leptidea where L. juvernica from Kazakhstan has considerably larger genome size than any other Leptidea population. Specific TE subclasses have been active in different Lepidoptera lineages with a pronounced expansion of predominantly LINEs, DNA elements, and unclassified TEs in the Leptidea lineage after the split from other Pieridae. The rate of genome expansion in Leptidea in general has been in the range of four Mb/Million year (My), with an increase in a particular L. juvernica population to 72 Mb/My. The considerable differences in accumulation rates of specific TE classes in different lineages indicate that TE activity plays a major role in genome size evolution in butterflies and moths. PMID:28981642
Genome size variation in the genus Avena.

PubMed

Yan, Honghai; Martin, Sara L; Bekele, Wubishet A; Latta, Robert G; Diederichsen, Axel; Peng, Yuanying; Tinker, Nicholas A

2016-03-01

Genome size is an indicator of evolutionary distance and a metric for genome characterization. Here, we report accurate estimates of genome size in 99 accessions from 26 species of Avena. We demonstrate that the average genome size of C genome diploid species (2C = 10.26 pg) is 15% larger than that of A genome species (2C = 8.95 pg), and that this difference likely accounts for a progression of size among tetraploid species, where AB < AC < CC (average 2C = 16.76, 18.60, and 21.78 pg, respectively). All accessions from three hexaploid species with the ACD genome configuration had similar genome sizes (average 2C = 25.74 pg). Genome size was mostly consistent within species and in general agreement with current information about evolutionary distance among species. Results also suggest that most of the polyploid species in Avena have experienced genome downsizing in relation to their diploid progenitors. Genome size measurements could provide additional quality control for species identification in germplasm collections, especially in cases where diploid and polyploid species have similar morphology.
Effects of Caffeine and Chlorogenic Acid on Propidium Iodide Accessibility to DNA: Consequences on Genome Size Evaluation in Coffee Tree

PubMed Central

NOIROT, M.; BARRE, P.; DUPERRAY, C.; LOUARN, J.; HAMON, S.

2003-01-01

Estimates of genome size using flow cytometry can be biased by the presence of cytosolic compounds, leading to pseudo‐intraspecific variation in genome size. Two important compounds present in coffee trees—caffeine and chlorogenic acid—modify accessibility of the dye propidium iodide to Petunia DNA, a species used as internal standard in our genome size evaluation. These compounds could be responsible for intraspecific variation in genome size since their contents vary between trees. They could also be implicated in environmental variations in genome size, such as those revealed when comparing the results of evaluations carried out on different dates on several genotypes. PMID:12876189
Rapid Increase in Genome Size as a Consequence of Transposable Element Hyperactivity in Wood-White (Leptidea) Butterflies.

PubMed

Talla, Venkat; Suh, Alexander; Kalsoom, Faheema; Dinca, Vlad; Vila, Roger; Friberg, Magne; Wiklund, Christer; Backström, Niclas

2017-10-01

Characterizing and quantifying genome size variation among organisms and understanding if genome size evolves as a consequence of adaptive or stochastic processes have been long-standing goals in evolutionary biology. Here, we investigate genome size variation and association with transposable elements (TEs) across lepidopteran lineages using a novel genome assembly of the common wood-white (Leptidea sinapis) and population re-sequencing data from both L. sinapis and the closely related L. reali and L. juvernica together with 12 previously available lepidopteran genome assemblies. A phylogenetic analysis confirms established relationships among species, but identifies previously unknown intraspecific structure within Leptidea lineages. The genome assembly of L. sinapis is one of the largest of any lepidopteran taxon so far (643 Mb) and genome size is correlated with abundance of TEs, both in Lepidoptera in general and within Leptidea where L. juvernica from Kazakhstan has considerably larger genome size than any other Leptidea population. Specific TE subclasses have been active in different Lepidoptera lineages with a pronounced expansion of predominantly LINEs, DNA elements, and unclassified TEs in the Leptidea lineage after the split from other Pieridae. The rate of genome expansion in Leptidea in general has been in the range of four Mb/Million year (My), with an increase in a particular L. juvernica population to 72 Mb/My. The considerable differences in accumulation rates of specific TE classes in different lineages indicate that TE activity plays a major role in genome size evolution in butterflies and moths. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size.

PubMed

Organ, Chris L; Brusatte, Stephen L; Stein, Koen

2009-12-22

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77-2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97-2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05-5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group.
Influence of sequence and size of DNA on packaging efficiency of parvovirus MVM-based vectors.

PubMed

Brandenburger, A; Coessens, E; El Bakkouri, K; Velu, T

1999-05-01

We have derived a vector from the autonomous parvovirus MVM(p), which expresses human IL-2 specifically in transformed cells (Russell et al., J. Virol 1992;66:2821-2828). Testing the therapeutic potential of these vectors in vivo requires high-titer stocks. Stocks with a titer of 10(9) can be obtained after concentration and purification (Avalosse et al., J. Virol. Methods 1996;62:179-183), but this method requires large culture volumes and cannot easily be scaled up. We wanted to increase the production of recombinant virus at the initial transfection step. Poor vector titers could be due to inadequate genome amplification or to inefficient packaging. Here we show that intracellular amplification of MVM vector genomes is not the limiting factor for vector production. Several vector genomes of different size and/or structure were amplified to an equal extent. Their amplification was also equivalent to that of a cotransfected wild-type genome. We did not observe any interference between vector and wild-type genomes at the level of DNA amplification. Despite equivalent genome amplification, vector titers varied greatly between the different genomes, presumably owing to differences in packaging efficiency. Genomes with a size close to 100% that of wild type were packaged most efficiently with loss of efficiency at lower and higher sizes. However, certain genomes of identical size showed different packaging efficiencies, illustrating the importance of the DNA sequence, and probably its structure.
Genome size variation in deep-sea amphipods

PubMed Central

Jamieson, A. J.; Piertney, S. B.

2017-01-01

Genome size varies considerably across taxa, and extensive research effort has gone into understanding whether variation can be explained by differences in key ecological and life-history traits among species. The extreme environmental conditions that characterize the deep sea have been hypothesized to promote large genome sizes in eukaryotes. Here we test this supposition by examining genome sizes among 13 species of deep-sea amphipods from the Mariana, Kermadec and New Hebrides trenches. Genome sizes were estimated using flow cytometry and found to vary nine-fold, ranging from 4.06 pg (4.04 Gb) in Paralicella caperesca to 34.79 pg (34.02 Gb) in Alicella gigantea. Phylogenetic independent contrast analysis identified a relationship between genome size and maximum body size, though this was largely driven by those species that display size gigantism. There was a distinct shift in the genome size trait diversification rate in the supergiant amphipod A. gigantea relative to the rest of the group. The variation in genome size observed is striking and argues against genome size being driven by a common evolutionary history, ecological niche and life-history strategy in deep-sea amphipods. PMID:28989783
Sauropod dinosaurs evolved moderately sized genomes unrelated to body size

PubMed Central

Organ, Chris L.; Brusatte, Stephen L.; Stein, Koen

2009-01-01

Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77–2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97–2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05–5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group. PMID:19793755
Meta-analysis of genome-wide association from genomic prediction models

USDA-ARS?s Scientific Manuscript database

A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...
Evolution of Genome Size and Complexity in Pinus

PubMed Central

Morse, Alison M.; Peterson, Daniel G.; Islam-Faridi, M. Nurul; Smith, Katherine E.; Magbanua, Zenaida; Garcia, Saul A.; Kubisiak, Thomas L.; Amerson, Henry V.; Carlson, John E.; Nelson, C. Dana; Davis, John M.

2009-01-01

Background Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood. Methodology/Principal Findings Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA. Conclusions/Significance Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes. PMID:19194510
Performances of Different Fragment Sizes for Reduced Representation Bisulfite Sequencing in Pigs.

PubMed

Yuan, Xiao-Long; Zhang, Zhe; Pan, Rong-Yang; Gao, Ning; Deng, Xi; Li, Bin; Zhang, Hao; Sangild, Per Torp; Li, Jia-Qi

2017-01-01

Reduced representation bisulfite sequencing (RRBS) has been widely used to profile genome-scale DNA methylation in mammalian genomes. However, the applications and technical performances of RRBS with different fragment sizes have not been systematically reported in pigs, which serve as one of the important biomedical models for humans. The aims of this study were to evaluate capacities of RRBS libraries with different fragment sizes to characterize the porcine genome. We found that the Msp I-digested segments between 40 and 220 bp harbored a high distribution peak at 74 bp, which were highly overlapped with the repetitive elements and might reduce the unique mapping alignment. The RRBS library of 110-220 bp fragment size had the highest unique mapping alignment and the lowest multiple alignment. The cost-effectiveness of the 40-110 bp, 110-220 bp and 40-220 bp fragment sizes might decrease when the dataset size was more than 70, 50 and 110 million reads for these three fragment sizes, respectively. Given a 50-million dataset size, the average sequencing depth of the detected CpG sites in the 110-220 bp fragment size appeared to be deeper than in the 40-110 bp and 40-220 bp fragment sizes, and these detected CpG sties differently located in gene- and CpG island-related regions. In this study, our results demonstrated that selections of fragment sizes could affect the numbers and sequencing depth of detected CpG sites as well as the cost-efficiency. No single solution of RRBS is optimal in all circumstances for investigating genome-scale DNA methylation. This work provides the useful knowledge on designing and executing RRBS for investigating the genome-wide DNA methylation in tissues from pigs.

Evolution of genome size and genomic GC content in carnivorous holokinetics (Droseraceae).

PubMed

Veleba, Adam; Šmarda, Petr; Zedek, František; Horová, Lucie; Šmerda, Jakub; Bureš, Petr

2017-02-01

Studies in the carnivorous family Lentibulariaceae in the last years resulted in the discovery of the smallest plant genomes and an unusual pattern of genomic GC content evolution. However, scarcity of genomic data in other carnivorous clades still prevents a generalization of the observed patterns. Here the aim was to fill this gap by mapping genome evolution in the second largest carnivorous family, Droseraceae, where this evolution may be affected by chromosomal holokinetism in Drosera METHODS: The genome size and genomic GC content of 71 Droseraceae species were measured by flow cytometry. A dated phylogeny was constructed, and the evolution of both genomic parameters and their relationship to species climatic niches were tested using phylogeny-based statistics. The 2C genome size of Droseraceae varied between 488 and 10 927 Mbp, and the GC content ranged between 37·1 and 44·7 %. The genome sizes and genomic GC content of carnivorous and holocentric species did not differ from those of their non-carnivorous and monocentric relatives. The genomic GC content positively correlated with genome size and annual temperature fluctuations. The genome size and chromosome numbers were inversely correlated in the Australian clade of Drosera CONCLUSIONS: Our results indicate that neither carnivory (nutrient scarcity) nor the holokinetism have a prominent effect on size and DNA base composition of Droseraceae genomes. However, the holokinetic drive seems to affect karyotype evolution in one of the major clades of Drosera Our survey confirmed that the evolution of GC content is tightly connected with the evolution of genome size and also with environmental conditions. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Nuclear DNA content and base composition in 28 taxa of Musa.

PubMed

Kamaté, K; Brown, S; Durand, P; Bureau, J M; De Nay, D; Trinh, T H

2001-08-01

The nuclear DNA content of 28 taxa of Musa was assessed by flow cytometry, using line PxPC6 of Petunia hybrida as an internal standard. The 2C DNA value of Musa balbisiana (BB genome) was 1.16 pg, whereas Musa acuminata (AA genome) had an average 2C DNA value of 1.27 pg, with a difference of 11% between its subspecies. The two haploid (IC) genomes, A and B, comprising most of the edible bananas, are therefore of similar size, 0.63 pg (610 million bp) and 0.58 pg (560 million bp), respectively. The genome of diploid Musa is thus threefold that of Arabidopsis thaliana. The genome sizes in a set of triploid Musa cultivars or clones were quite different, with 2C DNA values ranging from 1.61 to 2.23 pg. Likewise, the genome sizes of tetraploid cultivars ranged from 1.94 to 2.37 pg (2C). Apparently, tetraploids (for instance, accession I.C.2) can have a genome size that falls within the range of triploid genome sizes, and vice versa (as in the case of accession Simili Radjah). The 2C values estimated for organs such as leaf, leaf sheath, rhizome, and flower were consistent, whereas root material gave atypical results, owing to browning. The genomic base composition of these Musa taxa had a median value of 40.8% GC (SD = 0.43%).
Regions flanking ori sequences affect the replication efficiency of the mitochondrial genome of ori+ petite mutants from yeast.

PubMed

Rayko, E; Goursot, R; Cherif-Zahar, B; Melis, R; Bernardi, G

1988-03-31

The mitochondrial genomes of progenies from 26 crosses between 17 cytoplasmic, spontaneous, suppressive, ori+ petite mutants of Saccharomyces cerevisiae have been studied by electrophoresis of restriction fragments. Only parental genomes (or occasionally, genomes derived from them by secondary excisions) were found in the progenies of the almost 500 diploids investigated; no evidence for illegitimate, site-specific mitochondrial recombination was detected. One of the parental genomes was always found to be predominate over the other one, although to different extents in different crosses. This predominance appears to be due to a higher replication efficiency, which is correlated with a greater density of ori sequences on the mitochondrial genome (and with a shorter repeat unit size of the latter). Exceptions to the 'repeat-unit-size rule' were found, however, even when the parental mitochondrial genomes carried the same ori sequence. This indicates that noncoding, intergenic sequences outside ori sequences also play a role in modulating replication efficiency. Since in different petites such sequences differ in primary structure, size, and position relative to ori sequences, this modulation is likely to take place through an indirect effect on DNA and nucleoid structure.
Variation, Evolution, and Correlation Analysis of C+G Content and Genome or Chromosome Size in Different Kingdoms and Phyla

PubMed Central

Li, Xiu-Qing; Du, Donglei

2014-01-01

C+G content (GC content or G+C content) is known to be correlated with genome/chromosome size in bacteria but the relationship for other kingdoms remains unclear. This study analyzed genome size, chromosome size, and base composition in most of the available sequenced genomes in various kingdoms. Genome size tends to increase during evolution in plants and animals, and the same is likely true for bacteria. The genomic C+G contents were found to vary greatly in microorganisms but were quite similar within each animal or plant subkingdom. In animals and plants, the C+G contents are ranked as follows: monocot plants>mammals>non-mammalian animals>dicot plants. The variation in C+G content between chromosomes within species is greater in animals than in plants. The correlation between average chromosome C+G content and chromosome length was found to be positive in Proteobacteria, Actinobacteria (but not in other analyzed bacterial phyla), Ascomycota fungi, and likely also in some plants; negative in some animals, insignificant in two protist phyla, and likely very weak in Archaea. Clearly, correlations between C+G content and chromosome size can be positive, negative, or not significant depending on the kingdoms/groups or species. Different phyla or species exhibit different patterns of correlation between chromosome-size and C+G content. Most chromosomes within a species have a similar pattern of variation in C+G content but outliers are common. The data presented in this study suggest that the C+G content is under genetic control by both trans- and cis- factors and that the correlation between C+G content and chromosome length can be positive, negative, or not significant in different phyla. PMID:24551092
Sizing up arthropod genomes: an evaluation of the impact of environmental variation on genome size estimates by flow cytometry and the use of qPCR as a method of estimation.

PubMed

Gregory, T Ryan; Nathwani, Paula; Bonnett, Tiffany R; Huber, Dezene P W

2013-09-01

A study was undertaken to evaluate both a pre-existing method and a newly proposed approach for the estimation of nuclear genome sizes in arthropods. First, concerns regarding the reliability of the well-established method of flow cytometry relating to impacts of rearing conditions on genome size estimates were examined. Contrary to previous reports, a more carefully controlled test found negligible environmental effects on genome size estimates in the fly Drosophila melanogaster. Second, a more recently touted method based on quantitative real-time PCR (qPCR) was examined in terms of ease of use, efficiency, and (most importantly) accuracy using four test species: the flies Drosophila melanogaster and Musca domestica and the beetles Tribolium castaneum and Dendroctonus ponderosa. The results of this analysis demonstrated that qPCR has the tendency to produce substantially different genome size estimates from other established techniques while also being far less efficient than existing methods.
Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin

PubMed Central

2011-01-01

Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest mitochondrial genome reported so far. PMID:21854637
Total centromere size and genome size are strongly correlated in ten grass species.

PubMed

Zhang, Han; Dawe, R Kelly

2012-05-01

It has been known for decades that centromere size varies across species, but the factors involved in setting centromere boundaries are unknown. As a means to address this question, we estimated centromere sizes in ten species of the grass family including rice, maize, and wheat, which diverged 60~80 million years ago and vary by 40-fold in genome size. Measurements were made using a broadly reactive antibody to rice centromeric histone H3 (CENH3). In species-wide comparisons, we found a clear linear relationship between total centromere size and genome size. Species with large genomes and few chromosomes tend to have the largest centromeres (e.g., rye) while species with small genomes and many chromosomes have the smallest centromeres (e.g., rice). However, within a species, centromere size is surprisingly uniform. We present evidence from three oat-maize addition lines that support this claim, indicating that each of three maize centromeres propagated in oat are not measurably different from each other. In the context of previously published data, our results suggest that the apparent correlation between chromosome and centromere size is incidental to a larger trend that reflects genome size. Centromere size may be determined by a limiting component mechanism similar to that described for Caenorhabditis elegans centrosomes.
Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae)

PubMed Central

Alverson, Andrew J.; Wei, XiaoXin; Rice, Danny W.; Stern, David B.; Barry, Kerrie; Palmer, Jeffrey D.

2010-01-01

The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. PMID:20118192
Indel Group in Genomes (IGG) Molecular Genetic Markers1[OPEN

PubMed Central

Burkart-Waco, Diana; Kuppu, Sundaram; Britt, Anne; Chetelat, Roger

2016-01-01

Genetic markers are essential when developing or working with genetically variable populations. Indel Group in Genomes (IGG) markers are primer pairs that amplify single-locus sequences that differ in size for two or more alleles. They are attractive for their ease of use for rapid genotyping and their codominant nature. Here, we describe a heuristic algorithm that uses a k-mer-based approach to search two or more genome sequences to locate polymorphic regions suitable for designing candidate IGG marker primers. As input to the IGG pipeline software, the user provides genome sequences and the desired amplicon sizes and size differences. Primer sequences flanking polymorphic insertions/deletions are produced as output. IGG marker files for three sets of genomes, Solanum lycopersicum/Solanum pennellii, Arabidopsis (Arabidopsis thaliana) Columbia-0/Landsberg erecta-0 accessions, and S. lycopersicum/S. pennellii/Solanum tuberosum (three-way polymorphic) are included. PMID:27436831
Larger Daphnia at lower temperature: a role for cell size and genome configuration?

PubMed

Jalal, Marwa; Wojewodzic, Marcin W; Laane, Carl Morten M; Hessen, Dag O

2013-09-01

Experiments with Daphnia magna and Daphnia pulex raised at 10 and 20 °C yielded larger adult size at the lower temperature. This must reflect increased cell size, increased cell numbers, or a combination of both. As it is difficult to achieve good estimates on cell size in crustaceans, we, therefore, measured nucleus and genome size using flow cytometry at 10 and 20 °C. DNA was stained with propidium iodide, ethidium bromide, and DAPI. Both nucleus and genome size estimates were elevated at 10 °C compared with 20 °C, suggesting that larger body size at low temperature could partly be accredited to an enlarged nucleus and thus cell size. Confocal microscopy observations confirmed the staining properties of fluorochromes. As differences in nucleotide numbers in response of growth temperature within a life span is unlikely, these results seem accredited to changed DNA-fluorochrome binding properties, presumably reflecting increased DNA condensation at low temperature. This implies that genome size comparisons may be impacted by ambient temperature in ectotherms. It also suggests that temperature-induced structural changes in the genome could affect cell size and for some species even body size.
Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size.

PubMed

Kelly, Laura J; Renny-Byfield, Simon; Pellicer, Jaume; Macas, Jiří; Novák, Petr; Neumann, Pavel; Lysak, Martin A; Day, Peter D; Berger, Madeleine; Fay, Michael F; Nichols, Richard A; Leitch, Andrew R; Leitch, Ilia J

2015-10-01

Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Genome size of Alexandrium catenella and Gracilariopsis lemaneiformis estimated by flow cytometry

NASA Astrophysics Data System (ADS)

Du, Qingwei; Sui, Zhenghong; Chang, Lianpeng; Wei, Huihui; Liu, Yuan; Mi, Ping; Shang, Erlei; Zeeshan, Niaz; Que, Zhou

2016-08-01

Flow cytometry (FCM) technique has been widely applied to estimating the genome size of various higher plants. However, there is few report about its application in algae. In this study, an optimized procedure of FCM was exploited to estimate the genome size of two eukaryotic algae. For analyzing Alexandrium catenella, an important red tide species, the whole cell instead of isolated nucleus was studied, and chicken erythrocytes were used as an internal reference. The genome size of A. catenella was estimated to be 56.48 ± 4.14 Gb (1C), approximately nineteen times larger than that of human genome. For analyzing Gracilariopsis lemaneiformis, an important economical red alga, the purified nucleus was employed, and Arabidopsis thaliana and Chondrus crispus were used as internal references, respectively. The genome size of Gp. lemaneiformis was 97.35 ± 2.58 Mb (1C) and 112.73 ± 14.00 Mb (1C), respectively, depending on the different internal references. The results of this research will promote the related studies on the genomics and evolution of these two species.
LTR Retrotransposons Contribute to Genomic Gigantism in Plethodontid Salamanders

PubMed Central

Sun, Cheng; Shepard, Donald B.; Chong, Rebecca A.; López Arriaza, José; Hall, Kathryn; Castoe, Todd A.; Feschotte, Cédric; Pollock, David D.; Mueller, Rachel Lockridge

2012-01-01

Among vertebrates, most of the largest genomes are found within the salamanders, a clade of amphibians that includes 613 species. Salamander genome sizes range from ∼14 to ∼120 Gb. Because genome size is correlated with nucleus and cell sizes, as well as other traits, morphological evolution in salamanders has been profoundly affected by genomic gigantism. However, the molecular mechanisms driving genomic expansion in this clade remain largely unknown. Here, we present the first comparative analysis of transposable element (TE) content in salamanders. Using high-throughput sequencing, we generated genomic shotgun data for six species from the Plethodontidae, the largest family of salamanders. We then developed a pipeline to mine TE sequences from shotgun data in taxa with limited genomic resources, such as salamanders. Our summaries of overall TE abundance and diversity for each species demonstrate that TEs make up a substantial portion of salamander genomes, and that all of the major known types of TEs are represented in salamanders. The most abundant TE superfamilies found in the genomes of our six focal species are similar, despite substantial variation in genome size. However, our results demonstrate a major difference between salamanders and other vertebrates: salamander genomes contain much larger amounts of long terminal repeat (LTR) retrotransposons, primarily Ty3/gypsy elements. Thus, the extreme increase in genome size that occurred in salamanders was likely accompanied by a shift in TE landscape. These results suggest that increased proliferation of LTR retrotransposons was a major molecular mechanism contributing to genomic expansion in salamanders. PMID:22200636
Similar Ratios of Introns to Intergenic Sequence across Animal Genomes

PubMed Central

Wörheide, Gert

2017-01-01

Abstract One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. PMID:28633296
Patterns of genome size diversity in bats (order Chiroptera).

PubMed

Smith, Jillian D L; Bickham, John W; Gregory, T Ryan

2013-08-01

Despite being a group of particular interest in considering relationships between genome size and metabolic parameters, bats have not been well studied from this perspective. This study presents new estimates for 121 "microbat" species from 12 families and complements a previous study on members of the family Pteropodidae ("megabats"). The results confirm that diversity in genome size in bats is very limited even compared with other mammals, varying approximately 2-fold from 1.63 pg in Lophostoma carrikeri to 3.17 pg in Rhinopoma hardwickii and averaging only 2.35 pg ± 0.02 SE (versus 3.5 pg overall for mammals). However, contrary to some other vertebrate groups, and perhaps owing to the narrow range observed, genome size correlations were not apparent with any chromosomal, physiological, flight-related, developmental, or ecological characteristics within the order Chiroptera. Genome size is positively correlated with measures of body size in bats, though the strength of the relationships differs between pteropodids ("megabats") and nonpteropodids ("microbats").
The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome

PubMed Central

Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

2014-01-01

Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome. PMID:24671744
The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

PubMed

Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

2014-04-01

Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.
Transposable element distribution, abundance and role in genome size variation in the genus Oryza.

PubMed

Zuccolo, Andrea; Sebastian, Aswathy; Talag, Jayson; Yu, Yeisoo; Kim, HyeRan; Collura, Kristi; Kudrna, Dave; Wing, Rod A

2007-08-29

The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop - rice (Oryza sativa [AA]). Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements) in shaping these genomes and in their contributing to genome size variation. We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys) account for a significant portion of the genome size variations present in the Oryza genus.
Transposable Element Genomic Fissuring in Pyrenophora teres Is Associated With Genome Expansion and Dynamics of Host–Pathogen Genetic Interactions

PubMed Central

Syme, Robert A.; Martin, Anke; Wyatt, Nathan A.; Lawrence, Julie A.; Muria-Gonzalez, Mariano J.; Friesen, Timothy L.; Ellwood, Simon R.

2018-01-01

Pyrenophora teres, P. teres f. teres (PTT) and P. teres f. maculata (PTM) cause significant diseases in barley, but little is known about the large-scale genomic differences that may distinguish the two forms. Comprehensive genome assemblies were constructed from long DNA reads, optical and genetic maps. As repeat masking in fungal genomes influences the final gene annotations, an accurate and reproducible pipeline was developed to ensure comparability between isolates. The genomes of the two forms are highly collinear, each composed of 12 chromosomes. Genome evolution in P. teres is characterized by genome fissuring through the insertion and expansion of transposable elements (TEs), a process that isolates blocks of genic sequence. The phenomenon is particularly pronounced in PTT, which has a larger, more repetitive genome than PTM and more recent transposon activity measured by the frequency and size of genome fissures. PTT has a longer cultivated host association and, notably, a greater range of host–pathogen genetic interactions compared to other Pyrenophora spp., a property which associates better with genome size than pathogen lifestyle. The two forms possess similar complements of TE families with Tc1/Mariner and LINE-like Tad-1 elements more abundant in PTT. Tad-1 was only detectable as vestigial fragments in PTM and, within the forms, differences in genome sizes and the presence and absence of several TE families indicated recent lineage invasions. Gene differences between P. teres forms are mainly associated with gene-sparse regions near or within TE-rich regions, with many genes possessing characteristics of fungal effectors. Instances of gene interruption by transposons resulting in pseudogenization were detected in PTT. In addition, both forms have a large complement of secondary metabolite gene clusters indicating significant capacity to produce an array of different molecules. This study provides genomic resources for functional genetics to help dissect factors underlying the host–pathogen interactions. PMID:29720997
Genome size estimates for crustaceans using Feulgen image analysis densitometry of ethanol-preserved tissues.

PubMed

Jeffery, Nicholas W; Gregory, T Ryan

2014-10-01

Crustaceans are enormously diverse both phylogenetically and ecologically, but they remain substantially underrepresented in the existing genome size database. An expansion of this dataset could be facilitated if it were possible to obtain genome size estimates from ethanol-preserved specimens. In this study, two tests were performed in order to assess the reliability of genome size data generated using preserved material. First, the results of estimates based on flash-frozen versus ethanol-preserved material were compared across 37 species of crustaceans that differ widely in genome size. Second, a comparison was made of specimens from a single species that had been stored in ethanol for 1-14 years. In both cases, the use of gill tissue in Feulgen image analysis densitometry proved to be a very viable approach. This finding is of direct relevance to both new studies of field-collected crustaceans as well as potential studies based on existing collections. © 2014 International Society for Advancement of Cytometry.

Genome Size Variation in the Genus Carthamus (Asteraceae, Cardueae): Systematic Implications and Additive Changes During Allopolyploidization

PubMed Central

GARNATJE, TERESA; GARCIA, SÒNIA; VILATERSANA, ROSER; VALLÈS, JOAN

2006-01-01

• Background and Aims Plant genome size is an important biological characteristic, with relationships to systematics, ecology and distribution. Currently, there is no information regarding nuclear DNA content for any Carthamus species. In addition to improving the knowledge base, this research focuses on interspecific variation and its implications for the infrageneric classification of this genus. Genome size variation in the process of allopolyploid formation is also addressed. • Methods Nuclear DNA samples from 34 populations of 16 species of the genus Carthamus were assessed by flow cytometry using propidium iodide. • Key Results The 2C values ranged from 2·26 pg for C. leucocaulos to 7·46 pg for C. turkestanicus, and monoploid genome size (1Cx-value) ranged from 1·13 pg in C. leucocaulos to 1·53 pg in C. alexandrinus. Mean genome sizes differed significantly, based on sectional classification. Both allopolyploid species (C. creticus and C. turkestanicus) exhibited nuclear DNA contents in accordance with the sum of the putative parental C-values (in one case with a slight reduction, frequent in polyploids), supporting their hybrid origin. • Conclusions Genome size represents a useful tool in elucidating systematic relationships between closely related species. A considerable reduction in monoploid genome size, possibly due to the hybrid formation, is also reported within these taxa. PMID:16390843
Similar Ratios of Introns to Intergenic Sequence across Animal Genomes.

PubMed

Francis, Warren R; Wörheide, Gert

2017-06-01

One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Complete chloroplast genome sequence of a major allogamous forage species, perennial ryegrass (Lolium perenne L.).

PubMed

Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne

2009-06-01

Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.
Coconut genome size determined by flow cytometry: Tall versus Dwarf types.

PubMed

Freitas Neto, M; Pereira, T N S; Geronimo, I G C; Azevedo, A O N; Ramos, S R R; Pereira, M G

2016-02-11

Coconuts (Cocos nucifera L.) are tropical palm trees that are classified into Tall and Dwarf types based on height, and both types are diploid (2n = 2x = 32 chromosomes). The reproduction mode is autogamous for Dwarf types and allogamous for Tall types. One hypothesis for the origin of the Dwarf coconut suggests that it is a Tall variant that resulted from either mutation or inbreeding, and differences in genome size between the two types would support this hypothesis. In this study, we estimated the genome sizes of 14 coconut accessions (eight Tall and six Dwarf types) using flow cytometry. Nuclei were extracted from leaf discs and stained with propidium iodide, and Pisum sativum (2C = 9.07 pg DNA) was used as an internal standard. Histograms with good resolution and low coefficients of variation (2.5 to 3.2%) were obtained. The 2C DNA content ranged from 5.72 to 5.48 pg for Tall accessions and from 5.58 to 5.52 pg for Dwarf accessions. The mean genome sizes for Tall and Dwarf specimens were 5.59 and 5.55 pg, respectively. Among all accessions, Rennel Island Tall had the highest mean DNA content (5.72 pg), whereas West African Tall had the lowest (5.48 pg). The mean coconut genome size (2C = 5.57 pg, corresponding to 2723.73 Mbp/haploid set) was classified as small. Only small differences in genome size existed among the coconut accessions, suggesting that the Dwarf type did not evolve from the Tall type.
Whole genome duplication and transposable element proliferation drive genome expansion in Corydoradinae catfishes.

PubMed

Marburger, Sarah; Alexandrou, Markos A; Taggart, John B; Creer, Simon; Carvalho, Gary; Oliveira, Claudio; Taylor, Martin I

2018-02-14

Genome size varies significantly across eukaryotic taxa and the largest changes are typically driven by macro-mutations such as whole genome duplications (WGDs) and proliferation of repetitive elements. These two processes may affect the evolutionary potential of lineages by increasing genetic variation and changing gene expression. Here, we elucidate the evolutionary history and mechanisms underpinning genome size variation in a species-rich group of Neotropical catfishes (Corydoradinae) with extreme variation in genome size-0.6 to 4.4 pg per haploid cell. First, genome size was quantified in 65 species and mapped onto a novel fossil-calibrated phylogeny. Two evolutionary shifts in genome size were identified across the tree-the first between 43 and 49 Ma (95% highest posterior density (HPD) 36.2-68.1 Ma) and the second at approximately 19 Ma (95% HPD 15.3-30.14 Ma). Second, restriction-site-associated DNA (RAD) sequencing was used to identify potential WGD events and quantify transposable element (TE) abundance in different lineages. Evidence of two lineage-scale WGDs was identified across the phylogeny, the first event occurring between 54 and 66 Ma (95% HPD 42.56-99.5 Ma) and the second at 20-30 Ma (95% HPD 15.3-45 Ma) based on haplotype numbers per contig and between 35 and 44 Ma (95% HPD 30.29-64.51 Ma) and 20-30 Ma (95% HPD 15.3-45 Ma) based on SNP read ratios. TE abundance increased considerably in parallel with genome size, with a single TE-family (TC1-IS630-Pogo) showing several increases across the Corydoradinae, with the most recent at 20-30 Ma (95% HPD 15.3-45 Ma) and an older event at 35-44 Ma (95% HPD 30.29-64.51 Ma). We identified signals congruent with two WGD duplication events, as well as an increase in TE abundance across different lineages, making the Corydoradinae an excellent model system to study the effects of WGD and TEs on genome and organismal evolution. © 2018 The Authors.
The number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses

PubMed Central

Shukla, Avi; Chatterjee, Anirvan

2018-01-01

Abstract Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different genomic regions of AVs exhibit distinct genealogies. Although the vertically transferred core genes and their functions are universally conserved across the nucleocytoplasmic large DNA virus (NCLDV) families and are essential for their replication, the horizontally acquired genes are variable across families and are lineage-specific. When compared with other giant virus families, we observed a near–linear increase in the number of genes encoding repeat domain-containing proteins (RDCPs) with the increase in the genome size of AVs. From what is known about the functions of RDCPs in bacteria and eukaryotes and their prevalence in the AV genomes, we envisage important roles for RDCPs in the life cycle of AVs, their genome expansion, and plasticity. This observation also supports the evolution of AVs from a smaller viral ancestor by the acquisition of diverse gene families from the environment including RDCPs that might have helped in host adaption. PMID:29308275
Challenges of flow-cytometric estimation of nuclear genome size in orchids, a plant group with both whole-genome and progressively partial endoreplication.

PubMed

Trávníček, Pavel; Ponert, Jan; Urfus, Tomáš; Jersáková, Jana; Vrána, Jan; Hřibová, Eva; Doležel, Jaroslav; Suda, Jan

2015-10-01

Nuclear genome size is an inherited quantitative trait of eukaryotic organisms with both practical and biological consequences. A detailed analysis of major families is a promising approach to fully understand the biological meaning of the extensive variation in genome size in plants. Although Orchidaceae accounts for ∼10% of the angiosperm diversity, the knowledge of patterns and dynamics of their genome size is limited, in part due to difficulties in flow cytometric analyses. Cells in various somatic tissues of orchids undergo extensive endoreplication, either whole-genome or partial, and the G1-phase nuclei with 2C DNA amounts may be lacking, resulting in overestimated genome size values. Interpretation of DNA content histograms is particularly challenging in species with progressively partial endoreplication, in which the ratios between the positions of two neighboring DNA peaks are lower than two. In order to assess distributions of nuclear DNA amounts and identify tissue suitable for reliable estimation of nuclear DNA content, we analyzed six different tissue types in 48 orchid species belonging to all recognized subfamilies. Although traditionally used leaves may provide incorrect C-values, particularly in species with progressively partial endoreplication, young ovaries and pollinaria consistently yield 2C and 1C peaks of their G1-phase nuclei, respectively, and are, therefore, the most suitable parts for genome size studies in orchids. We also provide new DNA C-values for 22 orchid genera and 42 species. Adhering to the proposed methodology would allow for reliable genome size estimates in this largest plant family. Although our research was limited to orchids, the need to find a suitable tissue with dominant 2C peak of G1-phase nuclei applies to all endopolyploid species. © 2015 International Society for Advancement of Cytometry.
Patterns of genome size variation in snapping shrimp.

PubMed

Jeffery, Nicholas W; Hultgren, Kristin; Chak, Solomon Tin Chi; Gregory, T Ryan; Rubenstein, Dustin R

2016-06-01

Although crustaceans vary extensively in genome size, little is known about how genome size may affect the ecology and evolution of species in this diverse group, in part due to the lack of large genome size datasets. Here we investigate interspecific, intraspecific, and intracolony variation in genome size in 39 species of Synalpheus shrimps, representing one of the largest genome size datasets for a single genus within crustaceans. We find that genome size ranges approximately 4-fold across Synalpheus with little phylogenetic signal, and is not related to body size. In a subset of these species, genome size is related to chromosome size, but not to chromosome number, suggesting that despite large genomes, these species are not polyploid. Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus duffyi genome size within the same colony.
Chromosome Numbers and Genome Size Variation in Indian Species of Curcuma (Zingiberaceae)

PubMed Central

Leong-Škorničková, Jana; Šída, Otakar; Jarolímová, Vlasta; Sabu, Mamyil; Fér, Tomáš; Trávníček, Pavel; Suda, Jan

2007-01-01

Background and Aims Genome size and chromosome numbers are important cytological characters that significantly influence various organismal traits. However, geographical representation of these data is seriously unbalanced, with tropical and subtropical regions being largely neglected. In the present study, an investigation was made of chromosomal and genome size variation in the majority of Curcuma species from the Indian subcontinent, and an assessment was made of the value of these data for taxonomic purposes. Methods Genome size of 161 homogeneously cultivated plant samples classified into 51 taxonomic entities was determined by propidium iodide flow cytometry. Chromosome numbers were counted in actively growing root tips using conventional rapid squash techniques. Key Results Six different chromosome counts (2n = 22, 42, 63, >70, 77 and 105) were found, the last two representing new generic records. The 2C-values varied from 1·66 pg in C. vamana to 4·76 pg in C. oligantha, representing a 2·87-fold range. Three groups of taxa with significantly different homoploid genome sizes (Cx-values) and distinct geographical distribution were identified. Five species exhibited intraspecific variation in nuclear DNA content, reaching up to 15·1 % in cultivated C. longa. Chromosome counts and genome sizes of three Curcuma-like species (Hitchenia caulina, Kaempferia scaposa and Paracautleya bhatii) corresponded well with typical hexaploid (2n = 6x = 42) Curcuma spp. Conclusions The basic chromosome number in the majority of Indian taxa (belonging to subgenus Curcuma) is x = 7; published counts correspond to 6x, 9x, 11x, 12x and 15x ploidy levels. Only a few species-specific C-values were found, but karyological and/or flow cytometric data may support taxonomic decisions in some species alliances with morphological similarities. Close evolutionary relationships among some cytotypes are suggested based on the similarity in homoploid genome sizes and geographical grouping. A new species combination, Curcuma scaposa (Nimmo) Škorničk. & M. Sabu, comb. nov., is proposed. PMID:17686760
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits.

PubMed

Larsson, John; Nylander, Johan Aa; Bergman, Birgitta

2011-06-30

Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits

PubMed Central

2011-01-01

Background Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. Results A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. Conclusions The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets. PMID:21718514
Global DNA cytosine methylation as an evolving trait: phylogenetic signal and correlated evolution with genome size in angiosperms

PubMed Central

Alonso, Conchita; Pérez, Ricardo; Bazaga, Pilar; Herrera, Carlos M.

2015-01-01

DNA cytosine methylation is a widespread epigenetic mechanism in eukaryotes, and plant genomes commonly are densely methylated. Genomic methylation can be associated with functional consequences such as mutational events, genomic instability or altered gene expression, but little is known on interspecific variation in global cytosine methylation in plants. In this paper, we compare global cytosine methylation estimates obtained by HPLC and use a phylogenetically-informed analytical approach to test for significance of evolutionary signatures of this trait across 54 angiosperm species in 25 families. We evaluate whether interspecific variation in global cytosine methylation is statistically related to phylogenetic distance and also whether it is evolutionarily correlated with genome size (C-value). Global cytosine methylation varied widely between species, ranging between 5.3% (Arabidopsis) and 39.2% (Narcissus). Differences between species were related to their evolutionary trajectories, as denoted by the strong phylogenetic signal underlying interspecific variation. Global cytosine methylation and genome size were evolutionarily correlated, as revealed by the significant relationship between the corresponding phylogenetically independent contrasts. On average, a ten-fold increase in genome size entailed an increase of about 10% in global cytosine methylation. Results show that global cytosine methylation is an evolving trait in angiosperms whose evolutionary trajectory is significantly linked to changes in genome size, and suggest that the evolutionary implications of epigenetic mechanisms are likely to vary between plant lineages. PMID:25688257
Comparative genomics of the marine bacterial genus Glaciecola reveals the high degree of genomic diversity and genomic characteristic for cold adaptation.

PubMed

Qin, Qi-Long; Xie, Bin-Bin; Yu, Yong; Shu, Yan-Li; Rong, Jin-Cheng; Zhang, Yan-Jiao; Zhao, Dian-Li; Chen, Xiu-Lan; Zhang, Xi-Ying; Chen, Bo; Zhou, Bai-Cheng; Zhang, Yu-Zhong

2014-06-01

To what extent the genomes of different species belonging to one genus can be diverse and the relationship between genomic differentiation and environmental factor remain unclear for oceanic bacteria. With many new bacterial genera and species being isolated from marine environments, this question warrants attention. In this study, we sequenced all the type strains of the published species of Glaciecola, a recently defined cold-adapted genus with species from diverse marine locations, to study the genomic diversity and cold-adaptation strategy in this genus.The genome size diverged widely from 3.08 to 5.96 Mb, which can be explained by massive gene gain and loss events. Horizontal gene transfer and new gene emergence contributed substantially to the genome size expansion. The genus Glaciecola had an open pan-genome. Comparative genomic research indicated that species of the genus Glaciecola had high diversity in genome size, gene content and genetic relatedness. This may be prevalent in marine bacterial genera considering the dynamic and complex environments of the ocean. Species of Glaciecola had some common genomic features related to cold adaptation, which enable them to thrive and play a role in biogeochemical cycle in the cold marine environments.
Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle.

PubMed

Uemoto, Yoshinobu; Sasaki, Shinji; Kojima, Takatoshi; Sugimoto, Yoshikazu; Watanabe, Toshio

2015-11-19

Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data. In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design. The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV. The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.
The dynamic evolutionary history of genome size in North American woodland salamanders.

PubMed

Newman, Catherine E; Gregory, T Ryan; Austin, Christopher C

2017-04-01

The genus Plethodon is the most species-rich salamander genus in North America, and nearly half of its species face an uncertain future. It is also one of the most diverse families in terms of genome sizes, which range from 1C = 18.2 to 69.3 pg, or 5-20 times larger than the human genome. Large genome size in salamanders results in part from accumulation of transposable elements and is associated with various developmental and physiological traits. However, genome sizes have been reported for only 25% of the species of Plethodon (14 of 55). We collected genome size data for Plethodon serratus to supplement an ongoing phylogeographic study, reconstructed the evolutionary history of genome size in Plethodontidae, and inferred probable genome sizes for the 41 species missing empirical data. Results revealed multiple genome size changes in Plethodon: genomes of western Plethodon increased, whereas genomes of eastern Plethodon decreased, followed by additional decreases or subsequent increases. The estimated genome size of P. serratus was 21 pg. New understanding of variation in genome size evolution, along with genome size inferences for previously unstudied taxa, provide a foundation for future studies on the biology of plethodontid salamanders.
Draft Genomes of Anopheles cracens and Anopheles maculatus: Comparison of Simian Malaria and Human Malaria Vectors in Peninsular Malaysia

PubMed Central

Chen, Junhui; Zhong, Zhen; Jian, Jianbo; Amir, Amirah; Cheong, Fei-Wen; Sum, Jia-Siang; Fong, Mun-Yik

2016-01-01

Anopheles cracens has been incriminated as the vector of human knowlesi malaria in peninsular Malaysia. Besides, it is a good laboratory vector of Plasmodium falciparum and P. vivax. The distribution of An. cracens overlaps with that of An. maculatus, the human malaria vector in peninsular Malaysia that seems to be refractory to P. knowlesi infection in natural settings. Whole genome sequencing was performed on An. cracens and An. maculatus collected here. The draft genome of An. cracens was 395 Mb in size whereas the size of An. maculatus draft genome was 499 Mb. Comparison with the published Malaysian An. maculatus genome suggested the An. maculatus specimen used in this study as a different geographical race. Comparative analyses highlighted the similarities and differences between An. cracens and An. maculatus, providing new insights into their biological behavior and characteristics. PMID:27347683
Genome Size Evolution in Theobroma cacao: Recent Sequencing of Two Cacao Genomes of Different Size

USDA-ARS?s Scientific Manuscript database

Theobroma cacao, the source of cocoa beans for chocolate, is an important tropical agriculture commodity that is affected by a number of fungal pathogens and insect pests, as well as concerns about yield and quality. We are trying to find molecular genetic markers that are linked to disease resista...
Complete genome sequence of the english isolate of rat cytomegalovirus (Murid herpesvirus 8).

PubMed

Ettinger, Jakob; Geyer, Henriette; Nitsche, Andreas; Zimmermann, Albert; Brune, Wolfram; Sandford, Gordon R; Hayward, Gary S; Voigt, Sebastian

2012-12-01

The complete genome of the English isolate of rat cytomegalovirus (RCMV-E) was determined. RCMV-E has a 202,946-bp genome with noninverting repeats but without terminal repeats. Thus, it differs significantly in size and genomic arrangement from closely related rodent cytomegaloviruses (CMVs). To account for the differences between the rat CMV isolates of Maastricht and England, RCMV-E was classified as Murid herpesvirus 8 by the International Committee on Taxonomy of Viruses.
Genome size variation in Corchorus olitorius (Malvaceae s.l.) and its correlation with elevation and phenotypic traits.

PubMed

Benor, Solomon; Fuchs, Jörg; Blattner, Frank R

2011-07-01

In this study, we report genome size variations in Corchorus olitorius L. (Malvaceae s.l.), a crop species known for its morphological plasticity and broad geographical distribution, and Corchorus capsularis L., the second widely cultivated species in the genus. Flow cytometric analyses were conducted with several tissues and nuclei isolation buffers using 69 accessions of C. olitorius and 4 accessions of C. capsularis, representing different habitats and geographical origins. The mean 2C nuclear DNA content (± SD) of C. olitorius was estimated to be 0.918 ± 0.011 pg, with a minimum of 0.882 ± 0.004 pg, and a maximum of 0.942 ± 0.004 pg. All studied plant materials were found to be diploid with 2n = 14. The genome size is negatively correlated with days to flowering (r = -0.29, p < 0.05) and positively with seed surface area (r = 0.38, p < 0.05). Moreover, a statistically significant positive correlation was detected between genome size and growing elevation (r = 0.59, p < 0.001) in wild populations. The mean 2C nuclear DNA content of C. capsularis was estimated to be 0.802 ± 0.008 pg. In comparison to other economically important crop species, the genome sizes of C. olitorius and C. capsularis are much smaller, and therewith closer to that of rice. The relatively small genome sizes will be of general advantage for any efforts into genomics or sequencing approaches of these species.
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.

PubMed

Glusman, Gustavo; Mauldin, Denise E; Hood, Leroy E; Robinson, Max

2017-01-01

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.

Cell size, genome size and the dominance of Angiosperms

NASA Astrophysics Data System (ADS)

Simonin, K. A.; Roddy, A. B.

2016-12-01

Angiosperms are capable of maintaining the highest rates of photosynthetic gas exchange of all land plants. High rates of photosynthesis depends mechanistically both on efficiently transporting water to the sites of evaporation in the leaf and on regulating the loss of that water to the atmosphere as CO2 diffuses into the leaf. Angiosperm leaves are unique in their ability to sustain high fluxes of liquid and vapor phase water transport due to high vein densities and numerous, small stomata. Despite the ubiquity of studies characterizing the anatomical and physiological adaptations that enable angiosperms to maintain high rates of photosynthesis, the underlying mechanism explaining why they have been able to develop such high leaf vein densities, and such small and abundant stomata, is still incomplete. Here we ask whether the scaling of genome size and cell size places a fundamental constraint on the photosynthetic metabolism of land plants, and whether genome downsizing among the angiosperms directly contributed to their greater potential and realized primary productivity relative to the other major groups of terrestrial plants. Using previously published data we show that a single relationship can predict guard cell size from genome size across the major groups of terrestrial land plants (e.g. angiosperms, conifers, cycads and ferns). Similarly, a strong positive correlation exists between genome size and both stomatal density and vein density that together ultimately constrains maximum potential (gs, max) and operational stomatal conductance (gs, op). Further the difference in the slopes describing the covariation between genome size and both gs, max and gs, op suggests that genome downsizing brings gs, op closer to gs, max. Taken together the data presented here suggests that the smaller genomes of angiosperms allow their final cell sizes to vary more widely and respond more directly to environmental conditions and in doing so bring operational photosynthetic metabolism closer to maximum potential photosynthesis.EndFragment
[Annotation of the mobilomes of nine teleost species].

PubMed

Gao, Bo; Shen, Dan; Chen, Cai; Wang, Saisai; Yang, Kunlun; Chen, Wei; Wang, Wei; Zhang, Li; Song, Chengyi

2018-01-25

In this study, the mobilomes of nine teleost species were annotated by bioinformatics methods. Both of the mobilome size and constitute displayed a significant difference in 9 species of teleost fishes. The species of mobilome content ranking from high to low were zebrafish, medaka, tilapia, coelacanth, platyfish, cod, stickleback, tetradon and fugu. Mobilome content and genome size were positively correlated. The DNA transposons displayed higher diversity and larger variation in teleost (0.50% to 38.37%), was a major determinant of differences in teleost mobilomes, and hAT and Tc/Mariner superfamily were the major DNA transposons in teleost. RNA transposons also exhibited high diversity in teleost, LINE transposons accounted for 0.53% to 5.75% teleost genomic sequences, and 14 superfamilies were detected. L1, L2, RTE and Rex retrotransposons obtained significant amplification. While LTR displayed low amplification in most teleost with less than 2% of genome coverages, except in zebrafish and stickleback, where LTR reachs 5.58% and 2.51% of genome coverages respectively. And 6 LTR superfamilies (Copia, DIRS, ERV, Gypsy, Ngaro and Pao) were detected in the teleost, and Gypsy exhibits obvious amplication among them. While the SINE represents the weakest ampification types in teleost, only within zebrafish and coelacanth, it represents 3.28% and 5.64% of genome coverages, in the other 7 teleost, it occupies less than 1% of genomes, and tRNA, 5S and MIR families of SINE have a certain degree of amplification in some teleosts. This study shows that the teleost display high diversity and large variation of mobilome, there is a strong correlation with the size variations of genomes and mobilome contents in teleost, mobilome is an important factor in determining the teleost genome size.
Whole genome duplication and transposable element proliferation drive genome expansion in Corydoradinae catfishes

PubMed Central

Marburger, Sarah; Alexandrou, Markos A.; Creer, Simon

2018-01-01

Genome size varies significantly across eukaryotic taxa and the largest changes are typically driven by macro-mutations such as whole genome duplications (WGDs) and proliferation of repetitive elements. These two processes may affect the evolutionary potential of lineages by increasing genetic variation and changing gene expression. Here, we elucidate the evolutionary history and mechanisms underpinning genome size variation in a species-rich group of Neotropical catfishes (Corydoradinae) with extreme variation in genome size—0.6 to 4.4 pg per haploid cell. First, genome size was quantified in 65 species and mapped onto a novel fossil-calibrated phylogeny. Two evolutionary shifts in genome size were identified across the tree—the first between 43 and 49 Ma (95% highest posterior density (HPD) 36.2–68.1 Ma) and the second at approximately 19 Ma (95% HPD 15.3–30.14 Ma). Second, restriction-site-associated DNA (RAD) sequencing was used to identify potential WGD events and quantify transposable element (TE) abundance in different lineages. Evidence of two lineage-scale WGDs was identified across the phylogeny, the first event occurring between 54 and 66 Ma (95% HPD 42.56–99.5 Ma) and the second at 20–30 Ma (95% HPD 15.3–45 Ma) based on haplotype numbers per contig and between 35 and 44 Ma (95% HPD 30.29–64.51 Ma) and 20–30 Ma (95% HPD 15.3–45 Ma) based on SNP read ratios. TE abundance increased considerably in parallel with genome size, with a single TE-family (TC1-IS630-Pogo) showing several increases across the Corydoradinae, with the most recent at 20–30 Ma (95% HPD 15.3–45 Ma) and an older event at 35–44 Ma (95% HPD 30.29–64.51 Ma). We identified signals congruent with two WGD duplication events, as well as an increase in TE abundance across different lineages, making the Corydoradinae an excellent model system to study the effects of WGD and TEs on genome and organismal evolution. PMID:29445022
Reference-free comparative genomics of 174 chloroplasts.

PubMed

Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H

2012-01-01

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
Reference-Free Comparative Genomics of 174 Chloroplasts

PubMed Central

Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

2012-01-01

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. PMID:23185288
The Small Nuclear Genomes of Selaginella Are Associated with a Low Rate of Genome Size Evolution.

PubMed

Baniaga, Anthony E; Arrigo, Nils; Barker, Michael S

2016-06-03

The haploid nuclear genome size (1C DNA) of vascular land plants varies over several orders of magnitude. Much of this observed diversity in genome size is due to the proliferation and deletion of transposable elements. To date, all vascular land plant lineages with extremely small nuclear genomes represent recently derived states, having ancestors with much larger genome sizes. The Selaginellaceae represent an ancient lineage with extremely small genomes. It is unclear how small nuclear genomes evolved in Selaginella We compared the rates of nuclear genome size evolution in Selaginella and major vascular plant clades in a comparative phylogenetic framework. For the analyses, we collected 29 new flow cytometry estimates of haploid genome size in Selaginella to augment publicly available data. Selaginella possess some of the smallest known haploid nuclear genome sizes, as well as the lowest rate of genome size evolution observed across all vascular land plants included in our analyses. Additionally, our analyses provide strong support for a history of haploid nuclear genome size stasis in Selaginella Our results indicate that Selaginella, similar to other early diverging lineages of vascular land plants, has relatively low rates of genome size evolution. Further, our analyses highlight that a rapid transition to a small genome size is only one route to an extremely small genome. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Joint scaling laws in functional and evolutionary categories in prokaryotic genomes

PubMed Central

Grilli, J.; Bassetti, B.; Maslov, S.; Cosentino Lagomarsino, M.

2012-01-01

We propose and study a class-expansion/innovation/loss model of genome evolution taking into account biological roles of genes and their constituent domains. In our model, numbers of genes in different functional categories are coupled to each other. For example, an increase in the number of metabolic enzymes in a genome is usually accompanied by addition of new transcription factors regulating these enzymes. Such coupling can be thought of as a proportional ‘recipe’ for genome composition of the type ‘a spoonful of sugar for each egg yolk’. The model jointly reproduces two known empirical laws: the distribution of family sizes and the non-linear scaling of the number of genes in certain functional categories (e.g. transcription factors) with genome size. In addition, it allows us to derive a novel relation between the exponents characterizing these two scaling laws, establishing a direct quantitative connection between evolutionary and functional categories. It predicts that functional categories that grow faster-than-linearly with genome size to be characterized by flatter-than-average family size distributions. This relation is confirmed by our bioinformatics analysis of prokaryotic genomes. This proves that the joint quantitative trends of functional and evolutionary classes can be understood in terms of evolutionary growth with proportional recipes. PMID:21937509
Intra-specific variation in genome size in maize: cytological and phenotypic correlates

PubMed Central

Realini, María Florencia; Poggio, Lidia; Cámara-Hernández, Julián; González, Graciela Esther

2016-01-01

Genome size variation accompanies the diversification and evolution of many plant species. Relationships between DNA amount and phenotypic and cytological characteristics form the basis of most hypotheses that ascribe a biological role to genome size. The goal of the present research was to investigate the intra-specific variation in the DNA content in maize populations from Northeastern Argentina and further explore the relationship between genome size and the phenotypic traits seed weight and length of the vegetative cycle. Moreover, cytological parameters such as the percentage of heterochromatin as well as the number, position and sequence composition of knobs were analysed and their relationships with 2C DNA values were explored. The populations analysed presented significant differences in 2C DNA amount, from 4.62 to 6.29 pg, representing 36.15 % of the inter-populational variation. Moreover, intra-populational genome size variation was found, varying from 1.08 to 1.63-fold. The variation in the percentage of knob heterochromatin as well as in the number, chromosome position and sequence composition of the knobs was detected among and within the populations. Although a positive relationship between genome size and the percentage of heterochromatin was observed, a significant correlation was not found. This confirms that other non-coding repetitive DNA sequences are contributing to the genome size variation. A positive relationship between DNA amount and the seed weight has been reported in a large number of species, this relationship was not found in the populations studied here. The length of the vegetative cycle showed a positive correlation with the percentage of heterochromatin. This result allowed attributing an adaptive effect to heterochromatin since the length of this cycle would be optimized via selection for an appropriate percentage of heterochromatin. PMID:26644343
Reassessment of the Genome Size in Elaeis guineensis and Elaeis oleifera, and Its Interspecific Hybrid

PubMed Central

Camillo, Julceia; Leão, André P; Alves, Alexandre A; Formighieri, Eduardo F; Azevedo, Ana LS; Nunes, Juliana D; de Capdeville, Guy; de A Mattos, Jean K; Souza, Manoel T

2014-01-01

Aiming at generating a comprehensive genomic database on Elaeis spp., our group is leading several R&D initiatives with Elaeis guineensis (African oil palm) and Elaeis oleifera (American oil palm), including the whole-genome sequencing of the last. Genome size estimates currently available for this genus are controversial, as they indicate that American oil palm genome is about half the size of the African oil palm genome and that the genome of the interspecific hybrid is bigger than both the parental species genomes. We estimated the genome size of three E. guineensis genotypes, five E. oleifera genotypes, and two interspecific hybrids genotypes. On average, the genome size of E. guineensis is 4.32 ± 0.173 pg, while that of E. oleifera is 4.43 ± 0.018 pg. This indicates that both genomes are similar in size, even though E. oleifera is in fact bigger. As expected, the hybrid genome size is around the average of the two genomes, 4.40 ± 0.016 pg. Additionally, we demonstrate that both species present around 38% of GC content. As our results contradict the currently available data on Elaeis spp. genome sizes, we propose that the actual genome size of the Elaeis species is around 4 pg and that American oil palm possesses a larger genome than African oil palm. PMID:26203259
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints

PubMed Central

Glusman, Gustavo; Mauldin, Denise E.; Hood, Leroy E.; Robinson, Max

2017-01-01

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into “genome fingerprints” via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics. PMID:29018478
Integration of multi-omics data of a genome-reduced bacterium: Prevalence of post-transcriptional regulation and its correlation with protein abundances

PubMed Central

Chen, Wei-Hua; van Noort, Vera; Lluch-Senar, Maria; Hennrich, Marco L.; H. Wodke, Judith A.; Yus, Eva; Alibés, Andreu; Roma, Guglielmo; Mende, Daniel R.; Pesavento, Christina; Typas, Athanasios; Gavin, Anne-Claude; Serrano, Luis; Bork, Peer

2016-01-01

We developed a comprehensive resource for the genome-reduced bacterium Mycoplasma pneumoniae comprising 1748 consistently generated ‘-omics’ data sets, and used it to quantify the power of antisense non-coding RNAs (ncRNAs), lysine acetylation, and protein phosphorylation in predicting protein abundance (11%, 24% and 8%, respectively). These factors taken together are four times more predictive of the proteome abundance than of mRNA abundance. In bacteria, post-translational modifications (PTMs) and ncRNA transcription were both found to increase with decreasing genomic GC-content and genome size. Thus, the evolutionary forces constraining genome size and GC-content modify the relative contributions of the different regulatory layers to proteome homeostasis, and impact more genomic and genetic features than previously appreciated. Indeed, these scaling principles will enable us to develop more informed approaches when engineering minimal synthetic genomes. PMID:26773059
Genome size evolution in Ontario ferns (Polypodiidae): evolutionary correlations with cell size, spore size, and habitat type and an absence of genome downsizing.

PubMed

Henry, Thomas A; Bainard, Jillian D; Newmaster, Steven G

2014-10-01

Genome size is known to correlate with a number of traits in angiosperms, but less is known about the phenotypic correlates of genome size in ferns. We explored genome size variation in relation to a suite of morphological and ecological traits in ferns. Thirty-six fern taxa were collected from wild populations in Ontario, Canada. 2C DNA content was measured using flow cytometry. We tested for genome downsizing following polyploidy using a phylogenetic comparative analysis to explore the correlation between 1Cx DNA content and ploidy. There was no compelling evidence for the occurrence of widespread genome downsizing during the evolution of Ontario ferns. The relationship between genome size and 11 morphological and ecological traits was explored using a phylogenetic principal component regression analysis. Genome size was found to be significantly associated with cell size, spore size, spore type, and habitat type. These results are timely as past and recent studies have found conflicting support for the association between ploidy/genome size and spore size in fern polyploid complexes; this study represents the first comparative analysis of the trend across a broad taxonomic group of ferns.
Genome size of 14 species of fireflies (Insecta, Coleoptera, Lampyridae)

PubMed Central

Liu, Gui-Chun; Dong, Zhi-Wei; He, Jin-Wu; Zhao, Ruo-Ping; Wang, Wen; Li, Xue-Yan

2017-01-01

Eukaryotic genome size data are important both as the basis for comparative research into genome evolution and as estimators of the cost and difficulty of genome sequencing programs for non-model organisms. In this study, the genome size of 14 species of fireflies (Lampyridae) (two genera in Lampyrinae, three genera in Luciolinae, and one genus in subfamily incertae sedis) were estimated by propidium iodide (PI)-based flow cytometry. The haploid genome sizes of Lampyridae ranged from 0. 42 to 1. 31 pg, a 3. 1-fold span. Genome sizes of the fireflies varied within the tested subfamilies and genera. Lamprigera and Pyrocoelia species had large and small genome sizes, respectively. No correlation was found between genome size and morphological traits such as body length, body width, eye width, and antennal length. Our data provide additional information on genome size estimation of the firefly family Lampyridae. Furthermore, this study will help clarify the cost and difficulty of genome sequencing programs for non-model organisms and will help promote studies on firefly genome evolution. PMID:29280364
Exploration of the Drosophila buzzatii transposable element content suggests underestimation of repeats in Drosophila genomes.

PubMed

Rius, Nuria; Guillén, Yolanda; Delprat, Alejandra; Kapusta, Aurélie; Feschotte, Cédric; Ruiz, Alfredo

2016-05-10

Many new Drosophila genomes have been sequenced in recent years using new-generation sequencing platforms and assembly methods. Transposable elements (TEs), being repetitive sequences, are often misassembled, especially in the genomes sequenced with short reads. Consequently, the mobile fraction of many of the new genomes has not been analyzed in detail or compared with that of other genomes sequenced with different methods, which could shed light into the understanding of genome and TE evolution. Here we compare the TE content of three genomes: D. buzzatii st-1, j-19, and D. mojavensis. We have sequenced a new D. buzzatii genome (j-19) that complements the D. buzzatii reference genome (st-1) already published, and compared their TE contents with that of D. mojavensis. We found an underestimation of TE sequences in Drosophila genus NGS-genomes when compared to Sanger-genomes. To be able to compare genomes sequenced with different technologies, we developed a coverage-based method and applied it to the D. buzzatii st-1 and j-19 genome. Between 10.85 and 11.16 % of the D. buzzatii st-1 genome is made up of TEs, between 7 and 7,5 % of D. buzzatii j-19 genome, while TEs represent 15.35 % of the D. mojavensis genome. Helitrons are the most abundant order in the three genomes. TEs in D. buzzatii are less abundant than in D. mojavensis, as expected according to the genome size and TE content positive correlation. However, TEs alone do not explain the genome size difference. TEs accumulate in the dot chromosomes and proximal regions of D. buzzatii and D. mojavensis chromosomes. We also report a significantly higher TE density in D. buzzatii and D. mojavensis X chromosomes, which is not expected under the current models. Our easy-to-use correction method allowed us to identify recently active families in D. buzzatii st-1 belonging to the LTR-retrotransposon superfamily Gypsy.
Genome size analyses of Pucciniales reveal the largest fungal genomes.

PubMed

Tavares, Sílvia; Ramos, Ana Paula; Pires, Ana Sofia; Azinheira, Helena G; Caldeirinha, Patrícia; Link, Tobias; Abranches, Rita; Silva, Maria do Céu; Voegele, Ralf T; Loureiro, João; Talhinhas, Pedro

2014-01-01

Rust fungi (Basidiomycota, Pucciniales) are biotrophic plant pathogens which exhibit diverse complexities in their life cycles and host ranges. The completion of genome sequencing of a few rust fungi has revealed the occurrence of large genomes. Sequencing efforts for other rust fungi have been hampered by uncertainty concerning their genome sizes. Flow cytometry was recently applied to estimate the genome size of a few rust fungi, and confirmed the occurrence of large genomes in this order (averaging 225.3 Mbp, while the average for Basidiomycota was 49.9 Mbp and was 37.7 Mbp for all fungi). In this work, we have used an innovative and simple approach to simultaneously isolate nuclei from the rust and its host plant in order to estimate the genome size of 30 rust species by flow cytometry. Genome sizes varied over 10-fold, from 70 to 893 Mbp, with an average genome size value of 380.2 Mbp. Compared to the genome sizes of over 1800 fungi, Gymnosporangium confusum possesses the largest fungal genome ever reported (893.2 Mbp). Moreover, even the smallest rust genome determined in this study is larger than the vast majority of fungal genomes (94%). The average genome size of the Pucciniales is now of 305.5 Mbp, while the average Basidiomycota genome size has shifted to 70.4 Mbp and the average for all fungi reached 44.2 Mbp. Despite the fact that no correlation could be drawn between the genome sizes, the phylogenomics or the life cycle of rust fungi, it is interesting to note that rusts with Fabaceae hosts present genomes clearly larger than those with Poaceae hosts. Although this study comprises only a small fraction of the more than 7000 rust species described, it seems already evident that the Pucciniales represent a group where genome size expansion could be a common characteristic. This is in sharp contrast to sister taxa, placing this order in a relevant position in fungal genomics research.
Genome size analyses of Pucciniales reveal the largest fungal genomes

PubMed Central

Tavares, Sílvia; Ramos, Ana Paula; Pires, Ana Sofia; Azinheira, Helena G.; Caldeirinha, Patrícia; Link, Tobias; Abranches, Rita; Silva, Maria do Céu; Voegele, Ralf T.; Loureiro, João; Talhinhas, Pedro

2014-01-01

Rust fungi (Basidiomycota, Pucciniales) are biotrophic plant pathogens which exhibit diverse complexities in their life cycles and host ranges. The completion of genome sequencing of a few rust fungi has revealed the occurrence of large genomes. Sequencing efforts for other rust fungi have been hampered by uncertainty concerning their genome sizes. Flow cytometry was recently applied to estimate the genome size of a few rust fungi, and confirmed the occurrence of large genomes in this order (averaging 225.3 Mbp, while the average for Basidiomycota was 49.9 Mbp and was 37.7 Mbp for all fungi). In this work, we have used an innovative and simple approach to simultaneously isolate nuclei from the rust and its host plant in order to estimate the genome size of 30 rust species by flow cytometry. Genome sizes varied over 10-fold, from 70 to 893 Mbp, with an average genome size value of 380.2 Mbp. Compared to the genome sizes of over 1800 fungi, Gymnosporangium confusum possesses the largest fungal genome ever reported (893.2 Mbp). Moreover, even the smallest rust genome determined in this study is larger than the vast majority of fungal genomes (94%). The average genome size of the Pucciniales is now of 305.5 Mbp, while the average Basidiomycota genome size has shifted to 70.4 Mbp and the average for all fungi reached 44.2 Mbp. Despite the fact that no correlation could be drawn between the genome sizes, the phylogenomics or the life cycle of rust fungi, it is interesting to note that rusts with Fabaceae hosts present genomes clearly larger than those with Poaceae hosts. Although this study comprises only a small fraction of the more than 7000 rust species described, it seems already evident that the Pucciniales represent a group where genome size expansion could be a common characteristic. This is in sharp contrast to sister taxa, placing this order in a relevant position in fungal genomics research. PMID:25206357
Genome size expansion and the relationship between nuclear DNA content and spore size in the Asplenium monanthes fern complex (Aspleniaceae)

PubMed Central

2013-01-01

Background Homosporous ferns are distinctive amongst the land plant lineages for their high chromosome numbers and enigmatic genomes. Genome size measurements are an under exploited tool in homosporous ferns and show great potential to provide an overview of the mechanisms that define genome evolution in these ferns. The aim of this study is to investigate the evolution of genome size and the relationship between genome size and spore size within the apomictic Asplenium monanthes fern complex and related lineages. Results Comparative analyses to test for a relationship between spore size and genome size show that they are not correlated. The data do however provide evidence for marked genome size variation between species in this group. These results indicate that Asplenium monanthes has undergone a two-fold expansion in genome size. Conclusions Our findings challenge the widely held assumption that spore size can be used to infer ploidy levels within apomictic fern complexes. We argue that the observed genome size variation is likely to have arisen via increases in both chromosome number due to polyploidy and chromosome size due to amplification of repetitive DNA (e.g. transposable elements, especially retrotransposons). However, to date the latter has not been considered to be an important process of genome evolution within homosporous ferns. We infer that genome evolution, at least in some homosporous fern lineages, is a more dynamic process than existing studies would suggest. PMID:24354467
No evidence that sex and transposable elements drive genome size variation in evening primroses.

PubMed

Ågren, J Arvid; Greiner, Stephan; Johnson, Marc T J; Wright, Stephen I

2015-04-01

Genome size varies dramatically across species, but despite an abundance of attention there is little agreement on the relative contributions of selective and neutral processes in governing this variation. The rate of sex can potentially play an important role in genome size evolution because of its effect on the efficacy of selection and transmission of transposable elements (TEs). Here, we used a phylogenetic comparative approach and whole genome sequencing to investigate the contribution of sex and TE content to genome size variation in the evening primrose (Oenothera) genus. We determined genome size using flow cytometry for 30 species that vary in genetic system and find that variation in sexual/asexual reproduction cannot explain the almost twofold variation in genome size. Moreover, using whole genome sequences of three species of varying genome sizes and reproductive system, we found that genome size was not associated with TE abundance; instead the larger genomes had a higher abundance of simple sequence repeats. Although it has long been clear that sexual reproduction may affect various aspects of genome evolution in general and TE evolution in particular, it does not appear to have played a major role in genome size evolution in the evening primroses. © 2015 The Author(s).
Anthocyanin inhibits propidium iodide DNA fluorescence in Euphorbia pulcherrima: implications for genome size variation and flow cytometry.

PubMed

Bennett, Michael D; Price, H James; Johnston, J Spencer

2008-04-01

Measuring genome size by flow cytometry assumes direct proportionality between nuclear DNA staining and DNA amount. By 1997 it was recognized that secondary metabolites may affect DNA staining, thereby causing inaccuracy. Here experiments are reported with poinsettia (Euphorbia pulcherrima) with green leaves and red bracts rich in phenolics. DNA content was estimated as fluorescence of propidium iodide (PI)-stained nuclei of poinsettia and/or pea (Pisum sativum) using flow cytometry. Tissue was chopped, or two tissues co-chopped, in Galbraith buffer alone or with six concentrations of cyanidin-3-rutinoside (a cyanidin-3-rhamnoglucoside contributing to red coloration in poinsettia). There were large differences in PI staining (35-70 %) between 2C nuclei from green leaf and red bract tissue in poinsettia. These largely disappeared when pea leaflets were co-chopped with poinsettia tissue as an internal standard. However, smaller (2.8-6.9 %) differences remained, and red bracts gave significantly lower 1C genome size estimates (1.69-1.76 pg) than green leaves (1.81 pg). Chopping pea or poinsettia tissue in buffer with 0-200 microm cyanidin-3-rutinoside showed that the effects of natural inhibitors in red bracts of poinsettia on PI staining were largely reproduced in a dose-dependent way by this anthocyanin. Given their near-ubiquitous distribution, many suspected roles and known affects on DNA staining, anthocyanins are a potent, potential cause of significant error variation in genome size estimations for many plant tissues and taxa. This has important implications of wide practical and theoretical significance. When choosing genome size calibration standards it seems prudent to select materials producing little or no anthocyanin. Reviewing the literature identifies clear examples in which claims of intraspecific variation in genome size are probably artefacts caused by natural variation in anthocyanin levels or correlated with environmental factors known to induce variation in pigmentation.
The complete mitochondrial genomes of two band-winged grasshoppers, Gastrimargus marmoratus and Oedaleus asiaticus

PubMed Central

Ma, Chuan; Liu, Chunxiang; Yang, Pengcheng; Kang, Le

2009-01-01

Background The two closely related species of band-winged grasshoppers, Gastrimargus marmoratus and Oedaleus asiaticus, display significant differences in distribution, biological characteristics and habitat preferences. They are so similar to their respective congeneric species that it is difficult to differentiate them from other species within each genus. Hoppers of the two species have quite similar morphologies to that of Locusta migratoria, hence causing confusion in species identification. Thus we determined and compared the mitochondrial genomes of G. marmoratus and O. asiaticus to address these questions. Results The complete mitochondrial genomes of G. marmoratus and O. asiaticus are 15,924 bp and 16,259 bp in size, respectively, with O. asiaticus being the largest among all known mitochondrial genomes in Orthoptera. Both mitochondrial genomes contain a standard set of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and an A+T-rich region in the same order as those of the other analysed caeliferan species, but different from those of the ensiferan species by the rearrangement of trnD and trnK. The putative initiation codon for the cox1 gene in the two species is ATC. The presence of different sized tandem repeats in the A+T-rich region leads to size variation between their mitochondrial genomes. Except for nad2, nad4L, and nad6, most of the caeliferan mtDNA genes exhibit low levels of divergence. In phylogenetic analyses, the species from the suborder Caelifera form a monophyletic group, as is the case for the Ensifera. Furthermore, the two suborders cluster as sister groups, supporting the monophyly of Orthoptera. Conclusion The mitochondrial genomes of both G. marmoratus and O. asiaticus harbor the typical 37 genes and an A+T-rich region, exhibiting similar characters to those of other grasshopper species. Characterization of the two mitochondrial genomes has enriched our knowledge on mitochondrial genomes of Orthoptera. PMID:19361334

Analysis of copy number variants by three detection algorithms and their association with body size in horses.

PubMed

Metzger, Julia; Philipp, Ute; Lopes, Maria Susana; da Camara Machado, Artur; Felicetti, Michela; Silvestrelli, Maurizio; Distl, Ottmar

2013-07-18

Copy number variants (CNVs) have been shown to play an important role in genetic diversity of mammals and in the development of many complex phenotypic traits. The aim of this study was to perform a standard comparative evaluation of CNVs in horses using three different CNV detection programs and to identify genomic regions associated with body size in horses. Analysis was performed using the Illumina Equine SNP50 genotyping beadchip for 854 horses. CNVs were detected by three different algorithms, CNVPartition, PennCNV and QuantiSNP. Comparative analysis revealed 50 CNVs that affected 153 different genes mainly involved in sensory perception, signal transduction and cellular components. Genome-wide association analysis for body size showed highly significant deleted regions on ECA1, ECA8 and ECA9. Homologous regions to the detected CNVs on ECA1 and ECA9 have also been shown to be correlated with human height. Comparative analysis of CNV detection algorithms was useful to increase the specificity of CNV detection but had certain limitations dependent on the detection tool. GWAS revealed genome-wide associated CNVs for body size in horses.
Estimation of (co)variances for genomic regions of flexible sizes: application to complex infectious udder diseases in dairy cattle

PubMed Central

2012-01-01

Background Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related traits such as mammary disease traits in dairy cattle. Methods Data on progeny means of six traits related to mastitis resistance in dairy cattle (general mastitis resistance and five pathogen-specific mastitis resistance traits) were analyzed using a bivariate Bayesian SNP-based genomic model with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level, per chromosome, and in regions of 100 SNP on a chromosome. Results Genomic proportions of the total variance differed between traits. Genomic correlations were lower than pedigree-based genetic correlations and they were highest between general mastitis and pathogen-specific traits because of the part-whole relationship between these traits. The chromosome-wise genomic proportions of the total variance differed between traits, with some chromosomes explaining higher or lower values than expected in relation to chromosome size. Few chromosomes showed pleiotropic effects and only chromosome 19 had a clear effect on all traits, indicating the presence of QTL with a general effect on mastitis resistance. The region-wise patterns of genomic variances differed between traits. Peaks indicating QTL were identified but were not very distinctive because a common prior for the marker effects was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL. Conclusions The results show that it is possible to estimate, genome-wide and region-wise genomic (co)variances of mastitis resistance traits in dairy cattle using multivariate genomic models. PMID:22640006
Swine and Poultry Pathogens: the Complete Genome Sequences of Two Strains of Mycoplasma hyopneumoniae and a Strain of Mycoplasma synoviae†

PubMed Central

Vasconcelos, Ana Tereza R.; Ferreira, Henrique B.; Bizarro, Cristiano V.; Bonatto, Sandro L.; Carvalho, Marcos O.; Pinto, Paulo M.; Almeida, Darcy F.; Almeida, Luiz G. P.; Almeida, Rosana; Alves-Filho, Leonardo; Assunção, Enedina N.; Azevedo, Vasco A. C.; Bogo, Maurício R.; Brigido, Marcelo M.; Brocchi, Marcelo; Burity, Helio A.; Camargo, Anamaria A.; Camargo, Sandro S.; Carepo, Marta S.; Carraro, Dirce M.; de Mattos Cascardo, Júlio C.; Castro, Luiza A.; Cavalcanti, Gisele; Chemale, Gustavo; Collevatti, Rosane G.; Cunha, Cristina W.; Dallagiovanna, Bruno; Dambrós, Bibiana P.; Dellagostin, Odir A.; Falcão, Clarissa; Fantinatti-Garboggini, Fabiana; Felipe, Maria S. S.; Fiorentin, Laurimar; Franco, Gloria R.; Freitas, Nara S. A.; Frías, Diego; Grangeiro, Thalles B.; Grisard, Edmundo C.; Guimarães, Claudia T.; Hungria, Mariangela; Jardim, Sílvia N.; Krieger, Marco A.; Laurino, Jomar P.; Lima, Lucymara F. A.; Lopes, Maryellen I.; Loreto, Élgion L. S.; Madeira, Humberto M. F.; Manfio, Gilson P.; Maranhão, Andrea Q.; Martinkovics, Christyanne T.; Medeiros, Sílvia R. B.; Moreira, Miguel A. M.; Neiva, Márcia; Ramalho-Neto, Cicero E.; Nicolás, Marisa F.; Oliveira, Sergio C.; Paixão, Roger F. C.; Pedrosa, Fábio O.; Pena, Sérgio D. J.; Pereira, Maristela; Pereira-Ferrari, Lilian; Piffer, Itamar; Pinto, Luciano S.; Potrich, Deise P.; Salim, Anna C. M.; Santos, Fabrício R.; Schmitt, Renata; Schneider, Maria P. C.; Schrank, Augusto; Schrank, Irene S.; Schuck, Adriana F.; Seuanez, Hector N.; Silva, Denise W.; Silva, Rosane; Silva, Sérgio C.; Soares, Célia M. A.; Souza, Kelly R. L.; Souza, Rangel C.; Staats, Charley C.; Steffens, Maria B. R.; Teixeira, Santuza M. R.; Urmenyi, Turan P.; Vainstein, Marilene H.; Zuccherato, Luciana W.; Simpson, Andrew J. G.; Zaha, Arnaldo

2005-01-01

This work reports the results of analyses of three complete mycoplasma genomes, a pathogenic (7448) and a nonpathogenic (J) strain of the swine pathogen Mycoplasma hyopneumoniae and a strain of the avian pathogen Mycoplasma synoviae; the genome sizes of the three strains were 920,079 bp, 897,405 bp, and 799,476 bp, respectively. These genomes were compared with other sequenced mycoplasma genomes reported in the literature to examine several aspects of mycoplasma evolution. Strain-specific regions, including integrative and conjugal elements, and genome rearrangements and alterations in adhesin sequences were observed in the M. hyopneumoniae strains, and all of these were potentially related to pathogenicity. Genomic comparisons revealed that reduction in genome size implied loss of redundant metabolic pathways, with maintenance of alternative routes in different species. Horizontal gene transfer was consistently observed between M. synoviae and Mycoplasma gallisepticum. Our analyses indicated a likely transfer event of hemagglutinin-coding DNA sequences from M. gallisepticum to M. synoviae. PMID:16077101
Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster

PubMed Central

Jha, Aashish R.; Miles, Cecelia M.; Lippert, Nodia R.; Brown, Christopher D.; White, Kevin P.; Kreitman, Martin

2015-01-01

Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. PMID:26044351
Genome size diversity in angiosperms and its influence on gene space.

PubMed

Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

2015-12-01

Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Mitochondrial genome diversity in dagger and needle nematodes (Nematoda: Longidoridae).

PubMed

Palomares-Rius, J E; Cantalapiedra-Navarrete, C; Archidona-Yuste, A; Blok, V C; Castillo, P

2017-02-02

Dagger and needle nematodes included in the family Longidoridae (viz. Longidorus, Paralongidorus, and Xiphinema) are highly polyphagous plant-parasitic nematodes in wild and cultivated plants and some of them are plant-virus vectors (nepovirus). The mitochondrial (mt) genomes of the dagger and needle nematodes, Xiphinema rivesi, Xiphinema pachtaicum, Longidorus vineacola and Paralongidorus litoralis were sequenced in this study. The four circular mt genomes have an estimated size of 12.6, 12.5, 13.5 and 12.7 kb, respectively. Up to date, the mt genome of X. pachtaicum is the smallest genome found in Nematoda. The four mt genomes contain 12 protein-coding genes (viz. cox1-3, nad1-6, nad4L, atp6 and cob) and two ribosomal RNA genes (rrnL and rrnS), but the atp8 gene was not detected. These mt genomes showed a gene arrangement very different within the Longidoridae species sequenced, with the exception of very closely related species (X. americanum and X. rivesi). The sizes of non-coding regions in the Longidoridae nematodes were very small and were present in a few places in the mt genome. Phylogenetic analysis of all coding genes showed a closer relationship between Longidorus and Paralongidorus and different phylogenetic possibilities for the three Xiphinema species.
Mitochondrial genome diversity in dagger and needle nematodes (Nematoda: Longidoridae)

PubMed Central

Palomares-Rius, J. E.; Cantalapiedra-Navarrete, C.; Archidona-Yuste, A.; Blok, V. C.; Castillo, P.

2017-01-01

Dagger and needle nematodes included in the family Longidoridae (viz. Longidorus, Paralongidorus, and Xiphinema) are highly polyphagous plant-parasitic nematodes in wild and cultivated plants and some of them are plant-virus vectors (nepovirus). The mitochondrial (mt) genomes of the dagger and needle nematodes, Xiphinema rivesi, Xiphinema pachtaicum, Longidorus vineacola and Paralongidorus litoralis were sequenced in this study. The four circular mt genomes have an estimated size of 12.6, 12.5, 13.5 and 12.7 kb, respectively. Up to date, the mt genome of X. pachtaicum is the smallest genome found in Nematoda. The four mt genomes contain 12 protein-coding genes (viz. cox1-3, nad1-6, nad4L, atp6 and cob) and two ribosomal RNA genes (rrnL and rrnS), but the atp8 gene was not detected. These mt genomes showed a gene arrangement very different within the Longidoridae species sequenced, with the exception of very closely related species (X. americanum and X. rivesi). The sizes of non-coding regions in the Longidoridae nematodes were very small and were present in a few places in the mt genome. Phylogenetic analysis of all coding genes showed a closer relationship between Longidorus and Paralongidorus and different phylogenetic possibilities for the three Xiphinema species. PMID:28150734
The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry)

PubMed Central

Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro

2018-01-01

Abstract Background The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. Findings In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Conclusions Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family. PMID:29659812
The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry).

PubMed

Buti, Matteo; Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro; Sargent, Daniel James

2018-04-01

The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family.
Statistical considerations in evaluating pharmacogenomics-based clinical effect for confirmatory trials.

PubMed

Wang, Sue-Jane; O'Neill, Robert T; Hung, Hm James

2010-10-01

The current practice for seeking genomically favorable patients in randomized controlled clinical trials using genomic convenience samples. To discuss the extent of imbalance, confounding, bias, design efficiency loss, type I error, and type II error that can occur in the evaluation of the convenience samples, particularly when they are small samples. To articulate statistical considerations for a reasonable sample size to minimize the chance of imbalance, and, to highlight the importance of replicating the subgroup finding in independent studies. Four case examples reflecting recent regulatory experiences are used to underscore the problems with convenience samples. Probability of imbalance for a pre-specified subgroup is provided to elucidate sample size needed to minimize the chance of imbalance. We use an example drug development to highlight the level of scientific rigor needed, with evidence replicated for a pre-specified subgroup claim. The convenience samples evaluated ranged from 18% to 38% of the intent-to-treat samples with sample size ranging from 100 to 5000 patients per arm. The baseline imbalance can occur with probability higher than 25%. Mild to moderate multiple confounders yielding the same directional bias in favor of the treated group can make treatment group incomparable at baseline and result in a false positive conclusion that there is a treatment difference. Conversely, if the same directional bias favors the placebo group or there is loss in design efficiency, the type II error can increase substantially. Pre-specification of a genomic subgroup hypothesis is useful only for some degree of type I error control. Complete ascertainment of genomic samples in a randomized controlled trial should be the first step to explore if a favorable genomic patient subgroup suggests a treatment effect when there is no clear prior knowledge and understanding about how the mechanism of a drug target affects the clinical outcome of interest. When stratified randomization based on genomic biomarker status cannot be implemented in designing a pharmacogenomics confirmatory clinical trial, if there is one genomic biomarker prognostic for clinical response, as a general rule of thumb, a sample size of at least 100 patients may be needed to be considered for the lower prevalence genomic subgroup to minimize the chance of an imbalance of 20% or more difference in the prevalence of the genomic marker. The sample size may need to be at least 150, 350, and 1350, respectively, if an imbalance of 15%, 10% and 5% difference is of concern.
Accurate computation of survival statistics in genome-wide studies.

PubMed

Vandin, Fabio; Papoutsaki, Alexandra; Raphael, Benjamin J; Upfal, Eli

2015-05-01

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations.
Accurate Computation of Survival Statistics in Genome-Wide Studies

PubMed Central

Vandin, Fabio; Papoutsaki, Alexandra; Raphael, Benjamin J.; Upfal, Eli

2015-01-01

A key challenge in genomics is to identify genetic variants that distinguish patients with different survival time following diagnosis or treatment. While the log-rank test is widely used for this purpose, nearly all implementations of the log-rank test rely on an asymptotic approximation that is not appropriate in many genomics applications. This is because: the two populations determined by a genetic variant may have very different sizes; and the evaluation of many possible variants demands highly accurate computation of very small p-values. We demonstrate this problem for cancer genomics data where the standard log-rank test leads to many false positive associations between somatic mutations and survival time. We develop and analyze a novel algorithm, Exact Log-rank Test (ExaLT), that accurately computes the p-value of the log-rank statistic under an exact distribution that is appropriate for any size populations. We demonstrate the advantages of ExaLT on data from published cancer genomics studies, finding significant differences from the reported p-values. We analyze somatic mutations in six cancer types from The Cancer Genome Atlas (TCGA), finding mutations with known association to survival as well as several novel associations. In contrast, standard implementations of the log-rank test report dozens-hundreds of likely false positive associations as more significant than these known associations. PMID:25950620
Low levels of LTR retrotransposon deletion by ectopic recombination in the gigantic genomes of salamanders.

PubMed

Frahry, Matthew Blake; Sun, Cheng; Chong, Rebecca A; Mueller, Rachel Lockridge

2015-02-01

Across the tree of life, species vary dramatically in nuclear genome size. Mutations that add or remove sequences from genomes-insertions or deletions, or indels-are the ultimate source of this variation. Differences in the tempo and mode of insertion and deletion across taxa have been proposed to contribute to evolutionary diversity in genome size. Among vertebrates, most of the largest genomes are found within the salamanders, an amphibian clade with genome sizes ranging from ~14 to ~120 Gb. Salamander genomes have been shown to experience slower rates of DNA loss through small (i.e., <30 bp) deletions than do other vertebrate genomes. However, no studies have addressed DNA loss from salamander genomes resulting from larger deletions. Here, we focus on one type of large deletion-ectopic-recombination-mediated removal of LTR retrotransposon sequences. In ectopic recombination, double-strand breaks are repaired using a "wrong" (i.e., ectopic, or non-allelic) template sequence-typically another locus of similar sequence. When breaks occur within the LTR portions of LTR retrotransposons, ectopic-recombination-mediated repair can produce deletions that remove the internal transposon sequence and the equivalent of one of the two LTR sequences. These deletions leave a signature in the genome-a solo LTR sequence. We compared levels of solo LTRs in the genomes of four salamander species with levels present in five vertebrates with smaller genomes. Our results demonstrate that salamanders have low levels of solo LTRs, suggesting that ectopic-recombination-mediated deletion of LTR retrotransposons occurs more slowly than in other vertebrates with smaller genomes.
Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.

PubMed

Xu, Jianping

2006-06-01

Microbial ecology examines the diversity and activity of micro-organisms in Earth's biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA-DNA re-association kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.
The Genome Sizes of Ostracod Crustaceans Correlate with Body Size and Evolutionary History, but not Environment.

PubMed

Jeffery, Nicholas W; Ellis, Emily A; Oakley, Todd H; Gregory, T Ryan

2017-09-01

Within animals, a positive correlation between genome size and body size has been detected in several taxa but not in others, such that it remains unknown how pervasive this pattern may be. Here, we provide another example of a positive relationship in a group of crustaceans whose genome sizes have not previously been investigated. We analyze genome size estimates for 46 species across the 2 most diverse orders of Class Ostracoda, commonly known as seed shrimps, including 29 new estimates made using Feulgen image analysis densitometry and flow cytometry. Genome sizes in this group range ~80-fold, a level of variability that is otherwise not seen in crustaceans with the exception of some malacostracan orders. We find a strong positive correlation between genome size and body size across all species, including after phylogenetic correction. We additionally detect evidence of XX/XO sex determination in 3 species of marine ostracods where male and female genome sizes were estimated. On average, genome sizes are larger but less variable in Order Myodocopida than in Order Podocopida, and marine ostracods have larger genomes than freshwater species, but this appears to be explained by phylogenetic inertia. The relationship between phylogeny, genome size, body size, and habitat is complex in this system and provides a baseline for future studies examining the interactions of these biological traits. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Reconstructing relative genome size of vascular plants through geological time.

PubMed

Lomax, Barry H; Hilton, Jason; Bateman, Richard M; Upchurch, Garland R; Lake, Janice A; Leitch, Ilia J; Cromwell, Avery; Knight, Charles A

2014-01-01

The strong positive relationship evident between cell and genome size in both animals and plants forms the basis of using the size of stomatal guard cells as a proxy to track changes in plant genome size through geological time. We report for the first time a taxonomic fine-scale investigation into changes in stomatal guard-cell length and use these data to infer changes in genome size through the evolutionary history of land plants. Our data suggest that many of the earliest land plants had exceptionally large genome sizes and that a predicted overall trend of increasing genome size within individual lineages through geological time is not supported. However, maximum genome size steadily increases from the Mississippian (c. 360 million yr ago (Ma)) to the present. We hypothesise that the functional relationship between stomatal size, genome size and atmospheric CO2 may contribute to the dichotomy reported between preferential extinction of neopolyploids and the prevalence of palaeopolyploidy observed in DNA sequence data of extant vascular plants. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Metabolic 'engines' of flight drive genome size reduction in birds.

PubMed

Wright, Natalie A; Gregory, T Ryan; Witt, Christopher C

2014-03-22

The tendency for flying organisms to possess small genomes has been interpreted as evidence of natural selection acting on the physical size of the genome. Nonetheless, the flight-genome link and its mechanistic basis have yet to be well established by comparative studies within a volant clade. Is there a particular functional aspect of flight such as brisk metabolism, lift production or maneuverability that impinges on the physical genome? We measured genome sizes, wing dimensions and heart, flight muscle and body masses from a phylogenetically diverse set of bird species. In phylogenetically controlled analyses, we found that genome size was negatively correlated with relative flight muscle size and heart index (i.e. ratio of heart to body mass), but positively correlated with body mass and wing loading. The proportional masses of the flight muscles and heart were the most important parameters explaining variation in genome size in multivariate models. Hence, the metabolic intensity of powered flight appears to have driven genome size reduction in birds.
Microgeographic genome size differentiation of the carob tree, Ceratonia siliqua, at 'Evolution Canyon', Israel.

PubMed

Bures, Petr; Pavlícek, Tomás; Horová, Lucie; Nevo, Eviatar

2004-05-01

We tested whether the local differences in genome size recorded earlier in the wild barley, Hordeum spontaneum, at 'Evolution Canyon', Mount Carmel, Israel, can also be found in other organisms. As a model species for our test we chose the evergreen carob tree, Ceratonia siliqua. Genome size was measured by means of DAPI flow cytometry. In adults, significantly more DNA was recorded in trees growing on the more illuminated, warmer, drier, microclimatically more fluctuating 'African' south-facing slope than in trees on the opposite, less illuminated, cooler and more humid, 'European' north-facing slope in spite of an interslope distance of only 100 m at the canyon bottom and 400 m at the top. The amount of DNA was significantly negatively correlated with leaf length and tree circumference. In seedlings, interslope differences in the amount of genome DNA were not found. In addition, the first cases of triploidy and tetraploidy were found in C. siliqua. The data on C. siliqua at 'Evolution Canyon' showed that local variability in the C-value exists in this species and that ecological stress might be a strong evolutionary driving force in shaping the amount of DNA.
Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

PubMed

Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

2014-12-01

Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
A first exploration of genome size diversity in sponges.

PubMed

Jeffery, Nicholas W; Jardine, Catherine B; Gregory, T Ryan

2013-08-01

The phyla known as early-branching lineages of animals have become the subject of increasing interest from the perspectives of genomics and evolutionary biology. Unfortunately, data on even the most fundamental properties of their genomes, such as genome size, remain very scarce. In this study, genome size estimates are reported for 75 species of sponges (phylum Porifera) representing 33 families and 12 orders, marking the first large survey of genome size diversity for an early-branching phylum. Sponge genome sizes averaged around 0.2 pg but exhibited a 17-fold range overall (0.04-0.63 pg). In addition, the results of comparisons of two methods of genome size quantification (flow cytometry and Feulgen image analysis densitometry) are presented, thereby facilitating future work on these animals. Some particularly promising avenues for future investigation are highlighted.

Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster.

PubMed

Jha, Aashish R; Miles, Cecelia M; Lippert, Nodia R; Brown, Christopher D; White, Kevin P; Kreitman, Martin

2015-10-01

Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome size evolution at the speciation level: the cryptic species complex Brachionus plicatilis (Rotifera).

PubMed

Stelzer, Claus-Peter; Riss, Simone; Stadler, Peter

2011-04-07

Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels--within and among genealogical species--and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex.
Genome size evolution at the speciation level: The cryptic species complex Brachionus plicatilis (Rotifera)

PubMed Central

2011-01-01

Background Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. Results We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels - within and among genealogical species - and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Conclusions Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex. PMID:21473744
Guided Exploration of Genomic Risk for Gray Matter Abnormalities in Schizophrenia Using Parallel Independent Component Analysis with Reference

PubMed Central

Chen, Jiayu; Calhoun, Vince D.; Pearlson, Godfrey D.; Perrone-Bizzozero, Nora; Sui, Jing; Turner, Jessica A.; Bustillo, Juan R; Ehrlich, Stefan; Sponheim, Scott R.; Cañive, José M.; Ho, Beng-Choon; Liu, Jingyu

2013-01-01

One application of imaging genomics is to explore genetic variants associated with brain structure and function, presenting a new means of mapping genetic influences on mental disorders. While there is growing interest in performing genome-wide searches for determinants, it remains challenging to identify genetic factors of small effect size, especially in limited sample sizes. In an attempt to address this issue, we propose to take advantage of a priori knowledge, specifically to extend parallel independent component analysis (pICA) to incorporate a reference (pICA-R), aiming to better reveal relationships between hidden factors of a particular attribute. The new approach was first evaluated on simulated data for its performance under different configurations of effect size and dimensionality. Then pICA-R was applied to a 300-participant (140 schizophrenia (SZ) patients versus 160 healthy controls) dataset consisting of structural magnetic resonance imaging (sMRI) and single nucleotide polymorphism (SNP) data. Guided by a reference SNP set derived from ANK3, a gene implicated by the Psychiatric Genomic Consortium SZ study, pICA-R identified one pair of SNP and sMRI components with a significant loading correlation of 0.27 (p = 1.64×10−6). The sMRI component showed a significant group difference in loading parameters between patients and controls (p = 1.33×10−15), indicating SZ-related reduction in gray matter concentration in prefrontal and temporal regions. The linked SNP component also showed a group difference (p = 0.04) and was predominantly contributed to by 1,030 SNPs. The effect of these top contributing SNPs was verified using association test results of the Psychiatric Genomic Consortium SZ study, where the 1,030 SNPs exhibited significant SZ enrichment compared to the whole genome. In addition, pathway analyses indicated the genetic component majorly relating to neurotransmitter and nervous system signaling pathways. Given the simulation and experiment results, pICA-R may prove a promising multivariate approach for use in imaging genomics to discover reliable genetic risk factors under a scenario of relatively high dimensionality and small effect size. PMID:23727316
Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

PubMed Central

Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

2016-01-01

Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927
Intron size and genome size in plants.

Treesearch

J. Wendel; R. Cronn; I. Alvarez; B. Liu; R. Small; D. Senchina

2002-01-01

It has long been known that genomes vary over a remarkable range of sizes in both plants (Bennett, Cox, and Leitch 1997) and animals (Gregory 2001). It also has become evident that across the broad phylogenetic sweep, genome size may be correlated with intron size (Deutsch and Long 1999; Vinogradov 1999; McLysaght et al. 2000), suggesting that some component of genome...
Distribution and diversity of cytotypes in Dianthus broteri as evidenced by genome size variations.

PubMed

Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador

2009-10-01

Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1.17-fold) and tetraploid (1.22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones.
Differential genomic effects on signaling pathways by two different CeO2 nanoparticles in HepG2 cells

EPA Science Inventory

To investigate genomic effects, human liver hepatocellular carcinoma (HepG2) cells were exposed for three days to two different forms of nanoparticles both composed of Ce02 (0.3, 3 and 30 µg/mL). The two Ce02 nanopartices had dry primary particle sizes of 8 nanometers {(M) made b...
Genome size of termites (Insecta, Dictyoptera, Isoptera) and wood roaches (Insecta, Dictyoptera, Cryptocercidae)

NASA Astrophysics Data System (ADS)

Koshikawa, Shigeyuki; Miyazaki, Satoshi; Cornette, Richard; Matsumoto, Tadao; Miura, Toru

2008-09-01

The evolution of genome size has been discussed in relation to the evolution of various biological traits. In the present study, the genome sizes of 22 dictyopteran species were estimated by Feulgen image analysis densitometry and 6-diamidino-2-phenylindole (DAPI)-based flow cytometry. The haploid genome sizes ( C-values) of termites (Isoptera) ranged from 0.58 to 1.90 pg, and those of Cryptocercus wood roaches (Cryptocercidae) were 1.16 to 1.32 pg. Compared to known values of other cockroaches (Blattaria) and mantids (Mantodea), these values are low. A relatively small genome size appears to be a (syn)apomorphy of Isoptera + Cryptocercus, together with their sociality. In some phylogenetic groups, genome size evolution is thought to be influenced by selective pressure on a particular trait, such as cell size or rate of development. The present results raise the possibility that genome size is influenced by selective pressures on traits associated with the evolution of sociality.
Genome size of termites (Insecta, Dictyoptera, Isoptera) and wood roaches (Insecta, Dictyoptera, Cryptocercidae).

PubMed

Koshikawa, Shigeyuki; Miyazaki, Satoshi; Cornette, Richard; Matsumoto, Tadao; Miura, Toru

2008-09-01

The evolution of genome size has been discussed in relation to the evolution of various biological traits. In the present study, the genome sizes of 22 dictyopteran species were estimated by Feulgen image analysis densitometry and 6-diamidino-2-phenylindole (DAPI)-based flow cytometry. The haploid genome sizes (C-values) of termites (Isoptera) ranged from 0.58 to 1.90 pg, and those of Cryptocercus wood roaches (Cryptocercidae) were 1.16 to 1.32 pg. Compared to known values of other cockroaches (Blattaria) and mantids (Mantodea), these values are low. A relatively small genome size appears to be a (syn)apomorphy of Isoptera + Cryptocercus, together with their sociality. In some phylogenetic groups, genome size evolution is thought to be influenced by selective pressure on a particular trait, such as cell size or rate of development. The present results raise the possibility that genome size is influenced by selective pressures on traits associated with the evolution of sociality.
A universe of dwarfs and giants: genome size and chromosome evolution in the monocot family Melanthiaceae.

PubMed

Pellicer, Jaume; Kelly, Laura J; Leitch, Ilia J; Zomlefer, Wendy B; Fay, Michael F

2014-03-01

• Since the occurrence of giant genomes in angiosperms is restricted to just a few lineages, identifying where shifts towards genome obesity have occurred is essential for understanding the evolutionary mechanisms triggering this process. • Genome sizes were assessed using flow cytometry in 79 species and new chromosome numbers were obtained. Phylogenetically based statistical methods were applied to infer ancestral character reconstructions of chromosome numbers and nuclear DNA contents. • Melanthiaceae are the most diverse family in terms of genome size, with C-values ranging more than 230-fold. Our data confirmed that giant genomes are restricted to tribe Parideae, with most extant species in the family characterized by small genomes. Ancestral genome size reconstruction revealed that the most recent common ancestor (MRCA) for the family had a relatively small genome (1C = 5.37 pg). Chromosome losses and polyploidy are recovered as the main evolutionary mechanisms generating chromosome number change. • Genome evolution in Melanthiaceae has been characterized by a trend towards genome size reduction, with just one episode of dramatic DNA accumulation in Parideae. Such extreme contrasting profiles of genome size evolution illustrate the key role of transposable elements and chromosome rearrangements in driving the evolution of plant genomes. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Minimal-assumption inference from population-genomic data

NASA Astrophysics Data System (ADS)

Weissman, Daniel; Hallatschek, Oskar

Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
The arbuscular mycorrhizal fungus Glomus intraradices is haploid and has a small genome size in the lower limit of eukaryotes.

PubMed

Hijri, Mohamed; Sanders, Ian R

2004-02-01

The genome size, complexity, and ploidy of the arbuscular mycorrhizal fungus (AMF) Glomus intraradices was determined using flow cytometry, reassociation kinetics, and genomic reconstruction. Nuclei of G. intraradices from in vitro culture, were analyzed by flow cytometry. The estimated average length of DNA per nucleus was 14.07+/-3.52 Mb. Reassociation kinetics on G. intraradices DNA indicated a haploid genome size of approximately 16.54 Mb, comprising 88.36% single copy DNA, 1.59% repetitive DNA, and 10.05% fold-back DNA. To determine ploidy, the DNA content per nucleus measured by flow cytometry was compared with the genome estimate of reassociation kinetics. G. intraradices was found to have a DNA index (DNA per nucleus per haploid genome size) of approximately 0.9, indicating that it is haploid. Genomic DNA of G. intraradices was also analyzed by genomic reconstruction using four genes (Malate synthase, RecA, Rad32, and Hsp88). Because we used flow cytometry and reassociation kinetics to reveal the genome size of G. intraradices and show that it is haploid, then a similar value for genome size should be found when using genomic reconstruction as long as the genes studied are single copy. The average genome size estimate was 15.74+/-1.69 Mb indicating that these four genes are single copy per haploid genome and per nucleus of G. intraradices. Our results show that the genome size of G. intraradices is much smaller than estimates of other AMF and that the unusually high within-spore genetic variation that is seen in this fungus cannot be due to high ploidy.
Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae).

PubMed

García, G; Ríos, N; Gutiérrez, V

2015-06-01

Among Neotropical fish fauna, the South American killifish genus Austrolebias (Cyprinodontiformes: Rivulidae) constitutes an excellent model to study the genomic evolutionary processes underlying speciation events. Recently, unusually large genome size has been described in 16 species of this genus, with an average DNA content of about 5.95 ± 0.45 pg per diploid cell (mean C-value of about 2.98 pg). In the present paper we explore the possible origin of this unparallel genomic increase by means of comparative analysis of the repetitive components using NGS (454-Roche) technology in the lowest and highest Rivulidae genomes. Here, we provide the first annotated Rivulidae-repeated sequences composition and their relative repetitive fraction in both genomes. Remarkably, the genomic proportion of the moderately repetitive DNA in Austrolebias charrua genome represents approximately twice (45%) of the repetitive components of the highly related rivulinae taxon Cynopoecilus melanotaenia (25%). Present work provides evidence about the impact of the repeat families that could be distinctly proliferated among sublineages within Rivulidae fish group, explaining the great genome size differences encompassing the differentiation and speciation events in this family.
Genome-wide analysis of macrosatellite repeat copy number variation in worldwide populations: evidence for differences and commonalities in size distributions and size restrictions

PubMed Central

2013-01-01

Background Macrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely understudied. Here, we describe a detailed study of six autosomal and two X chromosomal MSRs among 270 HapMap individuals from Central Europe, Asia and Africa. Copy number variation, stability and genetic heterogeneity of the autosomal macrosatellite repeats RS447 (chromosome 4p), MSR5p (5p), FLJ40296 (13q), RNU2 (17q) and D4Z4 (4q and 10q) and X chromosomal DXZ4 and CT47 were investigated. Results Repeat array size distribution analysis shows that all of these MSRs are highly polymorphic with the most genetic variation among Africans and the least among Asians. A mitotic mutation rate of 0.4-2.2% was observed, exceeding meiotic mutation rates and possibly explaining the large size variability found for these MSRs. By means of a novel Bayesian approach, statistical support for a distinct multimodal rather than a uniform allele size distribution was detected in seven out of eight MSRs, with evidence for equidistant intervals between the modes. Conclusions The multimodal distributions with evidence for equidistant intervals, in combination with the observation of MSR-specific constraints on minimum array size, suggest that MSRs are limited in their configurations and that deviations thereof may cause disease, as is the case for facioscapulohumeral muscular dystrophy. However, at present we cannot exclude that there are mechanistic constraints for MSRs that are not directly disease-related. This study represents the first comprehensive study of MSRs in different human populations by applying novel statistical methods and identifies commonalities and differences in their organization and function in the human genome. PMID:23496858
Quantifying the Variation in the Effective Population Size Within a Genome

PubMed Central

Gossmann, Toni I.; Woolfit, Megan; Eyre-Walker, Adam

2011-01-01

The effective population size (Ne) is one of the most fundamental parameters in population genetics. It is thought to vary across the genome as a consequence of differences in the rate of recombination and the density of selected sites due to the processes of genetic hitchhiking and background selection. Although it is known that there is intragenomic variation in the effective population size in some species, it is not known whether this is widespread or how much variation in the effective population size there is. Here, we test whether the effective population size varies across the genome, between protein-coding genes, in 10 eukaryotic species by considering whether there is significant variation in neutral diversity, taking into account differences in the mutation rate between loci by using the divergence between species. In most species we find significant evidence of variation. We investigate whether the variation in Ne is correlated to recombination rate and the density of selected sites in four species, for which these data are available. We find that Ne is positively correlated to recombination rate in one species, Drosophila melanogaster, and negatively correlated to a measure of the density of selected sites in two others, humans and Arabidopsis thaliana. However, much of the variation remains unexplained. We use a hierarchical Bayesian analysis to quantify the amount of variation in the effective population size and show that it is quite modest in all species—most genes have an Ne that is within a few fold of all other genes. Nonetheless we show that this modest variation in Ne is sufficient to cause significant differences in the efficiency of natural selection across the genome, by demonstrating that the ratio of the number of nonsynonymous to synonymous polymorphisms is significantly correlated to synonymous diversity and estimates of Ne, even taking into account the obvious nonindependence between these measures. PMID:21954163
Intraspecies Genomic Diversity and Natural Population Structure of the Meat-Borne Lactic Acid Bacterium Lactobacillus sakei▿ †

PubMed Central

Chaillou, Stéphane; Daty, Marie; Baraige, Fabienne; Dudez, Anne-Marie; Anglade, Patricia; Jones, Rhys; Alpert, Carl-Alfred; Champomier-Vergès, Marie-Christine; Zagorec, Monique

2009-01-01

Lactobacillus sakei is a food-borne bacterium naturally found in meat and fish products. A study was performed to examine the intraspecies diversity among 73 isolates sourced from laboratory collections in several different countries. Pulsed-field gel electrophoresis analysis demonstrated a 25% variation in genome size between isolates, ranging from 1,815 kb to 2,310 kb. The relatedness between isolates was then determined using a PCR-based method that detects the possession of 60 chromosomal genes belonging to the flexible gene pool. Ten different strain clusters were identified that had noticeable differences in their average genome size reflecting the natural population structure. The results show that many different genotypes may be isolated from similar types of meat products, suggesting a complex ecological habitat in which intraspecies diversity may be required for successful adaptation. Finally, proteomic analysis revealed a slight difference between the migration patterns of highly abundant GapA isoforms of the two prevailing L. sakei subspecies (sakei and carnosus). This analysis was used to affiliate the genotypic clusters with the corresponding subspecies. These findings reveal for the first time the extent of intraspecies genomic diversity in L. sakei. Consequently, identification of molecular subtypes may in the future prove valuable for a better understanding of microbial ecosystems in food products. PMID:19114527
Intraspecies genomic diversity and natural population structure of the meat-borne lactic acid bacterium Lactobacillus sakei.

PubMed

Chaillou, Stéphane; Daty, Marie; Baraige, Fabienne; Dudez, Anne-Marie; Anglade, Patricia; Jones, Rhys; Alpert, Carl-Alfred; Champomier-Vergès, Marie-Christine; Zagorec, Monique

2009-02-01

Lactobacillus sakei is a food-borne bacterium naturally found in meat and fish products. A study was performed to examine the intraspecies diversity among 73 isolates sourced from laboratory collections in several different countries. Pulsed-field gel electrophoresis analysis demonstrated a 25% variation in genome size between isolates, ranging from 1,815 kb to 2,310 kb. The relatedness between isolates was then determined using a PCR-based method that detects the possession of 60 chromosomal genes belonging to the flexible gene pool. Ten different strain clusters were identified that had noticeable differences in their average genome size reflecting the natural population structure. The results show that many different genotypes may be isolated from similar types of meat products, suggesting a complex ecological habitat in which intraspecies diversity may be required for successful adaptation. Finally, proteomic analysis revealed a slight difference between the migration patterns of highly abundant GapA isoforms of the two prevailing L. sakei subspecies (sakei and carnosus). This analysis was used to affiliate the genotypic clusters with the corresponding subspecies. These findings reveal for the first time the extent of intraspecies genomic diversity in L. sakei. Consequently, identification of molecular subtypes may in the future prove valuable for a better understanding of microbial ecosystems in food products.
Chironomid midges (Diptera, chironomidae) show extremely small genome sizes.

PubMed

Cornette, Richard; Gusev, Oleg; Nakahara, Yuichi; Shimura, Sachiko; Kikawada, Takahiro; Okuda, Takashi

2015-06-01

Chironomid midges (Diptera; Chironomidae) are found in various environments from the high Arctic to the Antarctic, including temperate and tropical regions. In many freshwater habitats, members of this family are among the most abundant invertebrates. In the present study, the genome sizes of 25 chironomid species were determined by flow cytometry and the resulting C-values ranged from 0.07 to 0.20 pg DNA (i.e. from about 68 to 195 Mbp). These genome sizes were uniformly very small and included, to our knowledge, the smallest genome sizes recorded to date among insects. Small proportion of transposable elements and short intron sizes were suggested to contribute to the reduction of genome sizes in chironomids. We discuss about the possible developmental and physiological advantages of having a small genome size and about putative implications for the ecological success of the family Chironomidae.
Genome size and metabolic intensity in tetrapods: a tale of two lines

PubMed Central

Vinogradov, Alexander E; Anatskaya, Olga V

2005-01-01

We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles–birds and amphibians–mammals (the slope of regression is steeper in reptiles–birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles–birds and amphibians–mammals: reptiles–birds have the relatively higher GC content (for their genome sizes) compared to amphibians–mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian–birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization. PMID:16519230

Reassessment of genome size in turtle and crocodile based on chromosome measurement by flow karyotyping: close similarity to chicken

PubMed Central

Kasai, Fumio; O'Brien, Patricia C. M.; Ferguson-Smith, Malcolm A.

2012-01-01

The genome size in turtles and crocodiles is thought to be much larger than the 1.2 Gb of the chicken (Gallus gallus domesticus, GGA), according to the animal genome size database. However, GGA macrochromosomes show extensive homology in the karyotypes of the red eared slider (Trachemys scripta elegans, TSC) and the Nile crocodile (Crocodylus niloticus, CNI), and bird and reptile genomes have been highly conserved during evolution. In this study, size and GC content of all chromosomes are measured from the flow karyotypes of GGA, TSC and CNI. Genome sizes estimated from the total chromosome size demonstrate that TSC and CNI are 1.21 Gb and 1.29 Gb, respectively. This refines previous overestimations and reveals similar genome sizes in chicken, turtle and crocodile. Analysis of chromosome GC content in each of these three species shows a higher GC content in smaller chromosomes than in larger chromosomes. This contrasts with mammals and squamates in which GC content does not correlate with chromosome size. These data suggest that a common ancestor of birds, turtles and crocodiles had a small genome size and a chromosomal size-dependent GC bias, distinct from the squamate lineage. PMID:22491763
Plant Genome Size Research: A Field In Focus

PubMed Central

BENNETT, M. D.; LEITCH, I. J.

2005-01-01

This Special Issue contains 18 papers arising from presentations at the Second Plant Genome Size Workshop and Discussion Meeting (hosted by the Royal Botanic Gardens, Kew, 8–12 September, 2003). This preface provides an overview of these papers, setting their key contents in the broad framework of this highly active field. It also highlights a few overarching issues with wide biological impact or interest, including (1) the need to unify terminology relating to C-value and genome size, (2) the ongoing quest for accurate gold standards for accurate plant genome size estimation, (3) how knowledge of species' DNA amounts has increased in recent years, (4) the existence, causes and significance of intraspecific variation, (5) recent progress in understanding the mechanisms and evolutionary patterns of genome size change, and (6) the impact of genome size knowledge on related biological activities such as genetic fingerprinting and quantitative genetics. The paper offers a vision of how increased knowledge and understanding of genome size will contribute to holisitic genomic studies in both plants and animals in the next decade. PMID:15596455
Size-resolved emission rates of airborne bacteria and fungi in an occupied classroom

PubMed Central

Qian, J; Hospodsky, D; Yamamoto, N; Nazaroff, W W; Peccia, J

2012-01-01

The role of human occupancy as a source of indoor biological aerosols is poorly understood. Size-resolved concentrations of total and biological particles in indoor air were quantified in a classroom under occupied and vacant conditions. Per-occupant emission rates were estimated through a mass-balance modeling approach, and the microbial diversity of indoor and outdoor air during occupancy was determined via rDNA gene sequence analysis. Significant increases of total particle mass and bacterial genome concentrations were observed during the occupied period compared to the vacant case. These increases varied in magnitude with the particle size and ranged from 3 to 68 times for total mass, 12–2700 times for bacterial genomes, and 1.5–5.2 times for fungal genomes. Emission rates per person-hour because of occupancy were 31 mg, 37 × 106 genome copies, and 7.3 × 106 genome copies for total particle mass, bacteria, and fungi, respectively. Of the bacterial emissions, ∼18% are from taxa that are closely associated with the human skin microbiome. This analysis provides size-resolved, per person-hour emission rates for these biological particles and illustrates the extent to which being in an occupied room results in exposure to bacteria that are associated with previous or current human occupants. Practical Implications Presented here are the first size-resolved, per person emission rate estimates of bacterial and fungal genomes for a common occupied indoor space. The marked differences observed between total particle and bacterial size distributions suggest that size-dependent aerosol models that use total particles as a surrogate for microbial particles incorrectly assess the fate of and human exposure to airborne bacteria. The strong signal of human microbiota in airborne particulate matter in an occupied setting demonstrates that the aerosol route can be a source of exposure to microorganisms emitted from the skin, hair, nostrils, and mouths of other occupants. PMID:22257156
Detecting microsatellites within genomes: significant variation among algorithms.

PubMed

Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

2007-04-18

Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Detecting microsatellites within genomes: significant variation among algorithms

PubMed Central

Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

2007-01-01

Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102
Conserved Gene Order and Expanded Inverted Repeats Characterize Plastid Genomes of Thalassiosirales

PubMed Central

Ashworth, Matt P.; Baeshen, Nabih A.; Baeshen, Mohammad N.; Bahieldin, Ahmed; Theriot, Edward C.; Jansen, Robert K.

2014-01-01

Diatoms are mostly photosynthetic eukaryotes within the heterokont lineage. Variable plastid genome sizes and extensive genome rearrangements have been observed across the diatom phylogeny, but little is known about plastid genome evolution within order- or family-level clades. The Thalassiosirales is one of the more comprehensively studied orders in terms of both genetics and morphology. Seven complete diatom plastid genomes are reported here including four Thalassiosirales: Thalassiosira weissflogii, Roundia cardiophora, Cyclotella sp. WC03_2, Cyclotella sp. L04_2, and three additional non-Thalassiosirales species Chaetoceros simplex, Cerataulina daemon, and Rhizosolenia imbricata. The sizes of the seven genomes vary from 116,459 to 129,498 bp, and their genomes are compact and lack introns. The larger size of the plastid genomes of Thalassiosirales compared to other diatoms is due primarily to expansion of the inverted repeat. Gene content within Thalassiosirales is more conserved compared to other diatom lineages. Gene order within Thalassiosirales is highly conserved except for the extensive genome rearrangement in Thalassiosira oceanica. Cyclotella nana, Thalassiosira weissflogii and Roundia cardiophora share an identical gene order, which is inferred to be the ancestral order for the Thalassiosirales, differing from that of the other two Cyclotella species by a single inversion. The genes ilvB and ilvH are missing in all six diatom plastid genomes except for Cerataulina daemon, suggesting an independent gain of these genes in this species. The acpP1 gene is missing in all Thalassiosirales, suggesting that its loss may be a synapomorphy for the order and this gene may have been functionally transferred to the nucleus. Three genes involved in photosynthesis, psaE, psaI, psaM, are missing in Rhizosolenia imbricata, which represents the first documented instance of the loss of photosynthetic genes in diatom plastid genomes. PMID:25233465
Three tiers of genome evolution in reptiles

PubMed Central

Organ, Chris L.; Moreno, Ricardo Godínez; Edwards, Scott V.

2008-01-01

Characterization of reptilian genomes is essential for understanding the overall diversity and evolution of amniote genomes, because reptiles, which include birds, constitute a major fraction of the amniote evolutionary tree. To better understand the evolution and diversity of genomic characteristics in Reptilia, we conducted comparative analyses of online sequence data from Alligator mississippiensis (alligator) and Sphenodon punctatus (tuatara) as well as genome size and karyological data from a wide range of reptilian species. At the whole-genome and chromosomal tiers of organization, we find that reptilian genome size distribution is consistent with a model of continuous gradual evolution while genomic compartmentalization, as manifested in the number of microchromosomes and macrochromosomes, appears to have undergone early rapid change. At the sequence level, the third genomic tier, we find that exon size in Alligator is distributed in a pattern matching that of exons in Gallus (chicken), especially in the 101—200 bp size class. A small spike in the fraction of exons in the 301 bp—1 kb size class is also observed for Alligator, but more so for Sphenodon. For introns, we find that members of Reptilia have a larger fraction of introns within the 101 bp–2 kb size class and a lower fraction of introns within the 5–30 kb size class than do mammals. These findings suggest that the mode of reptilian genome evolution varies across three hierarchical levels of the genome, a pattern consistent with a mosaic model of genomic evolution. PMID:21669810
Body lice and head lice (Anoplura: Pediculidae) have the smallest genomes of any hemimetabolous insect reported to date.

PubMed

Johnston, J Spencer; Yoon, Kyong Sup; Strycharz, Joseph P; Pittendrigh, Barry R; Clark, J Marshall

2007-11-01

The human body louse, Pediculus humanus humanus L. (Anoplura: Pediculidae), is a vector of several diseases, including louse-borne epidemic typhus, relapsing fever, and trench fever, whereas the head louse, Pediculus humanus capitis De Geer (Anoplura: Pediculidae), is more a pest of social concern. Sequencing of the body louse genome has recently been proposed and undertaken by National Human Genome Research Institute. One of the first steps in understanding an organism's genome is to determine its genome size. Here, using flow cytometry determinations, we present evidence that body louse genome size is 104.7 +/- 1.4 Mb for females and 108.3 +/- 1.1 Mb for males. Our results suggest that head lice also have a small genome size, of similar size to the body louse. Thus, Pediculus lice have one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome.
Differentially Methylated Region-Representational Difference Analysis (DMR-RDA): A Powerful Method to Identify DMRs in Uncharacterized Genomes.

PubMed

Sasheva, Pavlina; Grossniklaus, Ueli

2017-01-01

Over the last years, it has become increasingly clear that environmental influences can affect the epigenomic landscape and that some epigenetic variants can have heritable, phenotypic effects. While there are a variety of methods to perform genome-wide analyses of DNA methylation in model organisms, this is still a challenging task for non-model organisms without a reference genome. Differentially methylated region-representational difference analysis (DMR-RDA) is a sensitive and powerful PCR-based technique that isolates DNA fragments that are differentially methylated between two otherwise identical genomes. The technique does not require special equipment and is independent of prior knowledge about the genome. It is even applicable to genomes that have high complexity and a large size, being the method of choice for the analysis of plant non-model systems.
Stomatal vs. genome size in angiosperms: the somatic tail wagging the genomic dog?

PubMed Central

Hodgson, J. G.; Sharafi, M.; Jalili, A.; Díaz, S.; Montserrat-Martí, G.; Palmer, C.; Cerabolini, B.; Pierce, S.; Hamzehee, B.; Asri, Y.; Jamzad, Z.; Wilson, P.; Raven, J. A.; Band, S. R.; Basconcelo, S.; Bogard, A.; Carter, G.; Charles, M.; Castro-Díez, P.; Cornelissen, J. H. C.; Funes, G.; Jones, G.; Khoshnevis, M.; Pérez-Harguindeguy, N.; Pérez-Rontomé, M. C.; Shirvany, F. A.; Vendramini, F.; Yazdani, S.; Abbas-Azimi, R.; Boustani, S.; Dehghan, M.; Guerrero-Campo, J.; Hynd, A.; Kowsary, E.; Kazemi-Saeed, F.; Siavash, B.; Villar-Salvador, P.; Craigie, R.; Naqinezhad, A.; Romo-Díez, A.; de Torres Espuny, L.; Simmons, E.

2010-01-01

Background and Aims Genome size is a function, and the product, of cell volume. As such it is contingent on ecological circumstance. The nature of ‘this ecological circumstance’ is, however, hotly debated. Here, we investigate for angiosperms whether stomatal size may be this ‘missing link’: the primary determinant of genome size. Stomata are crucial for photosynthesis and their size affects functional efficiency. Methods Stomatal and leaf characteristics were measured for 1442 species from Argentina, Iran, Spain and the UK and, using PCA, some emergent ecological and taxonomic patterns identified. Subsequently, an assessment of the relationship between genome-size values obtained from the Plant DNA C-values database and measurements of stomatal size was carried out. Key Results Stomatal size is an ecologically important attribute. It varies with life-history (woody species < herbaceous species < vernal geophytes) and contributes to ecologically and physiologically important axes of leaf specialization. Moreover, it is positively correlated with genome size across a wide range of major taxa. Conclusions Stomatal size predicts genome size within angiosperms. Correlation is not, however, proof of causality and here our interpretation is hampered by unexpected deficiencies in the scientific literature. Firstly, there are discrepancies between our own observations and established ideas about the ecological significance of stomatal size; very large stomata, theoretically facilitating photosynthesis in deep shade, were, in this study (and in other studies), primarily associated with vernal geophytes of unshaded habitats. Secondly, the lower size limit at which stomata can function efficiently, and the ecological circumstances under which these minute stomata might occur, have not been satisfactorally resolved. Thus, our hypothesis, that the optimization of stomatal size for functional efficiency is a major ecological determinant of genome size, remains unproven. PMID:20375204
Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

PubMed

Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

2015-01-01

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.
Highly rearranged and size-variable chloroplast genomes in conifers II clade (cupressophytes): evolution towards shorter intergenic spacers.

PubMed

Wu, Chung-Shien; Chaw, Shu-Miaw

2014-04-01

Although conifers are of immense ecological and economic value, bioengineering of their chloroplasts remains undeveloped. Understanding the chloroplast genomic organization of conifers can facilitate their bioengineering. Members of the conifer II clade (or cupressophytes) are highly diverse in both morphologic features and chloroplast genomic organization. We compared six cupressophyte chloroplast genomes (cpDNAs) that represent four of the five cupressophyte families, including three genomes that are first reported here (Agathis dammara, Calocedrus formosana and Nageia nagi). The six cupressophyte cpDNAs have lost a pair of large inverted repeats (IRs) and vary greatly in size, organization and tRNA copies. We demonstrate that cupressophyte cpDNAs have evolved towards reduced size, largely due to shrunken intergenic spacers. In cupressophytes, cpDNA rearrangements are capable of extending intergenic spacers, and synonymous mutations are negatively associated with the size and frequency of rearrangements. The variable cpDNA sizes of cupressophytes may have been shaped by mutational burden and genomic rearrangements. On the basis of cpDNA organization, our analyses revealed that in gymnosperms, cpDNA rearrangements are phylogenetically informative, which supports the 'gnepines' clade. In addition, removal of a specific IR influences the minimal rearrangements required for the gnepines and cupressophyte clades, whereby Pinaceae favours the removal of IRB but cupressophytes exclusion of IRA. This result strongly suggests that different IR copies have been lost from conifers I and II. Our data help understand the complexity and evolution of cupressophyte cpDNAs. © 2013 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology, The Association of Applied Biologists and John Wiley & Sons Ltd.
Contrasting growth phenology of native and invasive forest shrubs mediated by genome size.

PubMed

Fridley, Jason D; Craddock, Alaä

2015-08-01

Examination of the significance of genome size to plant invasions has been largely restricted to its association with growth rate. We investigated the novel hypothesis that genome size is related to forest invasions through its association with growth phenology, as a result of the ability of large-genome species to grow more effectively through cell expansion at cool temperatures. We monitored the spring leaf phenology of 54 species of eastern USA deciduous forests, including native and invasive shrubs of six common genera. We used new measurements of genome size to evaluate its association with spring budbreak, cell size, summer leaf production rate, and photosynthetic capacity. In a phylogenetic hierarchical model that differentiated native and invasive species as a function of summer growth rate and spring budbreak timing, species with smaller genomes exhibited both faster growth and delayed budbreak compared with those with larger nuclear DNA content. Growth rate, but not budbreak timing, was associated with whether a species was native or invasive. Our results support genome size as a broad indicator of the growth behavior of woody species. Surprisingly, invaders of deciduous forests show the same small-genome tendencies of invaders of more open habitats, supporting genome size as a robust indicator of invasiveness. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Recent updates and developments to plant genome size databases

PubMed Central

Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

2014-01-01

Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377
The cellular and Genomic response of rat dopaminergic neurons (N27) to coated nanosilver

EPA Science Inventory

This study examined if nanosilver (nanoAg) of different sizes and coatings were differentially toxic to oxidative stress-sensitive neurons. N27 rat dopaminergic neurons were exposed (0.5-5ppm) to a set of nanoAg of different sizes (10nm, 75nm) and coatings (PVP, citrate) and thei...
Using flow cytometry to estimate pollen DNA content: improved methodology and applications

PubMed Central

Kron, Paul; Husband, Brian C.

2012-01-01

Background and Aims Flow cytometry has been used to measure nuclear DNA content in pollen, mostly to understand pollen development and detect unreduced gametes. Published data have not always met the high-quality standards required for some applications, in part due to difficulties inherent in the extraction of nuclei. Here we describe a simple and relatively novel method for extracting pollen nuclei, involving the bursting of pollen through a nylon mesh, compare it with other methods and demonstrate its broad applicability and utility. Methods The method was tested across 80 species, 64 genera and 33 families, and the data were evaluated using established criteria for estimating genome size and analysing cell cycle. Filter bursting was directly compared with chopping in five species, yields were compared with published values for sonicated samples, and the method was applied by comparing genome size estimates for leaf and pollen nuclei in six species. Key Results Data quality met generally applied standards for estimating genome size in 81 % of species and the higher best practice standards for cell cycle analysis in 51 %. In 41 % of species we met the most stringent criterion of screening 10 000 pollen grains per sample. In direct comparison with two chopping techniques, our method produced better quality histograms with consistently higher nuclei yields, and yields were higher than previously published results for sonication. In three binucleate and three trinucleate species we found that pollen-based genome size estimates differed from leaf tissue estimates by 1·5 % or less when 1C pollen nuclei were used, while estimates from 2C generative nuclei differed from leaf estimates by up to 2·5 %. Conclusions The high success rate, ease of use and wide applicability of the filter bursting method show that this method can facilitate the use of pollen for estimating genome size and dramatically improve unreduced pollen production estimation with flow cytometry. PMID:22875815
Genome size and invasiveness traits in the hybrid meadow knapweed complex (Centaurea x moncktonii) in eastern North America

USDA-ARS?s Scientific Manuscript database

Hybridization and genomic admixture between divergent populations or species may be an important driver of plant invasiveness. Recent studies have emphasized the critical role that reductions in genome size may play in facilitating the rapid evolution of invasiveness, and small genome size has been ...
Comparison of Two Capillary Gel Electrophoresis Systems for Clostridium difficile Ribotyping, Using a Panel of Ribotype 027 Isolates and Whole-Genome Sequences as a Reference Standard

PubMed Central

Xiao, Meng; Kong, Fanrong; Jin, Ping; Wang, Qinning; Xiao, Kelin; Jeoffreys, Neisha; James, Gregory

2012-01-01

PCR ribotyping is the most commonly used Clostridium difficile genotyping method, but its utility is limited by lack of standardization. In this study, we analyzed four published whole genomes and tested an international collection of 21 well-characterized C. difficile ribotype 027 isolates as the basis for comparison of two capillary gel electrophoresis (CGE)-based ribotyping methods. There were unexpected differences between the 16S-23S rRNA intergenic spacer region (ISR) allelic profiles of the four ribotype 027 genomes, but six bands were identified in all four and a seventh in three genomes. All seven bands and another, not identified in any of the whole genomes, were found in all 21 isolates. We compared sequencer-based CGE (SCGE) with three different primer pairs to the Qiagen QIAxcel CGE (QCGE) platform. Deviations from individual reference/consensus band sizes were smaller for SCGE (0 to 0.2 bp) than for QCGE (4.2 to 9.5 bp). Compared with QCGE, SCGE more readily distinguished bands of similar length (more discriminatory), detected bands of larger size and lower intensity (more sensitive), and assigned band sizes more accurately and reproducibly, making it more suitable for standardization. Specifically, QCGE failed to identify the largest ISR amplicon. Based on several criteria, we recommend the primer set 16S-USA/23S-USA for use in a proposed standard SCGE method. Similar differences between SCGE and QCGE were found on testing of 14 isolates of four other C. difficile ribotypes. Based on our results, ISR profiles based on accurate sequencer-based band lengths would be preferable to agarose gel-based banding patterns for the assignment of ribotypes. PMID:22692737
Microeconomic principles explain an optimal genome size in bacteria.

PubMed

Ranea, Juan A G; Grant, Alastair; Thornton, Janet M; Orengo, Christine A

2005-01-01

Bacteria can clearly enhance their survival by expanding their genetic repertoire. However, the tight packing of the bacterial genome and the fact that the most evolved species do not necessarily have the biggest genomes suggest there are other evolutionary factors limiting their genome expansion. To clarify these restrictions on size, we studied those protein families contributing most significantly to bacterial-genome complexity. We found that all bacteria apply the same basic and ancestral 'molecular technology' to optimize their reproductive efficiency. The same microeconomics principles that define the optimum size in a factory can also explain the existence of a statistical optimum in bacterial genome size. This optimum is reached when the bacterial genome obtains the maximum metabolic complexity (revenue) for minimal regulatory genes (logistic cost).
Rapid discrimination of sequences flanking and within T-DNA insertions in the Arabidopsis genome.

PubMed

Ponce, M R; Quesada, V; Micol, J L

1998-05-01

An improvement to previous methods for recovering Arabidopsis thaliana genomic DNA flanking T-DNA insertions is presented that allows for the avoidance of some of the cloning difficulties caused by the concatameric nature of T-DNA inserts. The principle of the procedure is to categorize by size restriction fragments of mutant DNA, produced in separate digestions with NdeI and Bst1107I. Given that the sites for these two enzymes are contiguous within the pGV3850:1003 T-DNA construct, the restriction fragments obtained fall into two categories: those showing identical size in both digestions, which correspond to sequences internal to T-DNA concatamers; and those of different sizes, that contain the junctions between plant DNA and the T-DNA insert. Such a criterion makes it possible to easily distinguish the digestion products corresponding to internal T-DNA parts, which do not deserve further attention, and those which presumably include a segment of the locus of interest. Discrimination between restriction fragments of genomic mutant DNA can be made on rescued plasmids, inverse PCR amplification products or bands in a genomic blot.

Transethnic differences in GWAS signals: A simulation study.

PubMed

Zanetti, Daniela; Weale, Michael E

2018-05-07

Genome-wide association studies (GWASs) have allowed researchers to identify thousands of single nucleotide polymorphisms (SNPs) and other variants associated with particular complex traits. Previous studies have reported differences in the strength and even the direction of GWAS signals across different populations. These differences could be due to a combination of (1) lack of power, (2) allele frequency differences, (3) linkage disequilibrium (LD) differences, and (4) true differences in causal variant effect sizes. To determine whether properties (1)-(3) on their own might be sufficient to explain the patterns previously noted in strong GWAS signals, we simulated case-control data of European, Asian and African ancestry, applying realistic allele frequencies and LD from 1000 Genomes data but enforcing equal causal effect sizes across populations. Much of the observed differences in strong GWAS signals could indeed be accounted for by allele frequency and LD differences, enhanced by the Euro-centric SNP bias and lower SNP coverage found in older GWAS panels. While we cannot rule out a role for true transethnic effect size differences, our results suggest that strong causal effects may be largely shared among human populations, motivating the use of transethnic data for fine-mapping. © 2018 John Wiley & Sons Ltd/University College London.
Selective significance of genome size in a plant community with heavy metal pollution.

PubMed

Vidic, T; Greilhuber, J; Vilhar, B; Dermastia, M

2009-09-01

In eukaryotes, nuclear genome sizes vary by more than five orders of magnitude. This variation is not related to organismal complexity, and its origin and biological significance are still disputed. One of the open questions is whether genome size has an adaptive role. We tested the hypothesis that genome size has selective significance, using five grassland communities occurring on a gradient of metal pollution of the soil as a model. We detected a negative correlation between the concentration of contaminating metals in the soil and the number of vascular plant species. Analysis of genome sizes of 70 herbaceous dicot perennial species occurring on the investigated plots revealed a negative correlation between the concentration of contaminating metals in the soil and the proportion of species with large genomes in plant communities. Consistent with the hypothesis, these results show that species with large genomes are at selective disadvantage in extreme environmental conditions.
On the need for widespread horizontal gene transfers under genome size constraint.

PubMed

Isambert, Hervé; Stein, Richard R

2009-08-25

While eukaryotes primarily evolve by duplication-divergence expansion (and reduction) of their own gene repertoire with only rare horizontal gene transfers, prokaryotes appear to evolve under both gene duplications and widespread horizontal gene transfers over long evolutionary time scales. But, the evolutionary origin of this striking difference in the importance of horizontal gene transfers remains by and large a mystery. We propose that the abundance of horizontal gene transfers in free-living prokaryotes is a simple but necessary consequence of two opposite effects: i) their apparent genome size constraint compared to typical eukaryote genomes and ii) their underlying genome expansion dynamics through gene duplication-divergence evolution, as demonstrated by the presence of many tandem and block repeated genes. In principle, this combination of genome size constraint and underlying duplication expansion should lead to a coalescent-like process with extensive turnover of functional genes. This would, however, imply the unlikely, systematic reinvention of functions from discarded genes within independent phylogenetic lineages. Instead, we propose that the long-term evolutionary adaptation of free-living prokaryotes must have resulted in the emergence of efficient non-phylogenetic pathways to circumvent gene loss. This need for widespread horizontal gene transfers due to genome size constraint implies, in particular, that prokaryotes must remain under strong selection pressure in order to maintain the long-term evolutionary adaptation of their "mutualized" gene pool, beyond the inevitable turnover of individual prokaryote species. By contrast, the absence of genome size constraint for typical eukaryotes has presumably relaxed their need for widespread horizontal gene transfers and strong selection pressure. Yet, the resulting loss of genetic functions, due to weak selection pressure and inefficient gene recovery mechanisms, must have ultimately favored the emergence of more complex life styles and ecological integration of many eukaryotes. This article was reviewed by Pierre Pontarotti, Eugene V Koonin and Sergei Maslov.
Genome evolution in Reptilia, the sister group of mammals.

PubMed

Janes, Daniel E; Organ, Christopher L; Fujita, Matthew K; Shedlock, Andrew M; Edwards, Scott V

2010-01-01

The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.
Genomic gigantism: DNA loss is slow in mountain grasshoppers.

PubMed

Bensasson, D; Petrov, D A; Zhang, D X; Hartl, D L; Hewitt, G M

2001-02-01

Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.
Genomic Mapping of Human DNA provides Evidence of Difference in Stretch between AT and GC rich regions

NASA Astrophysics Data System (ADS)

Reifenberger, Jeffrey; Dorfman, Kevin; Cao, Han

Human DNA is a not a polymer consisting of a uniform distribution of all 4 nucleic acids, but rather contains regions of high AT and high GC content. When confined, these regions could have different stretch due to the extra hydrogen bond present in the GC basepair. To measure this potential difference, human genomic DNA was nicked with NtBspQI, labeled with a cy3 like fluorophore at the nick site, stained with YOYO, loaded into a device containing an array of nanochannels, and imaged. Over 473,000 individual molecules of DNA, corresponding to roughly 30x coverage of a human genome, were collected and aligned to the human reference. Based on the known AT/GC content between aligned pairs of labels, the stretch was measured for regions of similar size but different AT/GC content. We found that regions of high GC content were consistently more stretched than regions of high AT content between pairs of labels varying in size between 2.5 kbp and 500 kbp. We measured that for every 1% increase in GC content there was roughly a 0.06% increase in stretch. While this effect is small, it is important to take into account differences in stretch between AT and GC rich regions to improve the sensitivity of detection of structural variations from genomic variations. NIH Grant: R01-HG006851.
Distribution and diversity of cytotypes in Dianthus broteri as evidenced by genome size variations

PubMed Central

Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador

2009-01-01

Background and Aims Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Methods Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Key Results Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1·17-fold) and tetraploid (1·22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Conclusions Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones. PMID:19633312
Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans.

PubMed

Haraksingh, Rajini R; Abyzov, Alexej; Urban, Alexander Eckehart

2017-04-24

High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.
Karyotype and genome size of Iberochondrostoma almacai (Teleostei, Cyprinidae) and comparison with the sister-species I.lusitanicum

PubMed Central

2009-01-01

This study aimed to define the karyotype of the recently described Iberian endemic Iberochondrostoma almacai, to revisit the previously documented chromosome polymorphisms of its sister species I.lusitanicum using C-, Ag-/CMA3 and RE-banding, and to compare the two species genome sizes. A 2n = 50 karyotype (with the exception of a triploid I.lusitanicum specimen) and a corresponding haploid chromosome formula of 7M:15SM:3A (FN = 94) were found. Multiple NORs were observed in both species (in two submetacentric chromosome pairs, one of them clearly homologous) and a higher intra and interpopulational variability was evidenced in I.lusitanicum. Flow cytometry measurements of nuclear DNA content showed some significant differences in genome size both between and within species: the genome of I. almacai was smaller than that of I.lusitanicum (mean values 2.61 and 2.93 pg, respectively), which presented a clear interpopulational variability (mean values ranging from 2.72 to 3.00 pg). These data allowed the distinction of both taxa and confirmed the existence of two well differentiated groups within I. lusitanicum: one that includes the populations from the right bank of the Tejo and Samarra drainages, and another that reunites the southern populations. The peculiar differences between the two species, presently listed as “Critically Endangered”, reinforced the importance of this study for future conservation plans. PMID:21637679
Effect of phosphorus availability on the selection of species with different ploidy levels and genome sizes in a long-term grassland fertilization experiment.

PubMed

Šmarda, Petr; Hejcman, Michal; Březinová, Alexandra; Horová, Lucie; Steigerová, Helena; Zedek, František; Bureš, Petr; Hejcmanová, Pavla; Schellberg, Jürgen

2013-11-01

Polyploidy and increased genome size are hypothesized to increase organismal nutrient demands, namely of phosphorus (P), which is an essential and abundant component of nucleic acids. Therefore, polyploids and plants with larger genomes are expected to be selectively disadvantaged in P-limited environments. However, this hypothesis has yet to be experimentally tested. We measured the somatic DNA content and ploidy level in 74 vascular plant species in a long-term fertilization experiment. The differences between the fertilizer treatments regarding the DNA content and ploidy level of the established species were tested using phylogeny-based statistics. The percentage and biomass of polyploid species clearly increased with soil P in particular fertilizer treatments, and a similar but weaker trend was observed for the DNA content. These increases were associated with the dominance of competitive life strategy (particularly advantageous in the P-treated plots) in polyploids and the enhanced competitive ability of dominant polyploid grasses at high soil P concentrations, indicating their increased P limitation. Our results verify the hypothesized effect of P availability on the selection of polyploids and plants with increased genome sizes, although the relative contribution of increased P demands vs increased competitiveness as causes of the observed pattern requires further evaluation. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Genome Sequences of Marine Shrimp Exopalaemon carinicauda Holthuis Provide Insights into Genome Size Evolution of Caridea.

PubMed

Yuan, Jianbo; Gao, Yi; Zhang, Xiaojun; Wei, Jiankai; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

2017-07-05

Crustacea, particularly Decapoda, contains many economically important species, such as shrimps and crabs. Crustaceans exhibit enormous (nearly 500-fold) variability in genome size. However, limited genome resources are available for investigating these species. Exopalaemon carinicauda Holthuis, an economical caridean shrimp, is a potential ideal experimental animal for research on crustaceans. In this study, we performed low-coverage sequencing and de novo assembly of the E. carinicauda genome. The assembly covers more than 95% of coding regions. E. carinicauda possesses a large complex genome (5.73 Gb), with size twice higher than those of many decapod shrimps. As such, comparative genomic analyses were implied to investigate factors affecting genome size evolution of decapods. However, clues associated with genome duplication were not identified, and few horizontally transferred sequences were detected. Ultimately, the burst of transposable elements, especially retrotransposons, was determined as the major factor influencing genome expansion. A total of 2 Gb repeats were identified, and RTE-BovB, Jockey, Gypsy, and DIRS were the four major retrotransposons that significantly expanded. Both recent (Jockey and Gypsy) and ancestral (DIRS) originated retrotransposons responsible for the genome evolution. The E. carinicauda genome also exhibited potential for the genomic and experimental research of shrimps.
Insights into the genetic architecture of morphological traits in two passerine bird species.

PubMed

Silva, C N S; McFarlane, S E; Hagen, I J; Rönnegård, L; Billing, A M; Kvalnes, T; Kemppainen, P; Rønning, B; Ringsby, T H; Sæther, B-E; Qvarnström, A; Ellegren, H; Jensen, H; Husby, A

2017-09-01

Knowledge about the underlying genetic architecture of phenotypic traits is needed to understand and predict evolutionary dynamics. The number of causal loci, magnitude of the effects and location in the genome are, however, still largely unknown. Here, we use genome-wide single-nucleotide polymorphism (SNP) data from two large-scale data sets on house sparrows and collared flycatchers to examine the genetic architecture of different morphological traits (tarsus length, wing length, body mass, bill depth, bill length, total and visible badge size and white wing patches). Genomic heritabilities were estimated using relatedness calculated from SNPs. The proportion of variance captured by the SNPs (SNP-based heritability) was lower in house sparrows compared with collared flycatchers, as expected given marker density (6348 SNPs in house sparrows versus 38 689 SNPs in collared flycatchers). Indeed, after downsampling to similar SNP density and sample size, this estimate was no longer markedly different between species. Chromosome-partitioning analyses demonstrated that the proportion of variance explained by each chromosome was significantly positively related to the chromosome size for some traits and, generally, that larger chromosomes tended to explain proportionally more variation than smaller chromosomes. Finally, we found two genome-wide significant associations with very small-effect sizes. One SNP on chromosome 20 was associated with bill length in house sparrows and explained 1.2% of phenotypic variation (V P ), and one SNP on chromosome 4 was associated with tarsus length in collared flycatchers (3% of V P ). Although we cannot exclude the possibility of undetected large-effect loci, our results indicate a polygenic basis for morphological traits.
Complete mitochondrial genome of Xingguo red carp (Cyprinus carpio var. singuonensis) and purse red carp (Cyprinus carpio var. wuyuanensis).

PubMed

Hu, Guang-Fu; Liu, Xiang-Jiang; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na; Zou, Gui-Wei

2016-01-01

The complete mitochondrial genomes of Xingguo red carp (Cyprinus carpio var. singuonensis) and purse red carp (Cyprinus carpio var. wuyuanensis) were sequenced. Comparison of these two mitochondrial genomes revealed that the mtDNAs of these two common carp varieties were remarkably similar in genome length, gene order and content, and AT content. However, size variation between these two mitochondrial genomes presented here showed 39 site differences in overall length. About 2 site differences were located in rRNAs, 3 in tRNAs, 3 in the control region, 31 in protein-coding genes. Thirty-one variable bases in the protein-coding regions between the two varieties mitochondrial sequences led to three variable amino acids, which were mainly located in the protein ND5 and ND4.
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
Large differences in the genome organization of different plant Trypanosomatid parasites (Phytomonas spp.) reveal wide evolutionary divergences between taxa.

PubMed

Marín, C; Dollet, M; Pagès, M; Bastien, P

2009-03-01

All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera.
Energetics and genetics across the prokaryote-eukaryote divide

PubMed Central

2011-01-01

Background All complex life on Earth is eukaryotic. All eukaryotic cells share a common ancestor that arose just once in four billion years of evolution. Prokaryotes show no tendency to evolve greater morphological complexity, despite their metabolic virtuosity. Here I argue that the eukaryotic cell originated in a unique prokaryotic endosymbiosis, a singular event that transformed the selection pressures acting on both host and endosymbiont. Results The reductive evolution and specialisation of endosymbionts to mitochondria resulted in an extreme genomic asymmetry, in which the residual mitochondrial genomes enabled the expansion of bioenergetic membranes over several orders of magnitude, overcoming the energetic constraints on prokaryotic genome size, and permitting the host cell genome to expand (in principle) over 200,000-fold. This energetic transformation was permissive, not prescriptive; I suggest that the actual increase in early eukaryotic genome size was driven by a heavy early bombardment of genes and introns from the endosymbiont to the host cell, producing a high mutation rate. Unlike prokaryotes, with lower mutation rates and heavy selection pressure to lose genes, early eukaryotes without genome-size limitations could mask mutations by cell fusion and genome duplication, as in allopolyploidy, giving rise to a proto-sexual cell cycle. The side effect was that a large number of shared eukaryotic basal traits accumulated in the same population, a sexual eukaryotic common ancestor, radically different to any known prokaryote. Conclusions The combination of massive bioenergetic expansion, release from genome-size constraints, and high mutation rate favoured a protosexual cell cycle and the accumulation of eukaryotic traits. These factors explain the unique origin of eukaryotes, the absence of true evolutionary intermediates, and the evolution of sex in eukaryotes but not prokaryotes. Reviewers This article was reviewed by: Eugene Koonin, William Martin, Ford Doolittle and Mark van der Giezen. For complete reports see the Reviewers' Comments section. PMID:21714941
Photoperiod-H1 (Ppd-H1) Controls Leaf Size1[OPEN

PubMed Central

Digel, Benedikt; Tavakol, Elahe; Verderio, Gabriele; Xu, Xin

2016-01-01

Leaf size is a major determinant of plant photosynthetic activity and biomass; however, it is poorly understood how leaf size is genetically controlled in cereal crop plants like barley (Hordeum vulgare). We conducted a genome-wide association scan for flowering time, leaf width, and leaf length in a diverse panel of European winter cultivars grown in the field and genotyped with a single-nucleotide polymorphism array. The genome-wide association scan identified PHOTOPERIOD-H1 (Ppd-H1) as a candidate gene underlying the major quantitative trait loci for flowering time and leaf size in the barley population. Microscopic phenotyping of three independent introgression lines confirmed the effect of Ppd-H1 on leaf size. Differences in the duration of leaf growth and consequent variation in leaf cell number were responsible for the leaf size differences between the Ppd-H1 variants. The Ppd-H1-dependent induction of the BARLEY MADS BOX genes BM3 and BM8 in the leaf correlated with reductions in leaf size and leaf number. Our results indicate that leaf size is controlled by the Ppd-H1- and photoperiod-dependent progression of plant development. The coordination of leaf growth with flowering may be part of a reproductive strategy to optimize resource allocation to the developing inflorescences and seeds. PMID:27457126
Not All Particles Are Equal: The Selective Enrichment of Particle-Associated Bacteria from the Mediterranean Sea.

PubMed

López-Pérez, Mario; Kimes, Nikole E; Haro-Moreno, Jose M; Rodriguez-Valera, Francisco

2016-01-01

We have used two metagenomic approaches, direct sequencing of natural samples and sequencing after enrichment, to characterize communities of prokaryotes associated to particles. In the first approximation, different size filters (0.22 and 5 μm) were used to identify prokaryotic microbes of free-living and particle-attached bacterial communities in the Mediterranean water column. A subtractive metagenomic approach was used to characterize the dominant microbial groups in the large size fraction that were not present in the free-living one. They belonged mainly to Actinobacteria, Planctomycetes, Flavobacteria and Proteobacteria. In addition, marine microbial communities enriched by incubation with different kinds of particulate material have been studied by metagenomic assembly. Different particle kinds (diatomaceous earth, sand, chitin and cellulose) were colonized by very different communities of bacteria belonging to Roseobacter, Vibrio, Bacteriovorax, and Lacinutrix that were distant relatives of genomes already described from marine habitats. Besides, using assembly from deep metagenomic sequencing from the particle-specific enrichments we were able to determine a total of 20 groups of contigs (eight of them with >50% completeness) and reconstruct de novo five new genomes of novel species within marine clades (>79% completeness and <1.8% contamination). We also describe for the first time the genome of a marine Rhizobiales phage that seems to infect a broad range of Alphaproteobacteria and live in habitats as diverse as soil, marine sediment and water column. The metagenomic recruitment of the communities found by direct sequencing of the large size filter and by enrichment had nearly no overlap. These results indicate that these reconstructed genomes are part of the rare biosphere which exists at nominal levels under natural conditions.
Genome size and chromosome number in velvet worms (Onychophora).

PubMed

Jeffery, Nicholas W; Oliveira, Ivo S; Gregory, T Ryan; Rowell, David M; Mayer, Georg

2012-12-01

The Onychophora (velvet worms) represents a small group of invertebrates (~180 valid species), which is commonly united with Tardigrada and Arthropoda in a clade called Panarthropoda. As with the majority of invertebrate taxa, genome size data are very limited for the Onychophora, with only one previously published estimate. Here we use both flow cytometry and Feulgen image analysis densitometry to provide genome size estimates for seven species of velvet worms from both major subgroups, Peripatidae and Peripatopsidae, along with karyotype data for each species. Genome sizes in these species range from roughly 5-19 pg, with densitometric estimates being slightly larger than those obtained by flow cytometry for all species. Chromosome numbers range from 2n = 8 to 2n = 54. No relationship is evident between genome size, chromosome number, or reproductive mode. Various avenues for future genomic research are presented based on these results.
Phytophthora megakarya and Phytophthora palmivora, Closely Related Causal Agents of Cacao Black Pod Rot, Underwent Increases in Genome Sizes and Gene Numbers by Different Mechanisms

PubMed Central

Ali, Shahin S.; Shao, Jonathan; Lary, David J.; Kronmiller, Brent A.; Shen, Danyu; Strem, Mary D.; Amoako-Attah, Ishmael; Akrofi, Andrew Yaw; Begoude, B.A. Didier; ten Hoopen, G. Martijn; Coulibaly, Klotioloma; Kebe, Boubacar Ismaël; Melnick, Rachel L.; Guiltinan, Mark J.; Tyler, Brett M.; Meinhardt, Lyndel W.

2017-01-01

Phytophthora megakarya (Pmeg) and Phytophthora palmivora (Ppal) are closely related species causing cacao black pod rot. Although Ppal is a cosmopolitan pathogen, cacao is the only known host of economic importance for Pmeg. Pmeg is more virulent on cacao than Ppal. We sequenced and compared the Pmeg and Ppal genomes and identified virulence-related putative gene models (PGeneM) that may be responsible for their differences in host specificities and virulence. Pmeg and Ppal have estimated genome sizes of 126.88 and 151.23 Mb and PGeneM numbers of 42,036 and 44,327, respectively. The evolutionary histories of Pmeg and Ppal appear quite different. Postspeciation, Ppal underwent whole-genome duplication whereas Pmeg has undergone selective increases in PGeneM numbers, likely through accelerated transposable element-driven duplications. Many PGeneMs in both species failed to match transcripts and may represent pseudogenes or cryptic genetic reservoirs. Pmeg appears to have amplified specific gene families, some of which are virulence-related. Analysis of mycelium, zoospore, and in planta transcriptome expression profiles using neural network self-organizing map analysis generated 24 multivariate and nonlinear self-organizing map classes. Many members of the RxLR, necrosis-inducing phytophthora protein, and pectinase genes families were specifically induced in planta. Pmeg displays a diverse virulence-related gene complement similar in size to and potentially of greater diversity than Ppal but it remains likely that the specific functions of the genes determine each species’ unique characteristics as pathogens. PMID:28186564

Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo)

PubMed Central

2012-01-01

Background The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey. PMID:22891612
DNA methylation patterns and gene expression associated with litter size in Berkshire pig placenta

PubMed Central

Kwon, Seulgi; Park, Da Hye; Kim, Tae Wan; Kang, Deok Gyeong; Yu, Go Eun; Kim, Il-Suk; Park, Hwa Chun; Ha, Jeongim; Kim, Chul Wook

2017-01-01

Increasing litter size is of great interest to the pig industry. DNA methylation is an important epigenetic modification that regulates gene expression, resulting in livestock phenotypes such as disease resistance, milk production, and reproduction. We classified Berkshire pigs into two groups according to litter size and estimated breeding value: smaller (SLG) and larger (LLG) litter size groups. Genome-wide DNA methylation and gene expression were analyzed using placenta genomic DNA and RNA to identify differentially methylated regions (DMRs) and differentially expressed genes (DEGs) associated with litter size. The methylation levels of CpG dinucleotides in different genomic regions were noticeably different between the groups, while global methylation pattern was similar, and excluding intergenic regions they were found the most frequently in gene body regions. Next, we analyzed RNA-Seq data to identify DEGs between the SLG and LLG groups. A total of 1591 DEGs were identified: 567 were downregulated and 1024 were upregulated in LLG compared to SLG. To identify genes that simultaneously exhibited changes in DNA methylation and mRNA expression, we integrated and analyzed the data from bisulfite-Seq and RNA-Seq. Nine DEGs positioned in DMRs were found. The expression of only three of these genes (PRKG2, CLCA4, and PCK1) was verified by RT-qPCR. Furthermore, we observed the same methylation patterns in blood samples as in the placental tissues by PCR-based methylation analysis. Together, these results provide useful data regarding potential epigenetic markers for selecting hyperprolific sows. PMID:28880934
Dynamics of genome size evolution in birds and mammals.

PubMed

Kapusta, Aurélie; Suh, Alexander; Feschotte, Cédric

2017-02-21

Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified "accordion" model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives.
The scaling features of the 3D organization of chromosomes are highlighted by a transformation à la Kadanoff of Hi-C data

NASA Astrophysics Data System (ADS)

Chiariello, Andrea M.; Bianco, Simona; Annunziatella, Carlo; Esposito, Andrea; Nicodemi, Mario

2017-11-01

Technologies such as Hi-C and GAM have revealed that chromosomes are not randomly folded into the nucleus of cells, but are composed by a sequence of contact domains (TADs), each typically 0.5 Mb long. However, the larger scale organization of the genome remains still not well understood. To investigate the scaling behaviour of chromosome folding, here we apply an approach à la Kadanoff, inspired by the Renormalization Group theory, to Hi-C interaction data, across different cell types and chromosomes. We find that the genome is characterized by complex scaling features, where the average size of contact domains exhibits a power-law behaviour with the rescaling level. That is compatible with the existence of contact domains extending across length scales up to chromosomal sizes. The scaling exponent is statistically indistinguishable among the different murine cell types analysed. These results point toward a scenario of a universal higher-order spatial architecture of the genome, which could reflect fundamental, organizational principles.
Comparative genomics reveals insights into avian genome evolution and adaptation

PubMed Central

Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

2015-01-01

Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712
Comparative analysis of mitochondrial genomes between the hau cytoplasmic male sterility (CMS) line and its iso-nuclear maintainer line in Brassica juncea to reveal the origin of the CMS-associated gene orf288.

PubMed

Heng, Shuangping; Wei, Chao; Jing, Bing; Wan, Zhengjie; Wen, Jing; Yi, Bin; Ma, Chaozhi; Tu, Jinxing; Fu, Tingdong; Shen, Jinxiong

2014-04-30

Cytoplasmic male sterility (CMS) is not only important for exploiting heterosis in crop plants, but also as a model for investigating nuclear-cytoplasmic interaction. CMS may be caused by mutations, rearrangement or recombination in the mitochondrial genome. Understanding the mitochondrial genome is often the first and key step in unraveling the molecular and genetic basis of CMS in plants. Comparative analysis of the mitochondrial genome of the hau CMS line and its maintainer line in B. juneca (Brassica juncea) may help show the origin of the CMS-associated gene orf288. Through next-generation sequencing, the B. juncea hau CMS mitochondrial genome was assembled into a single, circular-mapping molecule that is 247,903 bp in size and 45.08% in GC content. In addition to the CMS associated gene orf288, the genome contains 35 protein-encoding genes, 3 rRNAs, 25 tRNA genes and 29 ORFs of unknown function. The mitochondrial genome sizes of the maintainer line and another normal type line "J163-4" are both 219,863 bp and with GC content at 45.23%. The maintainer line has 36 genes with protein products, 3 rRNAs, 22 tRNA genes and 31 unidentified ORFs. Comparative analysis the mitochondrial genomes of the hau CMS line and its maintainer line allowed us to develop specific markers to separate the two lines at the seedling stage. We also confirmed that different mitotypes coexist substoichiometrically in hau CMS lines and its maintainer lines in B. juncea. The number of repeats larger than 100 bp in the hau CMS line (16 repeats) are nearly twice of those found in the maintainer line (9 repeats). Phylogenetic analysis of the CMS-associated gene orf288 and four other homologous sequences in Brassicaceae show that orf288 was clearly different from orf263 in Brassica tournefortii despite of strong similarity. The hau CMS mitochondrial genome was highly rearranged when compared with its iso-nuclear maintainer line mitochondrial genome. This study may be useful for studying the mechanism of natural CMS in B. juncea, performing comparative analysis on sequenced mitochondrial genomes in Brassicas, and uncovering the origin of the hau CMS mitotype and structural and evolutionary differences between different mitotypes.
Megacycles of atmospheric carbon dioxide concentration correlate with fossil plant genome size.

PubMed

Franks, Peter J; Freckleton, Rob P; Beaulieu, Jeremy M; Leitch, Ilia J; Beerling, David J

2012-02-19

Tectonic processes drive megacycles of atmospheric carbon dioxide (CO(2)) concentration, c(a), that force large fluctuations in global climate. With a period of several hundred million years, these megacycles have been linked to the evolution of vascular plants, but adaptation at the subcellular scale has been difficult to determine because fossils typically do not preserve this information. Here we show, after accounting for evolutionary relatedness using phylogenetic comparative methods, that plant nuclear genome size (measured as the haploid DNA amount) and the size of stomatal guard cells are correlated across a broad taxonomic range of extant species. This phylogenetic regression was used to estimate the mean genome size of fossil plants from the size of fossil stomata. For the last 400 Myr, spanning almost the full evolutionary history of vascular plants, we found a significant correlation between fossil plant genome size and c(a), modelled independently using geochemical data. The correlation is consistent with selection for stomatal size and genome size by c(a) as plants adapted towards optimal leaf gas exchange under a changing CO(2) regime. Our findings point to the possibility that major episodes of change in c(a) throughout Earth history might have selected for changes in genome size, influencing plant diversification.
The Ecological Genomics of Fungi: Repeated Elements in Filamentous Fungi with a Focus on Wood-Decay Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murat, Claude; Payen, Thibaut; Petitpierre, Denis

2013-01-01

In the last decade, the genome of several dozen filamentous fungi have been sequenced. Interestingly, vast diversity in genome size was observed (Fig. 2.1) with 14-fold differences between the 9 Mb of the human pathogenic dandruff fungus (Malassezia globosa; Xu, Saunders, et al., 2007) and the 125 Mb of the ectomycorrhizal black truffle of P rigord (Tuber melanosporum; Martin, Kohler, et al., 2010). Recently, Raffaele and Kamoun (2012) highlighted that the genomes of several lineages of filamentous plant pathogens have been shaped by repeat-driven expansion. Indeed, repeated elements are ubiquitous in all prokaryote and eukaryote genomes; however, their frequencies canmore » vary from just a minor percentage of the genome to more that 60 percent of the genome. Repeated elements can be classified in two major types: satellites DNA and transposable elements. In this chapter, the different types of repeated elements and how these elements can impact genome and gene repertoire will be described. Also, an intriguing link between the transposable elements richness and diversity and the ecological niche will be highlighted.« less
Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

PubMed Central

Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

2013-01-01

Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856
Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

PubMed

Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

2011-01-01

The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

PubMed Central

Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

2011-01-01

Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

PubMed Central

Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

2007-01-01

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes.

PubMed

Angly, Florent E; Willner, Dana; Prieto-Davó, Alejandra; Edwards, Robert A; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A; Barott, Katie; Cottrell, Matthew T; Desnues, Christelle; Dinsdale, Elizabeth A; Furlan, Mike; Haynes, Matthew; Henn, Matthew R; Hu, Yongfei; Kirchman, David L; McDole, Tracey; McPherson, John D; Meyer, Folker; Miller, R Michael; Mundt, Egbert; Naviaux, Robert K; Rodriguez-Mueller, Beltran; Stevens, Rick; Wegley, Linda; Zhang, Lixin; Zhu, Baoli; Rohwer, Forest

2009-12-01

Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes.

PubMed

Haiminen, Niina; Feltus, F Alex; Parida, Laxmi

2011-04-15

We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.
Different genome maintenance strategies in human and tobacco cells.

PubMed

Pelczar, Pawel; Kalck, Véronique; Kovalchuk, Igor

2003-08-22

In this work, genome maintenance strategies of organisms belonging to different kingdoms (animals versus plants) but of similar genome size were investigated using a novel, universal double-strand break (DSB) repair assay. Different plasmids linearised with KpnI, Acc65I or EcoRV yielding either 3' or 5' protruding or blunt DNA termini, respectively, were transfected into HeLa cells and Nicotiana plumbaginifolia protoplasts and assayed for the efficiency and fidelity of DSB repair. We show that the mechanism of break sealing is similar but that drastic differences are seen in the fidelity of repair: in HeLa cells, 50-55% DSBs were repaired precisely, compared to as little as 15-30% in tobacco cells. Moreover, the DSB repair in plants resulted in 30-40% longer deletions and significantly shorter insertions. Combined, these led to more than twofold larger net DNA loss in tobacco cells. Our observations point to possible differences in the strategies of DSB repair and genome maintenance in plants and animals.
Empirical comparison between different methods for genomic prediction of number of piglets born alive in moderate sized breeding populations.

PubMed

Fangmann, A; Sharifi, R A; Heinkel, J; Danowski, K; Schrade, H; Erbe, M; Simianer, H

2017-04-01

Currently used multi-step methods to incorporate genomic information in the prediction of breeding values (BV) implicitly involve many assumptions which, if violated, may result in loss of information, inaccuracies and bias. To overcome this, single-step genomic best linear unbiased prediction (ssGBLUP) was proposed combining pedigree, phenotype and genotype of all individuals for genetic evaluation. Our objective was to implement ssGBLUP for genomic predictions in pigs and to compare the accuracy of ssGBLUP with that of multi-step methods with empirical data of moderately sized pig breeding populations. Different predictions were performed: conventional parent average (PA), direct genomic value (DGV) calculated with genomic BLUP (GBLUP), a GEBV obtained by blending the DGV with PA, and ssGBLUP. Data comprised individuals from a German Landrace (LR) and Large White (LW) population. The trait 'number of piglets born alive' (NBA) was available for 182,054 litters of 41,090 LR sows and 15,750 litters from 4534 LW sows. The pedigree contained 174,021 animals, of which 147,461 (26,560) animals were LR (LW) animals. In total, 526 LR and 455 LW animals were genotyped with the Illumina PorcineSNP60 BeadChip. After quality control and imputation, 495 LR (424 LW) animals with 44,368 (43,678) SNP on 18 autosomes remained for the analysis. Predictive abilities, i.e., correlations between de-regressed proofs and genomic BV, were calculated with a five-fold cross validation and with a forward prediction for young genotyped validation animals born after 2011. Generally, predictive abilities for LR were rather small (0.08 for GBLUP, 0.19 for GEBV and 0.18 for ssGBLUP). For LW, ssGBLUP had the greatest predictive ability (0.45). For both breeds, assessment of reliabilities for young genotyped animals indicated that genomic prediction outperforms PA with ssGBLUP providing greater reliabilities (0.40 for LR and 0.32 for LW) than GEBV (0.35 for LR and 0.29 for LW). Grouping of animals according to information sources revealed that genomic prediction had the highest potential benefit for genotyped animals without their own phenotype. Although, ssGBLUP did not generally outperform GBLUP or GEBV, the results suggest that ssGBLUP can be a useful and conceptually convincing approach for practical genomic prediction of NBA in moderately sized LR and LW populations.
Animal Mitochondrial DNA as We Do Not Know It: mt-Genome Organization and Evolution in Nonbilaterian Lineages

PubMed Central

Pett, Walker

2016-01-01

Abstract Animal mitochondrial DNA (mtDNA) is commonly described as a small, circular molecule that is conserved in size, gene content, and organization. Data collected in the last decade have challenged this view by revealing considerable diversity in animal mitochondrial genome organization. Much of this diversity has been found in nonbilaterian animals (phyla Cnidaria, Ctenophora, Placozoa, and Porifera), which, from a phylogenetic perspective, form the main branches of the animal tree along with Bilateria. Within these groups, mt-genomes are characterized by varying numbers of both linear and circular chromosomes, extra genes (e.g. atp9, polB, tatC), large variation in the number of encoded mitochondrial transfer RNAs (tRNAs) (0–25), at least seven different genetic codes, presence/absence of introns, tRNA and mRNA editing, fragmented ribosomal RNA genes, translational frameshifting, highly variable substitution rates, and a large range of genome sizes. This newly discovered diversity allows a better understanding of the evolutionary plasticity and conservation of animal mtDNA and provides insights into the molecular and evolutionary mechanisms shaping mitochondrial genomes. PMID:27557826
Novel nuclei isolation buffer for flow cytometric genome size estimation of Zingiberaceae: a comparison with common isolation buffers

PubMed Central

Sadhu, Abhishek; Bhadra, Sreetama; Bandyopadhyay, Maumita

2016-01-01

Background and Aims Cytological parameters such as chromosome numbers and genome sizes of plants are used routinely for studying evolutionary aspects of polyploid plants. Members of Zingiberaceae show a wide range of inter- and intrageneric variation in their reproductive habits and ploidy levels. Conventional cytological study in this group of plants is severely hampered by the presence of diverse secondary metabolites, which also affect their genome size estimation using flow cytometry. None of the several nuclei isolation buffers used in flow cytometry could be used very successfully for members of Zingiberaceae to isolate good quality nuclei from both shoot and root tissues. Methods The competency of eight nuclei isolation buffers was compared with a newly formulated buffer, MB01, in six different genera of Zingiberaceae based on the fluorescence intensity of propidium iodide-stained nuclei using flow cytometric parameters, namely coefficient of variation of the G0/G1 peak, debris factor and nuclei yield factor. Isolated nuclei were studied using fluorescence microscopy and bio-scanning electron microscopy to analyse stain–nuclei interaction and nuclei topology, respectively. Genome contents of 21 species belonging to these six genera were determined using MB01. Key Results Flow cytometric parameters showed significant differences among the analysed buffers. MB01 exhibited the best combination of analysed parameters; photomicrographs obtained from fluorescence and electron microscopy supported the superiority of MB01 buffer over other buffers. Among the 21 species studied, nuclear DNA contents of 14 species are reported for the first time. Conclusions Results of the present study substantiate the enhanced efficacy of MB01, compared to other buffers tested, in the generation of acceptable cytograms from all species of Zingiberaceae studied. Our study facilitates new ways of sample preparation for further flow cytometric analysis of genome size of other members belonging to this highly complex polyploid family. PMID:27594649
Novel nuclei isolation buffer for flow cytometric genome size estimation of Zingiberaceae: a comparison with common isolation buffers.

PubMed

Sadhu, Abhishek; Bhadra, Sreetama; Bandyopadhyay, Maumita

2016-11-01

Cytological parameters such as chromosome numbers and genome sizes of plants are used routinely for studying evolutionary aspects of polyploid plants. Members of Zingiberaceae show a wide range of inter- and intrageneric variation in their reproductive habits and ploidy levels. Conventional cytological study in this group of plants is severely hampered by the presence of diverse secondary metabolites, which also affect their genome size estimation using flow cytometry. None of the several nuclei isolation buffers used in flow cytometry could be used very successfully for members of Zingiberaceae to isolate good quality nuclei from both shoot and root tissues. The competency of eight nuclei isolation buffers was compared with a newly formulated buffer, MB01, in six different genera of Zingiberaceae based on the fluorescence intensity of propidium iodide-stained nuclei using flow cytometric parameters, namely coefficient of variation of the G 0 /G 1 peak, debris factor and nuclei yield factor. Isolated nuclei were studied using fluorescence microscopy and bio-scanning electron microscopy to analyse stain-nuclei interaction and nuclei topology, respectively. Genome contents of 21 species belonging to these six genera were determined using MB01. Flow cytometric parameters showed significant differences among the analysed buffers. MB01 exhibited the best combination of analysed parameters; photomicrographs obtained from fluorescence and electron microscopy supported the superiority of MB01 buffer over other buffers. Among the 21 species studied, nuclear DNA contents of 14 species are reported for the first time. Results of the present study substantiate the enhanced efficacy of MB01, compared to other buffers tested, in the generation of acceptable cytograms from all species of Zingiberaceae studied. Our study facilitates new ways of sample preparation for further flow cytometric analysis of genome size of other members belonging to this highly complex polyploid family. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

PubMed Central

de Koning, A. P. Jason; Gu, Wanjun; Castoe, Todd A.; Batzer, Mark A.; Pollock, David D.

2011-01-01

Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed. PMID:22144907

Genome size evolution in relation to leaf strategy and metabolic rates revisited.

PubMed

Beaulieu, Jeremy M; Leitch, Ilia J; Knight, Charles A

2007-03-01

It has been proposed that having too much DNA may carry physiological consequences for plants. The strong correlation between DNA content, cell size and cell division rate could lead to predictable morphological variation in plants, including a negative relationship with leaf mass per unit area (LMA). In addition, the possible increased demand for resources in species with high DNA content may have downstream effects on maximal metabolic efficiency, including decreased metabolic rates. Tests were made for genome size-dependent variation in LMA and metabolic rates (mass-based photosynthetic rate and dark respiration rate) using our own measurements and data from a plant functional trait database (Glopnet). These associations were tested using two metrics of genome size: bulk DNA amount (2C DNA) and monoploid genome size (1Cx DNA). The data were analysed using an evolutionary framework that included a regression analysis and independent contrasts using a phylogenetic tree with estimates of molecular diversification times. A contribution index for the LMA data set was also calculated to determine which divergences have the greatest influence on the relationship between genome size and LMA. A significant negative association was found between bulk DNA amount and LMA in angiosperms. This was primarily a result of influential divergences that may represent early shifts in growth form. However, divergences in bulk DNA amount were positively associated with divergences in LMA, suggesting that the relationship may be indirect and mediated through other traits directly related to genome size. There was a significant negative association between genome size and metabolic rates that was driven by a basal divergence between angiosperms and gymnosperms; no significant independent contrast results were found. Therefore, it is concluded that genome size-dependent constraints acting on metabolic efficiency may not exist within seed plants.
Dynamics of genome size evolution in birds and mammals

PubMed Central

Feschotte, Cédric

2017-01-01

Genome size in mammals and birds shows remarkably little interspecific variation compared with other taxa. However, genome sequencing has revealed that many mammal and bird lineages have experienced differential rates of transposable element (TE) accumulation, which would be predicted to cause substantial variation in genome size between species. Thus, we hypothesize that there has been covariation between the amount of DNA gained by transposition and lost by deletion during mammal and avian evolution, resulting in genome size equilibrium. To test this model, we develop computational methods to quantify the amount of DNA gained by TE expansion and lost by deletion over the last 100 My in the lineages of 10 species of eutherian mammals and 24 species of birds. The results reveal extensive variation in the amount of DNA gained via lineage-specific transposition, but that DNA loss counteracted this expansion to various extents across lineages. Our analysis of the rate and size spectrum of deletion events implies that DNA removal in both mammals and birds has proceeded mostly through large segmental deletions (>10 kb). These findings support a unified “accordion” model of genome size evolution in eukaryotes whereby DNA loss counteracting TE expansion is a major determinant of genome size. Furthermore, we propose that extensive DNA loss, and not necessarily a dearth of TE activity, has been the primary force maintaining the greater genomic compaction of flying birds and bats relative to their flightless relatives. PMID:28179571
Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

PubMed

Jayakumar, Vasanthan; Sakakibara, Yasubumi

2017-11-03

Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.
Genome-wide association studies identified multiple genetic loci for body size at four growth stages in Chinese Holstein cattle.

PubMed

Zhang, Xu; Chu, Qin; Guo, Gang; Dong, Ganghui; Li, Xizhi; Zhang, Qin; Zhang, Shengli; Zhang, Zhiwu; Wang, Yachun

2017-01-01

The growth and maturity of cattle body size affect not only feed efficiency, but also productivity and longevity. Dissecting the genetic architecture of body size is critical for cattle breeding to improve both efficiency and productivity. The volume and weight of body size are indicated by several measurements. Among them, Heart Girth (HG) and Hip Height (HH) are the most important traits. They are widely used as predictors of body weight (BW). Few association studies have been conducted for HG and HH in cattle focusing on single growth stage. In this study, we extended the Genome-wide association studies to a full spectrum of four growth stages (6-, 12-, 18-, and 24-months after birth) in Chinese Holstein heifers. The whole genomic single nucleotide polymorphisms (SNPs) were obtained from the Illumina BovineSNP50 v2 BeadChip genotyped on 3,325 individuals. Estimated breeding values (EBVs) were derived for both HG and HH at the four different ages and analyzed separately for GWAS by using the Fixed and random model Circuitous Probability Unification (FarmCPU) method. In total, 27 SNPs were identified to be significantly associated with HG and HH at different growth stages. We found 66 candidate genes located nearby the associated SNPs, including nine genes that were known as highly related to development and skeletal and muscular growth. In addition, biological function analysis was performed by Ingenuity Pathway Analysis and an interaction network related to development was obtained, which contained 16 genes out of the 66 candidates. The set of putative genes provided valuable resources and can help elucidate the genomic architecture and mechanisms underlying growth traits in dairy cattle.
Estimation of the genome sizes of the chigger mites Leptotrombidium pallidum and Leptotrombidium scutellare based on quantitative PCR and k-mer analysis

PubMed Central

2014-01-01

Background Leptotrombidium pallidum and Leptotrombidium scutellare are the major vector mites for Orientia tsutsugamushi, the causative agent of scrub typhus. Before these organisms can be subjected to whole-genome sequencing, it is necessary to estimate their genome sizes to obtain basic information for establishing the strategies that should be used for genome sequencing and assembly. Method The genome sizes of L. pallidum and L. scutellare were estimated by a method based on quantitative real-time PCR. In addition, a k-mer analysis of the whole-genome sequences obtained through Illumina sequencing was conducted to verify the mutual compatibility and reliability of the results. Results The genome sizes estimated using qPCR were 191 ± 7 Mb for L. pallidum and 262 ± 13 Mb for L. scutellare. The k-mer analysis-based genome lengths were estimated to be 175 Mb for L. pallidum and 286 Mb for L. scutellare. The estimates from these two independent methods were mutually complementary and within a similar range to those of other Acariform mites. Conclusions The estimation method based on qPCR appears to be a useful alternative when the standard methods, such as flow cytometry, are impractical. The relatively small estimated genome sizes should facilitate whole-genome analysis, which could contribute to our understanding of Arachnida genome evolution and provide key information for scrub typhus prevention and mite vector competence. PMID:24947244
Comparative genomics of Cp8viruses with special reference to Campylobacter phage vB_CjeM_los1, isolated from a slaughterhouse in Ireland.

PubMed

O'Sullivan, Lisa; Lucid, Alan; Neve, Horst; Franz, Charles M A P; Bolton, Declan; McAuliffe, Olivia; Paul Ross, R; Coffey, Aidan

2018-04-23

Campylobacter phage vB_CjeM_Los1 was recently isolated from a slaughterhouse in the Republic of Ireland using the host Campylobacter jejuni subsp. jejuni PT14, and full-genome sequencing and annotation were performed. The genome was found to be 134,073 bp in length and to contain 169 predicted open reading frames. Transmission electron microscopy images of vB_CjeM_Los1 revealed that it belongs to the family Myoviridae, with tail fibres observed in both extended and folded conformations, as seen in T4. The genome size and morphology of vB_CjeM_Los1 suggest that it belongs to the genus Cp8virus, and seven other Campylobacter phages with similar size characteristics have also been fully sequenced. In this work, comparative studies were performed in relation to genomic rearrangements and conservation within each of the eight genomes. None of the eight genomes were found to have undergone internal rearrangements, and their sequences retained more than 98% identity with one another despite the widespread geographical distribution of each phage. Whole-genome phylogenetics were also performed, and clades were shown to be representative of the differing number of tRNAs present in each phage. This may be an indication of lineages within the genus, despite their striking homology.
Reproductive Mode and the Evolution of Genome Size and Structure in Caenorhabditis Nematodes

PubMed Central

Fierst, Janna L.; Willis, John H.; Thomas, Cristel G.; Wang, Wei; Reynolds, Rose M.; Ahearne, Timothy E.; Cutter, Asher D.; Phillips, Patrick C.

2015-01-01

The self-fertile nematode worms Caenorhabditis elegans, C. briggsae, and C. tropicalis evolved independently from outcrossing male-female ancestors and have genomes 20-40% smaller than closely related outcrossing relatives. This pattern of smaller genomes for selfing species and larger genomes for closely related outcrossing species is also seen in plants. We use comparative genomics, including the first high quality genome assembly for an outcrossing member of the genus (C. remanei) to test several hypotheses for the evolution of genome reduction under a change in mating system. Unlike plants, it does not appear that reductions in the number of repetitive elements, such as transposable elements, are an important contributor to the change in genome size. Instead, all functional genomic categories are lost in approximately equal proportions. Theory predicts that self-fertilization should equalize the effective population size, as well as the resulting effects of genetic drift, between the X chromosome and autosomes. Contrary to this, we find that the self-fertile C. briggsae and C. elegans have larger intergenic spaces and larger protein-coding genes on the X chromosome when compared to autosomes, while C. remanei actually has smaller introns on the X chromosome than either self-reproducing species. Rather than being driven by mutational biases and/or genetic drift caused by a reduction in effective population size under self reproduction, changes in genome size in this group of nematodes appear to be caused by genome-wide patterns of gene loss, most likely generated by genomic adaptation to self reproduction per se. PMID:26114425
Demographic Divergence History of Pied Flycatcher and Collared Flycatcher Inferred from Whole-Genome Re-sequencing Data

PubMed Central

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I.; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-01-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000–80,000) and census sizes (5–50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species. PMID:24244198
Demographic divergence history of pied flycatcher and collared flycatcher inferred from whole-genome re-sequencing data.

PubMed

Nadachowska-Brzyska, Krystyna; Burri, Reto; Olason, Pall I; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2013-11-01

Profound knowledge of demographic history is a prerequisite for the understanding and inference of processes involved in the evolution of population differentiation and speciation. Together with new coalescent-based methods, the recent availability of genome-wide data enables investigation of differentiation and divergence processes at unprecedented depth. We combined two powerful approaches, full Approximate Bayesian Computation analysis (ABC) and pairwise sequentially Markovian coalescent modeling (PSMC), to reconstruct the demographic history of the split between two avian speciation model species, the pied flycatcher and collared flycatcher. Using whole-genome re-sequencing data from 20 individuals, we investigated 15 demographic models including different levels and patterns of gene flow, and changes in effective population size over time. ABC provided high support for recent (mode 0.3 my, range <0.7 my) species divergence, declines in effective population size of both species since their initial divergence, and unidirectional recent gene flow from pied flycatcher into collared flycatcher. The estimated divergence time and population size changes, supported by PSMC results, suggest that the ancestral species persisted through one of the glacial periods of middle Pleistocene and then split into two large populations that first increased in size before going through severe bottlenecks and expanding into their current ranges. Secondary contact appears to have been established after the last glacial maximum. The severity of the bottlenecks at the last glacial maximum is indicated by the discrepancy between current effective population sizes (20,000-80,000) and census sizes (5-50 million birds) of the two species. The recent divergence time challenges the supposition that avian speciation is a relatively slow process with extended times for intrinsic postzygotic reproductive barriers to evolve. Our study emphasizes the importance of using genome-wide data to unravel tangled demographic histories. Moreover, it constitutes one of the first examples of the inference of divergence history from genome-wide data in non-model species.
Optimizing the design of small-sized nucleus breeding programs for dairy cattle with minimal performance recording.

PubMed

Kariuki, C M; Komen, H; Kahi, A K; van Arendonk, J A M

2014-12-01

Dairy cattle breeding programs in developing countries are constrained by minimal and erratic pedigree and performance recording on cows on commercial farms. Small-sized nucleus breeding programs offer a viable alternative. Deterministic simulations using selection index theory were performed to determine the optimum design for small-sized nucleus schemes for dairy cattle. The nucleus was made up of 197 bulls and 243 cows distributed in 8 non-overlapping age classes. Each year 10 sires and 100 dams were selected to produce the next generation of male and female selection candidates. Conception rates and sex ratio were fixed at 0.90 and 0.50, respectively, translating to 45 male and 45 female candidates joining the nucleus per year. Commercial recorded dams provided information for genetic evaluation of selection candidates (bulls) in the nucleus. Five strategies were defined: nucleus records only [within-nucleus dam performance (DP)], progeny records in addition to nucleus records [progeny testing (PT)], genomic information only [genomic selection (GS)], dam performance records in addition to genomic information (GS+DP), and progeny records in addition to genomic information (GS+PT). Alternative PT, GS, GS+DP, and GS+PT schemes differed in the number of progeny per sire and size of reference population. The maximum number of progeny records per sire was 30, and the maximum size of the reference population was 5,000. Results show that GS schemes had higher responses and lower accuracies compared with other strategies, with the higher response being due to shorter generation intervals. Compared with similar sized progeny-testing schemes, genomic-selection schemes would have lower accuracies but these are offset by higher responses per year, which might provide additional incentive for farmers to participate in recording. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Comparative genomics reveals insights into avian genome evolution and adaptation.

PubMed

Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

2014-12-12

Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.
A genomic view of 500 million years of cnidarian evolution

PubMed Central

Steele, Robert E.; David, Charles N.; Technau, Ulrich

2010-01-01

Cnidarians (corals, anemones, jellyfish, and hydras) are a diverse group of animals of interest to evolutionary biologists, ecologists, and developmental biologists. With the publication of the genome sequences of Hydra and Nematostella, whose last common ancestor was the stem cnidarian, we are beginning to see the genomic underpinnings of cnidarian biology. Cnidarians are known for the remarkable plasticity of their morphology and life cycles. This plasticity is reflected in the Hydra and Nematostella genomes, which differ to an exceptional degree in size, base composition, transposable element content, and gene conservation. We now know what cnidarian genomes are capable of doing given 500 million years; the next challenge is to understand how this genomic history has led to the striking diversity we see in cnidarians. PMID:21047698
Photoperiod-H1 (Ppd-H1) Controls Leaf Size.

PubMed

Digel, Benedikt; Tavakol, Elahe; Verderio, Gabriele; Tondelli, Alessandro; Xu, Xin; Cattivelli, Luigi; Rossini, Laura; von Korff, Maria

2016-09-01

Leaf size is a major determinant of plant photosynthetic activity and biomass; however, it is poorly understood how leaf size is genetically controlled in cereal crop plants like barley (Hordeum vulgare). We conducted a genome-wide association scan for flowering time, leaf width, and leaf length in a diverse panel of European winter cultivars grown in the field and genotyped with a single-nucleotide polymorphism array. The genome-wide association scan identified PHOTOPERIOD-H1 (Ppd-H1) as a candidate gene underlying the major quantitative trait loci for flowering time and leaf size in the barley population. Microscopic phenotyping of three independent introgression lines confirmed the effect of Ppd-H1 on leaf size. Differences in the duration of leaf growth and consequent variation in leaf cell number were responsible for the leaf size differences between the Ppd-H1 variants. The Ppd-H1-dependent induction of the BARLEY MADS BOX genes BM3 and BM8 in the leaf correlated with reductions in leaf size and leaf number. Our results indicate that leaf size is controlled by the Ppd-H1- and photoperiod-dependent progression of plant development. The coordination of leaf growth with flowering may be part of a reproductive strategy to optimize resource allocation to the developing inflorescences and seeds. © 2016 American Society of Plant Biologists. All rights reserved.
A score-statistic approach for determining threshold values in QTL mapping.

PubMed

Kao, Chen-Hung; Ho, Hsiang-An

2012-06-01

Issues in determining the threshold values of QTL mapping are often investigated for the backcross and F2 populations with relatively simple genome structures so far. The investigations of these issues in the progeny populations after F2 (advanced populations) with relatively more complicated genomes are generally inadequate. As these advanced populations have been well implemented in QTL mapping, it is important to address these issues for them in more details. Due to an increasing number of meiosis cycle, the genomes of the advanced populations can be very different from the backcross and F2 genomes. Therefore, special devices that consider the specific genome structures present in the advanced populations are required to resolve these issues. By considering the differences in genome structure between populations, we formulate more general score test statistics and gaussian processes to evaluate their threshold values. In general, we found that, given a significance level and a genome size, threshold values for QTL detection are higher in the denser marker maps and in the more advanced populations. Simulations were performed to validate our approach.
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

PubMed

Kwok, Hin; Chiang, Alan Kwok Shing

2016-02-24

Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Draft genome of the gayal, Bos frontalis

PubMed Central

Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping

2017-01-01

Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483
Between Two Fern Genomes

PubMed Central

2014-01-01

Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves. PMID:25324969
Statistical correction of the Winner’s Curse explains replication variability in quantitative trait genome-wide association studies

PubMed Central

Pe’er, Itsik

2017-01-01

Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium. PMID:28715421
Genome-Wide Analysis Reveals Novel Regulators of Growth in Drosophila melanogaster

PubMed Central

Vonesch, Sibylle Chantal; Lamparter, David; Mackay, Trudy F. C.; Bergmann, Sven; Hafen, Ernst

2016-01-01

Organismal size depends on the interplay between genetic and environmental factors. Genome-wide association (GWA) analyses in humans have implied many genes in the control of height but suffer from the inability to control the environment. Genetic analyses in Drosophila have identified conserved signaling pathways controlling size; however, how these pathways control phenotypic diversity is unclear. We performed GWA of size traits using the Drosophila Genetic Reference Panel of inbred, sequenced lines. We find that the top associated variants differ between traits and sexes; do not map to canonical growth pathway genes, but can be linked to these by epistasis analysis; and are enriched for genes and putative enhancers. Performing GWA on well-studied developmental traits under controlled conditions expands our understanding of developmental processes underlying phenotypic diversity. PMID:26751788
Assessing pooled BAC and whole genome shotgun strategies for assembly of complex genomes

PubMed Central

2011-01-01

Background We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. Results The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. Conclusions BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies. PMID:21496274

Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity.

PubMed

Sharma, Anupma; Presting, Gernot G

2008-02-01

Centromeric retrotransposons (CR) are located almost exclusively at the centromeres of plant chromosomes. Analysis of the emerging Zea mays inbred B73 genome sequence revealed two novel subfamilies of CR elements of maize (CRM), bringing the total number of known CRM subfamilies to four. Orthologous subfamilies of each of these CRM subfamilies were discovered in the rice lineage, and the orthologous relationships were demonstrated with extensive phylogenetic analyses. The much higher number of CRs in maize versus Oryza sativa is due primarily to the recent expansion of the CRM1 subfamily in maize. At least one incomplete copy of a CRM1 homolog was found in O. sativa ssp. indica and O. officinalis, but no member of this subfamily could be detected in the finished O. sativa ssp. japonica genome, implying loss of this prolific subfamily in that subspecies. CRM2 and CRM3, as well as the corresponding rice subfamilies, have been recently active but are present in low numbers. CRM3 is a full-length element related to the non-autonomous CentA, which is the first described CRM. The oldest subfamily (CRM4), as well as its rice counterpart, appears to contain only inactive members that are not located in currently active centromeres. The abundance of active CR elements is correlated with chromosome size in the three plant genomes for which high quality genomic sequence is available, and the emerging picture of CR elements is one in which different subfamilies are active at different evolutionary times. We propose a model by which CR elements might influence chromosome and genome size.
Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

PubMed

Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

2015-01-01

Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.
Small genomes and large seeds: chromosome numbers, genome size and seed mass in diploid Aesculus species (Sapindaceae)

PubMed Central

Krahulcová, Anna; Trávníček, Pavel; Rejmánek, Marcel

2017-01-01

Background and Aims Aesculus L. (horse chestnut, buckeye) is a genus of 12–19 extant woody species native to the temperate Northern Hemisphere. This genus is known for unusually large seeds among angiosperms. While chromosome counts are available for many Aesculus species, only one has had its genome size measured. The aim of this study is to provide more genome size data and analyse the relationship between genome size and seed mass in this genus. Methods Chromosome numbers in root tip cuttings were confirmed for four species and reported for the first time for three additional species. Flow cytometric measurements of 2C nuclear DNA values were conducted on eight species, and mean seed mass values were estimated for the same taxa. Key Results The same chromosome number, 2n = 40, was determined in all investigated taxa. Original measurements of 2C values for seven Aesculus species (eight taxa), added to just one reliable datum for A. hippocastanum, confirmed the notion that the genome size in this genus with relatively large seeds is surprisingly low, ranging from 0·955 pg 2C–1 in A. parviflora to 1·275 pg 2C–1 in A. glabra var. glabra. Conclusions The chromosome number of 2n = 40 seems to be conclusively the universal 2n number for non-hybrid species in this genus. Aesculus genome sizes are relatively small, not only within its own family, Sapindaceae, but also within woody angiosperms. The genome sizes seem to be distinct and non-overlapping among the four major Aesculus clades. These results provide an extra support for the most recent reconstruction of Aesculus phylogeny. The correlation between the 2C values and seed masses in examined Aesculus species is slightly negative and not significant. However, when the four major clades are treated separately, there is consistent positive association between larger genome size and larger seed mass within individual lineages. PMID:28065925
Genome survey sequencing of red swamp crayfish Procambarus clarkii.

PubMed

Shi, Linlin; Yi, Shaokui; Li, Yanhe

2018-06-21

Red swamp crayfish, Procambarus clarkii, presently is an important aquatic commercial species in China. The crayfish is a hot area of research focus, and its genetic improvement is quite urgent for the crayfish aquaculture in China. However, the knowledge of its genomic landscape is limited. In this study, a survey of P. clarkii genome was investigated based on Illumina's Solexa sequencing platform. Meanwhile, its genome size was estimated using flow cytometry. Interestingly, the genome size estimated is about 8.50 Gb by flow cytometry and 1.86 Gb with genome survey sequencing. Based on the assembled genome sequences, total of 136,962 genes and 152,268 exons were predicted, and the predicted genes ranged from 150 to 12,807 bp in length. The survey sequences could help accelerate the progress of gene discovery involved in genetic diversity and evolutionary analysis, even though it could not successfully applied for estimation of P. clarkii genome size.
Transmission of human mtDNA heteroplasmy in the Genome of the Netherlands families: support for a variable-size bottleneck

PubMed Central

Li, Mingkun; Rothwell, Rebecca; Vermaat, Martijn; Wachsmuth, Manja; Schröder, Roland; Laros, Jeroen F.J.; van Oven, Mannis; de Bakker, Paul I.W.; Bovenberg, Jasper A.; van Duijn, Cornelia M.; van Ommen, Gert-Jan B.; Slagboom, P. Eline; Swertz, Morris A.; Wijmenga, Cisca; Kayser, Manfred; Boomsma, Dorret I.; Zöllner, Sebastian; de Knijff, Peter; Stoneking, Mark

2016-01-01

Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies. PMID:26916109
Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms

PubMed Central

2016-01-01

A major aim of evolutionary biology is to explain the respective roles of adaptive versus non-adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origination of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological populations. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small populations evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations. PMID:27923053
Chloroplast genome expansion by intron multiplication in the basal psychrophilic euglenoid Eutreptiella pomquetensis

PubMed Central

Bennett, Matthew S.; Triemer, Richard E.; Preisfeld, Angelika

2017-01-01

Background Over the last few years multiple studies have been published showing a great diversity in size of chloroplast genomes (cpGenomes), and in the arrangement of gene clusters, in the Euglenales. However, while these genomes provided important insights into the evolution of cpGenomes across the Euglenales and within their genera, only two genomes were analyzed in regard to genomic variability between and within Euglenales and Eutreptiales. To better understand the dynamics of chloroplast genome evolution in early evolving Eutreptiales, this study focused on the cpGenome of Eutreptiella pomquetensis, and the spread and peculiarities of introns. Methods The Etl. pomquetensis cpGenome was sequenced, annotated and afterwards examined in structure, size, gene order and intron content. These features were compared with other euglenoid cpGenomes as well as those of prasinophyte green algae, including Pyramimonas parkeae. Results and Discussion With about 130,561 bp the chloroplast genome of Etl. pomquetensis, a basal taxon in the phototrophic euglenoids, was considerably larger than the two other Eutreptiales cpGenomes sequenced so far. Although the detected quadripartite structure resembled most green algae and plant chloroplast genomes, the gene content of the single copy regions in Etl. pomquetensis was completely different from those observed in green algae and plants. The gene composition of Etl. pomquetensis was extensively changed and turned out to be almost identical to other Eutreptiales and Euglenales, and not to P. parkeae. Furthermore, the cpGenome of Etl. pomquetensis was unexpectedly permeated by a high number of introns, which led to a substantially larger genome. The 51 identified introns of Etl. pomquetensis showed two major unique features: (i) more than half of the introns displayed a high level of pairwise identities; (ii) no group III introns could be identified in the protein coding genes. These findings support the hypothesis that group III introns are degenerated group II introns and evolved later. PMID:28852596
Concerted evolution of body mass and cell size: similar patterns among species of birds (Galliformes) and mammals (Rodentia)

PubMed Central

Dragosz-Kluska, Dominika; Pis, Tomasz; Pawlik, Katarzyna; Kapustka, Filip; Kilarski, Wincenty M.; Kozłowski, Jan

2018-01-01

ABSTRACT Cell size plays a role in body size evolution and environmental adaptations. Addressing these roles, we studied body mass and cell size in Galliformes birds and Rodentia mammals, and collected published data on their genome sizes. In birds, we measured erythrocyte nuclei and basal metabolic rates (BMRs). In birds and mammals, larger species consistently evolved larger cells for five cell types (erythrocytes, enterocytes, chondrocytes, skin epithelial cells, and kidney proximal tubule cells) and evolved smaller hepatocytes. We found no evidence that cell size differences originated through genome size changes. We conclude that the organism-wide coordination of cell size changes might be an evolutionarily conservative characteristic, and the convergent evolutionary body size and cell size changes in Galliformes and Rodentia suggest the adaptive significance of cell size. Recent theory predicts that species evolving larger cells waste less energy on tissue maintenance but have reduced capacities to deliver oxygen to mitochondria and metabolize resources. Indeed, birds with larger size of the abovementioned cell types and smaller hepatocytes have evolved lower mass-specific BMRs. We propose that the inconsistent pattern in hepatocytes derives from the efficient delivery system to hepatocytes, combined with their intense involvement in supracellular function and anabolic activity. PMID:29540429
The Neandertal genome and ancient DNA authenticity

PubMed Central

Green, Richard E; Briggs, Adrian W; Krause, Johannes; Prüfer, Kay; Burbano, Hernán A; Siebauer, Michael; Lachmann, Michael; Pääbo, Svante

2009-01-01

Recent advances in high-thoughput DNA sequencing have made genome-scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large-scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar ‘boot-strap' approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired. PMID:19661919
Rates of spontaneous mutation.

PubMed Central

Drake, J W; Charlesworth, B; Charlesworth, D; Crow, J F

1998-01-01

Rates of spontaneous mutation per genome as measured in the laboratory are remarkably similar within broad groups of organisms but differ strikingly among groups. Mutation rates in RNA viruses, whose genomes contain ca. 10(4) bases, are roughly 1 per genome per replication for lytic viruses and roughly 0.1 per genome per replication for retroviruses and a retrotransposon. Mutation rates in microbes with DNA-based chromosomes are close to 1/300 per genome per replication; in this group, therefore, rates per base pair vary inversely and hugely as genome sizes vary from 6 x 10(3) to 4 x 10(7) bases or base pairs. Mutation rates in higher eukaryotes are roughly 0.1-100 per genome per sexual generation but are currently indistinguishable from 1/300 per cell division per effective genome (which excludes the fraction of the genome in which most mutations are neutral). It is now possible to specify some of the evolutionary forces that shape these diverse mutation rates. PMID:9560386
De novo comparative transcriptome analysis of genes involved in fruit morphology of pumpkin cultivars with extreme size difference and development of EST-SSR markers.

PubMed

Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis

2017-07-30

The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.
A genomic view of 500 million years of cnidarian evolution.

PubMed

Steele, Robert E; David, Charles N; Technau, Ulrich

2011-01-01

Cnidarians (corals, anemones, jellyfish and hydras) are a diverse group of animals of interest to evolutionary biologists, ecologists and developmental biologists. With the publication of the genome sequences of Hydra and Nematostella, whose last common ancestor was the stem cnidarian, researchers are beginning to see the genomic underpinnings of cnidarian biology. Cnidarians are known for the remarkable plasticity of their morphology and life cycles. This plasticity is reflected in the Hydra and Nematostella genomes, which differ to an exceptional degree in size, base composition, transposable element content and gene conservation. It is now known what cnidarian genomes, given 500 million years, are capable of; as we discuss here, the next challenge is to understand how this genomic history has led to the striking diversity seen in this group. Copyright © 2010 Elsevier Ltd. All rights reserved.
Genome-Wide Association Analyses Highlight the Potential for Different Genetic Mechanisms for Litter Size Among Sheep Breeds

PubMed Central

Xu, Song-Song; Gao, Lei; Xie, Xing-Long; Ren, Yan-Ling; Shen, Zhi-Qiang; Wang, Feng; Shen, Min; Eyϸórsdóttir, Emma; Hallsson, Jón H.; Kiseleva, Tatyana; Kantanen, Juha; Li, Meng-Hua

2018-01-01

Reproduction is an important trait in sheep breeding as well as in other livestock. However, despite its importance the genetic mechanisms of litter size in domestic sheep (Ovis aries) are still poorly understood. To explore genetic mechanisms underlying the variation in litter size, we conducted multiple independent genome-wide association studies in five sheep breeds of high prolificacy (Wadi, Hu, Icelandic, Finnsheep, and Romanov) and one low prolificacy (Texel) using the Ovine Infinium HD BeadChip, respectively. We identified different sets of candidate genes associated with litter size in different breeds: BMPR1B, FBN1, and MMP2 in Wadi; GRIA2, SMAD1, and CTNNB1 in Hu; NCOA1 in Icelandic; INHBB, NF1, FLT1, PTGS2, and PLCB3 in Finnsheep; ESR2 in Romanov and ESR1, GHR, ETS1, MMP15, FLI1, and SPP1 in Texel. Further annotation of genes and bioinformatics analyses revealed that different biological pathways could be involved in the variation in litter size of females: hormone secretion (FSH and LH) in Wadi and Hu, placenta and embryonic lethality in Icelandic, folliculogenesis and LH signaling in Finnsheep, ovulation and preovulatory follicle maturation in Romanov, and estrogen and follicular growth in Texel. Taken together, our results provide new insights into the genetic mechanisms underlying the prolificacy trait in sheep and other mammals, suggesting targets for selection where the aim is to increase prolificacy in breeding projects. PMID:29692799
Genome size variation in the pine fusiform rust pathogen Cronartium quercuum f.sp. fusiforme as determined by flow cytometry

Treesearch

Claire L Anderson; Thomas L Kubisiak; C Dana Nelson; Jason A Smith; John M Davis

2010-01-01

The genome size of the pine fusiform rust pathogen Cronartium quercuum f.sp. fusiforme (Cqf) was determined by flow cytometric analysis of propidium iodide-stained, intact haploid pycniospores with haploid spores of two genetically well characterized fungal species, Sclerotinia sclerotiorum and Puccinia graminis f.sp. tritici, as size standards. The Cqf haploid genome...
Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

PubMed Central

Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

2006-01-01

Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030
Comparative Genomics of the Cucurbitaceae

USDA-ARS?s Scientific Manuscript database

The genome size for watermelon, melon, cucumber, and pumpkin is 425, 454, 367, and 502 Mbp, respectively, and considered medium size as compared with most other crops. Whole-genome duplication is common in angiosperm plants. Research has revealed a paleohexaploidy (') event in the common ancestor of...
Repliscan: a tool for classifying replication timing regions.

PubMed

Zynda, Gregory J; Song, Jawon; Concia, Lorenzo; Wear, Emily E; Hanley-Bowdoin, Linda; Thompson, William F; Vaughn, Matthew W

2017-08-07

Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replication timing. To accurately detect and classify regions of replication across the genome, we present Repliscan. Repliscan robustly normalizes, automatically removes outlying and uninformative data points, and classifies Repli-seq signals into discrete combinations of replication signatures. The quality control steps and self-fitting methods make Repliscan generally applicable and more robust than previous methods that classify regions based on thresholds. Repliscan is simple and effective to use on organisms with different genome sizes. Even with analysis window sizes as small as 1 kilobase, reliable profiles can be generated with as little as 2.4x coverage.
High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences.

PubMed

Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan

2011-09-01

To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
Genome Sequence of a Bombyx mori Nucleopolyhedrovirus Strain with Cubic Occlusion Bodies

PubMed Central

Cheng, Ruo-Lin; Xu, Yi-Peng

2012-01-01

Bombyx mori nucleopolyhedrovirus (BmNPV) is a typical species of Baculoviridae. The complete genome sequence of a BmNPV strain with cubic occlusion bodies is reported here. The genome of this strain consists of 127,465 nucleotides with a G+C content of 40.36% and is 97.3% and 97.5% identical to those of BmNPV strain T3 and Bombyx mandarina NPV S1, respectively. Despite the abnormal polyhedra it forms, the polyhedrin gene of the BmNPV cubic strain is 100% identical to those of the other two strains. Baculovirus repeated ORFs and homologous repeat regions cause the major differences in genome size of these BmNPV isolates. PMID:22923803
Genome expansion and gene loss in powdery mildew fungi reveal functional tradeoffs in extreme parasitism

USDA-ARS?s Scientific Manuscript database

Eukaryotic genomes vary in size over five orders of magnitude ranging from microsporidia (~2.9Mb) to the lung-fish (~1.2Tb). This extraordinary variation is largely a result of the proliferation of mobile DNA elements also referred to as “genomic parasites.” The constraints on genome size may be imp...

The mitochondrial genome of the legume Vigna radiata and the analysis of recombination across short mitochondrial repeats.

PubMed

Alverson, Andrew J; Zhuo, Shi; Rice, Danny W; Sloan, Daniel B; Palmer, Jeffrey D

2011-01-20

The mitochondrial genomes of seed plants are exceptionally fluid in size, structure, and sequence content, with the accumulation and activity of repetitive sequences underlying much of this variation. We report the first fully sequenced mitochondrial genome of a legume, Vigna radiata (mung bean), and show that despite its unexceptional size (401,262 nt), the genome is unusually depauperate in repetitive DNA and "promiscuous" sequences from the chloroplast and nuclear genomes. Although Vigna lacks the large, recombinationally active repeats typical of most other seed plants, a PCR survey of its modest repertoire of short (38-297 nt) repeats nevertheless revealed evidence for recombination across all of them. A set of novel control assays showed, however, that these results could instead reflect, in part or entirely, artifacts of PCR-mediated recombination. Consequently, we recommend that other methods, especially high-depth genome sequencing, be used instead of PCR to infer patterns of plant mitochondrial recombination. The average-sized but repeat- and feature-poor mitochondrial genome of Vigna makes it ever more difficult to generalize about the factors shaping the size and sequence content of plant mitochondrial genomes.
Comparative genomic data of the Avian Phylogenomics Project.

PubMed

Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

2014-01-01

The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.
The Metamorphosis of Amphibian Toxicogenomics

PubMed Central

Helbing, Caren C.

2012-01-01

Amphibians are important vertebrates in toxicology often representing both aquatic and terrestrial forms within the life history of the same species. Of the thousands of species, only two have substantial genomics resources: the recently published genome of the Pipid, Xenopus (Silurana) tropicalis, and transcript information (and ongoing genome sequencing project) of Xenopus laevis. However, many more species representative of regional ecological niches and life strategies are used in toxicology worldwide. Since Xenopus species diverged from the most populous frog family, the Ranidae, ~200 million years ago, there are notable differences between them and the even more distant Caudates (salamanders) and Caecilians. These differences include genome size, gene composition, and extent of polyploidization. Application of toxicogenomics to amphibians requires the mobilization of resources and expertise to develop de novo sequence assemblies and analysis strategies for a broader range of amphibian species. The present mini-review will present the advances in toxicogenomics as pertains to amphibians with particular emphasis upon the development and use of genomic techniques (inclusive of transcriptomics, proteomics, and metabolomics) and the challenges inherent therein. PMID:22435070
Genome sequencing of ovine isolates of Mycobacterium avium subspecies paratuberculosis offers insights into host association

PubMed Central

2012-01-01

Background The genome of Mycobacterium avium subspecies paratuberculosis (MAP) is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. Results Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs). Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb). Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565) further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. Conclusions Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences. PMID:22409516
Deciphering Genome Content and Evolutionary Relationships of Isolates from the Fungus Magnaporthe oryzae Attacking Different Host Plants

PubMed Central

Chiapello, Hélène; Mallet, Ludovic; Guérin, Cyprien; Aguileta, Gabriela; Amselem, Joëlle; Kroj, Thomas; Ortega-Abboud, Enrique; Lebrun, Marc-Henri; Henrissat, Bernard; Gendrault, Annie; Rodolphe, François; Tharreau, Didier; Fournier, Elisabeth

2015-01-01

Deciphering the genetic bases of pathogen adaptation to its host is a key question in ecology and evolution. To understand how the fungus Magnaporthe oryzae adapts to different plants, we sequenced eight M. oryzae isolates differing in host specificity (rice, foxtail millet, wheat, and goosegrass), and one Magnaporthe grisea isolate specific of crabgrass. Analysis of Magnaporthe genomes revealed small variation in genome sizes (39–43 Mb) and gene content (12,283–14,781 genes) between isolates. The whole set of Magnaporthe genes comprised 14,966 shared families, 63% of which included genes present in all the nine M. oryzae genomes. The evolutionary relationships among Magnaporthe isolates were inferred using 6,878 single-copy orthologs. The resulting genealogy was mostly bifurcating among the different host-specific lineages, but was reticulate inside the rice lineage. We detected traces of introgression from a nonrice genome in the rice reference 70-15 genome. Among M. oryzae isolates and host-specific lineages, the genome composition in terms of frequencies of genes putatively involved in pathogenicity (effectors, secondary metabolism, cazome) was conserved. However, 529 shared families were found only in nonrice lineages, whereas the rice lineage possessed 86 specific families absent from the nonrice genomes. Our results confirmed that the host specificity of M. oryzae isolates was associated with a divergence between lineages without major gene flow and that, despite the strong conservation of gene families between lineages, adaptation to different hosts, especially to rice, was associated with the presence of a small number of specific gene families. All information was gathered in a public database (http://genome.jouy.inra.fr/gemo). PMID:26454013
Genome re-assignment of Arachis trinitensis (Sect. Arachis, Leguminosae) and its implications for the genetic origin of cultivated peanut

PubMed Central

2010-01-01

The karyotype structure of Arachis trinitensis was studied by conventional Feulgen staining, CMA/DAPI banding and rDNA loci detection by fluorescence in situ hybridization (FISH) in order to establish its genome status and test the hypothesis that this species is a genome donor of cultivated peanut. Conventional staining revealed that the karyotype lacked the small “A chromosomes” characteristic of the A genome. In agreement with this, chromosomal banding showed that none of the chromosomes had the large centromeric bands expected for A chromosomes. FISH revealed one pair each of 5S and 45S rDNA loci, located in different medium-sized metacentric chromosomes. Collectively, these results suggest that A. trinitensis should be removed from the A genome and be considered as a B or non-A genome species. The pattern of heterochromatic bands and rDNA loci of A. trinitensis differ markedly from any of the complements of A. hypogaea, suggesting that the former species is unlikely to be one of the wild diploid progenitors of the latter. PMID:21637581
Comparative evolution history of SINEs in Arabidopsis thaliana and Brassica oleracea: evidence for a high rate of SINE loss.

PubMed

Lenoir, A; Pélissier, T; Bousquet-Antonelli, C; Deragon, J M

2005-01-01

Brassica oleracea and Arabidopsis thaliana belong to the Brassicaceae(Cruciferae) family and diverged 16 to 19 million years ago. Although the genome size of B. oleracea (approximately 600 million base pairs) is more than four times that of A. thaliana (approximately 130 million base pairs), their gene content is believed to be very similar with more than 85% sequence identity in the coding region. Therefore, this important difference in genome size is likely to reflect a different rate of non-coding DNA accumulation. Transposable elements (TEs) constitute a major fraction of non-coding DNA in plant species. A different rate in TE accumulation between two closely related species can result in significant genome size variations in a short evolutionary period. Short interspersed elements (SINEs) are non-autonomous retroposons that have invaded the genome of most eukaryote species. Several SINE families are present in B. oleracea and A. thaliana and we found that two of them (called RathE1 and RathE2) are present in both species. In this study, the tempo of evolution of RathE1 and RathE2 SINE families in both species was compared. We observed that most B. oleracea RathE2 SINEs are "young" (close to the consensus sequence) and abundant while elements from this family are more degenerated and much less abundant in A. thaliana. However, the situation is different for the RathE1 SINE family for which the youngest elements are found in A. thaliana. Surprisingly, no SINE was found to occupy the same (orthologous) genomic locus in both species suggesting that either these SINE families were not amplified at a significant rate in the common ancestor of the two species or that older elements were lost and only the recent (lineage-specific) insertions remain. To test this latter hypothesis, loci containing a recently inserted SINE in the A. thaliana col-0 ecotype were selected and characterized in several other A. thaliana ecotypes. In addition to the expected SINE containing allele and the pre-integrative allele (i.e. the "empty" allele), we observed in the different ecotypes, alleles with truncated portions of the SINE (up to the complete loss of the element) and of the immediate genomic flanking sequences. The absence of SINEs in orthologous positions between B. oleracea and A. thaliana and the presence in recently diverged A. thaliana ecotypes of alleles containing severely truncated SINEs suggest a very high rate of SINE loss in these species.
Low diversity, activity, and density of transposable elements in five avian genomes.

PubMed

Gao, Bo; Wang, Saisai; Wang, Yali; Shen, Dan; Xue, Songlei; Chen, Cai; Cui, Hengmi; Song, Chengyi

2017-07-01

In this study, we conducted the activity, diversity, and density analysis of transposable elements (TEs) across five avian genomes (budgerigar, chicken, turkey, medium ground finch, and zebra finch) to explore the potential reason of small genome sizes of birds. We found that these avian genomes exhibited low density of TEs by about 10% of genome coverages and low diversity of TEs with the TE landscapes dominated by CR1 and ERV elements, and contrasting proliferation dynamics both between TE types and between species were observed across the five avian genomes. Phylogenetic analysis revealed that CR1 clade was more diverse in the family structure compared with R2 clade in birds; avian ERVs were classified into four clades (alpha, beta, gamma, and ERV-L) and belonged to three classes of ERV with an uneven distributed in these lineages. The activities of DNA and SINE TEs were very low in the evolution history of avian genomes; most LINEs and LTRs were ancient copies with a substantial decrease of activity in recent, with only LTRs and LINEs in chicken and zebra finch exhibiting weak activity in very recent, and very few TEs were intact; however, the recent activity may be underestimated due to the sequencing/assembly technologies in some species. Overall, this study demonstrates low diversity, activity, and density of TEs in the five avian species; highlights the differences of TEs in these lineages; and suggests that the current and recent activity of TEs in avian genomes is very limited, which may be one of the reasons of small genome sizes in birds.
Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model

PubMed Central

2010-01-01

Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. PMID:20565983
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.

PubMed

Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor

2016-10-06

Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.
The complete chloroplast genome sequence of the chlorophycean green alga Scenedesmus obliquus reveals a compact gene organization and a biased distribution of genes on the two DNA strands

PubMed Central

de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique

2006-01-01

Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ considerably in gene content. Conclusion Our results underscore the remarkable plasticity of the chlorophycean chloroplast genome. Owing to this plasticity, only a sketchy portrait could be drawn for the chloroplast genome of the last common ancestor of Scenedesmus and Chlamydomonas. PMID:16638149
Comparative Genomic Analysis of Phylogenetically Closely Related Hydrogenobaculum sp. Isolates from Yellowstone National Park

PubMed Central

Romano, Christine; D'Imperio, Seth; Woyke, Tanja; Mavromatis, Konstantinos; Lasken, Roger; Shock, Everett L.

2013-01-01

We describe the complete genome sequences of four closely related Hydrogenobaculum sp. isolates (≥99.7% 16S rRNA gene identity) that were isolated from the outflow channel of Dragon Spring (DS), Norris Geyser Basin, in Yellowstone National Park (YNP), WY. The genomes range in size from 1,552,607 to 1,552,931 bp, contain 1,667 to 1,676 predicted genes, and are highly syntenic. There are subtle differences among the DS isolates, which as a group are different from Hydrogenobaculum sp. strain Y04AAS1 that was previously isolated from a geographically distinct YNP geothermal feature. Genes unique to the DS genomes encode arsenite [As(III)] oxidation, NADH-ubiquinone-plastoquinone (complex I), NADH-ubiquinone oxidoreductase chain, a DNA photolyase, and elements of a type II secretion system. Functions unique to strain Y04AAS1 include thiosulfate metabolism, nitrate respiration, and mercury resistance determinants. DS genomes contain seven CRISPR loci that are almost identical but are different from the single CRISPR locus in strain Y04AAS1. Other differences between the DS and Y04AAS1 genomes include average nucleotide identity (94.764%) and percentage conserved DNA (80.552%). Approximately half of the genes unique to Y04AAS1 are predicted to have been acquired via horizontal gene transfer. Fragment recruitment analysis and marker gene searches demonstrated that the DS metagenome was more similar to the DS genomes than to the Y04AAS1 genome, but that the DS community is likely comprised of a continuum of Hydrogenobaculum genotypes that span from the DS genomes described here to an Y04AAS1-like organism, which appears to represent a distinct ecotype relative to the DS genomes characterized. PMID:23435891
Global Organization of a Positive-strand RNA Virus Genome

PubMed Central

Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

2013-01-01

The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
The rapidly expanding universe of giant viruses: Mimivirus, Pandoravirus, Pithovirus and Mollivirus.

PubMed

Abergel, Chantal; Legendre, Matthieu; Claverie, Jean-Michel

2015-11-01

More than a century ago, the term 'virus' was introduced to describe infectious agents that are invisible by light microscopy and capable of passing through sterilizing filters. In addition to their extremely small size, most viruses have minimal genomes and gene contents, and rely almost entirely on host cell-encoded functions to multiply. Unexpectedly, four different families of eukaryotic 'giant viruses' have been discovered over the past 10 years with genome sizes, gene contents and particle dimensions overlapping with that of cellular microbes. Their ongoing analyses are challenging accepted ideas about the diversity, evolution and origin of DNA viruses. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Evolutionary and Taxonomic Implications of Variation in Nuclear Genome Size: Lesson from the Grass Genus Anthoxanthum (Poaceae)

PubMed Central

Chumová, Zuzana; Krejčíková, Jana; Mandáková, Terezie; Suda, Jan; Trávníček, Pavel

2015-01-01

The genus Anthoxanthum (sweet vernal grass, Poaceae) represents a taxonomically intricate polyploid complex with large phenotypic variation and its evolutionary relationships still poorly resolved. In order to get insight into the geographic distribution of ploidy levels and assess the taxonomic value of genome size data, we determined C- and Cx-values in 628 plants representing all currently recognized European species collected from 197 populations in 29 European countries. The flow cytometric estimates were supplemented by conventional chromosome counts. In addition to diploids, we found two low (rare 3x and common 4x) and one high (~16x–18x) polyploid levels. Mean holoploid genome sizes ranged from 5.52 pg in diploid A. alpinum to 44.75 pg in highly polyploid A. amarum, while the size of monoploid genomes ranged from 2.75 pg in tetraploid A. alpinum to 9.19 pg in diploid A. gracile. In contrast to Central and Northern Europe, which harboured only limited cytological variation, a much more complex pattern of genome sizes was revealed in the Mediterranean, particularly in Corsica. Eight taxonomic groups that partly corresponded to traditionally recognized species were delimited based on genome size values and phenotypic variation. Whereas our data supported the merger of A. aristatum and A. ovatum, eastern Mediterranean populations traditionally referred to as diploid A. odoratum were shown to be cytologically distinct, and may represent a new taxon. Autopolyploid origin was suggested for 4x A. alpinum. In contrast, 4x A. odoratum seems to be an allopolyploid, based on the amounts of nuclear DNA. Intraspecific variation in genome size was observed in all recognized species, the most striking example being the A. aristatum/ovatum complex. Altogether, our study showed that genome size can be a useful taxonomic marker in Anthoxathum to not only guide taxonomic decisions but also help resolve evolutionary relationships in this challenging grass genus. PMID:26207824
Evolutionary and Taxonomic Implications of Variation in Nuclear Genome Size: Lesson from the Grass Genus Anthoxanthum (Poaceae).

PubMed

Chumová, Zuzana; Krejčíková, Jana; Mandáková, Terezie; Suda, Jan; Trávníček, Pavel

2015-01-01

The genus Anthoxanthum (sweet vernal grass, Poaceae) represents a taxonomically intricate polyploid complex with large phenotypic variation and its evolutionary relationships still poorly resolved. In order to get insight into the geographic distribution of ploidy levels and assess the taxonomic value of genome size data, we determined C- and Cx-values in 628 plants representing all currently recognized European species collected from 197 populations in 29 European countries. The flow cytometric estimates were supplemented by conventional chromosome counts. In addition to diploids, we found two low (rare 3x and common 4x) and one high (~16x-18x) polyploid levels. Mean holoploid genome sizes ranged from 5.52 pg in diploid A. alpinum to 44.75 pg in highly polyploid A. amarum, while the size of monoploid genomes ranged from 2.75 pg in tetraploid A. alpinum to 9.19 pg in diploid A. gracile. In contrast to Central and Northern Europe, which harboured only limited cytological variation, a much more complex pattern of genome sizes was revealed in the Mediterranean, particularly in Corsica. Eight taxonomic groups that partly corresponded to traditionally recognized species were delimited based on genome size values and phenotypic variation. Whereas our data supported the merger of A. aristatum and A. ovatum, eastern Mediterranean populations traditionally referred to as diploid A. odoratum were shown to be cytologically distinct, and may represent a new taxon. Autopolyploid origin was suggested for 4x A. alpinum. In contrast, 4x A. odoratum seems to be an allopolyploid, based on the amounts of nuclear DNA. Intraspecific variation in genome size was observed in all recognized species, the most striking example being the A. aristatum/ovatum complex. Altogether, our study showed that genome size can be a useful taxonomic marker in Anthoxathum to not only guide taxonomic decisions but also help resolve evolutionary relationships in this challenging grass genus.
The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae

PubMed Central

Neale, David B.; McGuire, Patrick E.; Wheeler, Nicholas C.; Stevens, Kristian A.; Crepeau, Marc W.; Cardeno, Charis; Zimin, Aleksey V.; Puiu, Daniela; Pertea, Geo M.; Sezen, U. Uzay; Casola, Claudio; Koralewski, Tomasz E.; Paul, Robin; Gonzalez-Ibeas, Daniel; Zaman, Sumaira; Cronn, Richard; Yandell, Mark; Holt, Carson; Langley, Charles H.; Yorke, James A.; Salzberg, Steven L.; Wegrzyn, Jill L.

2017-01-01

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms. PMID:28751502
Centromere and telomere sequence alterations reflect the rapid genome evolution within the carnivorous plant genus Genlisea.

PubMed

Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Neumann, Pavel; Novák, Petr; Fojtová, Miloslava; Vu, Giang T H; Macas, Jiří; Fajkus, Jiří; Schubert, Ingo; Fuchs, Joerg

2015-12-01

Linear chromosomes of eukaryotic organisms invariably possess centromeres and telomeres to ensure proper chromosome segregation during nuclear divisions and to protect the chromosome ends from deterioration and fusion, respectively. While centromeric sequences may differ between species, with arrays of tandemly repeated sequences and retrotransposons being the most abundant sequence types in plant centromeres, telomeric sequences are usually highly conserved among plants and other organisms. The genome size of the carnivorous genus Genlisea (Lentibulariaceae) is highly variable. Here we study evolutionary sequence plasticity of these chromosomal domains at an intrageneric level. We show that Genlisea nigrocaulis (1C = 86 Mbp; 2n = 40) and G. hispidula (1C = 1550 Mbp; 2n = 40) differ as to their DNA composition at centromeres and telomeres. G. nigrocaulis and its close relative G. pygmaea revealed mainly 161 bp tandem repeats, while G. hispidula and its close relative G. subglabra displayed a combination of four retroelements at centromeric positions. G. nigrocaulis and G. pygmaea chromosome ends are characterized by the Arabidopsis-type telomeric repeats (TTTAGGG); G. hispidula and G. subglabra instead revealed two intermingled sequence variants (TTCAGG and TTTCAGG). These differences in centromeric and, surprisingly, also in telomeric DNA sequences, uncovered between groups with on average a > 9-fold genome size difference, emphasize the fast genome evolution within this genus. Such intrageneric evolutionary alteration of telomeric repeats with cytosine in the guanine-rich strand, not yet known for plants, might impact the epigenetic telomere chromatin modification. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions.

PubMed

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai

2017-01-01

Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions

PubMed Central

Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize

2017-01-01

Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399

Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

PubMed Central

Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

2016-01-01

Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141
Small genomes and large seeds: chromosome numbers, genome size and seed mass in diploid Aesculus species (Sapindaceae).

PubMed

Krahulcová, Anna; Trávnícek, Pavel; Krahulec, František; Rejmánek, Marcel

2017-04-01

Aesculus L. (horse chestnut, buckeye) is a genus of 12-19 extant woody species native to the temperate Northern Hemisphere. This genus is known for unusually large seeds among angiosperms. While chromosome counts are available for many Aesculus species, only one has had its genome size measured. The aim of this study is to provide more genome size data and analyse the relationship between genome size and seed mass in this genus. Chromosome numbers in root tip cuttings were confirmed for four species and reported for the first time for three additional species. Flow cytometric measurements of 2C nuclear DNA values were conducted on eight species, and mean seed mass values were estimated for the same taxa. The same chromosome number, 2 n = 40, was determined in all investigated taxa. Original measurements of 2C values for seven Aesculus species (eight taxa), added to just one reliable datum for A. hippocastanum , confirmed the notion that the genome size in this genus with relatively large seeds is surprisingly low, ranging from 0·955 pg 2C -1 in A. parviflora to 1·275 pg 2C -1 in A. glabra var. glabra. The chromosome number of 2 n = 40 seems to be conclusively the universal 2 n number for non-hybrid species in this genus. Aesculus genome sizes are relatively small, not only within its own family, Sapindaceae, but also within woody angiosperms. The genome sizes seem to be distinct and non-overlapping among the four major Aesculus clades. These results provide an extra support for the most recent reconstruction of Aesculus phylogeny. The correlation between the 2C values and seed masses in examined Aesculus species is slightly negative and not significant. However, when the four major clades are treated separately, there is consistent positive association between larger genome size and larger seed mass within individual lineages. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Can multi-subpopulation reference sets improve the genomic predictive ability for pigs?

PubMed

Fangmann, A; Bergfelder-Drüing, S; Tholen, E; Simianer, H; Erbe, M

2015-12-01

In most countries and for most livestock species, genomic evaluations are obtained from within-breed analyses. To achieve reliable breeding values, however, a sufficient reference sample size is essential. To increase this size, the use of multibreed reference populations for small populations is considered a suitable option in other species. Over decades, the separate breeding work of different pig breeding organizations in Germany has led to stratified subpopulations in the breed German Large White. Due to this fact and the limited number of Large White animals available in each organization, there was a pressing need for ascertaining if multi-subpopulation genomic prediction is superior compared with within-subpopulation prediction in pigs. Direct genomic breeding values were estimated with genomic BLUP for the trait "number of piglets born alive" using genotype data (Illumina Porcine 60K SNP BeadChip) from 2,053 German Large White animals from five different commercial pig breeding companies. To assess the prediction accuracy of within- and multi-subpopulation reference sets, a random 5-fold cross-validation with 20 replications was performed. The five subpopulations considered were only slightly differentiated from each other. However, the prediction accuracy of the multi-subpopulations approach was not better than that of the within-subpopulation evaluation, for which the predictive ability was already high. Reference sets composed of closely related multi-subpopulation sets performed better than sets of distantly related subpopulations but not better than the within-subpopulation approach. Despite the low differentiation of the five subpopulations, the genetic connectedness between these different subpopulations seems to be too small to improve the prediction accuracy by applying multi-subpopulation reference sets. Consequently, resources should be used for enlarging the reference population within subpopulation, for example, by adding genotyped females.
Genome-wide SNP analysis reveals a genetic basis for sea-age variation in a wild population of Atlantic salmon (Salmo salar).

PubMed

Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R

2014-07-01

Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.
DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

PubMed

Bhaskar, Anand; Song, Yun S

2014-01-01

The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.
DESCARTES’ RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA1

PubMed Central

Bhaskar, Anand; Song, Yun S.

2016-01-01

The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the “folded” SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes’ rule of signs for polynomials to the Laplace transform of piecewise continuous functions. PMID:28018011
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

PubMed

Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

2015-04-01

Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.
RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

PubMed

Horesh, Yair; Wexler, Ydo; Lebenthal, Ilana; Ziv-Ukelson, Michal; Unger, Ron

2009-03-04

Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described. Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome. The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.
Genetic variability of mutans streptococci revealed by wide whole-genome sequencing

PubMed Central

2013-01-01

Background Mutans streptococci are a group of bacteria significantly contributing to tooth decay. Their genetic variability is however still not well understood. Results Genomes of 6 clinical S. mutans isolates of different origins, one isolate of S. sobrinus (DSM 20742) and one isolate of S. ratti (DSM 20564) were sequenced and comparatively analyzed. Genome alignment revealed a mosaic-like structure of genome arrangement. Genes related to pathogenicity are found to have high variations among the strains, whereas genes for oxidative stress resistance are well conserved, indicating the importance of this trait in the dental biofilm community. Analysis of genome-scale metabolic networks revealed significant differences in 42 pathways. A striking dissimilarity is the unique presence of two lactate oxidases in S. sobrinus DSM 20742, probably indicating an unusual capability of this strain in producing H2O2 and expanding its ecological niche. In addition, lactate oxidases may form with other enzymes a novel energetic pathway in S. sobrinus DSM 20742 that can remedy its deficiency in citrate utilization pathway. Using 67 S. mutans genomes currently available including the strains sequenced in this study, we estimates the theoretical core genome size of S. mutans, and performed modeling of S. mutans pan-genome by applying different fitting models. An “open” pan-genome was inferred. Conclusions The comparative genome analyses revealed diversities in the mutans streptococci group, especially with respect to the virulence related genes and metabolic pathways. The results are helpful for better understanding the evolution and adaptive mechanisms of these oral pathogen microorganisms and for combating them. PMID:23805886
An overview on genome organization of marine organisms.

PubMed

Costantini, Maria

2015-12-01

In this review we will concentrate on some general genome features of marine organisms and their evolution, ranging from vertebrate to invertebrates until unicellular organisms. Before genome sequencing, the ultracentrifugation in CsCl led to high resolution of mammalian DNA (without seeing at the sequence). The analytical profile of human DNA showed that the vertebrate genome is a mosaic of isochores, typically megabase-size DNA segments that belong in a small number of families characterized by different GC levels. The recent availability of a number of fully sequenced genomes allowed mapping very precisely the isochores, based on DNA sequences. Since isochores are tightly linked to biological properties such as gene density, replication timing and recombination, the new level of detail provided by the isochore map helped the understanding of genome structure, function and evolution. This led the current level of knowledge and to further insights. Copyright © 2015. Published by Elsevier B.V.
Analysis of phylogenetic relationships and genome size evolution of the Amaranthus genus using GBS indicates the ancestors of an ancient crop.

PubMed

Stetter, Markus G; Schmid, Karl J

2017-04-01

The genus Amaranthus consists of 50-70 species and harbors several cultivated and weedy species of great economic importance. A small number of suitable traits, phenotypic plasticity, gene flow and hybridization made it difficult to establish the taxonomy and phylogeny of the whole genus despite various studies using molecular markers. We inferred the phylogeny of the Amaranthus genus using genotyping by sequencing (GBS) of 94 genebank accessions representing 35 Amaranthus species and measured their genome sizes. SNPs were called by de novo and reference-based methods, for which we used the distant sugarbeet Beta vulgaris and the closely related Amaranthus hypochondriacus as references. SNP counts and proportions of missing data differed between methods, but the resulting phylogenetic trees were highly similar. A distance-based neighbor joining tree of individual accessions and a species tree calculated with the multispecies coalescent supported a previous taxonomic classification into three subgenera although the subgenus A. Acnida consists of two highly differentiated clades. The analysis of the Hybridus complex within the A. Amaranthus subgenus revealed insights on the history of cultivated grain amaranths. The complex includes the three cultivated grain amaranths and their wild relatives and was well separated from other species in the subgenus. Wild and cultivated amaranth accessions did not differentiate according to the species assignment but clustered by their geographic origin from South and Central America. Different geographically separated populations of Amaranthus hybridus appear to be the common ancestors of the three cultivated grain species and A. quitensis might be additionally be involved in the evolution of South American grain amaranth (A. caudatus). We also measured genome sizes of the species and observed little variation with the exception of two lineages that showed evidence for a recent polyploidization. With the exception of two lineages, genome sizes are quite similar and indicate that polyploidization did not play a major role in the history of the genus. Copyright © 2016 Elsevier Inc. All rights reserved.
Comparative genomics of the bacterial genus Streptococcus illuminates evolutionary implications of species groups.

PubMed

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into "species groups". However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups.
Comparative Genomics of the Bacterial Genus Streptococcus Illuminates Evolutionary Implications of Species Groups

PubMed Central

Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun

2014-01-01

Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into “species groups”. However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups. PMID:24977706
Multiple origins of interdependent endosymbiotic complexes in a genus of cicadas.

PubMed

Łukasik, Piotr; Nazario, Katherine; Van Leuven, James T; Campbell, Matthew A; Meyer, Mariah; Michalik, Anna; Pessacq, Pablo; Simon, Chris; Veloso, Claudio; McCutcheon, John P

2018-01-09

Bacterial endosymbionts that provide nutrients to hosts often have genomes that are extremely stable in structure and gene content. In contrast, the genome of the endosymbiont Hodgkinia cicadicola has fractured into multiple distinct lineages in some species of the cicada genus Tettigades To better understand the frequency, timing, and outcomes of Hodgkinia lineage splitting throughout this cicada genus, we sampled cicadas over three field seasons in Chile and performed genomics and microscopy on representative samples. We found that a single ancestral Hodgkinia lineage has split at least six independent times in Tettigades over the last 4 million years, resulting in complexes of between two and six distinct Hodgkinia lineages per host. Individual genomes in these symbiotic complexes differ dramatically in relative abundance, genome size, organization, and gene content. Each Hodgkinia lineage retains a small set of core genes involved in genetic information processing, but the high level of gene loss experienced by all genomes suggests that extensive sharing of gene products among symbiont cells must occur. In total, Hodgkinia complexes that consist of multiple lineages encode nearly complete sets of genes present on the ancestral single lineage and presumably perform the same functions as symbionts that have not undergone splitting. However, differences in the timing of the splits, along with dissimilar gene loss patterns on the resulting genomes, have led to very different outcomes of lineage splitting in extant cicadas.
Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment.

PubMed

Li, Cai; Zhang, Yong; Li, Jianwen; Kong, Lesheng; Hu, Haofu; Pan, Hailin; Xu, Luohao; Deng, Yuan; Li, Qiye; Jin, Lijun; Yu, Hao; Chen, Yan; Liu, Binghang; Yang, Linfeng; Liu, Shiping; Zhang, Yan; Lang, Yongshan; Xia, Jinquan; He, Weiming; Shi, Qiong; Subramanian, Sankar; Millar, Craig D; Meader, Stephen; Rands, Chris M; Fujita, Matthew K; Greenwold, Matthew J; Castoe, Todd A; Pollock, David D; Gu, Wanjun; Nam, Kiwoong; Ellegren, Hans; Ho, Simon Yw; Burt, David W; Ponting, Chris P; Jarvis, Erich D; Gilbert, M Thomas P; Yang, Huanming; Wang, Jian; Lambert, David M; Wang, Jun; Zhang, Guojie

2014-01-01

Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Phylogenetic dating suggests that early penguins arose ~60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from ~1 million years ago to ~100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
Complete mitochondrial genome sequence of black mustard (Brassica nigra; BB) and comparison with Brassica oleracea (CC) and Brassica carinata (BBCC).

PubMed

Yamagishi, Hiroshi; Tanaka, Yoshiyuki; Terachi, Toru

2014-11-01

Crop species of Brassica (Brassicaceae) consist of three monogenomic species and three amphidiploid species resulting from interspecific hybridizations among them. Until now, mitochondrial genome sequences were available for only five of these species. We sequenced the mitochondrial genome of the sixth species, Brassica nigra (nuclear genome constitution BB), and compared it with those of Brassica oleracea (CC) and Brassica carinata (BBCC). The genome was assembled into a 232 145 bp circular sequence that is slightly larger than that of B. oleracea (219 952 bp). The genome of B. nigra contained 33 protein-coding genes, 3 rRNA genes, and 17 tRNA genes. The cox2-2 gene present in B. oleracea was absent in B. nigra. Although the nucleotide sequences of 52 genes were identical between B. nigra and B. carinata, the second exon of rps3 showed differences including an insertion/deletion (indel) and nucleotide substitutions. A PCR test to detect the indel revealed intraspecific variation in rps3, and in one line of B. nigra it amplified a DNA fragment of the size expected for B. carinata. In addition, the B. carinata lines tested here produced DNA fragments of the size expected for B. nigra. The results indicate that at least two mitotypes of B. nigra were present in the maternal parents of B. carinata.
Effect of storage and processing on plasmid, yeast and plant genomic DNA stability in juice from genetically modified oranges.

PubMed

Weiss, Julia; Ros-Chumillas, Maria; Peña, Leandro; Egea-Cortines, Marcos

2007-01-30

Recombinant DNA technology is an important tool in the development of plant varieties with new favourable features. There is strong opposition towards this technology due to the potential risk of horizontal gene transfer between genetically modified plant material and food-associated bacteria, especially if genes for antibiotic resistance are involved. Since horizontal transfer efficiency depends on size and length of homologous sequences, we investigated the effect of conditions required for orange juice processing on the stability of DNA from three different origins: plasmid DNA, yeast genomic DNA and endogenous genomic DNA from transgenic sweet orange (C. sinensis L. Osb.). Acidic orange juice matrix had a strong degrading effect on plasmid DNA which becomes apparent in a conformation change from supercoiled structure to nicked, linear structure within 5h of storage at 4 degrees C. Genomic yeast DNA was degraded during exposure to acidic orange juice matrix within 4 days, and also the genomic DNA of C. sinensis suffered degradation within 2 days of storage as indicated by amplification results from transgene markers. Standard pasteurization procedures affected DNA integrity depending on the method and time used. Our data show that the current standard industrial procedures to pasteurize orange juice as well as its acidic nature causes a strong degradation of both yeast and endogenous genomic DNA below sizes reported to be suitable for horizontal gene transfer.
The distribution of runs of homozygosity and selection signatures in six commercial meat sheep breeds

PubMed Central

Purfield, Deirdre C.; McParland, Sinead; Wall, Eamon; Berry, Donagh P.

2017-01-01

Domestication and the subsequent selection of animals for either economic or morphological features can leave a variety of imprints on the genome of a population. Genomic regions subjected to high selective pressures often show reduced genetic diversity and frequent runs of homozygosity (ROH). Therefore, the objective of the present study was to use 42,182 autosomal SNPs to identify genomic regions in 3,191 sheep from six commercial breeds subjected to selection pressure and to quantify the genetic diversity within each breed using ROH. In addition, the historical effective population size of each breed was also estimated and, in conjunction with ROH, was used to elucidate the demographic history of the six breeds. ROH were common in the autosomes of animals in the present study, but the observed breed differences in patterns of ROH length and burden suggested differences in breed effective population size and recent management. ROH provided a sufficient predictor of the pedigree inbreeding coefficient, with an estimated correlation between both measures of 0.62. Genomic regions under putative selection were identified using two complementary algorithms; the fixation index and hapFLK. The identified regions under putative selection included candidate genes associated with skin pigmentation, body size and muscle formation; such characteristics are often sought after in modern-day breeding programs. These regions of selection frequently overlapped with high ROH regions both within and across breeds. Multiple yet uncharacterised genes also resided within putative regions of selection. This further substantiates the need for a more comprehensive annotation of the sheep genome as these uncharacterised genes may contribute to traits of interest in the animal sciences. Despite this, the regions identified as under putative selection in the current study provide an insight into the mechanisms leading to breed differentiation and genetic variation in meat production. PMID:28463982
GFFview: A Web Server for Parsing and Visualizing Annotation Information of Eukaryotic Genome.

PubMed

Deng, Feilong; Chen, Shi-Yi; Wu, Zhou-Lin; Hu, Yongsong; Jia, Xianbo; Lai, Song-Jia

2017-10-01

Owing to wide application of RNA sequencing (RNA-seq) technology, more and more eukaryotic genomes have been extensively annotated, such as the gene structure, alternative splicing, and noncoding loci. Annotation information of genome is prevalently stored as plain text in General Feature Format (GFF), which could be hundreds or thousands Mb in size. Therefore, it is a challenge for manipulating GFF file for biologists who have no bioinformatic skill. In this study, we provide a web server (GFFview) for parsing the annotation information of eukaryotic genome and then generating statistical description of six indices for visualization. GFFview is very useful for investigating quality and difference of the de novo assembled transcriptome in RNA-seq studies.
Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes

PubMed Central

Voss, Stephen R.; Kump, D. Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J.; Basa, Saritha; Walker, John A.; Smith, Jeramiah J.

2011-01-01

Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12–17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution. PMID:21482624

Genome sequencing of the winged midge, Parochlus steinenii, from the Antarctic Peninsula

PubMed Central

Kim, Sanghee; Oh, Mijin; Jung, Woongsic; Park, Joonho; Choi, Han-Gu

2017-01-01

Abstract Background: In the Antarctic, only two species of Chironomidae occur naturally—the wingless midge, Belgica antarctica, and the winged midge, Parochlus steinenii. B. antarctica is an extremophile with unusual adaptations. The larvae of B. antarctica are desiccation- and freeze-tolerant and the adults are wingless. Recently, the compact genome of B. antarctica was reported and it is the first Antarctic eukaryote to be sequenced. Although P. steinenii occurs naturally in the Antarctic with B. antarctica, the larvae of P. steinenii are cold-tolerant but not freeze-tolerant and the adults are winged. Differences in adaptations in the Antarctic midges are interesting in terms of evolutionary processes within an extreme environment. Herein, we provide the genome of another Antarctic midge to help elucidate the evolution of these species. Results: The draft genome of P. steinenii had a total size of 138 Mbp, comprising 9513 contigs with an N50 contig size of 34,110 bp, and a GC content of 32.2%. Overall, 13,468 genes were predicted using the MAKER annotation pipeline, and gene ontology classified 10,801 (80.2%) predicted genes to a function. Compared with the assembled genome architecture of B. antarctica, that of P. steinenii was approximately 50 Mbp longer with 6.2-fold more repeat sequences, whereas gene regions were as similarly compact as in B. antarctica. Conclusions: We present an annotated draft genome of the Antarctic midge, P. steinenii. The genomes of P. steinenii and B. antarctica will aid in the elucidation of evolution in harsh environments and provide new resources for functional genomic analyses of the order Diptera. PMID:28327954
Genome size variation among sex types in dioecious and triecious Caricaceae species

USDA-ARS?s Scientific Manuscript database

Caricaceae is a small family consisting of 35 species of varying sexual systems and includes economically important fruit crop, Carica papaya, and other species of “highland papayas”. Flow cytometry was used to obtain genome sizes for 11 species in three genera of Caricaceae to determine if genome s...
Insights on genome size evolution from a miniature inverted repeat transposon driving a satellite DNA.

PubMed

Scalvenzi, Thibault; Pollet, Nicolas

2014-12-01

The genome size in eukaryotes does not correlate well with the number of genes they contain. We can observe this so-called C-value paradox in amphibian species. By analyzing an amphibian genome we asked how repetitive DNA can impact genome size and architecture. We describe here our discovery of a Tc1/mariner miniature inverted-repeat transposon family present in Xenopus frogs. These transposons named miDNA4 are unique since they contain a satellite DNA motif. We found that miDNA4 measured 331 bp, contained 25 bp long inverted terminal repeat sequences and a sequence motif of 119 bp present as a unique copy or as an array of 2-47 copies. We characterized the structure, dynamics, impact and evolution of the miDNA4 family and its satellite DNA in Xenopus frog genomes. This led us to propose a model for the evolution of these two repeated sequences and how they can synergize to increase genome size. Copyright © 2014 Elsevier Inc. All rights reserved.
On the Sequence-Directed Nature of Human Gene Mutation: The Role of Genomic Architecture and the Local DNA Sequence Environment in Mediating Gene Mutations Underlying Human Inherited Disease

PubMed Central

Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min

2011-01-01

Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507
First complete chromosomal organization of a protozoan plant parasite (Phytomonas spp.).

PubMed

Marín, Clotilde; Alberge, Blandine; Dollet, Michel; Pagès, Michel; Bastien, Patrick

2008-01-01

Phytomonas spp. are members of the family Trypanosomatidae that parasitize plants and may cause lethal diseases in crops such as Coffee Phloem necrosis, Hartrot in coconut, and Marchitez sorpresiva in oil palm. In this study, the molecular karyotype of 6 isolates from latex plants has been entirely elucidated by pulsed-field gel electrophoresis and DNA hybridization. Twenty-one chromosomal linkage groups constituting heterologous chromosomes and sizing between 0.3 and 3 Mb could be physically defined by the use of 75 DNA markers (sequence-tagged sites and genes). From these data, the genome size can be estimated at 25.5 (+/-2) Mb. The physical linkage groups were consistently conserved in all strains examined. Moreover, the finding of several pairs of different-sized homologous chromosomes strongly suggest diploidy for this organism. The definition of the complete molecular karyotype of Phytomonas represents an essential primary step toward sequencing the genome of this parasite of economical importance.
Identification of novel deletions of 15q11q13 in Angelman syndrome by array-CGH: molecular characterization and genotype-phenotype correlations.

PubMed

Sahoo, Trilochan; Bacino, Carlos A; German, Jennifer R; Shaw, Chad A; Bird, Lynne M; Kimonis, Virginia; Anselm, Irinia; Waisbren, Susan; Beaudet, Arthur L; Peters, Sarika U

2007-09-01

Angelman syndrome (AS) is a neurodevelopmental disorder characterized by mental retardation, absent speech, ataxia, and a happy disposition. Deletions of the 15q11q13 region are found in approximately 70% of AS patients. The deletions are sub-classified into class I and class II based on their sizes of approximately 6.8 and approximately 6.0, respectively, with two different proximal breakpoints and a common distal breakpoint. Utilizing a chromosome 15-specific comparative genomic hybridization genomic microarray (array-CGH), we have identified, determined the deletion sizes, and mapped the breakpoints in a cohort of 44 cases, to relate those breakpoints to the genomic architecture and derive more precise genotype-phenotype correlations. Interestingly four patients of the 44 studied (9.1%) had novel and unusually large deletions, and are reported here. This is the first report of very large deletions of 15q11q13 resulting in AS; the largest deletion being >10.6 Mb. These novel deletions involve three different distal breakpoints, two of which have been earlier shown to be involved in the generation of isodicentric 15q chromosomes (idic15). Additionally, precise determination of the deletion breakpoints reveals the presence of directly oriented low-copy repeats (LCRs) flanking the recurrent and novel breakpoints. The LCRs are adequate in size, orientation, and homology to enable abnormal recombination events leading to deletions and duplications. This genomic organization provides evidence for a common mechanism for the generation of both common and rare deletion types. Larger deletions result in a loss of several genes outside the common Angelman syndrome-Prader-Willi syndrome (AS-PWS) critical interval, and a more severe phenotype.
A draft genome assembly of the army worm, Spodoptera frugiperda.

PubMed

Kakumani, Pavan Kumar; Malhotra, Pawan; Mukherjee, Sunil K; Bhatnagar, Raj K

2014-08-01

Spodoptera is an agriculturally important pest insect and studies in understanding its biology have been limited by the unavailability of its genome. In the present study, the genomic DNA was sequenced and assembled into 37,243 scaffolds of size, 358 Mb with N50 of 53.7 kb. Based on degree of identity, we could anchor 305 Mb of the genome onto all the 28 chromosomes of Bombyx mori. Repeat elements were identified, which accounts for 20.28% of the total genome. Further, we predicted 11,595 genes, with an average intron length of 726 bp. The genes were annotated and domain analysis revealed that Sf genes share a significant homology and expression pattern with B. mori, despite differences in KOG gene categories and representation of certain protein families. The present study on Sf genome would help in the characterization of cellular pathways to understand its biology and comparative evolutionary studies among lepidopteran family members to help annotate their genomes. Copyright © 2014 Elsevier Inc. All rights reserved.
The Genetic Architecture Underlying the Evolution of a Rare Piscivorous Life History Form in Brown Trout after Secondary Contact and Strong Introgression.

PubMed

Jacobs, Arne; Hughes, Martin R; Robinson, Paige C; Adams, Colin E; Elmer, Kathryn R

2018-05-31

Identifying the genetic basis underlying phenotypic divergence and reproductive isolation is a longstanding problem in evolutionary biology. Genetic signals of adaptation and reproductive isolation are often confounded by a wide range of factors, such as variation in demographic history or genomic features. Brown trout ( Salmo trutta ) in the Loch Maree catchment, Scotland, exhibit reproductively isolated divergent life history morphs, including a rare piscivorous (ferox) life history form displaying larger body size, greater longevity and delayed maturation compared to sympatric benthivorous brown trout. Using a dataset of 16,066 SNPs, we analyzed the evolutionary history and genetic architecture underlying this divergence. We found that ferox trout and benthivorous brown trout most likely evolved after recent secondary contact of two distinct glacial lineages, and identified 33 genomic outlier windows across the genome, of which several have most likely formed through selection. We further identified twelve candidate genes and biological pathways related to growth, development and immune response potentially underpinning the observed phenotypic differences. The identification of clear genomic signals divergent between life history phenotypes and potentially linked to reproductive isolation, through size assortative mating, as well as the identification of the underlying demographic history, highlights the power of genomic studies of young species pairs for understanding the factors shaping genetic differentiation.
The mitochondrial genome of the ethanol-metabolizing, wine cellar mold Zasmidium cellare is the smallest for a filamentous ascomycete.

PubMed

Goodwin, Stephen B; McCorison, Cassandra B; Cavaletto, Jessica R; Culley, David E; LaButti, Kurt; Baker, Scott E; Grigoriev, Igor V

2016-08-01

Fungi in the class Dothideomycetes often live in extreme environments or have unusual physiology. One of these, the wine cellar mold Zasmidium cellare, produces thick curtains of mycelia in cellars with high humidity, and its ability to metabolize volatile organic compounds is thought to improve air quality. Whether these abilities have affected its mitochondrial genome is not known. To fill this gap, the circular-mapping mitochondrial genome of Z. cellare was sequenced and, at only 23 743 bp, is the smallest reported for a filamentous fungus. Genes were encoded on both strands with a single change of direction, different from most other fungi but consistent with the Dothideomycetes. Other than its small size, the only unusual feature of the Z. cellare mitochondrial genome was two copies of a 110-bp sequence that were duplicated, inverted and separated by approximately 1 kb. This inverted-repeat sequence confused the assembly program but appears to have no functional significance. The small size of the Z. cellare mitochondrial genome was due to slightly smaller genes, lack of introns and non-essential genes, reduced intergenic spacers and very few ORFs relative to other fungi rather than a loss of essential genes. Whether this reduction facilitates its unusual biology remains unknown. Published by Elsevier Ltd.
Comprehensive Survey of Genetic Diversity in Chloroplast Genomes and 45S nrDNAs within Panax ginseng Species

PubMed Central

Kim, Kyunghee; Lee, Sang-Choon; Lee, Junki; Lee, Hyun Oh; Joh, Ho Jun; Kim, Nam-Hoon; Park, Hyun-Seung; Yang, Tae-Jin

2015-01-01

We report complete sequences of chloroplast (cp) genome and 45S nuclear ribosomal DNA (45S nrDNA) for 11 Panax ginseng cultivars. We have obtained complete sequences of cp and 45S nrDNA, the representative barcoding target sequences for cytoplasm and nuclear genome, respectively, based on low coverage NGS sequence of each cultivar. The cp genomes sizes ranged from 156,241 to 156,425 bp and the major size variation was derived from differences in copy number of tandem repeats in the ycf1 gene and in the intergenic regions of rps16-trnUUG and rpl32-trnUAG. The complete 45S nrDNA unit sequences were 11,091 bp, representing a consensus single transcriptional unit with an intergenic spacer region. Comparative analysis of these sequences as well as those previously reported for three Chinese accessions identified very rare but unique polymorphism in the cp genome within P. ginseng cultivars. There were 12 intra-species polymorphisms (six SNPs and six InDels) among 14 cultivars. We also identified five SNPs from 45S nrDNA of 11 Korean ginseng cultivars. From the 17 unique informative polymorphic sites, we developed six reliable markers for analysis of ginseng diversity and cultivar authentication. PMID:26061692
The Multipartite Mitochondrial Genome of Liposcelis bostrychophila: Insights into the Evolution of Mitochondrial Genomes in Bilateral Animals

PubMed Central

Yuan, Ming-Long; Dou, Wei; Barker, Stephen C.; Wang, Jin-Jun

2012-01-01

Booklice (order Psocoptera) in the genus Liposcelis are major pests to stored grains worldwide and are closely related to parasitic lice (order Phthiraptera). We sequenced the mitochondrial (mt) genome of Liposcelis bostrychophila and found that the typical single mt chromosome of bilateral animals has fragmented into and been replaced by two medium-sized chromosomes in this booklouse; each of these chromosomes has about half of the genes of the typical mt chromosome of bilateral animals. These mt chromosomes are 8,530 bp (mt chromosome I) and 7,933 bp (mt chromosome II) in size. Intriguingly, mt chromosome I is twice as abundant as chromosome II. It appears that the selection pressure for compact mt genomes in bilateral animals favors small mt chromosomes when small mt chromosomes co-exist with the typical large mt chromosomes. Thus, small mt chromosomes may have selective advantages over large mt chromosomes in bilateral animals. Phylogenetic analyses of mt genome sequences of Psocodea (i.e. Psocoptera plus Phthiraptera) indicate that: 1) the order Psocoptera (booklice and barklice) is paraphyletic; and 2) the order Phthiraptera (the parasitic lice) is monophyletic. Within parasitic lice, however, the suborder Ischnocera is paraphyletic; this differs from the traditional view that each suborder of parasitic lice is monophyletic. PMID:22479490
Novel European free-living, non-diazotrophic Bradyrhizobium isolates from contrasting soils that lack nodulation and nitrogen fixation genes - a genome comparison

NASA Astrophysics Data System (ADS)

Jones, Frances Patricia; Clark, Ian M.; King, Robert; Shaw, Liz J.; Woodward, Martin J.; Hirsch, Penny R.

2016-05-01

The slow-growing genus Bradyrhizobium is biologically important in soils, with different representatives found to perform a range of biochemical functions including photosynthesis, induction of root nodules and symbiotic nitrogen fixation and denitrification. Consequently, the role of the genus in soil ecology and biogeochemical transformations is of agricultural and environmental significance. Some isolates of Bradyrhizobium have been shown to be non-symbiotic and do not possess the ability to form nodules. Here we present the genome and gene annotations of two such free-living Bradyrhizobium isolates, named G22 and BF49, from soils with differing long-term management regimes (grassland and bare fallow respectively) in addition to carbon metabolism analysis. These Bradyrhizobium isolates are the first to be isolated and sequenced from European soil and are the first free-living Bradyrhizobium isolates, lacking both nodulation and nitrogen fixation genes, to have their genomes sequenced and assembled from cultured samples. The G22 and BF49 genomes are distinctly different with respect to size and number of genes; the grassland isolate also contains a plasmid. There are also a number of functional differences between these isolates and other published genomes, suggesting that this ubiquitous genus is extremely heterogeneous and has roles within the community not including symbiotic nitrogen fixation.
Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

PubMed

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Genome-wide meta-analyses of stratified depression in Generation Scotland and UK Biobank.

PubMed

Hall, Lynsey S; Adams, Mark J; Arnau-Soler, Aleix; Clarke, Toni-Kim; Howard, David M; Zeng, Yanni; Davies, Gail; Hagenaars, Saskia P; Maria Fernandez-Pujals, Ana; Gibson, Jude; Wigmore, Eleanor M; Boutin, Thibaud S; Hayward, Caroline; Scotland, Generation; Porteous, David J; Deary, Ian J; Thomson, Pippa A; Haley, Chris S; McIntosh, Andrew M

2018-01-10

Few replicable genetic associations for Major Depressive Disorder (MDD) have been identified. Recent studies of MDD have identified common risk variants by using a broader phenotype definition in very large samples, or by reducing phenotypic and ancestral heterogeneity. We sought to ascertain whether it is more informative to maximize the sample size using data from all available cases and controls, or to use a sex or recurrent stratified subset of affected individuals. To test this, we compared heritability estimates, genetic correlation with other traits, variance explained by MDD polygenic score, and variants identified by genome-wide meta-analysis for broad and narrow MDD classifications in two large British cohorts - Generation Scotland and UK Biobank. Genome-wide meta-analysis of MDD in males yielded one genome-wide significant locus on 3p22.3, with three genes in this region (CRTAP, GLB1, and TMPPE) demonstrating a significant association in gene-based tests. Meta-analyzed MDD, recurrent MDD and female MDD yielded equivalent heritability estimates, showed no detectable difference in association with polygenic scores, and were each genetically correlated with six health-correlated traits (neuroticism, depressive symptoms, subjective well-being, MDD, a cross-disorder phenotype and Bipolar Disorder). Whilst stratified GWAS analysis revealed a genome-wide significant locus for male MDD, the lack of independent replication, and the consistent pattern of results in other MDD classifications suggests that phenotypic stratification using recurrence or sex in currently available sample sizes is currently weakly justified. Based upon existing studies and our findings, the strategy of maximizing sample sizes is likely to provide the greater gain.
The effects of sample size on population genomic analyses--implications for the tests of neutrality.

PubMed

Subramanian, Sankar

2016-02-20

One of the fundamental measures of molecular genetic variation is the Watterson's estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24% vs 10 %).
Comparisons with Caenorhabditis (approximately 100 Mb) and Drosophila (approximately 175 Mb) using flow cytometry show genome size in Arabidopsis to be approximately 157 Mb and thus approximately 25% larger than the Arabidopsis genome initiative estimate of approximately 125 Mb.

PubMed

Bennett, Michael D; Leitch, Ilia J; Price, H James; Johnston, J Spencer

2003-04-01

Recent genome sequencing papers have given genome sizes of 180 Mb for Drosophila melanogaster Iso-1 and 125 Mb for Arabidopsis thaliana Columbia. The former agrees with early cytochemical estimates, but numerous cytometric estimates of around 170 Mb imply that a genome size of 125 Mb for arabidopsis is an underestimate. In this study, nuclei of species pairs were compared directly using flow cytometry. Co-run Columbia and Iso-1 female gave a 2C peak for arabidopsis only approx. 15 % below that for drosophila, and 16C endopolyploid Columbia nuclei had approx. 15 % more DNA than 2C chicken nuclei (with >2280 Mb). Caenorhabditis elegans Bristol N2 (genome size approx. 100 Mb) co-run with Columbia or Iso-1 gave a 2C peak for drosophila approx. 75 % above that for 2C C. elegans, and a 2C peak for arabidopsis approx. 57 % above that for C. elegans. This confirms that 1C in drosophila is approx. 175 Mb and, combined with other evidence, leads us to conclude that the genome size of arabidopsis is not approx. 125 Mb, but probably approx. 157 Mb. It is likely that the discrepancy represents extra repeated sequences in unsequenced gaps in heterochromatic regions. Complete sequencing of the arabidopsis genome until no gaps remain at telomeres, nucleolar organizing regions or centromeres is still needed to provide the first precise angiosperm C-value as a benchmark calibration standard for plant genomes, and to ensure that no genes have been missed in arabidopsis, especially in centromeric regions, which are clearly larger than once imagined.
FISHIS: Fluorescence In Situ Hybridization in Suspension and Chromosome Flow Sorting Made Easy

PubMed Central

Giorgi, Debora; Farina, Anna; Grosso, Valentina; Gennaro, Andrea; Ceoloni, Carla; Lucretti, Sergio

2013-01-01

The large size and complex polyploid nature of many genomes has often hampered genomics development, as is the case for several plants of high agronomic value. Isolating single chromosomes or chromosome arms via flow sorting offers a clue to resolve such complexity by focusing sequencing to a discrete and self-consistent part of the whole genome. The occurrence of sufficient differences in the size and or base-pair composition of the individual chromosomes, which is uncommon in plants, is critical for the success of flow sorting. We overcome this limitation by developing a robust method for labeling isolated chromosomes, named Fluorescent In situ Hybridization In suspension (FISHIS). FISHIS employs fluorescently labeled synthetic repetitive DNA probes, which are hybridized, in a wash-less procedure, to chromosomes in suspension following DNA alkaline denaturation. All typical A, B and D genomes of wheat, as well as individual chromosomes from pasta (T. durum L.) and bread (T. aestivum L.) wheat, were flow-sorted, after FISHIS, at high purity. For the first time in eukaryotes, each individual chromosome of a diploid organism, Dasypyrum villosum (L.) Candargy, was flow-sorted regardless of its size or base-pair related content. FISHIS-based chromosome sorting is a powerful and innovative flow cytogenetic tool which can develop new genomic resources from each plant species, where microsatellite DNA probes are available and high quality chromosome suspensions could be produced. The joining of FISHIS labeling and flow sorting with the Next Generation Sequencing methodology will enforce genomics for more species, and by this mightier chromosome approach it will be possible to increase our knowledge about structure, evolution and function of plant genome to be used for crop improvement. It is also anticipated that this technique could contribute to analyze and sort animal chromosomes with peculiar cytogenetic abnormalities, such as copy number variations or cytogenetic aberrations. PMID:23469124
FISHIS: fluorescence in situ hybridization in suspension and chromosome flow sorting made easy.

PubMed

Giorgi, Debora; Farina, Anna; Grosso, Valentina; Gennaro, Andrea; Ceoloni, Carla; Lucretti, Sergio

2013-01-01

The large size and complex polyploid nature of many genomes has often hampered genomics development, as is the case for several plants of high agronomic value. Isolating single chromosomes or chromosome arms via flow sorting offers a clue to resolve such complexity by focusing sequencing to a discrete and self-consistent part of the whole genome. The occurrence of sufficient differences in the size and or base-pair composition of the individual chromosomes, which is uncommon in plants, is critical for the success of flow sorting. We overcome this limitation by developing a robust method for labeling isolated chromosomes, named Fluorescent In situ Hybridization In suspension (FISHIS). FISHIS employs fluorescently labeled synthetic repetitive DNA probes, which are hybridized, in a wash-less procedure, to chromosomes in suspension following DNA alkaline denaturation. All typical A, B and D genomes of wheat, as well as individual chromosomes from pasta (T. durum L.) and bread (T. aestivum L.) wheat, were flow-sorted, after FISHIS, at high purity. For the first time in eukaryotes, each individual chromosome of a diploid organism, Dasypyrum villosum (L.) Candargy, was flow-sorted regardless of its size or base-pair related content. FISHIS-based chromosome sorting is a powerful and innovative flow cytogenetic tool which can develop new genomic resources from each plant species, where microsatellite DNA probes are available and high quality chromosome suspensions could be produced. The joining of FISHIS labeling and flow sorting with the Next Generation Sequencing methodology will enforce genomics for more species, and by this mightier chromosome approach it will be possible to increase our knowledge about structure, evolution and function of plant genome to be used for crop improvement. It is also anticipated that this technique could contribute to analyze and sort animal chromosomes with peculiar cytogenetic abnormalities, such as copy number variations or cytogenetic aberrations.
Normalization of Complete Genome Characteristics: Application to Evolution from Primitive Organisms to Homo sapiens.

PubMed

Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji

2015-04-01

Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.
The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum.

PubMed

Zimin, Aleksey V; Puiu, Daniela; Hall, Richard; Kingan, Sarah; Clavijo, Bernardo J; Salzberg, Steven L

2017-11-01

Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components. © The Author 2017. Published by Oxford University Press.

Genome-wide heterogeneity of nucleotide substitution model fit.

PubMed

Arbiza, Leonardo; Patricio, Mateus; Dopazo, Hernán; Posada, David

2011-01-01

At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.

PubMed

Campbell, Matthew A; Van Leuven, James T; Meister, Russell C; Carey, Kaitlin M; Simon, Chris; McCutcheon, John P

2015-08-18

Comparative genomics from mitochondria, plastids, and mutualistic endosymbiotic bacteria has shown that the stable establishment of a bacterium in a host cell results in genome reduction. Although many highly reduced genomes from endosymbiotic bacteria are stable in gene content and genome structure, organelle genomes are sometimes characterized by dramatic structural diversity. Previous results from Candidatus Hodgkinia cicadicola, an endosymbiont of cicadas, revealed that some lineages of this bacterium had split into two new cytologically distinct yet genetically interdependent species. It was hypothesized that the long life cycle of cicadas in part enabled this unusual lineage-splitting event. Here we test this hypothesis by investigating the structure of the Ca. Hodgkinia genome in one of the longest-lived cicadas, Magicicada tredecim. We show that the Ca. Hodgkinia genome from M. tredecim has fragmented into multiple new chromosomes or genomes, with at least some remaining partitioned into discrete cells. We also show that this lineage-splitting process has resulted in a complex of Ca. Hodgkinia genomes that are 1.1-Mb pairs in length when considered together, an almost 10-fold increase in size from the hypothetical single-genome ancestor. These results parallel some examples of genome fragmentation and expansion in organelles, although the mechanisms that give rise to these extreme genome instabilities are likely different.
An Exploration into Fern Genome Space.

PubMed

Wolf, Paul G; Sessa, Emily B; Marchant, Daniel Blaine; Li, Fay-Wei; Rothfels, Carl J; Sigel, Erin M; Gitzendanner, Matthew A; Visger, Clayton J; Banks, Jo Ann; Soltis, Douglas E; Soltis, Pamela S; Pryer, Kathleen M; Der, Joshua P

2015-08-26

Ferns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Segregation and transmission of mitochondrial markers in fusion products of the asporogenous yeast Torulopsis glabrata.

PubMed

Sriprakash, K S; Batum, C

1981-09-01

Using a protoplast fusion technique we have been able to locate to the mitochondrial genome of the asporogenous yeast Torulopsis glabrata mutations conferring resistance to oligomycin, antimycin and diuron. When two strains differing in the size of their mtDNAs were fused the mitochondrial markers from the parent with the larger mtDNA (71-91) were transmitted predominantly among the fusion products. Both genetical and physical evidence support the occurrence of recombination in T. glabrata mitochondrial genome. Segregation of the mitochondrial genome appears to take place before the separation of the first bud from the fusion product.
The dynamics of LTR retrotransposon accumulation across 25 million years of panicoid grass evolution

PubMed Central

Estep, M C; DeBarry, J D; Bennetzen, J L

2013-01-01

Sample sequence analysis was employed to investigate the repetitive DNAs that were most responsible for the evolved variation in genome content across seven panicoid grasses with >5-fold variation in genome size and different histories of polyploidy. In all cases, the most abundant repeats were LTR retrotransposons, but the particular families that had become dominant were found to be different in the Pennisetum, Saccharum, Sorghum and Zea lineages. One element family, Huck, has been very active in all of the studied species over the last few million years. This suggests the transmittal of an active or quiescent autonomous set of Huck elements to this lineage at the founding of the panicoids. Similarly, independent recent activity of Ji and Opie elements in Zea and of Leviathan elements in Sorghum and Saccharum species suggests that members of these families with exceptional activation potential were present in the genome(s) of the founders of these lineages. In a detailed analysis of the Zea lineage, the combined action of several families of LTR retrotransposons were observed to have approximately doubled the genome size of Zea luxurians relative to Zea mays and Zea diploperennis in just the last few million years. One of the LTR retrotransposon amplification bursts in Zea may have been initiated by polyploidy, but the great majority of transposable element activations are not. Instead, the results suggest random activation of a few or many LTR retrotransposons families in particular lineages over evolutionary time, with some families especially prone to future activation and hyper-amplification. PMID:23321774
Ecological Genomics of Marine Picocyanobacteria†

PubMed Central

Scanlan, D. J.; Ostrowski, M.; Mazard, S.; Dufresne, A.; Garczarek, L.; Hess, W. R.; Post, A. F.; Hagemann, M.; Paulsen, I.; Partensky, F.

2009-01-01

Summary: Marine picocyanobacteria of the genera Prochlorococcus and Synechococcus numerically dominate the picophytoplankton of the world ocean, making a key contribution to global primary production. Prochlorococcus was isolated around 20 years ago and is probably the most abundant photosynthetic organism on Earth. The genus comprises specific ecotypes which are phylogenetically distinct and differ markedly in their photophysiology, allowing growth over a broad range of light and nutrient conditions within the 45°N to 40°S latitudinal belt that they occupy. Synechococcus and Prochlorococcus are closely related, together forming a discrete picophytoplankton clade, but are distinguishable by their possession of dissimilar light-harvesting apparatuses and differences in cell size and elemental composition. Synechococcus strains have a ubiquitous oceanic distribution compared to that of Prochlorococcus strains and are characterized by phylogenetically discrete lineages with a wide range of pigmentation. In this review, we put our current knowledge of marine picocyanobacterial genomics into an environmental context and present previously unpublished genomic information arising from extensive genomic comparisons in order to provide insights into the adaptations of these marine microbes to their environment and how they are reflected at the genomic level. PMID:19487728
Evolution and dynamics of megaplasmids with genome sizes larger than 100 kb in the Bacillus cereus group.

PubMed

Zheng, Jinshui; Peng, Donghai; Ruan, Lifang; Sun, Ming

2013-12-02

Plasmids play a crucial role in the evolution of bacterial genomes by mediating horizontal gene transfer. However, the origin and evolution of most plasmids remains unclear, especially for megaplasmids. Strains of the Bacillus cereus group contain up to 13 plasmids with genome sizes ranging from 2 kb to 600 kb, and thus can be used to study plasmid dynamics and evolution. This work studied the origin and evolution of 31 B. cereus group megaplasmids (>100 kb) focusing on the most conserved regions on plasmids, minireplicons. Sixty-five putative minireplicons were identified and classified to six types on the basis of proteins that are essential for replication. Twenty-nine of the 31 megaplasmids contained two or more minireplicons. Phylogenetic analysis of the protein sequences showed that different minireplicons on the same megaplasmid have different evolutionary histories. Therefore, we speculated that these megaplasmids are the results of fusion of smaller plasmids. All plasmids of a bacterial strain must be compatible. In megaplasmids of the B. cereus group, individual minireplicons of different megaplasmids in the same strain belong to different types or subtypes. Thus, the subtypes of each minireplicon they contain may determine the incompatibilities of megaplasmids. A broader analysis of all 1285 bacterial plasmids with putative known minireplicons whose complete genome sequences were available from GenBank revealed that 34% (443 plasmids) of the plasmids have two or more minireplicons. This indicates that plasmid fusion events are general among bacterial plasmids. Megaplasmids of B. cereus group are fusion of smaller plasmids, and the fusion of plasmids likely occurs frequently in the B. cereus group and in other bacterial taxa. Plasmid fusion may be one of the major mechanisms for formation of novel megaplasmids in the evolution of bacteria.
Position-based scanning for comparative genomics and identification of genetic islands in Haemophilus influenzae type b.

PubMed

Bergman, Nicholas H; Akerley, Brian J

2003-03-01

Bacteria exhibit extensive genetic heterogeneity within species. In many cases, these differences account for virulence properties unique to specific strains. Several such loci have been discovered in the genome of the type b serotype of Haemophilus influenzae, a human pathogen able to cause meningitis, pneumonia, and septicemia. Here we report application of a PCR-based scanning procedure to compare the genome of a virulent type b (Hib) strain with that of the laboratory-passaged Rd KW20 strain for which a complete genome sequence is available. We have identified seven DNA segments or H. influenzae genetic islands (HiGIs) present in the type b genome and absent from the Rd genome. These segments vary in size and content and show signs of horizontal gene transfer in that their percent G+C content differs from that of the rest of the H. influenzae genome, they contain genes similar to those found on phages or other mobile elements, or they are flanked by DNA repeats. Several of these loci represent potential pathogenicity islands, because they contain genes likely to mediate interactions with the host. These newly identified genetic islands provide areas of investigation into both the evolution and pathogenesis of H. influenzae. In addition, the genome scanning approach developed to identify these islands provides a rapid means to compare the genomes of phenotypically diverse bacterial strains once the genome sequence of one representative strain has been determined.
Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits.

PubMed

Eckshtain-Levi, Noam; Shkedy, Dafna; Gershovits, Michael; Da Silva, Gustavo M; Tamir-Ariel, Dafna; Walcott, Ron; Pupko, Tal; Burdman, Saul

2016-01-01

Acidovorax citrulli is a seedborne bacterium that causes bacterial fruit blotch of cucurbit plants including watermelon and melon. A. citrulli strains can be divided into two major groups based on DNA fingerprint analyses and biochemical properties. Group I strains have been generally isolated from non-watermelon cucurbits, while group II strains are closely associated with watermelon. In the present study, we report the genome sequence of M6, a group I model A. citrulli strain, isolated from melon. We used comparative genome analysis to investigate differences between the genome of strain M6 and the genome of the group II model strain AAC00-1. The draft genome sequence of A. citrulli M6 harbors 139 contigs, with an overall approximate size of 4.85 Mb. The genome of M6 is ∼500 Kb shorter than that of strain AAC00-1. Comparative analysis revealed that this size difference is mainly explained by eight fragments, ranging from ∼35-120 Kb and distributed throughout the AAC00-1 genome, which are absent in the M6 genome. In agreement with this finding, while AAC00-1 was found to possess 532 open reading frames (ORFs) that are absent in strain M6, only 123 ORFs in M6 were absent in AAC00-1. Most of these M6 ORFs are hypothetical proteins and most of them were also detected in two group I strains that were recently sequenced, tw6 and pslb65. Further analyses by PCR assays and coverage analyses with other A. citrulli strains support the notion that some of these fragments or significant portions of them are discriminative between groups I and II strains of A. citrulli. Moreover, GC content, effective number of codon values and cluster of orthologs' analyses indicate that these fragments were introduced into group II strains by horizontal gene transfer events. Our study reports the genome sequence of a model group I strain of A. citrulli, one of the most important pathogens of cucurbits. It also provides the first comprehensive comparison at the genomic level between the two major groups of strains of this pathogen.
Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits

PubMed Central

Eckshtain-Levi, Noam; Shkedy, Dafna; Gershovits, Michael; Da Silva, Gustavo M.; Tamir-Ariel, Dafna; Walcott, Ron; Pupko, Tal; Burdman, Saul

2016-01-01

Acidovorax citrulli is a seedborne bacterium that causes bacterial fruit blotch of cucurbit plants including watermelon and melon. A. citrulli strains can be divided into two major groups based on DNA fingerprint analyses and biochemical properties. Group I strains have been generally isolated from non-watermelon cucurbits, while group II strains are closely associated with watermelon. In the present study, we report the genome sequence of M6, a group I model A. citrulli strain, isolated from melon. We used comparative genome analysis to investigate differences between the genome of strain M6 and the genome of the group II model strain AAC00-1. The draft genome sequence of A. citrulli M6 harbors 139 contigs, with an overall approximate size of 4.85 Mb. The genome of M6 is ∼500 Kb shorter than that of strain AAC00-1. Comparative analysis revealed that this size difference is mainly explained by eight fragments, ranging from ∼35–120 Kb and distributed throughout the AAC00-1 genome, which are absent in the M6 genome. In agreement with this finding, while AAC00-1 was found to possess 532 open reading frames (ORFs) that are absent in strain M6, only 123 ORFs in M6 were absent in AAC00-1. Most of these M6 ORFs are hypothetical proteins and most of them were also detected in two group I strains that were recently sequenced, tw6 and pslb65. Further analyses by PCR assays and coverage analyses with other A. citrulli strains support the notion that some of these fragments or significant portions of them are discriminative between groups I and II strains of A. citrulli. Moreover, GC content, effective number of codon values and cluster of orthologs’ analyses indicate that these fragments were introduced into group II strains by horizontal gene transfer events. Our study reports the genome sequence of a model group I strain of A. citrulli, one of the most important pathogens of cucurbits. It also provides the first comprehensive comparison at the genomic level between the two major groups of strains of this pathogen. PMID:27092114
A first AFLP-Based Genetic Linkage Map for Brine Shrimp Artemia franciscana and Its Application in Mapping the Sex Locus

PubMed Central

De Vos, Stephanie; Bossier, Peter; Van Stappen, Gilbert; Vercauteren, Ilse; Sorgeloos, Patrick; Vuylsteke, Marnik

2013-01-01

We report on the construction of sex-specific linkage maps, the identification of sex-linked markers and the genome size estimation for the brine shrimp Artemia franciscana. Overall, from the analysis of 433 AFLP markers segregating in a 112 full-sib family we identified 21 male and 22 female linkage groups (2n = 42), covering 1,041 and 1,313 cM respectively. Fifteen putatively homologous linkage groups, including the sex linkage groups, were identified between the female and male linkage map. Eight sex-linked AFLP marker alleles were inherited from the female parent, supporting the hypothesis of a WZ–ZZ sex-determining system. The haploid Artemia genome size was estimated to 0.93 Gb by flow cytometry. The produced Artemia linkage maps provide the basis for further fine mapping and exploring of the sex-determining region and are a possible marker resource for mapping genomic loci underlying phenotypic differences among Artemia species. PMID:23469207
Chromosomal Inversions between Human and Chimpanzee Lineages Caused by Retrotransposons

PubMed Central

Lee, Jungnam; Han, Kyudong; Meyer, Thomas J.; Kim, Heui-Soo; Batzer, Mark A.

2008-01-01

The long interspersed element-1 (LINE-1 or L1) and Alu elements are the most abundant mobile elements comprising 21% and 11% of the human genome, respectively. Since the divergence of human and chimpanzee lineages, these elements have vigorously created chromosomal rearrangements causing genomic difference between humans and chimpanzees by either increasing or decreasing the size of genome. Here, we report an exotic mechanism, retrotransposon recombination-mediated inversion (RRMI), that usually does not alter the amount of genomic material present. Through the comparison of the human and chimpanzee draft genome sequences, we identified 252 inversions whose respective inversion junctions can clearly be characterized. Our results suggest that L1 and Alu elements cause chromosomal inversions by either forming a secondary structure or providing a fragile site for double-strand breaks. The detailed analysis of the inversion breakpoints showed that L1 and Alu elements are responsible for at least 44% of the 252 inversion loci between human and chimpanzee lineages, including 49 RRMI loci. Among them, three RRMI loci inverted exonic regions in known genes, which implicates this mechanism in generating the genomic and phenotypic differences between human and chimpanzee lineages. This study is the first comprehensive analysis of mobile element bases inversion breakpoints between human and chimpanzee lineages, and highlights their role in primate genome evolution. PMID:19112500
Chompy: an infestation of MITE-like repetitive elements in the crocodilian genome.

PubMed

Ray, David A; Hedges, Dale J; Herke, Scott W; Fowlkes, Justin D; Barnes, Erin W; LaVie, Daniel K; Goodwin, Lindsey M; Densmore, Llewellyn D; Batzer, Mark A

2005-12-05

Interspersed repeats are a major component of most eukaryotic genomes and have an impact on genome size and stability, but the repetitive element landscape of crocodilian genomes has not yet been fully investigated. In this report, we provide the first detailed characterization of an interspersed repeat element in any crocodilian genome. Chompy is a putative miniature inverted-repeat transposable element (MITE) family initially recovered from the genome of Alligator mississippiensis (American alligator) but also present in the genomes of Crocodylus moreletii (Morelet's crocodile) and Gavialis gangeticus (Indian gharial). The element has all of the hallmarks of MITEs including terminal inverted repeats, possible target site duplications, and a tendency to form secondary structures. We estimate the copy number in the alligator genome to be approximately 46,000 copies. As a result of their size and unique properties, Chompy elements may provide a useful source of genomic variation for crocodilian comparative genomics.
Host Genes and Resistance/Sensitivity to Military Priority Pathogens

DTIC Science & Technology

2011-06-01

tularensis (FT Schu S4) that yields a significantly different outcome to infection in B6 and D2 mice. Both strains succumb to infection at essentially the...Figure 2). Some of the group sizes are too small to yield statistically relevant findings, and additional studies will be performed with these strains as...generated approximately 100-fold coverage of the DBA/2J genome (Table 2) and sequenced 99.96% of the DBA/2J genome (excluding gaps in the reference
Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes.

PubMed

Puigbò, Pere; Lobkovsky, Alexander E; Kristensen, David M; Wolf, Yuri I; Koonin, Eugene V

2014-08-21

Genomes of bacteria and archaea (collectively, prokaryotes) appear to exist in incessant flux, expanding via horizontal gene transfer and gene duplication, and contracting via gene loss. However, the actual rates of genome dynamics and relative contributions of different types of event across the diversity of prokaryotes are largely unknown, as are the sizes of microbial supergenomes, i.e. pools of genes that are accessible to the given microbial species. We performed a comprehensive analysis of the genome dynamics in 35 groups (34 bacterial and one archaeal) of closely related microbial genomes using a phylogenetic birth-and-death maximum likelihood model to quantify the rates of gene family gain and loss, as well as expansion and reduction. The results show that loss of gene families dominates the evolution of prokaryotes, occurring at approximately three times the rate of gain. The rates of gene family expansion and reduction are typically seven and twenty times less than the gain and loss rates, respectively. Thus, the prevailing mode of evolution in bacteria and archaea is genome contraction, which is partially compensated by the gain of new gene families via horizontal gene transfer. However, the rates of gene family gain, loss, expansion and reduction vary within wide ranges, with the most stable genomes showing rates about 25 times lower than the most dynamic genomes. For many groups, the supergenome estimated from the fraction of repetitive gene family gains includes about tenfold more gene families than the typical genome in the group although some groups appear to have vast, 'open' supergenomes. Reconstruction of evolution for groups of closely related bacteria and archaea reveals an extremely rapid and highly variable flux of genes in evolving microbial genomes, demonstrates that extensive gene loss and horizontal gene transfer leading to innovation are the two dominant evolutionary processes, and yields robust estimates of the supergenome size.
Comparison of the genomic sequence of the microminipig, a novel breed of swine, with the genomic database for conventional pig.

PubMed

Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide

2014-01-01

The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
Double-strand breaks in genome-sized DNA caused by mechanical stress under mixing: Quantitative evaluation through single-molecule observation

NASA Astrophysics Data System (ADS)

Kikuchi, Hayato; Nose, Keiji; Yoshikawa, Yuko; Yoshikawa, Kenichi

2018-06-01

It is becoming increasingly apparent that changes in the higher-order structure of genome-sized DNA molecules of more than several tens kbp play important roles in the self-control of genome activity in living cells. Unfortunately, it has been rather difficult to prepare genome-sized DNA molecules without damage or fragmentation. Here, we evaluated the degree of double-strand breaks (DSBs) caused by mechanical mixing by single-molecule observation with fluorescence microscopy. The results show that DNA breaks are most significant for the first second after the initiation of mechanical agitation. Based on such observation, we propose a novel mixing procedure to significantly decrease DSBs.
A dictionary based informational genome analysis

PubMed Central

2012-01-01

Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068
Mapping copy number variation by population-scale genome sequencing.

PubMed

Mills, Ryan E; Walter, Klaudia; Stewart, Chip; Handsaker, Robert E; Chen, Ken; Alkan, Can; Abyzov, Alexej; Yoon, Seungtai Chris; Ye, Kai; Cheetham, R Keira; Chinwalla, Asif; Conrad, Donald F; Fu, Yutao; Grubert, Fabian; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Iakoucheva, Lilia M; Iqbal, Zamin; Kang, Shuli; Kidd, Jeffrey M; Konkel, Miriam K; Korn, Joshua; Khurana, Ekta; Kural, Deniz; Lam, Hugo Y K; Leng, Jing; Li, Ruiqiang; Li, Yingrui; Lin, Chang-Yun; Luo, Ruibang; Mu, Xinmeng Jasmine; Nemesh, James; Peckham, Heather E; Rausch, Tobias; Scally, Aylwyn; Shi, Xinghua; Stromberg, Michael P; Stütz, Adrian M; Urban, Alexander Eckehart; Walker, Jerilyn A; Wu, Jiantao; Zhang, Yujun; Zhang, Zhengdong D; Batzer, Mark A; Ding, Li; Marth, Gabor T; McVean, Gil; Sebat, Jonathan; Snyder, Michael; Wang, Jun; Ye, Kenny; Eichler, Evan E; Gerstein, Mark B; Hurles, Matthew E; Lee, Charles; McCarroll, Steven A; Korbel, Jan O

2011-02-03

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses

PubMed Central

Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

2015-01-01

Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement. PMID:26208029

De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

PubMed

Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

2015-01-01

Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.
Pan-genome analysis of Aeromonas hydrophila, Aeromonas veronii and Aeromonas caviae indicates phylogenomic diversity and greater pathogenic potential for Aeromonas hydrophila.

PubMed

Ghatak, Sandeep; Blom, Jochen; Das, Samir; Sanjukta, Rajkumari; Puro, Kekungu; Mawlong, Michael; Shakuntala, Ingudam; Sen, Arnab; Goesmann, Alexander; Kumar, Ashok; Ngachan, S V

2016-07-01

Aeromonas species are important pathogens of fishes and aquatic animals capable of infecting humans and other animals via food. Due to the paucity of pan-genomic studies on aeromonads, the present study was undertaken to analyse the pan-genome of three clinically important Aeromonas species (A. hydrophila, A. veronii, A. caviae). Results of pan-genome analysis revealed an open pan-genome for all three species with pan-genome sizes of 9181, 7214 and 6884 genes for A. hydrophila, A. veronii and A. caviae, respectively. Core-genome: pan-genome ratio (RCP) indicated greater genomic diversity for A. hydrophila and interestingly RCP emerged as an effective indicator to gauge genomic diversity which could possibly be extended to other organisms too. Phylogenomic network analysis highlighted the influence of homologous recombination and lateral gene transfer in the evolution of Aeromonas spp. Prediction of virulence factors indicated no significant difference among the three species though analysis of pathogenic potential and acquired antimicrobial resistance genes revealed greater hazards from A. hydrophila. In conclusion, the present study highlighted the usefulness of whole genome analyses to infer evolutionary cues for Aeromonas species which indicated considerable phylogenomic diversity for A. hydrophila and hitherto unknown genomic evidence for pathogenic potential of A. hydrophila compared to A. veronii and A. caviae.
Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship.

PubMed

Lee, S Hong; Clark, Sam; van der Werf, Julius H J

2017-01-01

Genomic prediction is emerging in a wide range of fields including animal and plant breeding, risk prediction in human precision medicine and forensic. It is desirable to establish a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as 'unrelated' individuals from the wider population in the genomic prediction. The various sources of information were modeled as different populations with different effective population sizes (Ne). Both the effective number of chromosome segments (Me) and Ne are considered to be a function of the data used for prediction. We validate our theory with analyses of simulated as well as real data, and illustrate that the variation in genomic relationships with the target is a predictor of the information content of the reference set. With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. We also illustrate that when prediction relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (before or after collecting data) in animal, plant and human genetics.
Genome sequencing of the winged midge, Parochlus steinenii, from the Antarctic Peninsula.

PubMed

Kim, Sanghee; Oh, Mijin; Jung, Woongsic; Park, Joonho; Choi, Han-Gu; Shin, Seung Chul

2017-03-01

In the Antarctic, only two species of Chironomidae occur naturally-the wingless midge, Belgica antarctica , and the winged midge, Parochlus steinenii . B. antarctica is an extremophile with unusual adaptations. The larvae of B. antarctica are desiccation- and freeze-tolerant and the adults are wingless. Recently, the compact genome of B. antarctica was reported and it is the first Antarctic eukaryote to be sequenced. Although P. steinenii occurs naturally in the Antarctic with B. antarctica , the larvae of P. steinenii are cold-tolerant but not freeze-tolerant and the adults are winged. Differences in adaptations in the Antarctic midges are interesting in terms of evolutionary processes within an extreme environment. Herein, we provide the genome of another Antarctic midge to help elucidate the evolution of these species. The draft genome of P. steinenii had a total size of 138 Mbp, comprising 9513 contigs with an N50 contig size of 34,110 bp, and a GC content of 32.2%. Overall, 13,468 genes were predicted using the MAKER annotation pipeline, and gene ontology classified 10,801 (80.2%) predicted genes to a function. Compared with the assembled genome architecture of B. antarctica , that of P. steinenii was approximately 50 Mbp longer with 6.2-fold more repeat sequences, whereas gene regions were as similarly compact as in B. antarctica . We present an annotated draft genome of the Antarctic midge, P. steinenii . The genomes of P. steinenii and B. antarctica will aid in the elucidation of evolution in harsh environments and provide new resources for functional genomic analyses of the order Diptera. © The Authors 2017. Published by Oxford University Press.
Gene family evolution: an in-depth theoretical and simulation analysis of non-linear birth-death-innovation models.

PubMed

Karev, Georgy P; Wolf, Yuri I; Berezovskaya, Faina S; Koonin, Eugene V

2004-09-09

The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution. In this work, we extend our previous analysis of stochastic BDIMs. In addition to the previously examined rational BDIMs, we introduce potentially more realistic logistic BDIMs, in which birth/death rates are limited for the largest families, and show that their properties are similar to those of models that include no such limitation. We show that the mean time required for the formation of the largest gene families detected in eukaryotic genomes is limited by the mean number of duplications per gene and does not increase indefinitely with the model degree. Instead, this time reaches a minimum value, which corresponds to a non-linear rational BDIM with the degree of approximately 2.7. Even for this BDIM, the mean time of the largest family formation is orders of magnitude greater than any realistic estimates based on the timescale of life's evolution. We employed the embedding chains technique to estimate the expected number of elementary evolutionary events (gene duplications and deletions) preceding the formation of gene families of the observed size and found that the mean number of events exceeds the family size by orders of magnitude, suggesting a highly dynamic process of genome evolution. The variance of the time required for the formation of the largest families was found to be extremely large, with the coefficient of variation > 1. This indicates that some gene families might grow much faster than the mean rate such that the minimal time required for family formation is more relevant for a realistic representation of genome evolution than the mean time. We determined this minimal time using Monte Carlo simulations of family growth from an ensemble of simultaneously evolving singletons. In these simulations, the time elapsed before the formation of the largest family was much shorter than the estimated mean time and was compatible with the timescale of evolution of eukaryotes. The analysis of stochastic BDIMs presented here shows that non-linear versions of such models can well approximate not only the size distribution of gene families but also the dynamics of their formation during genome evolution. The fact that only higher degree BDIMs are compatible with the observed characteristics of genome evolution suggests that the growth of gene families is self-accelerating, which might reflect differential selective pressure acting on different genes.
Microsatellites in the Genome of the Edible Mushroom, Volvariella volvacea

PubMed Central

Chen, Mingjie; Wang, Hong; Bao, Dapeng

2014-01-01

Using bioinformatics software and database, we have characterized the microsatellite pattern in the V. volvacea genome and compared it with microsatellite patterns found in the genomes of four other edible fungi: Coprinopsis cinerea, Schizophyllum commune, Agaricus bisporus, and Pleurotus ostreatus. A total of 1346 microsatellites have been identified, with mono-nucleotides being the most frequent motif. The relative abundance of microsatellites was lower in coding regions with 21 No./Mb. However, the microsatellites in the V. volvacea gene models showed a greater tendency to be located in the CDS regions. There was also a higher preponderance of trinucleotide repeats, especially in the kinase genes, which implied a possible role in phenotypic variation. Among the five fungal genomes, microsatellite abundance appeared to be unrelated to genome size. Furthermore, the short motifs (mono- to tri-nucleotides) outnumbered other categories although these differed in proportion. Data analysis indicated a possible relationship between the most frequent microsatellite types and the genetic distance between the five fungal genomes. PMID:24575404
Microsatellites in the genome of the edible mushroom, Volvariella volvacea.

PubMed

Wang, Ying; Chen, Mingjie; Wang, Hong; Wang, Jing-Fang; Bao, Dapeng

2014-01-01

Using bioinformatics software and database, we have characterized the microsatellite pattern in the V. volvacea genome and compared it with microsatellite patterns found in the genomes of four other edible fungi: Coprinopsis cinerea, Schizophyllum commune, Agaricus bisporus, and Pleurotus ostreatus. A total of 1346 microsatellites have been identified, with mono-nucleotides being the most frequent motif. The relative abundance of microsatellites was lower in coding regions with 21 No./Mb. However, the microsatellites in the V. volvacea gene models showed a greater tendency to be located in the CDS regions. There was also a higher preponderance of trinucleotide repeats, especially in the kinase genes, which implied a possible role in phenotypic variation. Among the five fungal genomes, microsatellite abundance appeared to be unrelated to genome size. Furthermore, the short motifs (mono- to tri-nucleotides) outnumbered other categories although these differed in proportion. Data analysis indicated a possible relationship between the most frequent microsatellite types and the genetic distance between the five fungal genomes.
Two fundamentally different classes of microbial genes.

PubMed

Wolf, Yuri I; Makarova, Kira S; Lobkovsky, Alexander E; Koonin, Eugene V

2016-11-07

The evolution of bacterial and archaeal genomes is highly dynamic and involves extensive horizontal gene transfer and gene loss 1-4 . Furthermore, many microbial species appear to have open pangenomes, where each newly sequenced genome contains more than 10% ORFans, that is, genes without detectable homologues in other species 5,6 . Here, we report a quantitative analysis of microbial genome evolution by fitting the parameters of a simple, steady-state evolutionary model to the comparative genomic data on the gene content and gene order similarity between archaeal genomes. The results reveal two sharply distinct classes of microbial genes, one of which is characterized by effectively instantaneous gene replacement, and the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of the size of the prokaryotic genomic universe, which appears to consist of at least a billion distinct genes. Furthermore, the same distribution of constraints is shown to govern the evolution of gene complement and gene order, without the need to invoke long-range conservation or the selfish operon concept 7 .
An Annotated Draft Genome for Radix auricularia (Gastropoda, Mollusca)

PubMed Central

Feldmeyer, Barbara; Schmidt, Hanno; Greshake, Bastian; Tills, Oliver; Truebano, Manuela; Rundle, Simon D.; Paule, Juraj; Ebersberger, Ingo; Pfenninger, Markus

2017-01-01

Molluscs are the second most species-rich phylum in the animal kingdom, yet only 11 genomes of this group have been published so far. Here, we present the draft genome sequence of the pulmonate freshwater snail Radix auricularia. Six whole genome shotgun libraries with different layouts were sequenced. The resulting assembly comprises 4,823 scaffolds with a cumulative length of 910 Mb and an overall read coverage of 72×. The assembly contains 94.6% of a metazoan core gene collection, indicating an almost complete coverage of the coding fraction. The discrepancy of ∼690 Mb compared with the estimated genome size of R. auricularia (1.6 Gb) results from a high repeat content of 70% mainly comprising DNA transposons. The annotation of 17,338 protein coding genes was supported by the use of publicly available transcriptome data. This draft will serve as starting point for further genomic and population genetic research in this scientifically important phylum. PMID:28204581
Molecular variability analysis of five new complete cacao swollen shoot virus genomic sequences.

PubMed

Muller, E; Sackey, S

2005-01-01

Cacao swollen shoot virus (CSSV), a member of the family Caulimovi-ridae, genus Badnavirus occurs in all the main cacao-growing areas of West Africa. We amplified, cloned and sequenced complete genomes of five new isolates, two originating from Togo and three originating from Ghana. The genome of these five newly sequenced isolates all contain the five putative open reading frames I, II, III, X and Y described for the first sequenced CSSV isolate, Agou1 originating from Togo. Their genomes have been aligned with the genome of Agou1. The nucleotide and amino acid sequence identities between isolates have been calculated and a phylogenetic analysis has been made including other pararetroviruses. Maximum nucleotide sequence variability between complete genomes of CSSV isolates was 29.4%. Geographical differentiation between isolates appears more important than differentiation between mild and severe isolates. ORF X differs greatly in size and sequence between the Togolese isolates Nyongbo2 and Agou1, and the four other isolates, its functional role is therefore clearly questionable.
Genomic islands of divergence are not affected by geography of speciation in sunflowers.

PubMed

Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

2013-01-01

Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.
Diversity of the Arabidopsis mitochondrial genome occurs via nuclear-controlled recombination activity.

PubMed

Arrieta-Montiel, Maria P; Shedge, Vikas; Davila, Jaime; Christensen, Alan C; Mackenzie, Sally A

2009-12-01

The plant mitochondrial genome is recombinogenic, with DNA exchange activity controlled to a large extent by nuclear gene products. One nuclear gene, MSH1, appears to participate in suppressing recombination in Arabidopsis at every repeated sequence ranging in size from 108 to 556 bp. Present in a wide range of plant species, these mitochondrial repeats display evidence of successful asymmetric DNA exchange in Arabidopsis when MSH1 is disrupted. Recombination frequency appears to be influenced by repeat sequence homology and size, with larger size repeats corresponding to increased DNA exchange activity. The extensive mitochondrial genomic reorganization of the msh1 mutant produced altered mitochondrial transcription patterns. Comparison of mitochondrial genomes from the Arabidopsis ecotypes C24, Col-0, and Ler suggests that MSH1 activity accounts for most or all of the polymorphisms distinguishing these genomes, producing ecotype-specific stoichiometric changes in each line. Our observations suggest that MSH1 participates in mitochondrial genome evolution by influencing the lineage-specific pattern of mitochondrial genetic variation in higher plants.
A genome-wide perspective about the diversity and demographic history of seven Spanish goat breeds.

PubMed

Manunza, Arianna; Noce, Antonia; Serradilla, Juan Manuel; Goyache, Félix; Martínez, Amparo; Capote, Juan; Delgado, Juan Vicente; Jordana, Jordi; Muñoz, Eva; Molina, Antonio; Landi, Vincenzo; Pons, Agueda; Balteanu, Valentin; Traoré, Amadou; Vidilla, Montse; Sánchez-Rodríguez, Manuel; Sànchez, Armand; Cardoso, Tainã Figueiredo; Amills, Marcel

2016-07-25

The main goal of the current work was to infer the demographic history of seven Spanish goat breeds (Malagueña, Murciano-Granadina, Florida, Palmera, Mallorquina, Bermeya and Blanca de Rasquera) based on genome-wide diversity data generated with the Illumina Goat SNP50 BeadChip (population size, N = 176). Five additional populations from Europe (Saanen and Carpathian) and Africa (Tunisian, Djallonké and Sahel) were also included in this analysis (N = 80) for comparative purposes. Our results show that the genetic background of Spanish goats traces back mainly to European breeds although signs of North African admixture were detected in two Andalusian breeds (Malagueña and Murciano-Granadina). In general, observed and expected heterozygosities were quite similar across the seven Spanish goat breeds under analysis irrespective of their population size and conservation status. For the Mallorquina and Blanca de Rasquera breeds, which have suffered strong population declines during the past decades, we observed increased frequencies of large-sized (ROH), a finding that is consistent with recent inbreeding. In contrast, a substantial part of the genome of the Palmera goat breed comprised short ROH, which suggests a strong and ancient founder effect. Admixture with African goats, genetic drift and inbreeding have had different effects across the seven Spanish goat breeds analysed in the current work. This has generated distinct patterns of genome-wide diversity that provide new clues about the demographic history of these populations.
Excess of genomic defects in a woolly mammoth on Wrangel island

PubMed Central

Slatkin, Montgomery

2017-01-01

Woolly mammoths (Mammuthus primigenius) populated Siberia, Beringia, and North America during the Pleistocene and early Holocene. Recent breakthroughs in ancient DNA sequencing have allowed for complete genome sequencing for two specimens of woolly mammoths (Palkopoulou et al. 2015). One mammoth specimen is from a mainland population 45,000 years ago when mammoths were plentiful. The second, a 4300 yr old specimen, is derived from an isolated population on Wrangel island where mammoths subsisted with small effective population size more than 43-fold lower than previous populations. These extreme differences in effective population size offer a rare opportunity to test nearly neutral models of genome architecture evolution within a single species. Using these previously published mammoth sequences, we identify deletions, retrogenes, and non-functionalizing point mutations. In the Wrangel island mammoth, we identify a greater number of deletions, a larger proportion of deletions affecting gene sequences, a greater number of candidate retrogenes, and an increased number of premature stop codons. This accumulation of detrimental mutations is consistent with genomic meltdown in response to low effective population sizes in the dwindling mammoth population on Wrangel island. In addition, we observe high rates of loss of olfactory receptors and urinary proteins, either because these loci are non-essential or because they were favored by divergent selective pressures in island environments. Finally, at the locus of FOXQ1 we observe two independent loss-of-function mutations, which would confer a satin coat phenotype in this island woolly mammoth. PMID:28253255
Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements.

PubMed

Metcalfe, Cushla J; Filée, Jonathan; Germon, Isabelle; Joss, Jean; Casane, Didier

2012-11-01

Haploid genomes greater than 25,000 Mb are rare, within the animals only the lungfish and some of the salamanders and crustaceans are known to have genomes this large. There is very little data on the structure of genomes this size. It is known, however, that for animal genomes up to 3,000 Mb, there is in general a good correlation between genome size and the percent of the genome composed of repetitive sequence and that this repetitive component is highly dynamic. In this study, we sampled the Australian lungfish genome using three mini-genomic libraries and found that with very little sequence, the results converged on an estimate of 40% of the genome being composed of recognizable transposable elements (TEs), chiefly from the CR1 and L2 long interspersed nuclear element clades. We further characterized the CR1 and L2 elements in the lungfish genome and show that although most CR1 elements probably represent recent amplifications, the L2 elements are more diverse and are more likely the result of a series of amplifications. We suggest that our sampling method has probably underestimated the recognizable TE content. However, on the basis of the most likely sources of error, we suggest that this very large genome is not largely composed of recently amplified, undetected TEs but may instead include a large component of older degenerate TEs. Based on these estimates, and on Thomson's (Thomson K. 1972. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J Exp Zool. 180:363-372) inference that in the lineage leading to the extant Australian lungfish, there was massive increase in genome size between 350 and 200 mya, after which the size of the genome changed little, we speculate that the very large Australian lungfish genome may be the result of a massive amplification of TEs followed by a long period with a very low rate of sequence removal and some ongoing TE activity.
InterB multigenic family, a gene repertoire associated with subterminal chromosome regions of Encephalitozoon cuniculi and conserved in several human-infecting microsporidian species.

PubMed

Dia, Ndongo; Lavie, Laurence; Méténier, Guy; Toguebaye, Bhen S; Vivarès, Christian P; Cornillot, Emmanuel

2007-03-01

Microsporidia are fungi-related obligate intracellular parasites that infect numerous animals, including man. Encephalitozoon cuniculi harbours a very small genome (2.9 Mbp) with about 2,000 coding sequences (CDSs). Most repeated CDSs are of unknown function and are distributed in subterminal regions that mark the transitions between subtelomeric rDNA units and chromosome cores. A potential multigenic family (interB) encoding proteins within a size range of 579-641 aa was investigated by PCR and RT-PCR. Thirty members were finally assigned to the E. cuniculi interB family and a predominant interB transcript was found to originate from a newly identified gene on chromosome III. Microsporidian species from eight different genera infecting insects, fishes or mammals, were tested for a possible intra-phylum conservation of interB genes. Only representatives of the Encephalitozoon, Vittaforma and Brachiola genera, differing in host range but all able to invade humans, were positive. Molecular karyotyping of Brachiola algerae showed a complex set of chromosome bands, providing a haploid genome size estimate of 15-20 Mbp. In spite of this large difference in genome complexity, B. algerae and E. cuniculi shared some similar interB gene copies and a common location of interB genes in near-rDNA subterminal regions.
Genome scaffolding and annotation for the pathogen vector Ixodes ricinus by ultra-long single molecule sequencing.

PubMed

Cramaro, Wibke J; Hunewald, Oliver E; Bell-Sakyi, Lesley; Muller, Claude P

2017-02-08

Global warming and other ecological changes have facilitated the expansion of Ixodes ricinus tick populations. Ixodes ricinus is the most important carrier of vector-borne pathogens in Europe, transmitting viruses, protozoa and bacteria, in particular Borrelia burgdorferi (sensu lato), the causative agent of Lyme borreliosis, the most prevalent vector-borne disease in humans in the Northern hemisphere. To faster control this disease vector, a better understanding of the I. ricinus tick is necessary. To facilitate such studies, we recently published the first reference genome of this highly prevalent pathogen vector. Here, we further extend these studies by scaffolding and annotating the first reference genome by using ultra-long sequencing reads from third generation single molecule sequencing. In addition, we present the first genome size estimation for I. ricinus ticks and the embryo-derived cell line IRE/CTVM19. 235,953 contigs were integrated into 204,904 scaffolds, extending the currently known genome lengths by more than 30% from 393 to 516 Mb and the N50 contig value by 87% from 1643 bp to a N50 scaffold value of 3067 bp. In addition, 25,263 sequences were annotated by comparison to the tick's North American relative Ixodes scapularis. After (conserved) hypothetical proteins, zinc finger proteins, secreted proteins and P450 coding proteins were the most prevalent protein categories annotated. Interestingly, more than 50% of the amino acid sequences matching the homology threshold had 95-100% identity to the corresponding I. scapularis gene models. The sequence information was complemented by the first genome size estimation for this species. Flow cytometry-based genome size analysis revealed a haploid genome size of 2.65Gb for I. ricinus ticks and 3.80 Gb for the cell line. We present a first draft sequence map of the I. ricinus genome based on a PacBio-Illumina assembly. The I. ricinus genome was shown to be 26% (500 Mb) larger than the genome of its American relative I. scapularis. Based on the genome size of 2.65 Gb we estimated that we covered about 67% of the non-repetitive sequences. Genome annotation will facilitate screening for specific molecular pathways in I. ricinus cells and provides an overview of characteristics and functions.
Defective interfering particles of poliovirus: mapping of the deletion and evidence that the deletions in the genomes of DI(1), (2), and (3) are located in the same region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomoto, A.; Jacobson, A.; Lee, Y.F.

1979-01-01

The deletions in RNAs of three defective interfering (DI) particles of poliovirus type 1 have been located and their approximate extent determined by three methods. (1) Digestion with RNase III of DI RNAs yields the same 3'-terminal fragments as digestion wth RNase III of standard virus RNA. The longest 3'-terminal located in the 5'-terminal half of the polio genome. (2) Fingerprints of RNase T/sub 1/-resistant oligonucleotides of all three DI RNAs are identical and lack four large oligonucleotides as compared to the fingerprints of standard virus, an observation suggesting that the deletions in all three DI RNAs are located inmore » the same reregion of the viral genome. The deletion-specific oligonucleotides have also been shown to be within the 5'-terminal half of the viral genome by alkali fragmentation of the RNA and fingerprinting poly(A)-linked (3'-terminal) fragments of decreasing size. (3) Virion RNA of DI(2) particles was annealed with denatured double-stranded RNA (RF) of standard virus and the hybrid heteroduplex molcules examined in the electron microscope. A single loop, approximately 900 nucleotides long and 20% from one end of the molecules, was observed. Both the size and extent of individual deletions is somewhat variable in different hetereoduplex molecules, an observation suggesting heterogeneity in the size of the deletion in RNA of the DI(2) population. Our data show that the DI RNAs of poliovirus contain an internal deletion in that region of the viral genome known to specify the capsid polypeptides. This result provides an explanation as to why poliovirus DI particles are unable to synthesize viral coat proteins.« less
The genomes and comparative genomics of Lactobacillus delbrueckii phages.

PubMed

Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani

2011-07-01

Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.
Oncogenomic portals for the visualization and analysis of genome-wide cancer data

PubMed Central

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-01

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice. PMID:26484415

Oncogenomic portals for the visualization and analysis of genome-wide cancer data.

PubMed

Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

2016-01-05

Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice.
Comparative Analysis of Repetitive DNA between the Main Vectors of Chagas Disease: Triatoma infestans and Rhodnius prolixus.

PubMed

Pita, Sebastián; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Sánchez, Antonio; Panzera, Francisco; Lorite, Pedro

2018-04-24

Chagas disease or American trypanosomiasis affects six to seven million people worldwide, mostly in Latin America. This disease is transmitted by hematophagous insects known as "kissing bugs" (Hemiptera, Triatominae), with Triatoma infestans and Rhodnius prolixus being the two most important vector species. Despite the fact that both species present the same diploid chromosome number (2 n = 22), they have remarkable differences in their total DNA content, chromosome structure and genome organization. Variations in the DNA genome size are expected to be due to differences in the amount of repetitive DNA sequences. The T. infestans genome-wide analysis revealed the existence of 42 satellite DNA families. BLAST searches of these sequences against the R. prolixus genome assembly revealed that only four of these satellite DNA families are shared between both species, suggesting a great differentiation between the Triatoma and Rhodnius genomes. Fluorescence in situ hybridization (FISH) location of these repetitive DNAs in both species showed that they are dispersed on the euchromatic regions of all autosomes and the X chromosome. Regarding the Y chromosome, these common satellite DNAs are absent in T. infestans but they are present in the R. prolixus Y chromosome. These results support a different origin and/or evolution in the Y chromosome of both species.
Sequencing of mitochondrial genomes of nine Aspergillus and Penicillium species identifies mobile introns and accessory genes as main sources of genome size variability.

PubMed

Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C

2012-12-12

The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population studies. Despite the conservation of the core genes, the mitochondrial genomes of Aspergillus and Penicillium species examined here exhibit significant amount of interspecies variation. Most of this variation can be attributed to accessory genes and mobile introns, presumably acquired by horizontal gene transfer of mitochondrial plasmids and intron homing.
Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae

PubMed Central

Kim, Seong-Ryul; Kwak, Woori; Kim, Hyaekang; Kim, Kee-Young; Kim, Su-Bae; Choi, Kwang-Ho; Kim, Seong-Wan; Hwang, Jae-Sam; Kim, Minjee; Kim, Iksoo; Goo, Tae-Won

2018-01-01

Abstract Background Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available. Findings In order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (>2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation. Conclusions Here we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae. PMID:29186418
How to kill the honey bee larva: genomic potential and virulence mechanisms of Paenibacillus larvae.

PubMed

Djukic, Marvin; Brzuszkiewicz, Elzbieta; Fünfhaus, Anne; Voss, Jörn; Gollnow, Kathleen; Poppinga, Lena; Liesegang, Heiko; Garcia-Gonzalez, Eva; Genersch, Elke; Daniel, Rolf

2014-01-01

Paenibacillus larvae, a Gram positive bacterial pathogen, causes American Foulbrood (AFB), which is the most serious infectious disease of honey bees. In order to investigate the genomic potential of P. larvae, two strains belonging to two different genotypes were sequenced and used for comparative genome analysis. The complete genome sequence of P. larvae strain DSM 25430 (genotype ERIC II) consisted of 4,056,006 bp and harbored 3,928 predicted protein-encoding genes. The draft genome sequence of P. larvae strain DSM 25719 (genotype ERIC I) comprised 4,579,589 bp and contained 4,868 protein-encoding genes. Both strains harbored a 9.7 kb plasmid and encoded a large number of virulence-associated proteins such as toxins and collagenases. In addition, genes encoding large multimodular enzymes producing nonribosomally peptides or polyketides were identified. In the genome of strain DSM 25719 seven toxin associated loci were identified and analyzed. Five of them encoded putatively functional toxins. The genome of strain DSM 25430 harbored several toxin loci that showed similarity to corresponding loci in the genome of strain DSM 25719, but were non-functional due to point mutations or disruption by transposases. Although both strains cause AFB, significant differences between the genomes were observed including genome size, number and composition of transposases, insertion elements, predicted phage regions, and strain-specific island-like regions. Transposases, integrases and recombinases are important drivers for genome plasticity. A total of 390 and 273 mobile elements were found in strain DSM 25430 and strain DSM 25719, respectively. Comparative genomics of both strains revealed acquisition of virulence factors by horizontal gene transfer and provided insights into evolution and pathogenicity.
Efficient isolation method for high-quality genomic DNA from cicada exuviae.

PubMed

Nguyen, Hoa Quynh; Kim, Ye Inn; Borzée, Amaël; Jang, Yikweon

2017-10-01

In recent years, animal ethics issues have led researchers to explore nondestructive methods to access materials for genetic studies. Cicada exuviae are among those materials because they are cast skins that individuals left after molt and are easily collected. In this study, we aim to identify the most efficient extraction method to obtain high quantity and quality of DNA from cicada exuviae. We compared relative DNA yield and purity of six extraction protocols, including both manual protocols and available commercial kits, extracting from four different exoskeleton parts. Furthermore, amplification and sequencing of genomic DNA were evaluated in terms of availability of sequencing sequence at the expected genomic size. Both the choice of protocol and exuvia part significantly affected DNA yield and purity. Only samples that were extracted using the PowerSoil DNA Isolation kit generated gel bands of expected size as well as successful sequencing results. The failed attempts to extract DNA using other protocols could be partially explained by a low DNA yield from cicada exuviae and partly by contamination with humic acids that exist in the soil where cicada nymphs reside before emergence, as shown by spectroscopic measurements. Genomic DNA extracted from cicada exuviae could provide valuable information for species identification, allowing the investigation of genetic diversity across consecutive broods, or spatiotemporal variation among various populations. Consequently, we hope to provide a simple method to acquire pure genomic DNA applicable for multiple research purposes.
Priors in Whole-Genome Regression: The Bayesian Alphabet Returns

PubMed Central

Gianola, Daniel

2013-01-01

Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p. PMID:23636739
The complete mitochondrial genome of the citrus red mite Panonychus citri (Acari: Tetranychidae): high genome rearrangement and extremely truncated tRNAs

PubMed Central

2010-01-01

Background The family Tetranychidae (Chelicerata: Acari) includes ~1200 species, many of which are of agronomic importance. To date, mitochondrial genomes of only two Tetranychidae species have been sequenced, and it has been found that these two mitochondrial genomes are characterized by many unusual features in genome organization and structure such as gene order and nucleotide frequency. The scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). Information on Tetranychidae mitochondrial genomes is quite important for phylogenetic evaluation and population genetics, as well as the molecular evolution of functional genes such as acaricide-resistance genes. In this study, we sequenced the complete mitochondrial genome of Panonychus citri (Family Tetranychidae), a worldwide citrus pest, and provide a comparison to other Acari. Results The mitochondrial genome of P. citri is a typical circular molecule of 13,077 bp, and contains the complete set of 37 genes that are usually found in metazoans. This is the smallest mitochondrial genome within all sequenced Acari and other Chelicerata, primarily due to the significant size reduction of protein coding genes (PCGs), a large rRNA gene, and the A + T-rich region. The mitochondrial gene order for P. citri is the same as those for P. ulmi and Tetranychus urticae, but distinctly different from other Acari by a series of gene translocations and/or inversions. The majority of the P. citri mitochondrial genome has a high A + T content (85.28%), which is also reflected by AT-rich codons being used more frequently, but exhibits a positive GC-skew (0.03). The Acari mitochondrial nad1 exhibits a faster amino acid substitution rate than other genes, and the variation of nucleotide substitution patterns of PCGs is significantly correlated with the G + C content. Most tRNA genes of P. citri are extremely truncated and atypical (44-65, 54.1 ± 4.1 bp), lacking either the T- or D-arm, as found in P. ulmi, T. urticae, and other Acariform mites. Conclusions The P. citri mitochondrial gene order is markedly different from those of other chelicerates, but is conserved within the family Tetranychidae indicating that high rearrangements have occurred after Tetranychidae diverged from other Acari. Comparative analyses suggest that the genome size, gene order, gene content, codon usage, and base composition are strongly variable among Acari mitochondrial genomes. While extremely small and unusual tRNA genes seem to be common for Acariform mites, further experimental evidence is needed. PMID:20969792
A function accounting for training set size and marker density to model the average accuracy of genomic prediction.

PubMed

Erbe, Malena; Gredler, Birgit; Seefried, Franz Reinhold; Bapst, Beat; Simianer, Henner

2013-01-01

Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.
Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

PubMed

Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

2015-03-17

The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.
DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

PubMed

Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

2016-01-01

The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.
DNA methylation Landscape of body size variation in sheep.

PubMed

Cao, Jiaxue; Wei, Caihong; Liu, Dongming; Wang, Huihua; Wu, Mingming; Xie, Zhiyuan; Capellini, Terence D; Zhang, Li; Zhao, Fuping; Li, Li; Zhong, Tao; Wang, Linjie; Lu, Jian; Liu, Ruizao; Zhang, Shifang; Du, Yongfei; Zhang, Hongping; Du, Lixin

2015-10-16

Sub-populations of Chinese Mongolian sheep exhibit significant variance in body mass. In the present study, we sequenced the whole genome DNA methylation in these breeds to detect whether DNA methylation plays a role in determining the body mass of sheep by Methylated DNA immunoprecipitation - sequencing method. A high quality methylation map of Chinese Mongolian sheep was obtained in this study. We identified 399 different methylated regions located in 93 human orthologs, which were previously reported as body size related genes in human genome-wide association studies. We tested three regions in LTBP1, and DNA methylation of two CpG sites showed significant correlation with its RNA expression. Additionally, a particular set of differentially methylated windows enriched in the "development process" (GO: 0032502) was identified as potential candidates for association with body mass variation. Next, we validated small part of these windows in 5 genes; DNA methylation of SMAD1, TSC1 and AKT1 showed significant difference across breeds, and six CpG were significantly correlated with RNA expression. Interestingly, two CpG sites showed significant correlation with TSC1 protein expression. This study provides a thorough understanding of body size variation in sheep from an epigenetic perspective.
Empirical Bayes Estimation of Semi-parametric Hierarchical Mixture Models for Unbiased Characterization of Polygenic Disease Architectures

PubMed Central

Nishino, Jo; Kochi, Yuta; Shigemizu, Daichi; Kato, Mamoru; Ikari, Katsunori; Ochi, Hidenori; Noma, Hisashi; Matsui, Kota; Morizono, Takashi; Boroevich, Keith A.; Tsunoda, Tatsuhiko; Matsui, Shigeyuki

2018-01-01

Genome-wide association studies (GWAS) suggest that the genetic architecture of complex diseases consists of unexpectedly numerous variants with small effect sizes. However, the polygenic architectures of many diseases have not been well characterized due to lack of simple and fast methods for unbiased estimation of the underlying proportion of disease-associated variants and their effect-size distribution. Applying empirical Bayes estimation of semi-parametric hierarchical mixture models to GWAS summary statistics, we confirmed that schizophrenia was extremely polygenic [~40% of independent genome-wide SNPs are risk variants, most within odds ratio (OR = 1.03)], whereas rheumatoid arthritis was less polygenic (~4 to 8% risk variants, significant portion reaching OR = 1.05 to 1.1). For rheumatoid arthritis, stratified estimations revealed that expression quantitative loci in blood explained large genetic variance, and low- and high-frequency derived alleles were prone to be risk and protective, respectively, suggesting a predominance of deleterious-risk and advantageous-protective mutations. Despite genetic correlation, effect-size distributions for schizophrenia and bipolar disorder differed across allele frequency. These analyses distinguished disease polygenic architectures and provided clues for etiological differences in complex diseases. PMID:29740473
4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments.

PubMed

Raviram, Ramya; Rocha, Pedro P; Müller, Christian L; Miraldi, Emily R; Badri, Sana; Fu, Yi; Swanzey, Emily; Proudhon, Charlotte; Snetkova, Valentina; Bonneau, Richard; Skok, Jane A

2016-03-01

4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or "bait") that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes.
4C-ker: A Method to Reproducibly Identify Genome-Wide Interactions Captured by 4C-Seq Experiments

PubMed Central

Raviram, Ramya; Rocha, Pedro P.; Müller, Christian L.; Miraldi, Emily R.; Badri, Sana; Fu, Yi; Swanzey, Emily; Proudhon, Charlotte; Snetkova, Valentina

2016-01-01

4C-Seq has proven to be a powerful technique to identify genome-wide interactions with a single locus of interest (or “bait”) that can be important for gene regulation. However, analysis of 4C-Seq data is complicated by the many biases inherent to the technique. An important consideration when dealing with 4C-Seq data is the differences in resolution of signal across the genome that result from differences in 3D distance separation from the bait. This leads to the highest signal in the region immediately surrounding the bait and increasingly lower signals in far-cis and trans. Another important aspect of 4C-Seq experiments is the resolution, which is greatly influenced by the choice of restriction enzyme and the frequency at which it can cut the genome. Thus, it is important that a 4C-Seq analysis method is flexible enough to analyze data generated using different enzymes and to identify interactions across the entire genome. Current methods for 4C-Seq analysis only identify interactions in regions near the bait or in regions located in far-cis and trans, but no method comprehensively analyzes 4C signals of different length scales. In addition, some methods also fail in experiments where chromatin fragments are generated using frequent cutter restriction enzymes. Here, we describe 4C-ker, a Hidden-Markov Model based pipeline that identifies regions throughout the genome that interact with the 4C bait locus. In addition, we incorporate methods for the identification of differential interactions in multiple 4C-seq datasets collected from different genotypes or experimental conditions. Adaptive window sizes are used to correct for differences in signal coverage in near-bait regions, far-cis and trans chromosomes. Using several datasets, we demonstrate that 4C-ker outperforms all existing 4C-Seq pipelines in its ability to reproducibly identify interaction domains at all genomic ranges with different resolution enzymes. PMID:26938081
Complete genome sequence of an attenuated Sparfloxacin-resistant Streptococcus agalactiae strain 138spar

USDA-ARS?s Scientific Manuscript database

The complete genome of a sparfloxacin-resistant Streptococcus agalactiae vaccine strain 138spar is 1,838,126 bp in size. The genome has 1892 coding sequences and 82 RNAs. The annotation of the genome is added by the NCBI Prokaryotic Genome Annotation Pipeline. The publishing of this genome will allo...
Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits.

PubMed

Gebreyesus, Grum; Lund, Mogens S; Buitenhuis, Bart; Bovenhuis, Henk; Poulsen, Nina A; Janss, Luc G

2017-12-05

Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula. BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions. Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.
Mobile elements reveal small population size in the ancient ancestors of Homo sapiens.

PubMed

Huff, Chad D; Xing, Jinchuan; Rogers, Alan R; Witherspoon, David; Jorde, Lynn B

2010-02-02

The genealogies of different genetic loci vary in depth. The deeper the genealogy, the greater the chance that it will include a rare event, such as the insertion of a mobile element. Therefore, the genealogy of a region that contains a mobile element is on average older than that of the rest of the genome. In a simple demographic model, the expected time to most recent common ancestor (TMRCA) is doubled if a rare insertion is present. We test this expectation by examining single nucleotide polymorphisms around polymorphic Alu insertions from two completely sequenced human genomes. The estimated TMRCA for regions containing a polymorphic insertion is two times larger than the genomic average (P < <10(-30)), as predicted. Because genealogies that contain polymorphic mobile elements are old, they are shaped largely by the forces of ancient population history and are insensitive to recent demographic events, such as bottlenecks and expansions. Remarkably, the information in just two human DNA sequences provides substantial information about ancient human population size. By comparing the likelihood of various demographic models, we estimate that the effective population size of human ancestors living before 1.2 million years ago was 18,500, and we can reject all models where the ancient effective population size was larger than 26,000. This result implies an unusually small population for a species spread across the entire Old World, particularly in light of the effective population sizes of chimpanzees (21,000) and gorillas (25,000), which each inhabit only one part of a single continent.
Evolution of genome size and complexity in the rhabdoviridae.

PubMed

Walker, Peter J; Firth, Cadhla; Widen, Steven G; Blasdell, Kim R; Guzman, Hilda; Wood, Thomas G; Paradkar, Prasad N; Holmes, Edward C; Tesh, Robert B; Vasilakis, Nikos

2015-02-01

RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3' to 5' direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae.
Evolution of Genome Size and Complexity in the Rhabdoviridae

PubMed Central

Walker, Peter J.; Firth, Cadhla; Widen, Steven G.; Blasdell, Kim R.; Guzman, Hilda; Wood, Thomas G.; Paradkar, Prasad N.; Holmes, Edward C.; Tesh, Robert B.; Vasilakis, Nikos

2015-01-01

RNA viruses exhibit substantial structural, ecological and genomic diversity. However, genome size in RNA viruses is likely limited by a high mutation rate, resulting in the evolution of various mechanisms to increase complexity while minimising genome expansion. Here we conduct a large-scale analysis of the genome sequences of 99 animal rhabdoviruses, including 45 genomes which we determined de novo, to identify patterns of genome expansion and the evolution of genome complexity. All but seven of the rhabdoviruses clustered into 17 well-supported monophyletic groups, of which eight corresponded to established genera, seven were assigned as new genera, and two were taxonomically ambiguous. We show that the acquisition and loss of new genes appears to have been a central theme of rhabdovirus evolution, and has been associated with the appearance of alternative, overlapping and consecutive ORFs within the major structural protein genes, and the insertion and loss of additional ORFs in each gene junction in a clade-specific manner. Changes in the lengths of gene junctions accounted for as much as 48.5% of the variation in genome size from the smallest to the largest genome, and the frequency with which new ORFs were observed increased in the 3’ to 5’ direction along the genome. We also identify several new families of accessory genes encoded in these regions, and show that non-canonical expression strategies involving TURBS-like termination-reinitiation, ribosomal frame-shifts and leaky ribosomal scanning appear to be common. We conclude that rhabdoviruses have an unusual capacity for genomic plasticity that may be linked to their discontinuous transcription strategy from the negative-sense single-stranded RNA genome, and propose a model that accounts for the regular occurrence of genome expansion and contraction throughout the evolution of the Rhabdoviridae. PMID:25679389

High-speed and high-ratio referential genome compression.

PubMed

Liu, Yuansheng; Peng, Hui; Wong, Limsoon; Li, Jinyan

2017-11-01

The rapidly increasing number of genomes generated by high-throughput sequencing platforms and assembly algorithms is accompanied by problems in data storage, compression and communication. Traditional compression algorithms are unable to meet the demand of high compression ratio due to the intrinsic challenging features of DNA sequences such as small alphabet size, frequent repeats and palindromes. Reference-based lossless compression, by which only the differences between two similar genomes are stored, is a promising approach with high compression ratio. We present a high-performance referential genome compression algorithm named HiRGC. It is based on a 2-bit encoding scheme and an advanced greedy-matching search on a hash table. We compare the performance of HiRGC with four state-of-the-art compression methods on a benchmark dataset of eight human genomes. HiRGC takes <30 min to compress about 21 gigabytes of each set of the seven target genomes into 96-260 megabytes, achieving compression ratios of 217 to 82 times. This performance is at least 1.9 times better than the best competing algorithm on its best case. Our compression speed is also at least 2.9 times faster. HiRGC is stable and robust to deal with different reference genomes. In contrast, the competing methods' performance varies widely on different reference genomes. More experiments on 100 human genomes from the 1000 Genome Project and on genomes of several other species again demonstrate that HiRGC's performance is consistently excellent. The C ++ and Java source codes of our algorithm are freely available for academic and non-commercial use. They can be downloaded from https://github.com/yuansliu/HiRGC. jinyan.li@uts.edu.au. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Patterns of genomic and phenomic diversity in wine and table grapes

PubMed Central

Migicovsky, Zoë; Sawler, Jason; Gardner, Kyle M; Aradhya, Mallikarjuna K; Prins, Bernard H; Schwaninger, Heidi R; Bustamante, Carlos D; Buckler, Edward S; Zhong, Gan-Yuan; Brown, Patrick J; Myles, Sean

2017-01-01

Grapes are one of the most economically and culturally important crops worldwide, and they have been bred for both winemaking and fresh consumption. Here we evaluate patterns of diversity across 33 phenotypes collected over a 17-year period from 580 table and wine grape accessions that belong to one of the world’s largest grape gene banks, the grape germplasm collection of the United States Department of Agriculture. We find that phenological events throughout the growing season are correlated, and quantify the marked difference in size between table and wine grapes. By pairing publicly available historical phenotype data with genome-wide polymorphism data, we identify large effect loci controlling traits that have been targeted during domestication and breeding, including hermaphroditism, lighter skin pigmentation and muscat aroma. Breeding for larger berries in table grapes was traditionally concentrated in geographic regions where Islam predominates and alcohol was prohibited, whereas wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions. We uncover a novel locus with a suggestive association with berry size that harbors a signature of positive selection for larger berries. Our results suggest that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes. PMID:28791127
Patterns of genomic and phenomic diversity in wine and table grapes.

PubMed

Migicovsky, Zoë; Sawler, Jason; Gardner, Kyle M; Aradhya, Mallikarjuna K; Prins, Bernard H; Schwaninger, Heidi R; Bustamante, Carlos D; Buckler, Edward S; Zhong, Gan-Yuan; Brown, Patrick J; Myles, Sean

2017-01-01

Grapes are one of the most economically and culturally important crops worldwide, and they have been bred for both winemaking and fresh consumption. Here we evaluate patterns of diversity across 33 phenotypes collected over a 17-year period from 580 table and wine grape accessions that belong to one of the world's largest grape gene banks, the grape germplasm collection of the United States Department of Agriculture. We find that phenological events throughout the growing season are correlated, and quantify the marked difference in size between table and wine grapes. By pairing publicly available historical phenotype data with genome-wide polymorphism data, we identify large effect loci controlling traits that have been targeted during domestication and breeding, including hermaphroditism, lighter skin pigmentation and muscat aroma. Breeding for larger berries in table grapes was traditionally concentrated in geographic regions where Islam predominates and alcohol was prohibited, whereas wine grapes retained the ancestral smaller size that is more desirable for winemaking in predominantly Christian regions. We uncover a novel locus with a suggestive association with berry size that harbors a signature of positive selection for larger berries. Our results suggest that religious rules concerning alcohol consumption have had a marked impact on patterns of phenomic and genomic diversity in grapes.
Similar Efficacies of Selection Shape Mitochondrial and Nuclear Genes in Both Drosophila melanogaster and Homo sapiens.

PubMed

Cooper, Brandon S; Burrus, Chad R; Ji, Chao; Hahn, Matthew W; Montooth, Kristi L

2015-08-21

Deleterious mutations contribute to polymorphism even when selection effectively prevents their fixation. The efficacy of selection in removing deleterious mitochondrial mutations from populations depends on the effective population size (Ne) of the mitochondrial DNA and the degree to which a lack of recombination magnifies the effects of linked selection. Using complete mitochondrial genomes from Drosophila melanogaster and nuclear data available from the same samples, we reexamine the hypothesis that nonrecombining animal mitochondrial DNA harbor an excess of deleterious polymorphisms relative to the nuclear genome. We find no evidence of recombination in the mitochondrial genome, and the much-reduced level of mitochondrial synonymous polymorphism relative to nuclear genes is consistent with a reduction in Ne. Nevertheless, we find that the neutrality index, a measure of the excess of nonsynonymous polymorphism relative to the neutral expectation, is only weakly significantly different between mitochondrial and nuclear loci. This difference is likely the result of the larger proportion of beneficial mutations in X-linked relative to autosomal loci, and we find little to no difference between mitochondrial and autosomal neutrality indices. Reanalysis of published data from Homo sapiens reveals a similar lack of a difference between the two genomes, although previous studies have suggested a strong difference in both species. Thus, despite a smaller Ne, mitochondrial loci of both flies and humans appear to experience similar efficacies of purifying selection as do loci in the recombining nuclear genome. Copyright © 2015 Cooper et al.
Hybridization and genome size evolution: timing and magnitude of nuclear DNA content increases in Helianthus homoploid hybrid species

PubMed Central

Baack, Eric J.; Whitney, Kenneth D.; Rieseberg, Loren H.

2008-01-01

Summary Hybridization and polyploidy can induce rapid genomic changes, including the gain or loss of DNA, but the magnitude and timing of such changes are not well understood. The homoploid hybrid system in Helianthus (three hybrid-derived species and their two parents) provides an opportunity to examine the link between hybridization and genome size changes in a replicated fashion. Flow cytometry was used to estimate the nuclear DNA content in multiple populations of three homoploid hybrid Helianthus species (Helianthus anomalus, Helianthus deserticola, and Helianthus paradoxus), the parental species (Helianthus annuus and Helianthus petiolaris), synthetic hybrids, and natural hybrid-zone populations. Results confirm that hybrid-derived species have 50% more nuclear DNA than the parental species. Despite multiple origins, hybrid species were largely consistent in their DNA content across populations, although H. deserticola showed significant interpopulation differences. First- and sixth-generation synthetic hybrids and hybrid-zone plants did not show an increase from parental DNA content. First-generation hybrids differed in DNA content according to the maternal parent. In summary, hybridization by itself does not lead to increased nuclear DNA content in Helianthus, and the evolutionary forces responsible for the repeated increases in DNA content seen in the hybrid-derived species remain mysterious. PMID:15998412
De Novo Assembly of the Pneumocystis jirovecii Genome from a Single Bronchoalveolar Lavage Fluid Specimen from a Patient

PubMed Central

Cissé, Ousmane H.; Pagni, Marco; Hauser, Philippe M.

2012-01-01

ABSTRACT Pneumocystis jirovecii is a fungus that causes severe pneumonia in immunocompromised patients. However, its study is hindered by the lack of an in vitro culture method. We report here the genome of P. jirovecii that was obtained from a single bronchoalveolar lavage fluid specimen from a patient. The major challenge was the in silico sorting of the reads from a mixture representing the different organisms of the lung microbiome. This genome lacks virulence factors and most amino acid biosynthesis enzymes and presents reduced GC content and size. Together with epidemiological observations, these features suggest that P. jirovecii is an obligate parasite specialized in the colonization of human lungs, which causes disease only in immune-deficient individuals. This genome sequence will boost research on this deadly pathogen. PMID:23269827
GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

PubMed

Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

2013-06-01

Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.
Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

PubMed

Schnitzler, P; Darai, G

1989-09-01

The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.
Salix transect of Europe: variation in ploidy and genome size in willow-associated common nettle, Urtica dioica L. sens. lat., from Greece to arctic Norway

PubMed Central

Hidalgo, Oriane; Pellicer, Jaume; Percy, Diana; Leitch, Ilia J.

2016-01-01

Abstract Background The common stinging nettle, Urtica dioica L. sensu lato, is an invertebrate "superhost", its clonal patches maintaining large populations of insects and molluscs. It is extremely widespread in Europe and highly variable, and two ploidy levels (diploid and tetraploid) are known. However, geographical patterns in cytotype variation require further study. New information We assembled a collection of nettles in conjunction with a transect of Europe from the Aegean to Arctic Norway (primarily conducted to examine the diversity of Salix and Salix-associated insects). Using flow cytometry to measure genome size, our sample of 29 plants reveals 5 diploids and 24 tetraploids. Two diploids were found in SE Europe (Bulgaria and Romania) and three diploids in S. Finland. More detailed cytotype surveys in these regions are suggested. The tetraploid genome size (2C value) varied between accessions from 2.36 to 2.59 pg. The diploids varied from 1.31 to 1.35 pg per 2C nucleus, equivalent to a haploid genome size of c. 650 Mbp. Within the tetraploids, we find that the most northerly samples (from N. Finland and arctic Norway) have a generally higher genome size. This is possibly indicative of a distinct population in this region. PMID:27932918
Salix transect of Europe: variation in ploidy and genome size in willow-associated common nettle, Urtica dioica L. sens. lat., from Greece to arctic Norway.

PubMed

Cronk, Quentin; Hidalgo, Oriane; Pellicer, Jaume; Percy, Diana; Leitch, Ilia J

2016-01-01

The common stinging nettle, Urtica dioica L. sensu lato, is an invertebrate "superhost", its clonal patches maintaining large populations of insects and molluscs. It is extremely widespread in Europe and highly variable, and two ploidy levels (diploid and tetraploid) are known. However, geographical patterns in cytotype variation require further study. We assembled a collection of nettles in conjunction with a transect of Europe from the Aegean to Arctic Norway (primarily conducted to examine the diversity of Salix and Salix -associated insects). Using flow cytometry to measure genome size, our sample of 29 plants reveals 5 diploids and 24 tetraploids. Two diploids were found in SE Europe (Bulgaria and Romania) and three diploids in S. Finland. More detailed cytotype surveys in these regions are suggested. The tetraploid genome size (2C value) varied between accessions from 2.36 to 2.59 pg. The diploids varied from 1.31 to 1.35 pg per 2C nucleus, equivalent to a haploid genome size of c. 650 Mbp. Within the tetraploids, we find that the most northerly samples (from N. Finland and arctic Norway) have a generally higher genome size. This is possibly indicative of a distinct population in this region.
Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

PubMed Central

Coin, Lachlan J. M.; Steinfeld, Israel; Yakhini, Zohar; Sladek, Rob; Froguel, Philippe; Blakemore, Alexandra I. F.

2008-01-01

Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms. PMID:18769679
Retrotransposons Are the Major Contributors to the Expansion of the Drosophila ananassae Muller F Element

PubMed Central

Shaffer, Christopher D.; Chen, Elizabeth J.; Quisenberry, Thomas J.; Ko, Kevin; Braverman, John M.; Giarla, Thomas C.; Mortimer, Nathan T.; Reed, Laura K.; Smith, Sheryl T.; Robic, Srebrenka; McCartha, Shannon R.; Perry, Danielle R.; Prescod, Lindsay M.; Sheppard, Zenyth A.; Saville, Ken J.; McClish, Allison; Morlock, Emily A.; Sochor, Victoria R.; Stanton, Brittney; Veysey-White, Isaac C.; Revie, Dennis; Jimenez, Luis A.; Palomino, Jennifer J.; Patao, Melissa D.; Patao, Shane M.; Himelblau, Edward T.; Campbell, Jaclyn D.; Hertz, Alexandra L.; McEvilly, Maddison F.; Wagner, Allison R.; Youngblom, James; Bedi, Baljit; Bettincourt, Jeffery; Duso, Erin; Her, Maiye; Hilton, William; House, Samantha; Karimi, Masud; Kumimoto, Kevin; Lee, Rebekah; Lopez, Darryl; Odisho, George; Prasad, Ricky; Robbins, Holly Lyn; Sandhu, Tanveer; Selfridge, Tracy; Tsukashima, Kara; Yosif, Hani; Kokan, Nighat P.; Britt, Latia; Zoellner, Alycia; Spana, Eric P.; Chlebina, Ben T.; Chong, Insun; Friedman, Harrison; Mammo, Danny A.; Ng, Chun L.; Nikam, Vinayak S.; Schwartz, Nicholas U.; Xu, Thomas Q.; Burg, Martin G.; Batten, Spencer M.; Corbeill, Lindsay M.; Enoch, Erica; Ensign, Jesse J.; Franks, Mary E.; Haiker, Breanna; Ingles, Judith A.; Kirkland, Lyndsay D.; Lorenz-Guertin, Joshua M.; Matthews, Jordan; Mittig, Cody M.; Monsma, Nicholaus; Olson, Katherine J.; Perez-Aragon, Guillermo; Ramic, Alen; Ramirez, Jordan R.; Scheiber, Christopher; Schneider, Patrick A.; Schultz, Devon E.; Simon, Matthew; Spencer, Eric; Wernette, Adam C.; Wykle, Maxine E.; Zavala-Arellano, Elizabeth; McDonald, Mitchell J.; Ostby, Kristine; Wendland, Peter; DiAngelo, Justin R.; Ceasrine, Alexis M.; Cox, Amanda H.; Docherty, James E.B.; Gingras, Robert M.; Grieb, Stephanie M.; Pavia, Michael J.; Personius, Casey L.; Polak, Grzegorz L.; Beach, Dale L.; Cerritos, Heaven L.; Horansky, Edward A.; Sharif, Karim A.; Moran, Ryan; Parrish, Susan; Bickford, Kirsten; Bland, Jennifer; Broussard, Juliana; Campbell, Kerry; Deibel, Katelynn E.; Forka, Richard; Lemke, Monika C.; Nelson, Marlee B.; O'Keeffe, Catherine; Ramey, S. Mariel; Schmidt, Luke; Villegas, Paola; Jones, Christopher J.; Christ, Stephanie L.; Mamari, Sami; Rinaldi, Adam S.; Stity, Ghazal; Hark, Amy T.; Scheuerman, Mark; Silver Key, S. Catherine; McRae, Briana D.; Haberman, Adam S.; Asinof, Sam; Carrington, Harriette; Drumm, Kelly; Embry, Terrance; McGuire, Richard; Miller-Foreman, Drew; Rosen, Stella; Safa, Nadia; Schultz, Darrin; Segal, Matt; Shevin, Yakov; Svoronos, Petros; Vuong, Tam; Skuse, Gary; Paetkau, Don W.; Bridgman, Rachael K.; Brown, Charlotte M.; Carroll, Alicia R.; Gifford, Francesca M.; Gillespie, Julie Beth; Herman, Susan E.; Holtcamp, Krystal L.; Host, Misha A.; Hussey, Gabrielle; Kramer, Danielle M.; Lawrence, Joan Q.; Martin, Madeline M.; Niemiec, Ellen N.; O'Reilly, Ashleigh P.; Pahl, Olivia A.; Quintana, Guadalupe; Rettie, Elizabeth A.S.; Richardson, Torie L.; Rodriguez, Arianne E.; Rodriguez, Mona O.; Schiraldi, Laura; Smith, Joanna J.; Sugrue, Kelsey F.; Suriano, Lindsey J.; Takach, Kaitlyn E.; Vasquez, Arielle M.; Velez, Ximena; Villafuerte, Elizabeth J.; Vives, Laura T.; Zellmer, Victoria R.; Hauke, Jeanette; Hauser, Charles R.; Barker, Karolyn; Cannon, Laurie; Parsamian, Perouza; Parsons, Samantha; Wichman, Zachariah; Bazinet, Christopher W.; Johnson, Diana E.; Bangura, Abubakarr; Black, Jordan A.; Chevee, Victoria; Einsteen, Sarah A.; Hilton, Sarah K.; Kollmer, Max; Nadendla, Rahul; Stamm, Joyce; Fafara-Thompson, Antoinette E.; Gygi, Amber M.; Ogawa, Emmy E.; Van Camp, Matt; Kocsisova, Zuzana; Leatherman, Judith L.; Modahl, Cassie M.; Rubin, Michael R.; Apiz-Saab, Susana S.; Arias-Mejias, Suzette M.; Carrion-Ortiz, Carlos F.; Claudio-Vazquez, Patricia N.; Espada-Green, Debbie M.; Feliciano-Camacho, Marium; Gonzalez-Bonilla, Karina M.; Taboas-Arroyo, Mariela; Vargas-Franco, Dorianmarie; Montañez-Gonzalez, Raquel; Perez-Otero, Joseph; Rivera-Burgos, Myrielis; Rivera-Rosario, Francisco J.; Eisler, Heather L.; Alexander, Jackie; Begley, Samatha K.; Gabbard, Deana; Allen, Robert J.; Aung, Wint Yan; Barshop, William D.; Boozalis, Amanda; Chu, Vanessa P.; Davis, Jeremy S.; Duggal, Ryan N.; Franklin, Robert; Gavinski, Katherine; Gebreyesus, Heran; Gong, Henry Z.; Greenstein, Rachel A.; Guo, Averill D.; Hanson, Casey; Homa, Kaitlin E.; Hsu, Simon C.; Huang, Yi; Huo, Lucy; Jacobs, Sarah; Jia, Sasha; Jung, Kyle L.; Wai-Chee Kong, Sarah; Kroll, Matthew R.; Lee, Brandon M.; Lee, Paul F.; Levine, Kevin M.; Li, Amy S.; Liu, Chengyu; Liu, Max Mian; Lousararian, Adam P.; Lowery, Peter B.; Mallya, Allyson P.; Marcus, Joseph E.; Ng, Patrick C.; Nguyen, Hien P.; Patel, Ruchik; Precht, Hashini; Rastogi, Suchita; Sarezky, Jonathan M.; Schefkind, Adam; Schultz, Michael B.; Shen, Delia; Skorupa, Tara; Spies, Nicholas C.; Stancu, Gabriel; Vivian Tsang, Hiu Man; Turski, Alice L.; Venkat, Rohit; Waldman, Leah E.; Wang, Kaidi; Wang, Tracy; Wei, Jeffrey W.; Wu, Dennis Y.; Xiong, David D.; Yu, Jack; Zhou, Karen; McNeil, Gerard P.; Fernandez, Robert W.; Menzies, Patrick Gomez; Gu, Tingting; Buhler, Jeremy; Mardis, Elaine R.; Elgin, Sarah C.R.

2017-01-01

The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (∼5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae. To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae. Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains. PMID:28667019
Identification of Differentially Methylated Sites with Weak Methylation Effects

PubMed Central

Tran, Hong; Zhu, Hongxiao; Wu, Xiaowei; Kim, Gunjune; Clarke, Christopher R.; Larose, Hailey; Haak, David C.; Westwood, James H.; Zhang, Liqing

2018-01-01

Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities—both datasets have weak methylation effects of <1%—show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs. PMID:29419727
Measurement of replication structures at the nanometer scale using super-resolution light microscopy

PubMed Central

Baddeley, D.; Chagin, V. O.; Schermelleh, L.; Martin, S.; Pombo, A.; Carlton, P. M.; Gahl, A.; Domaing, P.; Birk, U.; Leonhardt, H.; Cremer, C.; Cardoso, M. C.

2010-01-01

DNA replication, similar to other cellular processes, occurs within dynamic macromolecular structures. Any comprehensive understanding ultimately requires quantitative data to establish and test models of genome duplication. We used two different super-resolution light microscopy techniques to directly measure and compare the size and numbers of replication foci in mammalian cells. This analysis showed that replication foci vary in size from 210 nm down to 40 nm. Remarkably, spatially modulated illumination (SMI) and 3D-structured illumination microscopy (3D-SIM) both showed an average size of 125 nm that was conserved throughout S-phase and independent of the labeling method, suggesting a basic unit of genome duplication. Interestingly, the improved optical 3D resolution identified 3- to 5-fold more distinct replication foci than previously reported. These results show that optical nanoscopy techniques enable accurate measurements of cellular structures at a level previously achieved only by electron microscopy and highlight the possibility of high-throughput, multispectral 3D analyses. PMID:19864256
[Variability of nuclear 18S-25S rDNA of Gentiana lutea L. in nature and in tissue culture in vitro].

PubMed

Mel'nyk, V M; Spiridonova, K V; Andrieiev, I O; Strashniuk, N M; Kunakh, V A

2004-01-01

18S-25S rDNA sequence in genomes of G. lutea plants from different natural populations and from tissue culture has been studied with blot-hybridization method. It was shown that ribosomal repeats are represented by the variants which differ for their size and for the presence of additional HindIII restriction site. Genome of individual plant usually possesses several variants of DNA repeats. Interpopulation variability according to their quantitative ratio and to the presence of some of them has been shown. Modifications of the range of rDNA repeats not exceeding intraspecific variability were observed in callus tissues in comparison with the plants of initial population. Non-randomness of genome modifications in the course of cell adaptation to in vitro conditions makes it possible to some extent to forecast these modifications in tissue culture.
Maternal alcohol consumption and offspring DNA methylation: findings from six general population-based birth cohorts

PubMed Central

Sharp, Gemma C; Arathimos, Ryan; Reese, Sarah E; Page, Christian M; Felix, Janine; Küpers, Leanne K; Rifas-Shiman, Sheryl L; Liu, Chunyu; Burrows, Kimberley; Zhao, Shanshan; Magnus, Maria C; Duijts, Liesbeth; Corpeleijn, Eva; DeMeo, Dawn L; Litonjua, Augusto; Baccarelli, Andrea; Hivert, Marie-France; Oken, Emily; Snieder, Harold; Jaddoe, Vincent; Nystad, Wenche; London, Stephanie J; Relton, Caroline L; Zuccolo, Luisa

2018-01-01

Aim: Alcohol consumption during pregnancy is sometimes associated with adverse outcomes in offspring, potentially mediated by epigenetic modifications. We aimed to investigate genome-wide DNA methylation in cord blood of newborns exposed to alcohol in utero. Materials & methods: We meta-analyzed information from six population-based birth cohorts within the Pregnancy and Childhood Epigenetics consortium. Results: We found no strong evidence of association at either individual CpGs or across larger regions of the genome. Conclusion: Our findings suggest no association between maternal alcohol consumption and offspring cord blood DNA methylation. This is in stark contrast to the multiple strong associations previous studies have found for maternal smoking, which is similarly socially patterned. However, it is possible that a combination of a larger sample size, higher doses, different timings of exposure, exploration of a different tissue and a more global assessment of genomic DNA methylation might show evidence of association. PMID:29172695
Evolution and genome architecture in fungal plant pathogens.

PubMed

Möller, Mareike; Stukenbrock, Eva H

2017-12-01

The fungal kingdom comprises some of the most devastating plant pathogens. Sequencing the genomes of fungal pathogens has shown a remarkable variability in genome size and architecture. Population genomic data enable us to understand the mechanisms and the history of changes in genome size and adaptive evolution in plant pathogens. Although transposable elements predominantly have negative effects on their host, fungal pathogens provide prominent examples of advantageous associations between rapidly evolving transposable elements and virulence genes that cause variation in virulence phenotypes. By providing homogeneous environments at large regional scales, managed ecosystems, such as modern agriculture, can be conducive for the rapid evolution and dispersal of pathogens. In this Review, we summarize key examples from fungal plant pathogen genomics and discuss evolutionary processes in pathogenic fungi in the context of molecular evolution, population genomics and agriculture.
Coordinated Changes in Mutation and Growth Rates Induced by Genome Reduction.

PubMed

Nishimura, Issei; Kurokawa, Masaomi; Liu, Liu; Ying, Bei-Wen

2017-07-05

Genome size is determined during evolution, but it can also be altered by genetic engineering in laboratories. The systematic characterization of reduced genomes provides valuable insights into the cellular properties that are quantitatively described by the global parameters related to the dynamics of growth and mutation. In the present study, we analyzed a small collection of W3110 Escherichia coli derivatives containing either the wild-type genome or reduced genomes of various lengths to examine whether the mutation rate, a global parameter representing genomic plasticity, was affected by genome reduction. We found that the mutation rates of these cells increased with genome reduction. The correlation between genome length and mutation rate, which has been reported for the evolution of bacteria, was also identified, intriguingly, for genome reduction. Gene function enrichment analysis indicated that the deletion of many of the genes encoding membrane and transport proteins play a role in the mutation rate changes mediated by genome reduction. Furthermore, the increase in the mutation rate with genome reduction was highly associated with a decrease in the growth rate in a nutrition-dependent manner; thus, poorer media showed a larger change that was of higher significance. This negative correlation was strongly supported by experimental evidence that the serial transfer of the reduced genome improved the growth rate and reduced the mutation rate to a large extent. Taken together, the global parameters corresponding to the genome, growth, and mutation showed a coordinated relationship, which might be an essential working principle for balancing the cellular dynamics appropriate to the environment. IMPORTANCE Genome reduction is a powerful approach for investigating the fundamental rules for living systems. Whether genetically disturbed genomes have any specific properties that are different from or similar to those of natively evolved genomes has been under investigation. In the present study, we found that Escherichia coli cells with reduced genomes showed accelerated nucleotide substitution errors (mutation rates), although these cells retained the normal DNA mismatch repair systems. Intriguingly, this finding of correlation between reduced genome size and a higher mutation rate was consistent with the reported evolution of mutation rates. Furthermore, the increased mutation rate was quantitatively associated with a decreased growth rate, indicating that the global parameters related to the genome, growth, and mutation, which represent the amount of genetic information, the efficiency of propagation, and the fidelity of replication, respectively, are dynamically coordinated. Copyright © 2017 Nishimura et al.
Moderating the Covariance Between Family Member’s Substance Use Behavior

PubMed Central

Eaves, Lindon J.; Neale, Michael C.

2014-01-01

Twin and family studies implicitly assume that the covariation between family members remains constant across differences in age between the members of the family. However, age-specificity in gene expression for shared environmental factors could generate higher correlations between family members who are more similar in age. Cohort effects (cohort × genotype or cohort × common environment) could have the same effects, and both potentially reduce effect sizes estimated in genome-wide association studies where the subjects are heterogeneous in age. In this paper we describe a model in which the covariance between twins and non-twin siblings is moderated as a function of age difference. We describe the details of the model and simulate data using a variety of different parameter values to demonstrate that model fitting returns unbiased parameter estimates. Power analyses are then conducted to estimate the sample sizes required to detect the effects of moderation in a design of twins and siblings. Finally, the model is applied to data on cigarette smoking. We find that (1) the model effectively recovers the simulated parameters, (2) the power is relatively low and therefore requires large sample sizes before small to moderate effect sizes can be found reliably, and (3) the genetic covariance between siblings for smoking behavior decays very rapidly. Result 3 implies that, e.g., genome-wide studies of smoking behavior that use individuals assessed at different ages, or belonging to different birth-year cohorts may have had substantially reduced power to detect effects of genotype on cigarette use. It also implies that significant special twin environmental effects can be explained by age-moderation in some cases. This effect likely contributes to the missing heritability paradox. PMID:24647834
Entire nucleotide sequences of Gossypium raimondii and G. arboreum mitochondrial genomes revealed A-genome species as cytoplasmic donor of the allotetraploid species.

PubMed

Chen, Z; Nie, H; Grover, C E; Wang, Y; Li, P; Wang, M; Pei, H; Zhao, Y; Li, S; Wendel, J F; Hua, J

2017-05-01

Cotton (Gossypium spp.) is commonly grouped into eight diploid genomic groups, designated A-G and K, and an allotetraploid genomic group, AD. Gossypium raimondii (D 5 ) and G. arboreum (A 2 ) are the putative contributors to the progenitor of G. hirsutum (AD 1 ), the economically important fibre-producing cotton species. Mitochondrial DNA from week-old etiolated seedlings was extracted from isolated organelles using discontinuous sucrose density gradient method. Mitochondrial genomes were sequenced, assembled, annotated and analysed in orderly. Gossypium raimondii (D 5 ) and G. arboreum (A 2 ) mitochondrial genomes were provided in this study. The mitochondrial genomes of two diploid species harboured circular genome of 643,914 bp (D 5 ) and 687,482 bp (A 2 ), respectively. They differ in size and number of repeat sequences, both contain illuminating triplicate sequences with 7317 and 10,246 bp, respectively, demonstrating dynamic difference and rearranged genome organisations. Comparing the D 5 and A 2 mitogenomes with mitogenomes of tetraploid Gossypium species (AD 1 , G. hirsutum; AD 2 , G. barbadense), a shared 11 kbp fragment loss was detected in allotetraploid species, three regions shared by G. arboreum (A 2 ), G. hirsutum (AD 1 ) and G. barbadense (AD 2 ), while eight regions were specific to G. raimondii (D 5 ). The presence/absence variations and gene-based phylogeny supported that A-genome is a cytoplasmic donor to the progenitor of allotetraploid species G. hirsutum and G. barbadense. The results present structure variations and phylogeny of Gossypium mitochondrial genome evolution. © 2017 The Authors. Plant Biology published by John Wiley & Sons Ltd on behalf of German Botanical Society, Royal Dutch Botanical Society.

Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles.

PubMed

Stepanauskas, Ramunas; Fergusson, Elizabeth A; Brown, Joseph; Poulton, Nicole J; Tupper, Ben; Labonté, Jessica M; Becraft, Eric D; Brown, Julia M; Pachiadaki, Maria G; Povilaitis, Tadas; Thompson, Brian P; Mascena, Corianna J; Bellows, Wendy K; Lubys, Arvydas

2017-07-20

Microbial single-cell genomics can be used to provide insights into the metabolic potential, interactions, and evolution of uncultured microorganisms. Here we present WGA-X, a method based on multiple displacement amplification of DNA that utilizes a thermostable mutant of the phi29 polymerase. WGA-X enhances genome recovery from individual microbial cells and viral particles while maintaining ease of use and scalability. The greatest improvements are observed when amplifying high G+C content templates, such as those belonging to the predominant bacteria in agricultural soils. By integrating WGA-X with calibrated index-cell sorting and high-throughput genomic sequencing, we are able to analyze genomic sequences and cell sizes of hundreds of individual, uncultured bacteria, archaea, protists, and viral particles, obtained directly from marine and soil samples, in a single experiment. This approach may find diverse applications in microbiology and in biomedical and forensic studies of humans and other multicellular organisms.Single-cell genomics can be used to study uncultured microorganisms. Here, Stepanauskas et al. present a method combining improved multiple displacement amplification and FACS, to obtain genomic sequences and cell size information from uncultivated microbial cells and viral particles in environmental samples.
Draft genome of the Peruvian scallop Argopecten purpuratus.

PubMed

Li, Chao; Liu, Xiao; Liu, Bo; Ma, Bin; Liu, Fengqiao; Liu, Guilong; Shi, Qiong; Wang, Chunde

2018-04-01

The Peruvian scallop, Argopecten purpuratus, is mainly cultured in southern Chile and Peru was introduced into China in the last century. Unlike other Argopecten scallops, the Peruvian scallop normally has a long life span of up to 7 to 10 years. Therefore, researchers have been using it to develop hybrid vigor. Here, we performed whole genome sequencing, assembly, and gene annotation of the Peruvian scallop, with an important aim to develop genomic resources for genetic breeding in scallops. A total of 463.19-Gb raw DNA reads were sequenced. A draft genome assembly of 724.78 Mb was generated (accounting for 81.87% of the estimated genome size of 885.29 Mb), with a contig N50 size of 80.11 kb and a scaffold N50 size of 1.02 Mb. Repeat sequences were calculated to reach 33.74% of the whole genome, and 26,256 protein-coding genes and 3,057 noncoding RNAs were predicted from the assembly. We generated a high-quality draft genome assembly of the Peruvian scallop, which will provide a solid resource for further genetic breeding and for the analysis of the evolutionary history of this economically important scallop.
Transposable element junctions in marker development and genomic characterization of barley

USDA-ARS?s Scientific Manuscript database

Barley is a model plant in genomic studies of Triticeae species. A complete barley genome sequence will facilitate not only barley breeding programs, but also those for related species. However, the large genome size and high repetitive sequence content complicate the barley genome assembly. The ma...
Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

USDA-ARS?s Scientific Manuscript database

The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...
Identification of a unique library of complex, but ordered, arrays of repetitive elements in the human genome and implication of their potential involvement in pathobiology.

PubMed

Lee, Kang-Hoon; Lee, Young-Kwan; Kwon, Deug-Nam; Chiu, Sophia; Chew, Victoria; Rah, Hyungchul; Kujawski, Gregory; Melhem, Ramzi; Hsu, Karen; Chung, Cecilia; Greenhalgh, David G; Cho, Kiho

2011-06-01

Approximately 2% of the human genome is reported to be occupied by genes. Various forms of repetitive elements (REs), both characterized and uncharacterized, are presumed to make up the vast majority of the rest of the genomes of human and other species. In conjunction with a comprehensive annotation of genes, information regarding components of genome biology, such as gene polymorphisms, non-coding RNAs, and certain REs, is found in human genome databases. However, the genome-wide profile of unique RE arrangements formed by different groups of REs has not been fully characterized yet. In this study, the entire human genome was subjected to an unbiased RE survey to establish a whole-genome profile of REs and their arrangements. Due to the limitation in query size within the bl2seq alignment program (National Center for Biotechnology Information [NCBI]) utilized for the RE survey, the entire NCBI reference human genome was fragmented into 6206 units of 0.5M nucleotides. A number of RE arrangements with varying complexities and patterns were identified throughout the genome. Each chromosome had unique profiles of RE arrangements and density, and high levels of RE density were measured near the centromere regions. Subsequently, 175 complex RE arrangements, which were selected throughout the genome, were subjected to a comparison analysis using five different human genome sequences. Interestingly, three of the five human genome databases shared the exactly same arrangement patterns and sequences for all 175 RE arrangement regions (a total of 12,765,625 nucleotides). The findings from this study demonstrate that a substantial fraction of REs in the human genome are clustered into various forms of ordered structures. Further investigations are needed to examine whether some of these ordered RE arrangements contribute to the human pathobiology as a functional genome unit. Copyright © 2011 Elsevier Inc. All rights reserved.
Chromosome banding in amphibia. XXIII. Giant W sex chromosomes and extremely small genomes in Eleutherodactylus euphronides and Eleutherodactylus shrevei (Anura, Leptodactylidae).

PubMed

Schmid, M; Feichtinger, W; Steinlein, C; Rupprecht, A; Haaf, T; Kaiser, H

2002-01-01

Highly differentiated, heteromorphic ZZ female symbol /ZW male symbol sex chromosomes were found in the karyotypes of the neotropical leptodactylid frogs Eleutherodactylus euphronides and E. shrevei. The W chromosomes are the largest heterochromatic, female-specific chromosomes so far discovered in the class Amphibia. The analyses of the banding patterns with AT- and GC base-pair specific fluorochromes show that the constitutive heterochromatin in the giant W chromosomes consists of various categories of repetitive DNA sequences. The W chromosomes of both species are similar in size, morphology and banding patterns, whereas their Z chromosomes exhibit conspicuous differences. In the cell nuclei of female animals, the W chromosomes form very prominent chromatin bodies (W chromatin). DNA flow cytometric measurements demonstrate clear differences in the DNA content of male and female erythrocytes caused by the giant W chromosome, and also shows that these Eleutherodactylus genomes are among the smallest of all amphibian genomes. The importance of the heteromorphic ZW sex chromosomes for the study of Z-linked genes, the similarities and differences of the two karyotypes, and the significance of the exceptionally small genomes are discussed. Copyright 2002 S. Karger AG, Basel
The cost of large numbers of hypothesis tests on power, effect size and sample size.

PubMed

Lazzeroni, L C; Ray, A

2012-01-01

Advances in high-throughput biology and computer science are driving an exponential increase in the number of hypothesis tests in genomics and other scientific disciplines. Studies using current genotyping platforms frequently include a million or more tests. In addition to the monetary cost, this increase imposes a statistical cost owing to the multiple testing corrections needed to avoid large numbers of false-positive results. To safeguard against the resulting loss of power, some have suggested sample sizes on the order of tens of thousands that can be impractical for many diseases or may lower the quality of phenotypic measurements. This study examines the relationship between the number of tests on the one hand and power, detectable effect size or required sample size on the other. We show that once the number of tests is large, power can be maintained at a constant level, with comparatively small increases in the effect size or sample size. For example at the 0.05 significance level, a 13% increase in sample size is needed to maintain 80% power for ten million tests compared with one million tests, whereas a 70% increase in sample size is needed for 10 tests compared with a single test. Relative costs are less when measured by increases in the detectable effect size. We provide an interactive Excel calculator to compute power, effect size or sample size when comparing study designs or genome platforms involving different numbers of hypothesis tests. The results are reassuring in an era of extreme multiple testing.
The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.

2011-04-29

In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspectmore » centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.« less
Collective Dynamics of Specific Gene Ensembles Crucial for Neutrophil Differentiation: The Existence of Genome Vehicles Revealed

PubMed Central

Giuliani, Alessandro; Tomita, Masaru

2010-01-01

Cell fate decision remarkably generates specific cell differentiation path among the multiple possibilities that can arise through the complex interplay of high-dimensional genome activities. The coordinated action of thousands of genes to switch cell fate decision has indicated the existence of stable attractors guiding the process. However, origins of the intracellular mechanisms that create “cellular attractor” still remain unknown. Here, we examined the collective behavior of genome-wide expressions for neutrophil differentiation through two different stimuli, dimethyl sulfoxide (DMSO) and all-trans-retinoic acid (atRA). To overcome the difficulties of dealing with single gene expression noises, we grouped genes into ensembles and analyzed their expression dynamics in correlation space defined by Pearson correlation and mutual information. The standard deviation of correlation distributions of gene ensembles reduces when the ensemble size is increased following the inverse square root law, for both ensembles chosen randomly from whole genome and ranked according to expression variances across time. Choosing the ensemble size of 200 genes, we show the two probability distributions of correlations of randomly selected genes for atRA and DMSO responses overlapped after 48 hours, defining the neutrophil attractor. Next, tracking the ranked ensembles' trajectories, we noticed that only certain, not all, fall into the attractor in a fractal-like manner. The removal of these genome elements from the whole genomes, for both atRA and DMSO responses, destroys the attractor providing evidence for the existence of specific genome elements (named “genome vehicle”) responsible for the neutrophil attractor. Notably, within the genome vehicles, genes with low or moderate expression changes, which are often considered noisy and insignificant, are essential components for the creation of the neutrophil attractor. Further investigations along with our findings might provide a comprehensive mechanistic view of cell fate decision. PMID:20725638
Comparative Genomics of Listeria Sensu Lato: Genus-Wide Differences in Evolutionary Dynamics and the Progressive Gain of Complex, Potentially Pathogenicity-Related Traits through Lateral Gene Transfer

PubMed Central

Chiara, Matteo; Caruso, Marta; D’Erchia, Anna Maria; Manzari, Caterina; Fraccalvieri, Rosa; Goffredo, Elisa; Latorre, Laura; Miccolupo, Angela; Padalino, Iolanda; Santagada, Gianfranco; Chiocco, Doriano; Pesole, Graziano; Horner, David S.; Parisi, Antonio

2015-01-01

Historically, genome-wide and molecular characterization of the genus Listeria has concentrated on the important human pathogen Listeria monocytogenes and a small number of closely related species, together termed Listeria sensu strictu. More recently, a number of genome sequences for more basal, and nonpathogenic, members of the Listeria genus have become available, facilitating a wider perspective on the evolution of pathogenicity and genome level evolutionary dynamics within the entire genus (termed Listeria sensu lato). Here, we have sequenced the genomes of additional Listeria fleischmannii and Listeria newyorkensis isolates and explored the dynamics of genome evolution in Listeria sensu lato. Our analyses suggest that acquisition of genetic material through gene duplication and divergence as well as through lateral gene transfer (mostly from outside Listeria) is widespread throughout the genus. Novel genetic material is apparently subject to rapid turnover. Multiple lines of evidence point to significant differences in evolutionary dynamics between the most basal Listeria subclade and all other congeners, including both sensu strictu and other sensu lato isolates. Strikingly, these differences are likely attributable to stochastic, population-level processes and contribute to observed variation in genome size across the genus. Notably, our analyses indicate that the common ancestor of Listeria sensu lato lacked flagella, which were acquired by lateral gene transfer by a common ancestor of Listeria grayi and Listeria sensu strictu, whereas a recently functionally characterized pathogenicity island, responsible for the capacity to produce cobalamin and utilize ethanolamine/propane-2-diol, was acquired in an ancestor of Listeria sensu strictu. PMID:26185097
Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

2006-02-06

The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, whichmore » is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.« less
Identification of cyanobacterial non-coding RNAs by comparative genome analysis.

PubMed

Axmann, Ilka M; Kensche, Philip; Vogel, Jörg; Kohl, Stefan; Herzel, Hanspeter; Hess, Wolfgang R

2005-01-01

Whole genome sequencing of marine cyanobacteria has revealed an unprecedented degree of genomic variation and streamlining. With a size of 1.66 megabase-pairs, Prochlorococcus sp. MED4 has the most compact of these genomes and it is enigmatic how the few identified regulatory proteins efficiently sustain the lifestyle of an ecologically successful marine microorganism. Small non-coding RNAs (ncRNAs) control a plethora of processes in eukaryotes as well as in bacteria; however, systematic searches for ncRNAs are still lacking for most eubacterial phyla outside the enterobacteria. Based on a computational prediction we show the presence of several ncRNAs (cyanobacterial functional RNA or Yfr) in several different cyanobacteria of the Prochlorococcus-Synechococcus lineage. Some ncRNA genes are present only in two or three of the four strains investigated, whereas the RNAs Yfr2 through Yfr5 are structurally highly related and are encoded by a rapidly evolving gene family as their genes exist in different copy numbers and at different sites in the four investigated genomes. One ncRNA, Yfr7, is present in at least seven other cyanobacteria. In addition, control elements for several ribosomal operons were predicted as well as riboswitches for thiamine pyrophosphate and cobalamin. This is the first genome-wide and systematic screen for ncRNAs in cyanobacteria. Several ncRNAs were both computationally predicted and their presence was biochemically verified. These RNAs may have regulatory functions and each shows a distinct phylogenetic distribution. Our approach can be applied to any group of microorganisms for which more than one total genome sequence is available for comparative analysis.
The Search for Therapeutic Bacteriophages Uncovers One New Subfamily and Two New Genera of Pseudomonas-Infecting Myoviridae

PubMed Central

Henry, Marine; Bobay, Louis-Marie; Chevallereau, Anne; Saussereau, Emilie; Ceyssens, Pieter-Jan; Debarbieux, Laurent

2015-01-01

In a previous study, six virulent bacteriophages PAK_P1, PAK_P2, PAK_P3, PAK_P4, PAK_P5 and CHA_P1 were evaluated for their in vivo efficacy in treating Pseudomonas aeruginosa infections using a mouse model of lung infection. Here, we show that their genomes are closely related to five other Pseudomonas phages and allow a subdivision into two clades, PAK_P1-like and KPP10-like viruses, based on differences in genome size, %GC and genomic contents, as well as number of tRNAs. These two clades are well delineated, with a mean of 86% and 92% of proteins considered homologous within individual clades, and 25% proteins considered homologous between the two clades. By ESI-MS/MS analysis we determined that their virions are composed of at least 25 different proteins and electron microscopy revealed a morphology identical to the hallmark Salmonella phage Felix O1. A search for additional bacteriophage homologs, using profiles of protein families defined from the analysis of the 11 genomes, identified 10 additional candidates infecting hosts from different species. By carrying out a phylogenetic analysis using these 21 genomes we were able to define a new subfamily of viruses, the Felixounavirinae within the Myoviridae family. The new Felixounavirinae subfamily includes three genera: Felixounalikevirus, PAK_P1likevirus and KPP10likevirus. Sequencing genomes of bacteriophages with therapeutic potential increases the quantity of genomic data on closely related bacteriophages, leading to establishment of new taxonomic clades and the development of strategies for analyzing viral genomes as presented in this article. PMID:25629728
The folding landscape of the epigenome

NASA Astrophysics Data System (ADS)

Olarte-Plata, Juan D.; Haddad, Noelle; Vaillant, Cédric; Jost, Daniel

2016-04-01

The role of the spatial organization of chromatin in gene regulation is a long-standing but still open question. Experimentally it has been shown that the genome is segmented into epigenomic chromatin domains that are organized into hierarchical sub-nuclear spatial compartments. However, whether this non-random spatial organization only reflects or indeed contributes—and how—to the regulation of genome function remains to be elucidated. To address this question, we recently proposed a quantitative description of the folding properties of the fly genome as a function of its epigenomic landscape using a polymer model with epigenomic-driven attractions. We propose in this article, to characterize more deeply the physical properties of the 3D epigenome folding. Using an efficient lattice version of the original block copolymer model, we study the structural and dynamical properties of chromatin and show that the size of epigenomic domains and asymmetries in sizes and in interaction strengths play a critical role in the chromatin organization. Finally, we discuss the biological implications of our findings. In particular, our predictions are quantitatively compatible with experimental data and suggest a different mean of self-interaction in euchromatin versus heterochromatin domains.
The first complete organellar genomes of an Antarctic red alga, Pyropia endiviifolia: insights into its genome architecture and phylogenetic position within genus Pyropia (Bangiales, Rhodophyta)

NASA Astrophysics Data System (ADS)

Xu, Kuipeng; Tang, Xianghai; Bi, Guiqi; Cao, Min; Wang, Lu; Mao, Yunxiang

2017-08-01

Pyropia species grow in the intertidal zone and are cold-water adapted. To date, most of the information about the whole plastid and mitochondrial genomes (ptDNA and mtDNA) of this genus is limited to Northern Hemisphere species. Here, we report the sequencing of the ptDNA and mtDNA of the Antarctic red alga Pyropia endiviifolia using the Illumina platform. The plastid genome (195 784 bp, 33.28% GC content) contains 210 protein-coding genes, 37 tRNA genes and 6 rRNA genes. The mitochondrial genome (34 603 bp, 30.5% GC content) contains 26 protein-coding genes, 25 tRNA genes and 2 rRNA genes. Our results suggest that the organellar genomes of Py. endiviifolia have a compact organization. Although the collinearity of these genomes is conserved compared with other Pyropia species, the genome sizes show significant differences, mainly because of the different copy numbers of rDNA operons in the ptDNA and group II introns in the mtDNA. The other Pyropia species have 2u20133 distinct intronic ORFs in their cox 1 genes, but Py. endiviifolia has no introns in its cox 1 gene. This has led to a smaller mtDNA than in other Pyropia species. The phylogenetic relationships within Pyropia were examined using concatenated gene sets from most of the available organellar genomes with both the maximum likelihood and Bayesian methods. The analysis revealed a sister taxa affiliation between the Antarctic species Py. endiviifolia and the North American species Py. kanakaensis.
Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

PubMed Central

2010-01-01

Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI), which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come. PMID:20398277
Quantifying Temporal Genomic Erosion in Endangered Species.

PubMed

Díez-Del-Molino, David; Sánchez-Barreiro, Fatima; Barnes, Ian; Gilbert, M Thomas P; Dalén, Love

2018-03-01

Many species have undergone dramatic population size declines over the past centuries. Although stochastic genetic processes during and after such declines are thought to elevate the risk of extinction, comparative analyses of genomic data from several endangered species suggest little concordance between genome-wide diversity and current population sizes. This is likely because species-specific life-history traits and ancient bottlenecks overshadow the genetic effect of recent demographic declines. Therefore, we advocate that temporal sampling of genomic data provides a more accurate approach to quantify genetic threats in endangered species. Specifically, genomic data from predecline museum specimens will provide valuable baseline data that enable accurate estimation of recent decreases in genome-wide diversity, increases in inbreeding levels, and accumulation of deleterious genetic variation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins.

PubMed

Ramírez-Sánchez, Obed; Pérez-Rodríguez, Paulino; Delaye, Luis; Tiessen, Axel

2016-12-01

Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400nt). Among subcellular compartments, membrane proteins are the largest (∼520aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Shotgun mitogenomics across body size classes in a local assemblage of tropical Diptera: Phylogeny, species diversity and mitochondrial abundance spectrum.

PubMed

Choo, Le Qin; Crampton-Platt, Alex; Vogler, Alfried P

2017-10-01

Mitochondrial genomes can be assembled readily from shotgun-sequenced DNA mixtures of mass-trapped arthropods ("mitochondrial metagenomics"), speeding up the taxonomic characterization. Bulk sequencing was conducted on some 800 individuals of Diptera obtained by canopy fogging of a single tree in Borneo dominated by small (<1.5 mm) individuals. Specimens were split into five body size classes for DNA extraction, to equalize read numbers across specimens and to study how body size, a key ecological trait, interacts with species and phylogenetic diversity. Genome assembly produced 304 orthologous mitochondrial contigs presumed to each represent a different species. The small-bodied fraction was the by far most species-rich (187 contigs). Identification of contigs was through phylogenetic analysis together with 56 reference mitogenomes, which placed most of the Bornean community into seven clades of small-bodied species, indicating phylogenetic conservation of body size. Mapping of shotgun reads against the mitogenomes showed wide ranges of read abundances within each size class. Ranked read abundance plots were largely log-linear, indicating a uniformly filled abundance spectrum, especially for small-bodied species. Small-bodied species differed greatly from other size classes in neutral metacommunity parameters, exhibiting greater levels of immigration, besides greater total community size. We suggest that the established uses of mitochondrial metagenomics for analysis of species and phylogenetic diversity can be extended to parameterize recent theories of community ecology and biodiversity, and by focusing on the number mitochondria, rather than individuals, a new theoretical framework for analysis of mitochondrial abundance spectra can be developed that incorporates metabolic activity approximated by the count of mitochondria. © 2017 John Wiley & Sons Ltd.
Resistance of Permafrost and Modern Acinetobacter lwoffii Strains to Heavy Metals and Arsenic Revealed by Genome Analysis.

PubMed

Mindlin, Sofia; Petrenko, Anatolii; Kurakov, Anton; Beletsky, Alexey; Mardanov, Andrey; Petrova, Mayya

2016-01-01

We performed whole-genome sequencing of five permafrost strains of Acinetobacter lwoffii (frozen for 15-3000 thousand years) and analyzed their resistance genes found in plasmids and chromosomes. Four strains contained multiple plasmids (8-12), which varied significantly in size (from 4,135 to 287,630 bp) and genetic structure; the fifth strain contained only two plasmids. All large plasmids and some medium-size and small plasmids contained genes encoding resistance to various heavy metals, including mercury, cobalt, zinc, cadmium, copper, chromium, and arsenic compounds. Most resistance genes found in the ancient strains of A . lwoffii had their closely related counterparts in modern clinical A . lwoffii strains that were also located on plasmids. The vast majority of the chromosomal resistance determinants did not possess complete sets of the resistance genes or contained truncated genes. Comparative analysis of various A . lwoffii and of A . baumannii strains discovered a number of differences between them: (i) chromosome sizes in A . baumannii exceeded those in A . lwoffii by about 20%; (ii) on the contrary, the number of plasmids in A . lwoffii and their total size were much higher than those in A . baumannii ; (iii) heavy metal resistance genes in the environmental A . lwoffii strains surpassed those in A . baumannii strains in the number and diversity and were predominantly located on plasmids. Possible reasons for these differences are discussed.

Resistance of Permafrost and Modern Acinetobacter lwoffii Strains to Heavy Metals and Arsenic Revealed by Genome Analysis

PubMed Central

Kurakov, Anton; Beletsky, Alexey; Mardanov, Andrey

2016-01-01

We performed whole-genome sequencing of five permafrost strains of Acinetobacter lwoffii (frozen for 15–3000 thousand years) and analyzed their resistance genes found in plasmids and chromosomes. Four strains contained multiple plasmids (8–12), which varied significantly in size (from 4,135 to 287,630 bp) and genetic structure; the fifth strain contained only two plasmids. All large plasmids and some medium-size and small plasmids contained genes encoding resistance to various heavy metals, including mercury, cobalt, zinc, cadmium, copper, chromium, and arsenic compounds. Most resistance genes found in the ancient strains of A. lwoffii had their closely related counterparts in modern clinical A. lwoffii strains that were also located on plasmids. The vast majority of the chromosomal resistance determinants did not possess complete sets of the resistance genes or contained truncated genes. Comparative analysis of various A. lwoffii and of A. baumannii strains discovered a number of differences between them: (i) chromosome sizes in A. baumannii exceeded those in A. lwoffii by about 20%; (ii) on the contrary, the number of plasmids in A. lwoffii and their total size were much higher than those in A. baumannii; (iii) heavy metal resistance genes in the environmental A. lwoffii strains surpassed those in A. baumannii strains in the number and diversity and were predominantly located on plasmids. Possible reasons for these differences are discussed. PMID:27795957
Seeking Optimal Region-Of-Interest (ROI) Single-Value Summary Measures for fMRI Studies in Imaging Genetics

PubMed Central

Tong, Yunxia; Chen, Qiang; Nichols, Thomas E.; Rasetti, Roberta; Callicott, Joseph H.; Berman, Karen F.; Weinberger, Daniel R.; Mattay, Venkata S.

2016-01-01

A data-driven hypothesis-free genome-wide association (GWA) approach in imaging genetics studies allows screening the entire genome to discover novel genes that modulate brain structure, chemistry, and function. However, a whole brain voxel-wise analysis approach in such genome-wide based imaging genetic studies can be computationally intense and also likely has low statistical power since a stringent multiple comparisons correction is needed for searching over the entire genome and brain. In imaging genetics with functional magnetic resonance imaging (fMRI) phenotypes, since many experimental paradigms activate focal regions that can be pre-specified based on a priori knowledge, reducing the voxel-wise search to single-value summary measures within a priori ROIs could prove efficient and promising. The goal of this investigation is to evaluate the sensitivity and reliability of different single-value ROI summary measures and provide guidance in future work. Four different fMRI databases were tested and comparisons across different groups (patients with schizophrenia, their siblings, vs. normal control subjects; across genotype groups) were conducted. Our results show that four of these measures, particularly those that represent values from the top most-activated voxels within an ROI are more powerful at reliably detecting group differences and generating greater effect sizes than the others. PMID:26974435
On the molecular mechanism of GC content variation among eubacterial genomes.

PubMed

Wu, Hao; Zhang, Zhang; Hu, Songnian; Yu, Jun

2012-01-10

As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years.
Genomes of the T4-related bacteriophages as windows on microbial genome evolution.

PubMed

Petrov, Vasiliy M; Ratnayaka, Swarnamala; Nolan, James M; Miller, Eric S; Karam, Jim D

2010-10-28

The T4-related bacteriophages are a group of bacterial viruses that share morphological similarities and genetic homologies with the well-studied Escherichia coli phage T4, but that diverge from T4 and each other by a number of genetically determined characteristics including the bacterial hosts they infect, the sizes of their linear double-stranded (ds) DNA genomes and the predicted compositions of their proteomes. The genomes of about 40 of these phages have been sequenced and annotated over the last several years and are compared here in the context of the factors that have determined their diversity and the diversity of other microbial genomes in evolution. The genomes of the T4 relatives analyzed so far range in size between ~160,000 and ~250,000 base pairs (bp) and are mosaics of one another, consisting of clusters of homology between them that are interspersed with segments that vary considerably in genetic composition between the different phage lineages. Based on the known biological and biochemical properties of phage T4 and the proteins encoded by the T4 genome, the T4 relatives reviewed here are predicted to share a genetic core, or "Core Genome" that determines the structural design of their dsDNA chromosomes, their distinctive morphology and the process of their assembly into infectious agents (phage morphogenesis). The Core Genome appears to be the most ancient genetic component of this phage group and constitutes a mere 12-15% of the total protein encoding potential of the typical T4-related phage genome. The high degree of genetic heterogeneity that exists outside of this shared core suggests that horizontal DNA transfer involving many genetic sources has played a major role in diversification of the T4-related phages and their spread to a wide spectrum of bacterial species domains in evolution. We discuss some of the factors and pathways that might have shaped the evolution of these phages and point out several parallels between their diversity and the diversity generally observed within all groups of interrelated dsDNA microbial genomes in nature.
Genomes of the T4-related bacteriophages as windows on microbial genome evolution

PubMed Central

2010-01-01

The T4-related bacteriophages are a group of bacterial viruses that share morphological similarities and genetic homologies with the well-studied Escherichia coli phage T4, but that diverge from T4 and each other by a number of genetically determined characteristics including the bacterial hosts they infect, the sizes of their linear double-stranded (ds) DNA genomes and the predicted compositions of their proteomes. The genomes of about 40 of these phages have been sequenced and annotated over the last several years and are compared here in the context of the factors that have determined their diversity and the diversity of other microbial genomes in evolution. The genomes of the T4 relatives analyzed so far range in size between ~160,000 and ~250,000 base pairs (bp) and are mosaics of one another, consisting of clusters of homology between them that are interspersed with segments that vary considerably in genetic composition between the different phage lineages. Based on the known biological and biochemical properties of phage T4 and the proteins encoded by the T4 genome, the T4 relatives reviewed here are predicted to share a genetic core, or "Core Genome" that determines the structural design of their dsDNA chromosomes, their distinctive morphology and the process of their assembly into infectious agents (phage morphogenesis). The Core Genome appears to be the most ancient genetic component of this phage group and constitutes a mere 12-15% of the total protein encoding potential of the typical T4-related phage genome. The high degree of genetic heterogeneity that exists outside of this shared core suggests that horizontal DNA transfer involving many genetic sources has played a major role in diversification of the T4-related phages and their spread to a wide spectrum of bacterial species domains in evolution. We discuss some of the factors and pathways that might have shaped the evolution of these phages and point out several parallels between their diversity and the diversity generally observed within all groups of interrelated dsDNA microbial genomes in nature. PMID:21029436
Orthogonal control of expression mean and variance by epigenetic features at different genomic loci

DOE PAGES

Dey, Siddharth S.; Foley, Jonathan E.; Limsirichai, Prajit; ...

2015-05-05

While gene expression noise has been shown to drive dramatic phenotypic variations, the molecular basis for this variability in mammalian systems is not well understood. Gene expression has been shown to be regulated by promoter architecture and the associated chromatin environment. However, the exact contribution of these two factors in regulating expression noise has not been explored. Using a dual-reporter lentiviral model system, we deconvolved the influence of the promoter sequence to systematically study the contribution of the chromatin environment at different genomic locations in regulating expression noise. By integrating a large-scale analysis to quantify mRNA levels by smFISH andmore » protein levels by flow cytometry in single cells, we found that mean expression and noise are uncorrelated across genomic locations. Furthermore, we showed that this independence could be explained by the orthogonal control of mean expression by the transcript burst size and noise by the burst frequency. Finally, we showed that genomic locations displaying higher expression noise are associated with more repressed chromatin, thereby indicating the contribution of the chromatin environment in regulating expression noise.« less
Bacteriophage prevalence in the genus Azospirillum and analysis of the first genome sequence of an Azospirillum brasilense integrative phage.

PubMed

Boyer, Mickaël; Haurat, Jacqueline; Samain, Sylvie; Segurens, Béatrice; Gavory, Frédérick; González, Víctor; Mavingui, Patrick; Rohr, René; Bally, René; Wisniewski-Dyé, Florence

2008-02-01

The prevalence of bacteriophages was investigated in 24 strains of four species of plant growth-promoting rhizobacteria belonging to the genus Azospirillum. Upon induction by mitomycin C, the release of phage particles was observed in 11 strains from three species. Transmission electron microscopy revealed two distinct sizes of particles, depending on the identity of the Azospirillum species, typical of the Siphoviridae family. Pulsed-field gel electrophoresis and hybridization experiments carried out on phage-encapsidated DNAs revealed that all phages isolated from A. lipoferum and A. doebereinerae strains had a size of about 10 kb whereas all phages isolated from A. brasilense strains displayed genome sizes ranging from 62 to 65 kb. Strong DNA hybridizing signals were shown for most phages hosted by the same species whereas no homology was found between phages harbored by different species. Moreover, the complete sequence of the A. brasilense Cd bacteriophage (phiAb-Cd) genome was determined as a double-stranded DNA circular molecule of 62,337 pb that encodes 95 predicted proteins. Only 14 of the predicted proteins could be assigned functions, some of which were involved in DNA processing, phage morphogenesis, and bacterial lysis. In addition, the phiAb-Cd complete genome was mapped as a prophage on a 570-kb replicon of strain A. brasilense Cd, and a region of 27.3 kb of phiAb-Cd was found to be duplicated on the 130-kb pRhico plasmid previously sequenced from A. brasilense Sp7, the parental strain of A. brasilense Cd.
Bacteriophage Prevalence in the Genus Azospirillum and Analysis of the First Genome Sequence of an Azospirillum brasilense Integrative Phage▿

PubMed Central

Boyer, Mickaël; Haurat, Jacqueline; Samain, Sylvie; Segurens, Béatrice; Gavory, Frédérick; González, Víctor; Mavingui, Patrick; Rohr, René; Bally, René; Wisniewski-Dyé, Florence

2008-01-01

The prevalence of bacteriophages was investigated in 24 strains of four species of plant growth-promoting rhizobacteria belonging to the genus Azospirillum. Upon induction by mitomycin C, the release of phage particles was observed in 11 strains from three species. Transmission electron microscopy revealed two distinct sizes of particles, depending on the identity of the Azospirillum species, typical of the Siphoviridae family. Pulsed-field gel electrophoresis and hybridization experiments carried out on phage-encapsidated DNAs revealed that all phages isolated from A. lipoferum and A. doebereinerae strains had a size of about 10 kb whereas all phages isolated from A. brasilense strains displayed genome sizes ranging from 62 to 65 kb. Strong DNA hybridizing signals were shown for most phages hosted by the same species whereas no homology was found between phages harbored by different species. Moreover, the complete sequence of the A. brasilense Cd bacteriophage (ΦAb-Cd) genome was determined as a double-stranded DNA circular molecule of 62,337 pb that encodes 95 predicted proteins. Only 14 of the predicted proteins could be assigned functions, some of which were involved in DNA processing, phage morphogenesis, and bacterial lysis. In addition, the ΦAb-Cd complete genome was mapped as a prophage on a 570-kb replicon of strain A. brasilense Cd, and a region of 27.3 kb of ΦAb-Cd was found to be duplicated on the 130-kb pRhico plasmid previously sequenced from A. brasilense Sp7, the parental strain of A. brasilense Cd. PMID:18065619
Genomic and phenotypic evidence for an incomplete domestication of South American grain amaranth (Amaranthus caudatus).

PubMed

Stetter, Markus G; Müller, Thomas; Schmid, Karl J

2017-02-01

The domestication syndrome comprises phenotypic changes that differentiate crops from their wild ancestors. We compared the genomic variation and phenotypic differentiation of the two putative domestication traits seed size and seed colour of the grain amaranth Amaranthus caudatus, which is an ancient crop of South America, and its two close wild relatives and putative ancestors A. hybridus and A. quitensis. Genotyping 119 accessions of the three species from the Andean region using genotyping by sequencing (GBS) resulted in 9485 SNPs that revealed a strong genetic differentiation of cultivated A. caudatus from its two relatives. A. quitensis and A. hybridus accessions did not cluster by their species assignment but formed mixed groups according to their geographic origin in Ecuador and Peru, respectively. A. caudatus had a higher genetic diversity than its close relatives and shared a high proportion of polymorphisms with their wild relatives consistent with the absence of a strong bottleneck or a high level of recent gene flow. Genome sizes and seed sizes were not significantly different between A. caudatus and its relatives, although a genetically distinct group of A. caudatus from Bolivia had significantly larger seeds. We conclude that despite a long history of human cultivation and selection for white grain colour, A. caudatus shows a weak genomic and phenotypic domestication syndrome and proposes that it is an incompletely domesticated crop species either because of weak selection or high levels of gene flow from its sympatric close undomesticated relatives that counteracted the fixation of key domestication traits. © 2016 John Wiley & Sons Ltd.
Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

USDA-ARS?s Scientific Manuscript database

The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...
Substantial variation in the extent of mitochondrial genome fragmentation among blood-sucking lice of mammals.

PubMed

Jiang, Haowei; Barker, Stephen C; Shao, Renfu

2013-01-01

Blood-sucking lice of humans have extensively fragmented mitochondrial (mt) genomes. Human head louse and body louse have their 37 mt genes on 20 minichromosomes. In human pubic louse, the 34 mt genes known are on 14 minichromosomes. To understand the process of mt genome fragmentation in the blood-sucking lice of mammals, we sequenced the mt genomes of the domestic pig louse, Haematopinus suis, and the wild pig louse, H. apri, which diverged from human lice approximately 65 Ma. The 37 mt genes of the pig lice are on nine circular minichromosomes; each minichromosome is 3-4 kb in size. The pig lice have four genes per minichromosome on average, in contrast to two genes per minichromosome in the human lice. One minichromosome of the pig lice has eight genes and is the most gene-rich minichromosome found in the sucking lice. Our results indicate substantial variation in the rate and extent of mt genome fragmentation among different lineages of the sucking lice.
Substantial Variation in the Extent of Mitochondrial Genome Fragmentation among Blood-Sucking Lice of Mammals

PubMed Central

Jiang, Haowei; Barker, Stephen C.; Shao, Renfu

2013-01-01

Blood-sucking lice of humans have extensively fragmented mitochondrial (mt) genomes. Human head louse and body louse have their 37 mt genes on 20 minichromosomes. In human pubic louse, the 34 mt genes known are on 14 minichromosomes. To understand the process of mt genome fragmentation in the blood-sucking lice of mammals, we sequenced the mt genomes of the domestic pig louse, Haematopinus suis, and the wild pig louse, H. apri, which diverged from human lice approximately 65 Ma. The 37 mt genes of the pig lice are on nine circular minichromosomes; each minichromosome is 3–4 kb in size. The pig lice have four genes per minichromosome on average, in contrast to two genes per minichromosome in the human lice. One minichromosome of the pig lice has eight genes and is the most gene-rich minichromosome found in the sucking lice. Our results indicate substantial variation in the rate and extent of mt genome fragmentation among different lineages of the sucking lice. PMID:23781098
The Adenovirus Genome Contributes to the Structural Stability of the Virion

PubMed Central

Saha, Bratati; Wong, Carmen M.; Parks, Robin J.

2014-01-01

Adenovirus (Ad) vectors are currently the most commonly used platform for therapeutic gene delivery in human gene therapy clinical trials. Although these vectors are effective, many researchers seek to further improve the safety and efficacy of Ad-based vectors through detailed characterization of basic Ad biology relevant to its function as a vector system. Most Ad vectors are deleted of key, or all, viral protein coding sequences, which functions to not only prevent virus replication but also increase the cloning capacity of the vector for foreign DNA. However, radical modifications to the genome size significantly decreases virion stability, suggesting that the virus genome plays a role in maintaining the physical stability of the Ad virion. Indeed, a similar relationship between genome size and virion stability has been noted for many viruses. This review discusses the impact of the genome size on Ad virion stability and emphasizes the need to consider this aspect of virus biology in Ad-based vector design. PMID:25254384
ChIP-seq.

PubMed

Kim, Tae Hoon; Dekker, Job

2018-05-01

Owing to its digital nature, ChIP-seq has become the standard method for genome-wide ChIP analysis. Using next-generation sequencing platforms (notably the Illumina Genome Analyzer), millions of short sequence reads can be obtained. The densities of recovered ChIP sequence reads along the genome are used to determine the binding sites of the protein. Although a relatively small amount of ChIP DNA is required for ChIP-seq, the current sequencing platforms still require amplification of the ChIP DNA by ligation-mediated PCR (LM-PCR). This protocol, which involves linker ligation followed by size selection, is the standard ChIP-seq protocol using an Illumina Genome Analyzer. The size-selected ChIP DNA is amplified by LM-PCR and size-selected for the second time. The purified ChIP DNA is then loaded into the Genome Analyzer. The ChIP DNA can also be processed in parallel for ChIP-chip results. © 2018 Cold Spring Harbor Laboratory Press.
Biased distributions and decay of long interspersed nuclear elements in the chicken genome.

PubMed

Abrusán, György; Krambeck, Hans-Jürgen; Junier, Thomas; Giordano, Joti; Warburton, Peter E

2008-01-01

The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates.
Annotated Draft Genome Assemblies for the Northern Bobwhite (Colinus virginianus) and the Scaled Quail (Callipepla squamata) Reveal Disparate Estimates of Modern Genome Diversity and Historic Effective Population Size.

PubMed

Oldeschulte, David L; Halley, Yvette A; Wilson, Miranda L; Bhattarai, Eric K; Brashear, Wesley; Hill, Joshua; Metz, Richard P; Johnson, Charles D; Rollins, Dale; Peterson, Markus J; Bickhart, Derek M; Decker, Jared E; Sewell, John F; Seabury, Christopher M

2017-09-07

Northern bobwhite ( Colinus virginianus ; hereafter bobwhite) and scaled quail ( Callipepla squamata ) populations have suffered precipitous declines across most of their US ranges. Illumina-based first- (v1.0) and second- (v2.0) generation draft genome assemblies for the scaled quail and the bobwhite produced N50 scaffold sizes of 1.035 and 2.042 Mb, thereby producing a 45-fold improvement in contiguity over the existing bobwhite assembly, and ≥90% of the assembled genomes were captured within 1313 and 8990 scaffolds, respectively. The scaled quail assembly (v1.0 = 1.045 Gb) was ∼20% smaller than the bobwhite (v2.0 = 1.254 Gb), which was supported by kmer-based estimates of genome size. Nevertheless, estimates of GC content (41.72%; 42.66%), genome-wide repetitive content (10.40%; 10.43%), and MAKER-predicted protein coding genes (17,131; 17,165) were similar for the scaled quail (v1.0) and bobwhite (v2.0) assemblies, respectively. BUSCO analyses utilizing 3023 single-copy orthologs revealed a high level of assembly completeness for the scaled quail (v1.0; 84.8%) and the bobwhite (v2.0; 82.5%), as verified by comparison with well-established avian genomes. We also detected 273 putative segmental duplications in the scaled quail genome (v1.0), and 711 in the bobwhite genome (v2.0), including some that were shared among both species. Autosomal variant prediction revealed ∼2.48 and 4.17 heterozygous variants per kilobase within the scaled quail (v1.0) and bobwhite (v2.0) genomes, respectively, and estimates of historic effective population size were uniformly higher for the bobwhite across all time points in a coalescent model. However, large-scale declines were predicted for both species beginning ∼15-20 KYA. Copyright © 2017 Oldeschulte et al.
Medium-sized tandem repeats represent an abundant component of the Drosophila virilis genome.

PubMed

Abdurashitov, Murat A; Gonchar, Danila A; Chernukhin, Valery A; Tomilov, Victor N; Tomilova, Julia E; Schostak, Natalia G; Zatsepina, Olga G; Zelentsova, Elena S; Evgen'ev, Michael B; Degtyarev, Sergey K H

2013-11-09

Previously, we developed a simple method for carrying out a restriction enzyme analysis of eukaryotic DNA in silico, based on the known DNA sequences of the genomes. This method allows the user to calculate lengths of all DNA fragments that are formed after a whole genome is digested at the theoretical recognition sites of a given restriction enzyme. A comparison of the observed peaks in distribution diagrams with the results from DNA cleavage using several restriction enzymes performed in vitro have shown good correspondence between the theoretical and experimental data in several cases. Here, we applied this approach to the annotated genome of Drosophila virilis which is extremely rich in various repeats. Here we explored the combined approach to perform the restriction analysis of D. virilis DNA. This approach enabled to reveal three abundant medium-sized tandem repeats within the D. virilis genome. While the 225 bp repeats were revealed previously in intergenic non-transcribed spacers between ribosomal genes of D. virilis, two other families comprised of 154 bp and 172 bp repeats were not described. Tandem Repeats Finder search demonstrated that 154 bp and 172 bp units are organized in multiple clusters in the genome of D. virilis. Characteristically, only 154 bp repeats derived from Helitron transposon are transcribed. Using in silico digestion in combination with conventional restriction analysis and sequencing of repeated DNA fragments enabled us to isolate and characterize three highly abundant families of medium-sized repeats present in the D. virilis genome. These repeats comprise a significant portion of the genome and may have important roles in genome function and structural integrity. Therefore, we demonstrated an approach which makes possible to investigate in detail the gross arrangement and expression of medium-sized repeats basing on sequencing data even in the case of incompletely assembled and/or annotated genomes.
Genome evolution in Reptilia: in silico chicken mapping of 12,000 BAC-end sequences from two reptiles and a basal bird

PubMed Central

2009-01-01

Background With the publication of the draft chicken genome and the recent production of several BAC clone libraries from non-avian reptiles and birds, it is now possible to undertake more detailed comparative genomic studies in Reptilia. Of interest in particular are the genomic events that transformed the large, repeat-rich genomes of mammals and non-avian reptiles into the minimalist chicken genome. We have used paired BAC end sequences (BESs) from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta) and emu (Dromaius novaehollandiae) to investigate patterns of sequence divergence, gene and retroelement content, and microsynteny between these species and chicken. Results From a total of 11,967 curated BESs, we successfully mapped 725, 773 and 2597 sequences in alligator, turtle, and emu, respectively, to sites in the draft chicken genome using a stringent BLAST protocol. Most commonly, sequences mapped to a single site in the chicken genome. Of 1675, 1828 and 2936 paired BESs obtained for alligator, turtle, and emu, respectively, a total of 34 (alligator, 2%), 24 (turtle, 1.3%) and 479 (emu, 16.3%) pairs were found to map with high confidence and in the correct orientation and with BAC-sized intermarker distances to single chicken chromosomes, including 25 such paired hits in emu mapping to the chicken Z chromosome. By determining the insert sizes of a subset of BAC clones from these three species, we also found a significant correlation between the intermarker distance in alligator and turtle and in chicken, with slopes as expected on the basis of the ratio of the genome sizes. Conclusion Our results suggest that a large number of small-scale chromosomal rearrangements and deletions in the lineage leading to chicken have drastically reduced the number of detected syntenies observed between the chicken and alligator, turtle, and emu genomes and imply that small deletions occurring widely throughout the genomes of reptilian and avian ancestors led to the ~50% reduction in genome size observed in birds compared to reptiles. We have also mapped and identified likely gene regions in hundreds of new BAC clones from these species. PMID:19607659
Genome evolution in Reptilia: in silico chicken mapping of 12,000 BAC-end sequences from two reptiles and a basal bird.

PubMed

Chapus, Charles; Edwards, Scott V

2009-07-14

With the publication of the draft chicken genome and the recent production of several BAC clone libraries from non-avian reptiles and birds, it is now possible to undertake more detailed comparative genomic studies in Reptilia. Of interest in particular are the genomic events that transformed the large, repeat-rich genomes of mammals and non-avian reptiles into the minimalist chicken genome. We have used paired BAC end sequences (BESs) from the American alligator (Alligator mississippiensis), painted turtle (Chrysemys picta) and emu (Dromaius novaehollandiae) to investigate patterns of sequence divergence, gene and retroelement content, and microsynteny between these species and chicken. From a total of 11,967 curated BESs, we successfully mapped 725, 773 and 2597 sequences in alligator, turtle, and emu, respectively, to sites in the draft chicken genome using a stringent BLAST protocol. Most commonly, sequences mapped to a single site in the chicken genome. Of 1675, 1828 and 2936 paired BESs obtained for alligator, turtle, and emu, respectively, a total of 34 (alligator, 2%), 24 (turtle, 1.3%) and 479 (emu, 16.3%) pairs were found to map with high confidence and in the correct orientation and with BAC-sized intermarker distances to single chicken chromosomes, including 25 such paired hits in emu mapping to the chicken Z chromosome. By determining the insert sizes of a subset of BAC clones from these three species, we also found a significant correlation between the intermarker distance in alligator and turtle and in chicken, with slopes as expected on the basis of the ratio of the genome sizes. Our results suggest that a large number of small-scale chromosomal rearrangements and deletions in the lineage leading to chicken have drastically reduced the number of detected syntenies observed between the chicken and alligator, turtle, and emu genomes and imply that small deletions occurring widely throughout the genomes of reptilian and avian ancestors led to the ~50% reduction in genome size observed in birds compared to reptiles. We have also mapped and identified likely gene regions in hundreds of new BAC clones from these species.
Global assessment of genomic variation in cattle by genome resequencing and high-throughput genotyping

PubMed Central

2011-01-01

Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336

Comparative Analysis of Transposable Elements Highlights Mobilome Diversity and Evolution in Vertebrates

PubMed Central

Chalopin, Domitille; Naville, Magali; Plard, Floriane; Galiana, Delphine; Volff, Jean-Nicolas

2015-01-01

Transposable elements (TEs) are major components of vertebrate genomes, with major roles in genome architecture and evolution. In order to characterize both common patterns and lineage-specific differences in TE content and TE evolution, we have compared the mobilomes of 23 vertebrate genomes, including 10 actinopterygian fish, 11 sarcopterygians, and 2 nonbony vertebrates. We found important variations in TE content (from 6% in the pufferfish tetraodon to 55% in zebrafish), with a more important relative contribution of TEs to genome size in fish than in mammals. Some TE superfamilies were found to be widespread in vertebrates, but most elements showed a more patchy distribution, indicative of multiple events of loss or gain. Interestingly, loss of major TE families was observed during the evolution of the sarcopterygian lineage, with a particularly strong reduction in TE diversity in birds and mammals. Phylogenetic trends in TE composition and activity were detected: Teleost fish genomes are dominated by DNA transposons and contain few ancient TE copies, while mammalian genomes have been predominantly shaped by nonlong terminal repeat retrotransposons, along with the persistence of older sequences. Differences were also found within lineages: The medaka fish genome underwent more recent TE amplification than the related platyfish, as observed for LINE retrotransposons in the mouse compared with the human genome. This study allows the identification of putative cases of horizontal transfer of TEs, and to tentatively infer the composition of the ancestral vertebrate mobilome. Taken together, the results obtained highlight the importance of TEs in the structure and evolution of vertebrate genomes, and demonstrate their major impact on genome diversity both between and within lineages. PMID:25577199
Comparative analysis of transposable elements highlights mobilome diversity and evolution in vertebrates.

PubMed

Chalopin, Domitille; Naville, Magali; Plard, Floriane; Galiana, Delphine; Volff, Jean-Nicolas

2015-01-09

Transposable elements (TEs) are major components of vertebrate genomes, with major roles in genome architecture and evolution. In order to characterize both common patterns and lineage-specific differences in TE content and TE evolution, we have compared the mobilomes of 23 vertebrate genomes, including 10 actinopterygian fish, 11 sarcopterygians, and 2 nonbony vertebrates. We found important variations in TE content (from 6% in the pufferfish tetraodon to 55% in zebrafish), with a more important relative contribution of TEs to genome size in fish than in mammals. Some TE superfamilies were found to be widespread in vertebrates, but most elements showed a more patchy distribution, indicative of multiple events of loss or gain. Interestingly, loss of major TE families was observed during the evolution of the sarcopterygian lineage, with a particularly strong reduction in TE diversity in birds and mammals. Phylogenetic trends in TE composition and activity were detected: Teleost fish genomes are dominated by DNA transposons and contain few ancient TE copies, while mammalian genomes have been predominantly shaped by nonlong terminal repeat retrotransposons, along with the persistence of older sequences. Differences were also found within lineages: The medaka fish genome underwent more recent TE amplification than the related platyfish, as observed for LINE retrotransposons in the mouse compared with the human genome. This study allows the identification of putative cases of horizontal transfer of TEs, and to tentatively infer the composition of the ancestral vertebrate mobilome. Taken together, the results obtained highlight the importance of TEs in the structure and evolution of vertebrate genomes, and demonstrate their major impact on genome diversity both between and within lineages. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The genome of the Erwinia amylovora phage PhiEaH1 reveals greater diversity and broadens the applicability of phages for the treatment of fire blight.

PubMed

Meczker, Katalin; Dömötör, Dóra; Vass, János; Rákhely, Gábor; Schneider, György; Kovács, Tamás

2014-01-01

The enterobacterium Erwinia amylovora is the causal agent of fire blight. This study presents the analysis of the complete genome of phage PhiEaH1, isolated from the soil surrounding an E. amylovora-infected apple tree in Hungary. Its genome is 218 kb in size, containing 244 ORFs. PhiEaH1 is the second E. amylovora infecting phage from the Siphoviridae family whose complete genome sequence was determined. Beside PhiEaH2, PhiEaH1 is the other active component of Erwiphage, the first bacteriophage-based pesticide on the market against E. amylovora. Comparative genome analysis in this study has revealed that PhiEaH1 not only differs from the 10 formerly sequenced E. amylovora bacteriophages belonging to other phage families, but also from PhiEaH2. Sequencing of more Siphoviridae phage genomes might reveal further diversity, providing opportunities for the development of even more effective biological control agents, phage cocktails against Erwinia fire blight disease of commercial fruit crops.
The draft genome sequence of cork oak

PubMed Central

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia

2018-01-01

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699
The draft genome sequence of cork oak.

PubMed

Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia

2018-05-22

Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.
Detection of genomic rearrangements in cucumber using genomecmp software

NASA Astrophysics Data System (ADS)

Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

2017-08-01

Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.
Influence of adaptive mutations, from thermal adaptation experiments, on the infection cycle of RNA bacteriophage Qβ.

PubMed

Kashiwagi, Akiko; Kadoya, Tamami; Kumasaka, Naoya; Kumagai, Tomofumi; Tsushima, Fumie Sano; Yomo, Tetsuya

2018-06-04

A population's growth rate is determined by multiple 'life history traits'. To quantitatively determine which life history traits should be improved to allow a living organism to adapt to an inhibitory environment is an important issue. Previously, we conducted thermal adaptation experiments on the RNA bacteriophage Qβ using three independent replicates and reported that all three end-point populations could grow at a temperature (43.6°C) that inhibited the growth of the ancestral strain. Even though the fitness values of the endpoint populations were almost the same, their genome sequence was not, indicating that the three thermally adapted populations may have different life history traits. In this study, we introduced each mutation observed in these three end-point populations into the cDNA of the Qβ genome and prepared three different mutants. Quantitative analysis showed that they tended to increase their fitness by increasing the adsorption rate to their host, shortening their latent period (i.e., the duration between phage infection and progeny release), and increasing the burst size (i.e., the number of progeny phages per infected cell), but all three mutants decreased their thermal stability. However, the degree to which these traits changed differed. The mutant with the least mutations showed a smaller decrease in thermal stability, the largest adsorption rate to the host, and the shortest latent period. These results indicated that several different adaptive routes exist by which Qβ can adapt to higher temperatures, even though Qβ is a simple RNA bacteriophage with a small genome size, encoding only four genes.
Analysis of Genome Plasticity in Pathogenic and Commensal Escherichia coli Isolates by Use of DNA Arrays

PubMed Central

Dobrindt, Ulrich; Agerer, Franziska; Michaelis, Kai; Janka, Andreas; Buchrieser, Carmen; Samuelson, Martin; Svanborg, Catharina; Gottschalk, Gerhard; Karch, Helge; Hacker, Jörg

2003-01-01

Genomes of prokaryotes differ significantly in size and DNA composition. Escherichia coli is considered a model organism to analyze the processes involved in bacterial genome evolution, as the species comprises numerous pathogenic and commensal variants. Pathogenic and nonpathogenic E. coli strains differ in the presence and absence of additional DNA elements contributing to specific virulence traits and also in the presence and absence of additional genetic information. To analyze the genetic diversity of pathogenic and commensal E. coli isolates, a whole-genome approach was applied. Using DNA arrays, the presence of all translatable open reading frames (ORFs) of nonpathogenic E. coli K-12 strain MG1655 was investigated in 26 E. coli isolates, including various extraintestinal and intestinal pathogenic E. coli isolates, 3 pathogenicity island deletion mutants, and commensal and laboratory strains. Additionally, the presence of virulence-associated genes of E. coli was determined using a DNA “pathoarray” developed in our laboratory. The frequency and distributional pattern of genomic variations vary widely in different E. coli strains. Up to 10% of the E. coli K-12-specific ORFs were not detectable in the genomes of the different strains. DNA sequences described for extraintestinal or intestinal pathogenic E. coli are more frequently detectable in isolates of the same origin than in other pathotypes. Several genes coding for virulence or fitness factors are also present in commensal E. coli isolates. Based on these results, the conserved E. coli core genome is estimated to consist of at least 3,100 translatable ORFs. The absence of K-12-specific ORFs was detectable in all chromosomal regions. These data demonstrate the great genome heterogeneity and genetic diversity among E. coli strains and underline the fact that both the acquisition and deletion of DNA elements are important processes involved in the evolution of prokaryotes. PMID:12618447
Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis.

PubMed

Wang, Yan; Stata, Matt; Wang, Wei; Stajich, Jason E; White, Merlin M; Moncalvo, Jean-Marc

2018-05-15

Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. IMPORTANCE Insect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered, are poorly known with respect to their biology within the insect guts. To understand the genomic features and related biology, we produced the whole-genome sequences of nine gut commensal fungi from disease-bearing insects (black flies, midges, and mosquitoes). The results show that insect gut fungi tend to have low GC content across their genomes. By comparing these commensals with entomopathogenic and free-living fungi that have available genome sequences, we found a universal core gene toolbox that is unique and thus potentially important for the insect-fungus symbiosis. This comparative work also uncovered different host invasion strategies employed by insect pathogens and commensals, as well as a model system to study ancient fungal genome duplication within the gut of insects. © Crown copyright 2018.
Genome features of Pseudomonas putida LS46, a novel polyhydroxyalkanoate producer and its comparison with other P. putida strains

PubMed Central

2014-01-01

A novel strain of Pseudomonas putida LS46 was isolated from wastewater on the basis of its ability to synthesize medium chain-length polyhydroxyalkanoates (mcl-PHAs). P.putida LS46 was differentiated from other P.putida strains on the basis of cpn60 (UT). The complete genome of P.putida LS46 was sequenced and annotated. Its chromosome is 5,86,2556 bp in size with GC ratio of 61.69. It is encoding 5316 genes, including 7 rRNA genes and 76 tRNA genes. Nucleotide sequence data of the complete P. putida LS46 genome was compared with nine other P. putida strains (KT2440, F1, BIRD-1, S16, ND6, DOT-T1E, UW4, W619 and GB-1) identified either as biocontrol agents or as bioremediation agents and isolated from different geographical region and different environment. BLASTn analysis of whole genome sequences of the ten P. putida strains revealed nucleotide sequence identities of 86.54 to 97.52%. P.putida genome arrangement was LS46 highly similar to P.putida BIRD1 and P.putida ND6 but was markedly different than P.putida DOT-T1E, P.putida UW4 and P.putida W619. Fatty acid biosynthesis (fab), fatty acid degradation (fad) and PHA synthesis genes were highly conserved among biocontrol and bioremediation P.putida strains. Six genes in pha operon of P. putida LS46 showed >98% homology at gene and proteins level. It appears that polyhydroxyalkanoate (PHA) synthesis is an intrinsic property of P. putida and was not affected by its geographic origin. However, all strains, including P. putida LS46, were different from one another on the basis of house keeping genes, and presence of plasmid, prophages, insertion sequence elements and genomic islands. While P. putida LS46 was not selected for plant growth promotion or bioremediation capacity, its genome also encoded genes for root colonization, pyoverdine synthesis, oxidative stress (present in other soil isolates), degradation of aromatic compounds, heavy metal resistance and nicotinic acid degradation, manganese (Mn II) oxidation. Genes for toluene or naphthalene degradation found in the genomes of P. putida F1, DOT-T1E, and ND6 were absent in the P. putida LS46 genome. Heavy metal resistant genes encoded by the P. putida W619 genome were also not present in the P. putida LS46 genome. Despite the overall similarity among genome of P.putida strains isolated for different applications and from different geographical location a number of differences were observed in genome arrangement, occurrence of transposon, genomic islands and prophage. It appears that P.putida strains had a common ancestor and by acquiring some specific genes by horizontal gene transfer it differed from other related strains. PMID:25401060
Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome

PubMed Central

Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying

2015-01-01

Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588
Mitochondrial Mutation Rate, Spectrum and Heteroplasmy in Caenorhabditis elegans Spontaneous Mutation Accumulation Lines of Differing Population Size.

PubMed

Konrad, Anke; Thompson, Owen; Waterston, Robert H; Moerman, Donald G; Keightley, Peter D; Bergthorsson, Ulfar; Katju, Vaishali

2017-06-01

Mitochondrial genomes of metazoans, given their elevated rates of evolution, have served as pivotal markers for phylogeographic studies and recent phylogenetic events. In order to determine the dynamics of spontaneous mitochondrial mutations in small populations in the absence and presence of selection, we evolved mutation accumulation (MA) lines of Caenorhabditis elegans in parallel over 409 consecutive generations at three varying population sizes of N = 1, 10, and 100 hermaphrodites. The N =1 populations should have a minimal influence of natural selection to provide the spontaneous mutation rate and the expected rate of neutral evolution, whereas larger population sizes should experience increasing intensity of selection. New mutations were identified by Illumina paired-end sequencing of 86 mtDNA genomes across 35 experimental lines and compared with published genomes of natural isolates. The spontaneous mitochondrial mutation rate was estimated at 1.05 × 10-7/site/generation. A strong G/C→A/T mutational bias was observed in both the MA lines and the natural isolates. This suggests that the low G + C content at synonymous sites is the product of mutation bias rather than selection as previously proposed. The mitochondrial effective population size per worm generation was estimated to be 62. Although it was previously concluded that heteroplasmy was rare in C. elegans, the vast majority of mutations in this study were heteroplasmic despite an experimental regime exceeding 400 generations. The frequencies of frameshift and nonsynonymous mutations were negatively correlated with population size, which suggests their deleterious effects on fitness and a potent role for selection in their eradication. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Measurement, variation, and scaling of osteocyte lacunae: a case study in birds.

PubMed

D'Emic, Michael D; Benson, Roger B J

2013-11-01

Basic issues surrounding osteocyte biology are still poorly understood, including the variability of osteocyte morphology within and among bones, individuals, and species. Several studies have suggested that the volume or shape of osteocytes (or their lacunae) is related to bone and/or organismal growth rate or metabolism, but the nature of this relationship, if any, is unclear. Furthermore, several studies have linked osteocyte lacuna volume with genome size or growth rate and suggested that osteocyte lacuna volume is unrelated to body size. Herein the scaling of osteocyte lacuna volume with body mass, growth and basal metabolic rates, genome size, and red blood cell size is examined using a broad sample of extant birds within a phylogenetic framework. Over 12,000 osteocyte lacuna axes were measured in a variety of bones from 34 avian and four non-avian dinosaur species. Osteocyte lacunae in parallel-fibered bone are scalene ellipsoids; their morphology and volume cannot be reliably estimated from any single thin section, and using a prolate ellipsoid model to estimate osteocyte lacuna volume results in a substantial (ca. 2-7 times) underestimate relative to true lacunar volume. Orthogonal thin sections reveal that in birds, even when only observing parallel-fibered, primary, cortical bone, intra-skeletal variation in osteocyte lacuna volume and shape is very high (volumes vary by a factor of 5.4 among different bones), whereas variation among homologous bones of the same species is low (1.2-44%; mean=12%). Ordinary and phylogenetically informed bivariate and multiple regressions demonstrate that in birds, osteocyte volume scales significantly but weakly with body mass and mass-specific basal metabolic rate and moderately with genome size, but not with erythrocyte size. Avian whole-body growth rate and osteocyte lacuna volume are weakly and inversely related. Finally, we present the first three-dimensionally calculated osteocyte volumes for several non-avian dinosaurs, which are much larger than previously reported values and smaller than those of large extant avians. Osteocyte volumes estimated from a single transverse section and assuming prolate morphology, as done in previous studies, are relative underestimates in theropod dinosaurs compared to sauropod dinosaurs, raising the possibility that no major change in osteocyte volumes (and genome size) occurred within Theropoda on the lineage leading to birds. Osteocyte volume is intertwined with several organismal attributes whose relative importance varies at a number of hierarchical levels. © 2013.
Variability among Cucurbitaceae species (melon, cucumber and watermelon) in a genomic region containing a cluster of NBS-LRR genes.

PubMed

Morata, Jordi; Puigdomènech, Pere

2017-02-08

Cucurbitaceae species contain a significantly lower number of genes coding for proteins with similarity to plant resistance genes belonging to the NBS-LRR family than other plant species of similar genome size. A large proportion of these genes are organized in clusters that appear to be hotspots of variability. The genomes of the Cucurbitaceae species measured until now are intermediate in size (between 350 and 450 Mb) and they apparently have not undergone any genome duplications beside those at the origin of eudicots. The cluster containing the largest number of NBS-LRR genes has previously been analyzed in melon and related species and showed a high degree of interspecific and intraspecific variability. It was of interest to study whether similar behavior occurred in other cluster of the same family of genes. The cluster of NBS-LRR genes located in melon chromosome 9 was analyzed and compared with the syntenic regions in other cucurbit genomes. This is the second cluster in number within this species and it contains nine sequences with a NBS-LRR annotation including two genes, Fom1 and Prv, providing resistance against Fusarium and Ppapaya ring-spot virus (PRSV). The variability within the melon species appears to consist essentially of single nucleotide polymorphisms. Clusters of similar genes are present in the syntenic regions of the two species of Cucurbitaceae that were sequenced, cucumber and watermelon. Most of the genes in the syntenic clusters can be aligned between species and a hypothesis of generation of the cluster is proposed. The number of genes in the watermelon cluster is similar to that in melon while a higher number of genes (12) is present in cucumber, a species with a smaller genome than melon. After comparing genome resequencing data of 115 cucumber varieties, deletion of a group of genes is observed in a group of varieties of Indian origin. Clusters of genes coding for NBS-LRR proteins in cucurbits appear to have specific variability in different regions of the genome and between different species. This observation is in favour of considering that the adaptation of plant species to changing environments is based upon the variability that may occur at any location in the genome and that has been produced by specific mechanisms of sequence variation acting on plant genomes. This information could be useful both to understand the evolution of species and for plant breeding.
The complete mitochondrial genomes for three Toxocara species of human and animal health significance.

PubMed

Li, Ming-Wei; Lin, Rui-Qing; Song, Hui-Qun; Wu, Xiang-Yun; Zhu, Xing-Quan

2008-05-16

Studying mitochondrial (mt) genomics has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms. Toxocara canis, Toxocara cati and Toxocara malaysiensis cause significant health problems in animals and humans. Although they are of importance in human and animal health, no information on the mt genomes for any of Toxocara species is available. The sizes of the entire mt genome are 14,322 bp for T. canis, 14029 bp for T. cati and 14266 bp for T. malaysiensis, respectively. These circular genomes are amongst the largest reported to date for all secernentean nematodes. Their relatively large sizes relate mainly to an increased length in the AT-rich region. The mt genomes of the three Toxocara species all encode 12 proteins, two ribosomal RNAs and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with all other species of Nematode studied to date, with the exception of Trichinella spiralis. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The contents of A+T of the complete genomes are 68.57% for T. canis, 69.95% for T. cati and 68.86% for T. malaysiensis, among which the A+T for T. canis is the lowest among all nematodes studied to date. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. The mt genome structures for three Toxocara species, including genes and non-coding regions, are in the same order as for Ascaris suum and Anisakis simplex, but differ from Ancylostoma duodenale, Necator americanus and Caenorhabditis elegans only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus,Dirofiliria immitis and Strongyloides stercoralis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes revealed that the newly described species T. malaysiensis was more closely related to T. cati than to T. canis, consistent with results of a previous study using sequences of nuclear internal transcribed spacers as genetic markers. The present study determined the complete mt genome sequences for three roundworms of human and animal health significance, which provides mtDNA evidence for the validity of T. malaysiensis and also provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.

PubMed

Disdero, Eric; Filée, Jonathan

2017-01-01

Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed. LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences. LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422.
Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context

PubMed Central

Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

2007-01-01

Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794
Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species

PubMed Central

Wang, Jing; Street, Nathaniel R.; Scofield, Douglas G.; Ingvarsson, Pär K.

2016-01-01

A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. PMID:26721855
Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species.

PubMed

Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K

2016-03-01

A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

PubMed

Raveendar, Sebastin; Na, Young-Wang; Lee, Jung-Ro; Shim, Donghwan; Ma, Kyung-Ho; Lee, Sok-Young; Chung, Jong-Wook

2015-07-20

Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.

Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.

PubMed

Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

2007-09-18

Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.
Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

PubMed

Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

2018-02-10

Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.
Long Terminal Repeat Retrotransposon Content in Eight Diploid Sunflower Species Inferred from Next-Generation Sequence Data

PubMed Central

Tetreault, Hannah M.; Ungerer, Mark C.

2016-01-01

The most abundant transposable elements (TEs) in plant genomes are Class I long terminal repeat (LTR) retrotransposons represented by superfamilies gypsy and copia. Amplification of these superfamilies directly impacts genome structure and contributes to differential patterns of genome size evolution among plant lineages. Utilizing short-read Illumina data and sequence information from a panel of Helianthus annuus (sunflower) full-length gypsy and copia elements, we explore the contribution of these sequences to genome size variation among eight diploid Helianthus species and an outgroup taxon, Phoebanthus tenuifolius. We also explore transcriptional dynamics of these elements in both leaf and bud tissue via RT-PCR. We demonstrate that most LTR retrotransposon sublineages (i.e., families) display patterns of similar genomic abundance across species. A small number of LTR retrotransposon sublineages exhibit lineage-specific amplification, particularly in the genomes of species with larger estimated nuclear DNA content. RT-PCR assays reveal that some LTR retrotransposon sublineages are transcriptionally active across all species and tissue types, whereas others display species-specific and tissue-specific expression. The species with the largest estimated genome size, H. agrestis, has experienced amplification of LTR retrotransposon sublineages, some of which have proliferated independently in other lineages in the Helianthus phylogeny. PMID:27233667
Computing prokaryotic gene ubiquity: rescuing the core from extinction.

PubMed

Charlebois, Robert L; Doolittle, W Ford

2004-12-01

The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
Draft genome sequences of two closely related aflatoxigenic Aspergillus species obtained from the Ivory Coast

USDA-ARS?s Scientific Manuscript database

The genomes of the A. ochraceoroseus and A. rambellii type strains were sequenced using a personal genome machine, followed by annotation of their genes. The genome size for A. ochraceoroseus was found to be approximately 23 Mb and contained 7,837 genes, while the A. rambellii genome was found to be...
Comparative phenotypic analysis of Gossypium raimondii with Upland cotton

USDA-ARS?s Scientific Manuscript database

Gossypium raimondii Ulbr., a wild species with a diploid genome, has been sequenced due to its small genome size and sequence similarity with the polyploidy cultivated Gossypium species. Accessibility of the G. raimondii genome has made the species a reference used extensively in cotton genomic and...
The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes.

PubMed

Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J

2012-02-09

The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.
Genome-wide identification and evolution of the PIN-FORMED (PIN) gene family in Glycine max.

PubMed

Liu, Yuan; Wei, Haichao

2017-07-01

Soybean (Glycine max) is one of the most important crop plants. Wild and cultivated soybean varieties have significant differences worth further investigation, such as plant morphology, seed size, and seed coat development; these characters may be related to auxin biology. The PIN gene family encodes essential transport proteins in cell-to-cell auxin transport, but little research on soybean PIN genes (GmPIN genes) has been done, especially with respect to the evolution and differences between wild and cultivated soybean. In this study, we retrieved 23 GmPIN genes from the latest updated G. max genome database; six GmPIN protein sequences were changed compared with the previous database. Based on the Plant Genome Duplication Database, 18 GmPIN genes have been involved in segment duplication. Three pairs of GmPIN genes arose after the second soybean genome duplication, and six occurred after the first genome duplication. The duplicated GmPIN genes retained similar expression patterns. All the duplicated GmPIN genes experienced purifying selection (K a /K s < 1) to prevent accumulation of non-synonymous mutations and thus remained more similar. In addition, we also focused on the artificial selection of the soybean PIN genes. Five artificially selected GmPIN genes were identified by comparing the genome sequence of 17 wild and 14 cultivated soybean varieties. Our research provides useful and comprehensive basic information for understanding GmPIN genes.
Molecular phylogeny and genome size evolution of the genus Betula (Betulaceae)

PubMed Central

Wang, Nian; McAllister, Hugh A.; Bartlett, Paul R.; Buggs, Richard J. A.

2016-01-01

Background and Aims Betula L. (birch) is a genus of approx. 60 species, subspecies or varieties with a wide distribution in the northern hemisphere, of ecological and economic importance. A new classification of Betula has recently been proposed based on morphological characters. This classification differs somewhat from previously published molecular phylogenies, which may be due to factors such as convergent evolution, hybridization, incomplete taxon sampling or misidentification of samples. While chromosome counts have been made for many species, few have had their genome size measured. The aim of this study is to produce a new phylogenetic and genome size analysis of the genus. Methods Internal transcribed spacer (ITS) regions of nuclear ribosomal DNA were sequenced for 76 Betula samples verified by taxonomic experts, representing approx. 60 taxa, of which approx. 24 taxa have not been included in previous phylogenetic analyses. A further 49 samples from other collections were also sequenced, and 108 ITS sequences were downloaded from GenBank. Phylogenetic trees were built for these sequences. The genome sizes of 103 accessions representing nearly all described species were estimated using flow cytometry. Key Results As expected for a gene tree of a genus where hybridization and allopolyploidy occur, the ITS tree shows clustering, but not resolved monophyly, for the morphological subgenera recently proposed. Most sections show some clustering, but species of the dwarf section Apterocaryon are unusually scattered. Betula corylifolia (subgenus Nipponobetula) unexpectedly clusters with species of subgenus Aspera. Unexpected placements are also found for B. maximowicziana, B. bomiensis, B. nigra and B. grossa. Biogeographical disjunctions were found within Betula between Europe and North America, and also disjunctions between North-east and South-west Asia. The 2C-values for Betula ranged from 0·88 to 5·33 pg, and polyploids are scattered widely throughout the ITS phylogeny. Species with large genomes tend to have narrow ranges. Conclusions Betula grossa may have formed via allopolyploidization between parents in subgenus Betula and subgenus Aspera. Betula bomiensis may also be a wide allopolyploid. Betula corylifolia may be a parental species of allopolyploids in the subsection Chinenses. Placements of B. maximowicziana, B. michauxii and B. nigra need further investigation. This analysis, in line with previous studies, suggests that section Apterocaryon is not monophyletic and thus dwarfism has evolved repeatedly in different lineages of Betula. Polyploidization has occurred many times independently in the evolution of Betula. PMID:27072644
Complete chloroplast genome sequence of a tree fern Alsophila spinulosa: insights into evolutionary changes in fern chloroplast genomes.

PubMed

Gao, Lei; Yi, Xuan; Yang, Yong-Xia; Su, Ying-Juan; Wang, Ting

2009-06-11

Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae). The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes. By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the polypod fern Adiantum in terms of gene content, gene order and GC content. However, there exist some striking differences between them: the trnR-UCG gene represents a putative molecular apomorphy of tree ferns; and the repeats observed at one inversion endpoint may be a vestige of some unknown rearrangement(s). This work provided fresh insights into the fern cp genome evolution as well as useful data for future phylogenetic studies.
Ecological genomics of adaptation and speciation in fungi.

PubMed

Leducq, Jean-Baptiste

2014-01-01

Fungi play a central role in both ecosystems and human societies. This is in part because they have adopted a large diversity of life history traits to conquer a wide variety of ecological niches. Here, I review recent fungal genomics studies that explored the molecular origins and the adaptive significance of this diversity. First, macro-ecological genomics studies revealed that fungal genomes were highly remodelled during their evolution. This remodelling, in terms of genome organization and size, occurred through the proliferation of non-coding elements, gene compaction, gene loss and the expansion of large families of adaptive genes. These features vary greatly among fungal clades, and are correlated with different life history traits such as multicellularity, pathogenicity, symbiosis, and sexual reproduction. Second, micro-ecological genomics studies, based on population genomics, experimental evolution and quantitative trait loci approaches, have allowed a deeper exploration of early evolutionary steps of the above adaptations. Fungi, and especially budding yeasts, were used intensively to characterize early mutations and chromosomal rearrangements that underlie the acquisition of new adaptive traits allowing them to conquer new ecological niches and potentially leading to speciation. By uncovering the ecological factors and genomic modifications that underline adaptation, these studies showed that Fungi are powerful models for ecological genomics (eco-genomics), and that this approach, so far mainly developed in a few model species, should be expanded to the whole kingdom.
Idiosyncratic Genome Degradation in a Bacterial Endosymbiont of Periodical Cicadas.

PubMed

Campbell, Matthew A; Łukasik, Piotr; Simon, Chris; McCutcheon, John P

2017-11-20

When a free-living bacterium transitions to a host-beneficial endosymbiotic lifestyle, it almost invariably loses a large fraction of its genome [1, 2]. The resulting small genomes often become stable in size, structure, and coding capacity [3-5], as exemplified by Sulcia muelleri, a nutritional endosymbiont of cicadas. Sulcia's partner endosymbiont, Hodgkinia cicadicola, similarly remains co-linear in some cicadas diverged by millions of years [6, 7]. But in the long-lived periodical cicada Magicicada tredecim, the Hodgkinia genome has split into dozens of tiny, gene-sparse circles that sometimes reside in distinct Hodgkinia cells [8]. Previous data suggested that all other Magicicada species harbor complex Hodgkinia populations, but the timing, number of origins, and outcomes of the splitting process were unknown. Here, by sequencing Hodgkinia metagenomes from the remaining six Magicicada and two sister species, we show that each Magicicada species harbors Hodgkinia populations of at least 20 genomic circles. We find little synteny among the 256 Hodgkinia circles analyzed except between the most closely related cicada species. Gene phylogenies show multiple Hodgkinia lineages in the common ancestor of Magicicada and its closest known relatives but that most splitting has occurred within Magicicada and has given rise to highly variable Hodgkinia gene dosages among species. These data show that Hodgkinia genome degradation has proceeded down different paths in different Magicicada species and support a model of genomic degradation that is stochastic in outcome and nonadaptive for the host. These patterns mirror the genomic instability seen in some mitochondria. Copyright © 2017 Elsevier Ltd. All rights reserved.
Efficient replication, and evolution of Sindbis virus genomes with non-canonical 3'A/U-rich elements (NC3ARE) in neonatal mice.

PubMed

James, Frederick D; Hietala, Katie A; Eldar, Dganit; Guess, Tiffany E; Cone, Cecil; Mundell, Nathan A; Mundall, Nathan; Barnett, Joey V; Raju, Ramaswamy

2007-12-01

Sindbis virus (SIN) is a mosquito-transmitted animal RNA virus. We previously reported that SIN genomes lacking a canonical 19 nt 3'CSE undergo novel repair processes in BHK cells to generate a library of stable atypical SIN genomes with non-canonical 3'A/U-rich elements (NC3AREs) adjacent to the 3' poly(A) tail [1]. To determine the stability and evolutionary pressures on the SIN genomes with NC3AREs to regain a 3'CSE, five representative SIN isolates and a wild type SIN were tested in newborn mice. The key findings of this study are: (a) all six SIN isolates, including those that have extensive NC3AREs in the 3'NTRs, replicate well and produce high titer viremia in newborn mice; (b) 7-9 successive passages of these isolates in newborn mice produced comparable levels of viremia; (c) while all isolates produced only small-sized plaques during primary infection in animals, both small- and large-sized plaques were generated in all other passages; (d) polymerase stuttering occurs on select 3' oligo(U) motifs to add more U residues within the NC3AREs; (e) the S3-8 isolate with an internal UAUUU motif in the 3'poly(A) tail maintains this element even after 9 passages in animals; (f) despite differences in 3'NTRs and variable tissue distribution, all SIN isolates appear to produce similar tissue pathology in infected animals. Competition experiments with wt SIN and atypical SIN isolates in BHK cells show dominance of wt SIN. As shown for BHK cells in culture, the 3'CSE of the SIN genome is not required for virus replication and genome stability in live animals. Since the NC3AREs of atypical SIN genomes are not specific to SIN replicases, alternate RNA motifs of alphavirus genome must confer specificity in template selection. These studies fulfill the need to confirm the long-term viability of atypical SIN genomes in newborn mice and offer a basis for exploring the use of atypical SIN genomes in biotechnology.
Codon usage bias: causative factors, quantification methods and genome-wide patterns: with emphasis on insect genomes.

PubMed

Behura, Susanta K; Severson, David W

2013-02-01

Codon usage bias refers to the phenomenon where specific codons are used more often than other synonymous codons during translation of genes, the extent of which varies within and among species. Molecular evolutionary investigations suggest that codon bias is manifested as a result of balance between mutational and translational selection of such genes and that this phenomenon is widespread across species and may contribute to genome evolution in a significant manner. With the advent of whole-genome sequencing of numerous species, both prokaryotes and eukaryotes, genome-wide patterns of codon bias are emerging in different organisms. Various factors such as expression level, GC content, recombination rates, RNA stability, codon position, gene length and others (including environmental stress and population size) can influence codon usage bias within and among species. Moreover, there has been a continuous quest towards developing new concepts and tools to measure the extent of codon usage bias of genes. In this review, we outline the fundamental concepts of evolution of the genetic code, discuss various factors that may influence biased usage of synonymous codons and then outline different principles and methods of measurement of codon usage bias. Finally, we discuss selected studies performed using whole-genome sequences of different insect species to show how codon bias patterns vary within and among genomes. We conclude with generalized remarks on specific emerging aspects of codon bias studies and highlight the recent explosion of genome-sequencing efforts on arthropods (such as twelve Drosophila species, species of ants, honeybee, Nasonia and Anopheles mosquitoes as well as the recent launch of a genome-sequencing project involving 5000 insects and other arthropods) that may help us to understand better the evolution of codon bias and its biological significance. © 2012 The Authors. Biological Reviews © 2012 Cambridge Philosophical Society.
Simple stochastic birth and death models of genome evolution: was there enough time for us to evolve?

PubMed

Karev, Georgy P; Wolf, Yuri I; Koonin, Eugene V

2003-10-12

The distributions of many genome-associated quantities, including the membership of paralogous gene families can be approximated with power laws. We are interested in developing mathematical models of genome evolution that adequately account for the shape of these distributions and describe the evolutionary dynamics of their formation. We show that simple stochastic models of genome evolution lead to power-law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced birth-and-death processes, in which domain duplication and deletion rates are asymptotically equal up to the second order. The simplest, linear BDIM shows an excellent fit to the observed distributions of domain family size in diverse prokaryotic and eukaryotic genomes. However, the stochastic version of the linear BDIM explored here predicts that the actual size of large paralogous families is reached on an unrealistically long timescale. We show that introduction of non-linearity, which might be interpreted as interaction of a particular order between individual family members, allows the model to achieve genome evolution rates that are much better compatible with the current estimates of the rates of individual duplication/loss events.
Evidence that viral RNAs have evolved for efficient, two-stage packaging.

PubMed

Borodavka, Alexander; Tuma, Roman; Stockley, Peter G

2012-09-25

Genome packaging is an essential step in virus replication and a potential drug target. Single-stranded RNA viruses have been thought to encapsidate their genomes by gradual co-assembly with capsid subunits. In contrast, using a single molecule fluorescence assay to monitor RNA conformation and virus assembly in real time, with two viruses from differing structural families, we have discovered that packaging is a two-stage process. Initially, the genomic RNAs undergo rapid and dramatic (approximately 20-30%) collapse of their solution conformations upon addition of cognate coat proteins. The collapse occurs with a substoichiometric ratio of coat protein subunits and is followed by a gradual increase in particle size, consistent with the recruitment of additional subunits to complete a growing capsid. Equivalently sized nonviral RNAs, including high copy potential in vivo competitor mRNAs, do not collapse. They do support particle assembly, however, but yield many aberrant structures in contrast to viral RNAs that make only capsids of the correct size. The collapse is specific to viral RNA fragments, implying that it depends on a series of specific RNA-protein interactions. For bacteriophage MS2, we have shown that collapse is driven by subsequent protein-protein interactions, consistent with the RNA-protein contacts occurring in defined spatial locations. Conformational collapse appears to be a distinct feature of viral RNA that has evolved to facilitate assembly. Aspects of this process mimic those seen in ribosome assembly.
Comparative Genomics of Listeria Sensu Lato: Genus-Wide Differences in Evolutionary Dynamics and the Progressive Gain of Complex, Potentially Pathogenicity-Related Traits through Lateral Gene Transfer.

PubMed

Chiara, Matteo; Caruso, Marta; D'Erchia, Anna Maria; Manzari, Caterina; Fraccalvieri, Rosa; Goffredo, Elisa; Latorre, Laura; Miccolupo, Angela; Padalino, Iolanda; Santagada, Gianfranco; Chiocco, Doriano; Pesole, Graziano; Horner, David S; Parisi, Antonio

2015-07-15

Historically, genome-wide and molecular characterization of the genus Listeria has concentrated on the important human pathogen Listeria monocytogenes and a small number of closely related species, together termed Listeria sensu strictu. More recently, a number of genome sequences for more basal, and nonpathogenic, members of the Listeria genus have become available, facilitating a wider perspective on the evolution of pathogenicity and genome level evolutionary dynamics within the entire genus (termed Listeria sensu lato). Here, we have sequenced the genomes of additional Listeria fleischmannii and Listeria newyorkensis isolates and explored the dynamics of genome evolution in Listeria sensu lato. Our analyses suggest that acquisition of genetic material through gene duplication and divergence as well as through lateral gene transfer (mostly from outside Listeria) is widespread throughout the genus. Novel genetic material is apparently subject to rapid turnover. Multiple lines of evidence point to significant differences in evolutionary dynamics between the most basal Listeria subclade and all other congeners, including both sensu strictu and other sensu lato isolates. Strikingly, these differences are likely attributable to stochastic, population-level processes and contribute to observed variation in genome size across the genus. Notably, our analyses indicate that the common ancestor of Listeria sensu lato lacked flagella, which were acquired by lateral gene transfer by a common ancestor of Listeria grayi and Listeria sensu strictu, whereas a recently functionally characterized pathogenicity island, responsible for the capacity to produce cobalamin and utilize ethanolamine/propane-2-diol, was acquired in an ancestor of Listeria sensu strictu. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Genome-wide analysis of LTR-retrotransposon diversity and its impact on the evolution of the genus Helianthus (L.).

PubMed

Mascagni, Flavia; Giordani, Tommaso; Ceccarelli, Marilena; Cavallini, Andrea; Natali, Lucia

2017-08-18

Genome divergence by mobile elements activity and recombination is a continuous process that plays a key role in the evolution of species. Nevertheless, knowledge on retrotransposon-related variability among species belonging to the same genus is still limited. Considering the importance of the genus Helianthus, a model system for studying the ecological genetics of speciation and adaptation, we performed a comparative analysis of the repetitive genome fraction across ten species and one subspecies of sunflower, focusing on long terminal repeat retrotransposons at superfamily, lineage and sublineage levels. After determining the relative genome size of each species, genomic DNA was isolated and subjected to Illumina sequencing. Then, different assembling and clustering approaches allowed exploring the repetitive component of all genomes. On average, repetitive DNA in Helianthus species represented more than 75% of the genome, being composed mostly by long terminal repeat retrotransposons. Also, the prevalence of Gypsy over Copia superfamily was observed and, among lineages, Chromovirus was by far the most represented. Although nearly all the same sublineages are present in all species, we found considerable variability in the abundance of diverse retrotransposon lineages and sublineages, especially between annual and perennial species. This large variability should indicate that different events of amplification or loss related to these elements occurred following species separation and should have been involved in species differentiation. Our data allowed us inferring on the extent of interspecific repetitive DNA variation related to LTR-RE abundance, investigating the relationship between changes of LTR-RE abundance and the evolution of the genus, and determining the degree of coevolution of different LTR-RE lineages or sublineages between and within species. Moreover, the data suggested that LTR-RE abundance in a species was affected by the annual or perennial habit of that species.
Genome-wide meta-analyses identify novel loci associated with n-3 and n-6 polyunsaturated fatty acid levels in Chinese and European-ancestry populations.

PubMed

Hu, Yao; Li, Huaixing; Lu, Ling; Manichaikul, Ani; Zhu, Jingwen; Chen, Yii-Der I; Sun, Liang; Liang, Shuang; Siscovick, David S; Steffen, Lyn M; Tsai, Michael Y; Rich, Stephen S; Lemaitre, Rozenn N; Lin, Xu

2016-03-15

Epidemiological studies suggest that levels of n-3 and n-6 long-chain polyunsaturated fatty acids are associated with risk of cardio-metabolic outcomes across different ethnic groups. Recent genome-wide association studies in populations of European ancestry have identified several loci associated with plasma and/or erythrocyte polyunsaturated fatty acids. To identify additional novel loci, we carried out a genome-wide association study in two population-based cohorts consisting of 3521 Chinese participants, followed by a trans-ethnic meta-analysis with meta-analysis results from 8962 participants of European ancestry. Four novel loci (MYB, AGPAT4, DGAT2 and PPT2) reached genome-wide significance in the trans-ethnic meta-analysis (log10(Bayes Factor) ≥ 6). Of them, associations of MYB and AGPAT4 with docosatetraenoic acid (log10(Bayes Factor) = 11.5 and 8.69, respectively) also reached genome-wide significance in the Chinese-specific genome-wide association analyses (P = 4.15 × 10(-14) and 4.30 × 10(-12), respectively), while associations of DGAT2 with gamma-linolenic acid (log10(Bayes Factor) = 6.16) and of PPT2 with docosapentaenoic acid (log10(Bayes Factor) = 6.24) were nominally significant in both Chinese- and European-specific genome-wide association analyses (P ≤ 0.003). We also confirmed previously reported loci including FADS1, NTAN1, NRBF2, ELOVL2 and GCKR. Different effect sizes in FADS1 and independent association signals in ELOVL2 were observed. These results provide novel insight into the genetic background of polyunsaturated fatty acids and their differences between Chinese and European populations. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Genome, transcriptome and methylome sequencing of a primitively eusocial wasp reveal a greatly reduced DNA methylation system in a social insect.

PubMed

Standage, Daniel S; Berens, Ali J; Glastad, Karl M; Severin, Andrew J; Brendel, Volker P; Toth, Amy L

2016-04-01

Comparative genomics of social insects has been intensely pursued in recent years with the goal of providing insights into the evolution of social behaviour and its underlying genomic and epigenomic basis. However, the comparative approach has been hampered by a paucity of data on some of the most informative social forms (e.g. incipiently and primitively social) and taxa (especially members of the wasp family Vespidae) for studying social evolution. Here, we provide a draft genome of the primitively eusocial model insect Polistes dominula, accompanied by analysis of caste-related transcriptome and methylome sequence data for adult queens and workers. Polistes dominula possesses a fairly typical hymenopteran genome, but shows very low genomewide GC content and some evidence of reduced genome size. We found numerous caste-related differences in gene expression, with evidence that both conserved and novel genes are related to caste differences. Most strikingly, these -omics data reveal a major reduction in one of the major epigenetic mechanisms that has been previously suggested to be important for caste differences in social insects: DNA methylation. Along with a conspicuous loss of a key gene associated with environmentally responsive DNA methylation (the de novo DNA methyltransferase Dnmt3), these wasps have greatly reduced genomewide methylation to almost zero. In addition to providing a valuable resource for comparative analysis of social insect evolution, our integrative -omics data for this important behavioural and evolutionary model system call into question the general importance of DNA methylation in caste differences and evolution in social insects. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

Prohibition of antibiotic growth promoters has affected the genomic profiles of Lactobacillus salivarius inhabiting the swine intestine.

PubMed

Lee, Jun-Yeong; Han, Geon Goo; Lee, Ho-Bin; Lee, Sang-Mok; Kang, Sang-Kee; Jin, Gwi-Deuk; Park, Jongbin; Chae, Byung Jo; Choi, Yo Han; Kim, Eun Bae; Choi, Yun-Jaie

2017-01-01

After the introduction of a ban on the use of antibiotic growth promoters (AGPs) for livestock, the feeding environment, including the composition of animal intestinal microbiota, has changed rapidly. We hypothesized that the microbial genomes have also been affected by this legal prohibition, and investigated an important member of the swine gut microbiota, Lactobacillus salivarius, with a pan-genomic approach. Here, we isolated 21 L. salivarius strains composed of 6 strains isolated before the AGP prohibition (SBPs) and 15 strains isolated after the AGP prohibition (SAPs) at an interval of a decade, and the draft genomes were generated de novo. Several genomic differences between SBPs and SAPs were identified, although the number and function of antibiotic resistance genes were not different. SBPs showed larger genome size and a higher number of orthologs, as well as lower genetic diversity, than SAPs. SBPs had genes associated with the utilization of L-rhamnose and D-tagatose for energy production. Because these sugars are also used in exopolysaccharide (EPS) synthesis, we tried to identify differences in biofilm formation-associated genes. The genes for the production of EPSs and extracellular proteins were different in terms of amino acid sequences. Indeed, SAPs formed dense biofilm and survived better than SBPs in the swine intestinal environment. These results suggest that SAPs have evolved and adapted to protect themselves from new selection pressure of the swine intestinal microenvironment by forming dense biofilms, adopting a distinct antibiotic resistance strategy. This finding is particularly important to understand the evolutionary changes in host-microbe interaction and provide detailed insight for the development of effective probiotics for livestock.
Heterogeneic dynamics of the structures of multiple gene clusters in two pathogenetically different lines originating from the same phytoplasma.

PubMed

Arashida, Ryo; Kakizawa, Shigeyuki; Hoshi, Ayaka; Ishii, Yoshiko; Jung, Hee-Young; Kagiwada, Satoshi; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

2008-04-01

Phytoplasmas are phloem-limited plant pathogens that are transmitted by insect vectors and are associated with diseases in hundreds of plant species. Despite their small sizes, phytoplasma genomes have repeat-rich sequences, which are due to several genes that are encoded as multiple copies. These multiple genes exist in a gene cluster, the potential mobile unit (PMU). PMUs are present at several distinct regions in the phytoplasma genome. The multicopy genes encoded by PMUs (herein named mobile unit genes [MUGs]) and similar genes elsewhere in the genome (herein named fundamental genes [FUGs]) are likely to have the same function based on their annotations. In this manuscript we show evidence that MUGs and FUGs do not cluster together within the same clade. Each MUG is in a cluster with a short branch length, suggesting that MUGs are recently diverged paralogs, whereas the origin of FUGs is different from that of MUGs. We also compared the genome structures around the lplA gene in two derivative lines of the 'Candidatus Phytoplasma asteris' OY strain, the severe-symptom line W (OY-W) and the mild-symptom line M (OY-M). The gene organizations of the nucleotide sequences upstream of the lplA genes of OY-W and OY-M were dramatically different. The tra5 insertion sequence, an element of PMUs, was found only in this region in OY-W. These results suggest that transposition of entire PMUs and PMU sections has occurred frequently in the OY phytoplasma genome. The difference in the pathogenicities of OY-W and OY-M might be caused by the duplication and transposition of PMUs, followed by genome rearrangement.
Unraveling the Sex Chromosome Heteromorphism of the Paradoxical Frog Pseudis tocantins

PubMed Central

Gatto, Kaleb Pretto; Busin, Carmen Silvia; Lourenço, Luciana Bolsoni

2016-01-01

The paradoxical frog Pseudis tocantins is the only species in the Hylidae family with known heteromorphic Z and W sex chromosomes. The Z chromosome is metacentric and presents an interstitial nucleolar organizer region (NOR) on the long arm that is adjacent to a pericentromeric heterochromatic band. In contrast, the submetacentric W chromosome carries a pericentromeric NOR on the long arm, which is adjacent to a clearly evident heterochromatic band that is larger than the band found on the Z chromosome and justify the size difference observed between these chromosomes. Here, we provide evidence that the non-centromeric heterochromatic bands in Zq and Wq differ not only in size and location but also in composition, based on comparative genomic hybridization (CGH) and an analysis of the anuran PcP190 satellite DNA. The finding of PcP190 sequences in P. tocantins extends the presence of this satellite DNA, which was previously detected among Leptodactylidae and Hylodidae, suggesting that this family of repetitive DNA is even older than it was formerly considered. Seven groups of PcP190 sequences were recognized in the genome of P. tocantins. PcP190 probes mapped to the heterochromatic band in Wq, and a Southern blot analysis indicated the accumulation of PcP190 in the female genome of P. tocantins, which suggests the involvement of this satellite DNA in the evolution of the sex chromosomes of this species. PMID:27214234
Mediterranean species of Caulerpa are polyploid with smaller genomes in the invasive ones.

PubMed

Varela-Álvarez, Elena; Gómez Garreta, Amelia; Rull Lluch, Jordi; Salvador Soler, Noemi; Serrao, Ester A; Siguán, María Antonia Ribera

2012-01-01

Caulerpa species are marine green algae, which often act as invasive species with rapid clonal proliferation when growing outside their native biogeographical borders. Despite many publications on the genetics and ecology of Caulerpa species, their life history and ploidy levels are still to be resolved and are the subject of large controversy. While some authors claimed that the thallus found in nature has a haplodiplobiontic life cycle with heteromorphic alternation of generations, other authors claimed a diploid or haploid life cycle with only one generation involved. DAPI-staining with image analysis and microspectrophotometry were used to estimate relative nuclear DNA contents in three species of Caulerpa from the Mediterranean, at individual, population and species levels. Results show that ploidy levels and genome size vary in these three Caulerpa species, with a reduction in genome size for the invasive ones. Caulerpa species in the Mediterranean are polyploids in different life history phases; all sampled C. taxifolia and C. racemosa var. cylindracea were in haplophasic phase, but in C. prolifera, the native species, individuals were found in both diplophasic and haplophasic phases. Different levels of endopolyploidy were found in both C. prolifera and C. racemosa var. cylindracea. Life history is elucidated for the Mediterranean C. prolifera and it is hypothesized that haplophasic dominance in C. racemosa var. cylindracea and C. taxifolia is a beneficial trait for their invasive strategies.
Genomic Analysis of Natural Variation for Seed and Plant Size in Maize (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

ScienceCinema

Kaeppler, Shawn

2018-02-01

Shawn Kaeppler from the University of Wisconsin-Madison on "Genomic Analysis of Biofuel Traits in Maize and Switchgrass" at the 7th Annual Genomics of Energy & Environment Meeting on March 21, 2012 in Walnut Creek, CA.
Detecting Recombination Hotspots from Patterns of Linkage Disequilibrium.

PubMed

Wall, Jeffrey D; Stevison, Laurie S

2016-08-09

With recent advances in DNA sequencing technologies, it has become increasingly easy to use whole-genome sequencing of unrelated individuals to assay patterns of linkage disequilibrium (LD) across the genome. One type of analysis that is commonly performed is to estimate local recombination rates and identify recombination hotspots from patterns of LD. One method for detecting recombination hotspots, LDhot, has been used in a handful of species to further our understanding of the basic biology of recombination. For the most part, the effectiveness of this method (e.g., power and false positive rate) is unknown. In this study, we run extensive simulations to compare the effectiveness of three different implementations of LDhot. We find large differences in the power and false positive rates of these different approaches, as well as a strong sensitivity to the window size used (with smaller window sizes leading to more accurate estimation of hotspot locations). We also compared our LDhot simulation results with comparable simulation results obtained from a Bayesian maximum-likelihood approach for identifying hotspots. Surprisingly, we found that the latter computationally intensive approach had substantially lower power over the parameter values considered in our simulations. Copyright © 2016 Wall and Stevison.
Mitochondrial genome sequences and comparative genomics ofPhytophthora ramorum and P. sojae

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martin, Frank N.; Douda, Bensasson; Tyler, Brett M.

The complete sequences of the mitochondrial genomes of theoomycetes of Phytophthora ramorum and P. sojae were determined during thecourse of their complete nuclear genome sequencing (Tyler, et al. 2006).Both are circular, with sizes of 39,314 bp for P. ramorum and 42,975 bpfor P. sojae. Each contains a total of 37 identifiable protein-encodinggenes, 25 or 26 tRNAs (P. sojae and P. ramorum, respectively)specifying19 amino acids, and a variable number of ORFs (7 for P. ramorum and 12for P. sojae) which are potentially additional functional genes.Non-coding regions comprise approximately 11.5 percent and 18.4 percentof the genomes of P. ramorum and P. sojae,more » respectively. Relative to P.sojae, there is an inverted repeat of 1,150 bp in P. ramorum thatincludes an unassigned unique ORF, a tRNA gene, and adjacent non-codingsequences, but otherwise the gene order in both species is identical.Comparisons of these genomes with published sequences of the P. infestansmitochondrial genome reveals a number of similarities, but the gene orderin P. infestans differs in two adjacent locations due to inversions.Sequence alignments of the three genomes indicated sequence conservationranging from 75 to 85 percent and that specific regions were morevariable than others.« less
Genomic imprinting in Drosophila has properties of both mammalian and insect imprinting.

PubMed

Anaka, Matthew; Lynn, Audra; McGinn, Patrick; Lloyd, Vett K

2009-02-01

Genomic imprinting is a process that marks DNA, causing a change in gene or chromosome behavior, depending on the sex of the transmitting parent. In mammals, most examples of genomic imprinting affect the transcription of individual or small clusters of genes whereas in insects, genomic imprinting tends to silence entire chromosomes. This has been interpreted as evidence of independent evolutionary origins for imprinting. To investigate how these types of imprinting are related, we performed a phenotypic, molecular, and cytological analysis of an imprinted chromosome in Drosophila melanogaster. Analysis of this chromosome reveals that the imprint results in transcriptional silencing. Yet, the domain of transcriptional silencing is very large, extending at least 1.2 Mb and encompassing over 100 genes, and is associated with decreased somatic polytenization of the entire chromosome. We propose that repression of somatic replication in polytenized cells, as a secondary response to the imprint, acts to extend the size of the imprinted domain to an entire chromosome. Thus, imprinting in Drosophila has properties of both typical mammalian and insect imprinting which suggests that genomic imprinting in Drosophila and mammals is not fundamentally different; imprinting is manifest as transcriptional silencing of a few genes or silencing of an entire chromosome depending on secondary processes such as differences in gene density and polytenization.
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.

2005-08-26

Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset.

PubMed

de Beer, Tjaart A P; Laskowski, Roman A; Parks, Sarah L; Sipos, Botond; Goldman, Nick; Thornton, Janet M

2013-01-01

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.
Exon–intron organization of genes in the slime mold Physarum polycephalum

PubMed Central

Trzcinska-Danielewicz, Joanna; Fronk, Jan

2000-01-01

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.

PubMed

Trzcinska-Danielewicz, J; Fronk, J

2000-09-15

The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
The battle of the sexes over seed size: support for both kinship genomic imprinting and interlocus contest evolution.

PubMed

Willi, Yvonne

2013-06-01

Outcrossing creates a venue for parental conflict. When one sex provides parental care to offspring fertilized by several partners, the nonproviding sex is under selection to maximally exploit the caring sex. The caring sex may counteradapt, and a coevolutionary arms race ensues. Genetic models of this conflict include the kinship theory of genomic imprinting (parent-of-origin-specific expression of maternal-care effectors) and interlocus conflict evolution (interaction between male selfish signals and female abatement). Predictions were tested by measuring the sizes of seeds produced by within-population crosses (diallel design) and between-population crosses in outcrossing and selfing populations of Arabidopsis lyrata. Within-population diallel crosses revealed substantial maternal variance in seed size in most populations. The comparison of between- and within-population crosses showed that seeds were larger when pollen came from another outcrossing population than when pollen came from a selfing or the same population, supporting interlocus contest evolution between male selfish genes and female recognition genes. Evidence for kinship genomic imprinting came from complementary trait means of seed size in reciprocal between-population crosses independent of whether populations were predominantly selfing or outcrossing. Hence, both kinship genomic imprinting and interlocus contest are supported in outcrossing Arabidopsis, whereas only kinship genomic imprinting is important in selfing populations.
Global Genomic Diversity of Oryza sativa Varieties Revealed by Comparative Physical Mapping

PubMed Central

Wang, Xiaoming; Kudrna, David A.; Pan, Yonglong; Wang, Hao; Liu, Lin; Lin, Haiyan; Zhang, Jianwei; Song, Xiang; Goicoechea, Jose Luis; Wing, Rod A.; Zhang, Qifa; Luo, Meizhong

2014-01-01

Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html). PMID:24424778
A Single Transcriptome of a Green Toad (Bufo viridis) Yields Candidate Genes for Sex Determination and -Differentiation and Non-Anonymous Population Genetic Markers

PubMed Central

Gerchen, Jörn F.; Reichert, Samuel J.; Röhr, Johannes T.; Dieterich, Christoph; Kloas, Werner

2016-01-01

Large genome size, including immense repetitive and non-coding fractions, still present challenges for capacity, bioinformatics and thus affordability of whole genome sequencing in most amphibians. Here, we test the performance of a single transcriptome to understand whether it can provide a cost-efficient resource for species with large unknown genomes. Using RNA from six different tissues from a single Palearctic green toad (Bufo viridis) specimen and Hiseq2000, we obtained 22,5 Mio reads and publish >100,000 unigene sequences. To evaluate efficacy and quality, we first use this data to identify green toad specific candidate genes, known from other vertebrates for their role in sex determination and differentiation. Of a list of 37 genes, the transcriptome yielded 32 (87%), many of which providing the first such data for this non-model anuran species. However, for many of these genes, only fragments could be retrieved. In order to allow also applications to population genetics, we further used the transcriptome for the targeted development of 21 non-anonymous microsatellites and tested them in genetic families and backcrosses. Eleven markers were specifically developed to be located on the B. viridis sex chromosomes; for eight markers we can indeed demonstrate sex-specific transmission in genetic families. Depending on phylogenetic distance, several markers, which are sex-linked in green toads, show high cross-amplification success across the anuran phylogeny, involving nine systematic anuran families. Our data support the view that single transcriptome sequencing (based on multiple tissues) provides a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and, despite limitations, should be considered as long as genome sequencing remains unaffordable for most species. PMID:27232626
Distinguishing noise from signal in patterns of genomic divergence in a highly polymorphic avian radiation.

PubMed

Campagna, Leonardo; Gronau, Ilan; Silveira, Luís Fábio; Siepel, Adam; Lovette, Irby J

2015-08-01

Recently diverged taxa provide the opportunity to search for the genetic basis of the phenotypes that distinguish them. Genomic scans aim to identify loci that are diverged with respect to an otherwise weakly differentiated genetic background. These loci are candidates for being past targets of selection because they behave differently from the rest of the genome that has either not yet differentiated or that may cross species barriers through introgressive hybridization. Here we use a reduced-representation genomic approach to explore divergence among six species of southern capuchino seedeaters, a group of recently radiated sympatric passerine birds in the genus Sporophila. For the first time in these taxa, we discovered a small proportion of markers that appeared differentiated among species. However, when assessing the significance of these signatures of divergence, we found that similar patterns can also be recovered from random grouping of individuals representing different species. A detailed demographic inference indicates that genetic differences among Sporophila species could be the consequence of neutral processes, which include a very large ancestral effective population size that accentuates the effects of incomplete lineage sorting. As these neutral phenomena can generate genomic scan patterns that mimic those of markers involved in speciation and phenotypic differentiation, they highlight the need for caution when ascertaining and interpreting differentiated markers between species, especially when large numbers of markers are surveyed. Our study provides new insights into the demography of the southern capuchino radiation and proposes controls to distinguish signal from noise in similar genomic scans. © 2015 John Wiley & Sons Ltd.
Construction of a Llama Bacterial Artificial Chromosome Library with Approximately 9-Fold Genome Equivalent Coverage

PubMed Central

Airmet, K. W.; Hinckley, J. D.; Tree, L. T.; Moss, M.; Blumell, S.; Ulicny, K.; Gustafson, A. K.; Weed, M.; Theodosis, R.; Lehnardt, M.; Genho, J.; Stevens, M. R.; Kooyman, D. L.

2012-01-01

The Ilama is an important agricultural livestock in much of South America. The llama is increasing in popularity in the United States as a companion animal. Little work has been done to improve llama production using modern technology. A paucity of information is available regarding the llama genome. We report the construction of a llama bacterial artificial chromosome (BAC) library of about 196,224 clones in the vector pECBAC1. Using flow cytometry and bovine, human, mouse, and chicken as controls, we determined the llama genome size to be 2.4 × 109 bp. The average insert size of the library is 137.8 kb corresponding to approximately 9-fold genome coverage. Further studies are needed to further characterize the library and llama genome. We anticipate that this new library will help facilitate future genomic studies in the llama. PMID:22811594
GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

PubMed

Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

2013-01-01

No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
D-GENIES: dot plot large genomes in an interactive, efficient and simple way.

PubMed

Cabanettes, Floréal; Klopp, Christophe

2018-01-01

Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.
Background Adjusted Alignment-Free Dissimilarity Measures Improve the Detection of Horizontal Gene Transfer.

PubMed

Tang, Kujin; Lu, Yang Young; Sun, Fengzhu

2018-01-01

Horizontal gene transfer (HGT) plays an important role in the evolution of microbial organisms including bacteria. Alignment-free methods based on single genome compositional information have been used to detect HGT. Currently, Manhattan and Euclidean distances based on tetranucleotide frequencies are the most commonly used alignment-free dissimilarity measures to detect HGT. By testing on simulated bacterial sequences and real data sets with known horizontal transferred genomic regions, we found that more advanced alignment-free dissimilarity measures such as CVTree and [Formula: see text] that take into account the background Markov sequences can solve HGT detection problems with significantly improved performance. We also studied the influence of different factors such as evolutionary distance between host and donor sequences, size of sliding window, and host genome composition on the performances of alignment-free methods to detect HGT. Our study showed that alignment-free methods can predict HGT accurately when host and donor genomes are in different order levels. Among all methods, CVTree with word length of 3, [Formula: see text] with word length 3, Markov order 1 and [Formula: see text] with word length 4, Markov order 1 outperform others in terms of their highest F 1 -score and their robustness under the influence of different factors.

Are there laws of genome evolution?

PubMed

Koonin, Eugene V

2011-08-01

Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.
Complete genome sequence of Rhizobium leguminosarum bv. trifolii strain WSM1325, an effective microsymbiont of annual Mediterranean clovers.

PubMed Central

Reeve, Wayne; O’Hara, Graham; Chain, Patrick; Ardley, Julie; Bräu, Lambert; Nandesena, Kemanthi; Tiwari, Ravi; Copeland, Alex; Nolan, Matt; Han, Cliff; Brettin, Thomas; Land, Miriam; Ovchinikova, Galina; Ivanova, Natalia; Mavromatis, Konstantinos; Markowitz, Victor; Kyrpides, Nikos; Melino, Vanessa; Denton, Matthew; Yates, Ron; Howieson, John

2010-01-01

Rhizobium leguminosarum bv trifolii is a soil-inhabiting bacterium that has the capacity to be an effective nitrogen fixing microsymbiont of a diverse range of annual Trifolium (clover) species. Strain WSM1325 is an aerobic, motile, non-spore forming, Gram-negative rod isolated from root nodules collected in 1993 from the Greek Island of Serifos. WSM1325 is produced commercially in Australia as an inoculant for a broad range of annual clovers of Mediterranean origin due to its superior attributes of saprophytic competence, nitrogen fixation and acid-tolerance. Here we describe the basic features of this organism, together with the complete genome sequence, and annotation. This is the first completed genome sequence for a microsymbiont of annual clovers. We reveal that its genome size is 7,418,122 bp encoding 7,232 protein-coding genes and 61 RNA-only encoding genes. This multipartite genome contains 6 distinct replicons; a chromosome of size 4,767,043 bp and 5 plasmids of size 828,924 bp, 660,973 bp, 516,088 bp, 350,312 bp and 294,782 bp. PMID:21304718
Comparative genomics of Lactobacillus

PubMed Central

Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

2011-01-01

Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712
Horizontal gene acquisitions contributed to genome expansion in insect-symbiotic Spiroplasma clarkii.

PubMed

Tsai, Yi-Ming; Chang, An; Kuo, Chih-Horng

2018-06-01

Genome reduction is a recurring theme of symbiont evolution. The genus Spiroplasma contains species that are mostly facultative insect symbionts. The typical genome sizes of those species within the Apis clade were estimated to be ∼1.0-1.4 Mb. Intriguingly, Spiroplasma clarkii was found to have a genome size that is > 30% larger than the median of other species within the same clade. To investigate the molecular evolution events that led to the genome expansion of this bacterium, we determined its complete genome sequence and inferred the evolutionary origin of each protein-coding gene based on the phylogenetic distribution of homologs. Among the 1,346 annotated protein-coding genes, 641 were originated from within the Apis clade while 233 were putatively acquired from outside of the clade (including 91 high-confidence candidates). Additionally, 472 were specific to S. clarkii without homologs in the current database (i.e., the origins remained unknown). The acquisition of protein-coding genes, rather than mobile genetic elements, appeared to be a major contributing factor of genome expansion. Notably, >50% of the high-confidence acquired genes are related to carbohydrate transport and metabolism, suggesting that these acquired genes contributed to the expansion of both genome size and metabolic capability. The findings of this work provided an interesting case against the general evolutionary trend observed among symbiotic bacteria and further demonstrated the flexibility of Spiroplasma genomes. For future studies, investigation on the functional integration of these acquired genes, as well as the inference of their contribution to fitness could improve our knowledge of symbiont evolution.
[Intraspecific chromosomal variability in human pathogenic fungi, especially in Histoplasma capsulatum].

PubMed

Romero-Martínez, Rafael; Canteros, Cristina; Taylor, Maria Lucia

2004-12-01

The ploidy, karyotype, and chromosome length polymorphism (CLP) of human pathogenic fungi were revised with emphasis on Histoplasma capsulatum, the causative agent of the systemic mycosis, histoplasmosis. Currently, different systems of gel electrophoresis are being used to determine fungal electrokaryotypes (EK). By renaturation kinetic and genomic reconstruction in H. capsulatum strains (G-186AS and Downs), estimated genome sizes of 23 and 32 Mb were determined for both strains, respectively. The haploid state was proposed for both strains, although aneuploidy was suggested for the Downs strain. Contour-clamped homogeneous electric field (CHEF), field inversion gel electrophoresis (FIGE), and Southern blot using different probes showed the presence of six to seven chromosomes in the Downs strain (low virulence), whereas four chromosomes were identified in the G-186B strain (high virulence). The use of these methods in the three major H. capsulatum reference strains (G-217B and Downs from the United States of America, G-186B from Panama) revealed distinct chromosome sizes, from 0.5 to 5.7 Mb, with CLP associated with chromosomes size and mobility. Recently, by CHEF, using 19 H. capsulatum isolates from Latin-America and the G-186B strain, five to seven chromosomes with 1.1 to 11.2 Mb molecular sizes were revealed, which again suggested CLP in H. capsulatum. However, to elucidate the EKs polymorphism in H. capsulatum and its relationship with the isolates phenotype more studies are needed to understand the mechanisms controlling ploidy variability.
Genome-Wide Mapping of Loci Explaining Variance in Scrotal Circumference in Nellore Cattle

PubMed Central

Utsunomiya, Yuri T.; Carmo, Adriana S.; Neves, Haroldo H. R.; Carvalheiro, Roberto; Matos, Márcia C.; Zavarez, Ludmilla B.; Ito, Pier K. R. K.; Pérez O'Brien, Ana M.; Sölkner, Johann; Porto-Neto, Laercio R.; Schenkel, Flávio S.; McEwan, John; Cole, John B.; da Silva, Marcos V. G. B.; Van Tassell, Curtis P.; Sonstegard, Tad S.; Garcia, José Fernando

2014-01-01

The reproductive performance of bulls has a high impact on the beef cattle industry. Scrotal circumference (SC) is the most recorded reproductive trait in beef herds, and is used as a major selection criterion to improve precocity and fertility. The characterization of genomic regions affecting SC can contribute to the identification of diagnostic markers for reproductive performance and uncover molecular mechanisms underlying complex aspects of bovine reproductive biology. In this paper, we report a genome-wide scan for chromosome segments explaining differences in SC, using data of 861 Nellore bulls (Bos indicus) genotyped for over 777,000 single nucleotide polymorphisms. Loci that excel from the genome background were identified on chromosomes 4, 6, 7, 10, 14, 18 and 21. The majority of these regions were previously found to be associated with reproductive and body size traits in cattle. The signal on chromosome 14 replicates the pleiotropic quantitative trait locus encompassing PLAG1 that affects male fertility in cattle and stature in several species. Based on intensive literature mining, SP4, MAGEL2, SH3RF2, PDE5A and SNAI2 are proposed as novel candidate genes for SC, as they affect growth and testicular size in other animal models. These findings contribute to linking reproductive phenotypes to gene functions, and may offer new insights on the molecular biology of male fertility. PMID:24558400
Chromosome reshuffling in birds of prey: the karyotype of the world's largest eagle (Harpy eagle, Harpia harpyja) compared to that of the chicken (Gallus gallus).

PubMed

de Oliveira, Edivaldo H C; Habermann, Felix A; Lacerda, Oneida; Sbalqueiro, Ives J; Wienberg, Johannes; Müller, Stefan

2005-11-01

Like various other diurnal birds of prey, the world's largest eagle, the Harpy (Harpia harpyja), presents an atypical bird karyotype with 2n=58 chromosomes. There is little knowledge about the dramatic changes in the genomic reorganization of these species compared to other birds. Since recently, the chicken provides a "default map" for various birds including the first genomic DNA sequence of a bird species. Obviously, the gross division of the chicken genome into relatively gene-poor macrochromosomes and predominantly gene-rich microchromosomes has been conserved for more than 150 million years in most bird species. Here, we present classical features of the Harpy eagle karyotype but also chromosomal homologies between H. harpyja and the chicken by chromosome painting and comparison to the chicken genome map. We used two different sets of painting probes: (1) chicken chromosomes were divided into three size categories: (a) macrochromosomes 1-5 and Z, (b) medium-sized chromosomes 6-10, and (c) 19 microchromosomes; (2) combinatorially labeled chicken chromosome paints 1-6 and Z. Both probe sets were visualized on H. harpyja chromosomes by multicolor fluorescence in situ hybridization (FISH). Our data show how the organization into micro- and macrochromosomes has been lost in the Harpy eagle, seemingly without any preference or constraints.
Extreme variability among mammalian V1R gene families.

PubMed

Young, Janet M; Massa, Hillary F; Hsu, Li; Trask, Barbara J

2010-01-01

We report an evolutionary analysis of the V1R gene family across 37 mammalian genomes. V1Rs comprise one of three chemosensory receptor families expressed in the vomeronasal organ, and contribute to pheromone detection. We first demonstrate that Trace Archive data can be used effectively to determine V1R family sizes and to obtain sequences of most V1R family members. Analyses of V1R sequences from trace data and genome assemblies show that species-specific expansions previously observed in only eight species were prevalent throughout mammalian evolution, resulting in "semi-private" V1R repertoires for most mammals. The largest families are found in mouse and platypus, whose V1R repertoires have been published previously, followed by mouse lemur and rabbit (approximately 215 and approximately 160 intact V1Rs, respectively). In contrast, two bat species and dolphin possess no functional V1Rs, only pseudogenes, and suffered inactivating mutations in the vomeronasal signal transduction gene Trpc2. We show that primate V1R decline happened prior to acquisition of trichromatic vision, earlier during evolution than was previously thought. We also show that it is extremely unlikely that decline of the dog V1R repertoire occurred in response to selective pressures imposed by humans during domestication. Functional repertoire sizes in each species correlate roughly with anatomical observations of vomeronasal organ size and quality; however, no single ecological correlate explains the very diverse fates of this gene family in different mammalian genomes. V1Rs provide one of the most extreme examples observed to date of massive gene duplication in some genomes, with loss of all functional genes in other species.
The mitochondrial genome of the arbuscular mycorrhizal fungus Gigaspora margarita reveals two unsuspected trans-splicing events of group I introns.

PubMed

Pelin, Adrian; Pombert, Jean-François; Salvioli, Alessandra; Bonen, Linda; Bonfante, Paola; Corradi, Nicolas

2012-05-01

• Arbuscular mycorrhizal fungi (AMF) are ubiquitous organisms that benefit ecosystems through the establishment of an association with the roots of most plants: the mycorrhizal symbiosis. Despite their ecological importance, however, these fungi have been poorly studied at the genome level. • In this study, total DNA from the AMF Gigaspora margarita was subjected to a combination of 454 and Illumina sequencing, and the resulting reads were used to assemble its mitochondrial genome de novo. This genome was annotated and compared with those of other relatives to better comprehend the evolution of the AMF lineage. • The mitochondrial genome of G. margarita is unique in many ways, exhibiting a large size (97 kbp) and elevated GC content (45%). This genome also harbors molecular events that were previously unknown to occur in fungal mitochondrial genomes, including trans-splicing of group I introns from two different genes coding for the first subunit of the cytochrome oxidase and for the small subunit of the rRNA. • This study reports the second published genome from an AMF organelle, resulting in relevant DNA sequence information from this poorly studied fungal group, and providing new insights into the frequency, origin and evolution of trans-spliced group I introns found across the mitochondrial genomes of distantly related organisms. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.
Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms

PubMed Central

Haraksingh, Rajini R.; Abyzov, Alexej; Gerstein, Mark; Urban, Alexander E.; Snyder, Michael

2011-01-01

Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications. PMID:22140474
The American cranberry mitochondrial genome reveals the presence of selenocysteine (tRNA-Sec and SECIS) insertion machinery in land plants

USDA-ARS?s Scientific Manuscript database

The American cranberry (Vaccinium macrocarpon Ait.) mitochondrial genome was assembled and reconstructed from whole genome 454 Roche GS-FLX and Illumina shotgun sequences. Compared with other Asterids, the reconstruction of the genome revealed an average size mitochondrion (459,678 nt) with comparat...
Rhipicephalus microplus strain Deutsch, whole genome shotgun sequencing project Version 2

USDA-ARS?s Scientific Manuscript database

The cattle tick, Rhipicephalus (Boophilus) microplus, has a genome over 2.4 times the size of the human genome, and with over 70% of repetitive DNA, this genome would prove very costly to sequence at today's prices and difficult to assemble and analyze. Cot filtration/selection techniques were used ...
Draft Genome Sequence of a Rare Smut Relative, Tilletiaria anomala UBC 951

DOE PAGES

Toome, Merje; Kuo, Alan; Henrissat, Bernard; ...

2014-06-12

We present the draft genome sequence of the smut fungus Tilletiaria anomala UBC 951 (Basidiomycota, Ustilaginomycotina). The sequenced genome size is 18.7 Mb, consisting of 289 scaffolds and a total of 6,810 predicted genes. This is the first genome sequence published for a fungus in the order Georgefisheriales (Exobasidiomycetes).
Heritabilities and genetic correlations in the same traits across different strata of herds created according to continuous genomic, genetic, and phenotypic descriptors.

PubMed

Yin, Tong; König, Sven

2018-03-01

The most common approach in dairy cattle to prove genotype by environment interactions is a multiple-trait model application, and considering the same traits in different environments as different traits. We enhanced such concepts by defining continuous phenotypic, genetic, and genomic herd descriptors, and applying random regression sire models. Traits of interest were test-day traits for milk yield, fat percentage, protein percentage, and somatic cell score, considering 267,393 records from 32,707 first-lactation Holstein cows. Cows were born in the years 2010 to 2013, and kept in 52 large-scale herds from 2 federal states of north-east Germany. The average number of genotyped cows per herd (45,613 single nucleotide polymorphism markers per cow) was 133.5 (range: 45 to 415 genotyped cows). Genomic herd descriptors were (1) the level of linkage disequilibrium (r 2 ) within specific chromosome segments, and (2) the average allele frequency for single nucleotide polymorphisms in close distance to a functional mutation. Genetic herd descriptors were the (1) intra-herd inbreeding coefficient, and (2) the percentage of daughters from foreign sires. Phenotypic herd descriptors were (1) herd size, and (2) the herd mean for nonreturn rate. Most correlations among herd descriptors were close to 0, indicating independence of genomic, genetic, and phenotypic characteristics. Heritabilities for milk yield increased with increasing intra-herd linkage disequilibrium, inbreeding, and herd size. Genetic correlations in same traits between adjacent levels of herd descriptors were close to 1, but declined for descriptor levels in greater distance. Genetic correlation declines were more obvious for somatic cell score, compared with test-day traits with larger heritabilities (fat percentage and protein percentage). Also, for milk yield, alterations of herd descriptor levels had an obvious effect on heritabilities and genetic correlations. By trend, multiple trait model results (based on created discrete herd classes) confirmed the random regression estimates. Identified alterations of breeding values in dependency of herd descriptors suggest utilization of specific sires for specific herd structures, offering new possibilities to improve sire selection strategies. Regarding genomic selection designs and genetic gain transfer into commercial herds, cow herds for the utilization in cow training sets should reflect the genomic, genetic, and phenotypic pattern of the broad population. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Analyses of mitochondrial amino acid sequence datasets support the proposal that specimens of Hypodontus macropi from three species of macropodid hosts represent distinct species

PubMed Central

2013-01-01

Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823
Chromosome number reduction in the sister clade of Carica papaya with concomitant genome size doubling.

PubMed

Rockinger, Alexander; Sousa, Aretuza; Carvalho, Fernanda A; Renner, Susanne S

2016-06-01

Caricaceae include six genera and 34 species, among them papaya, a model species in plant sex chromosome research. The family was held to have a conserved karyotype with 2n = 18 chromosomes, an assumption based on few counts. We examined the karyotypes and genome size of species from all genera to test for possible cytogenetic variation. We used fluorescent in situ hybridization using standard telomere, 5S, and 45S rDNA probes. New and published data were combined with a phylogeny, molecular clock dating, and C values (available for ∼50% of the species) to reconstruct genome evolution. The African genus Cylicomorpha, which is sister to the remaining Caricaceae (all neotropical), has 2n = 18, as do the species in two other genera. A Mexican clade of five species that includes papaya, however, has 2n = 18 (papaya), 2n = 16 (Horovitzia cnidoscoloides), and 2n = 14 (Jarilla caudata and J. heterophylla; third Jarilla not counted), with the phylogeny indicating that the dysploidy events occurred ∼16.6 and ∼5.5 million years ago and that Jarilla underwent genome size doubling (∼450 to 830-920 Mbp/haploid genome). Pericentromeric interstitial telomere repeats occur in both Jarilla adjacent to 5S rDNA sites, and the variability of 5S rDNA sites across all genera is high. On the basis of outgroup comparison, 2n = 18 is the ancestral number, and repeated chromosomal fusions with simultaneous genome size increase as a result of repetitive elements accumulating near centromeres characterize the papaya clade. These results have implications for ongoing genome assemblies in Caricaceae. © 2016 Botanical Society of America.
Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.

PubMed

Müller, Bárbara S F; Neves, Leandro G; de Almeida Filho, Janeo E; Resende, Márcio F R; Muñoz, Patricio R; Dos Santos, Paulo E T; Filho, Estefano Paludzyszyn; Kirst, Matias; Grattapaglia, Dario

2017-07-11

The advent of high-throughput genotyping technologies coupled to genomic prediction methods established a new paradigm to integrate genomics and breeding. We carried out whole-genome prediction and contrasted it to a genome-wide association study (GWAS) for growth traits in breeding populations of Eucalyptus benthamii (n =505) and Eucalyptus pellita (n =732). Both species are of increasing commercial interest for the development of germplasm adapted to environmental stresses. Predictive ability reached 0.16 in E. benthamii and 0.44 in E. pellita for diameter growth. Predictive abilities using either Genomic BLUP or different Bayesian methods were similar, suggesting that growth adequately fits the infinitesimal model. Genomic prediction models using ~5000-10,000 SNPs provided predictive abilities equivalent to using all 13,787 and 19,506 SNPs genotyped in the E. benthamii and E. pellita populations, respectively. No difference was detected in predictive ability when different sets of SNPs were utilized, based on position (equidistantly genome-wide, inside genes, linkage disequilibrium pruned or on single chromosomes), as long as the total number of SNPs used was above ~5000. Predictive abilities obtained by removing relatedness between training and validation sets fell near zero for E. benthamii and were halved for E. pellita. These results corroborate the current view that relatedness is the main driver of genomic prediction, although some short-range historical linkage disequilibrium (LD) was likely captured for E. pellita. A GWAS identified only one significant association for volume growth in E. pellita, illustrating the fact that while genome-wide regression is able to account for large proportions of the heritability, very little or none of it is captured into significant associations using GWAS in breeding populations of the size evaluated in this study. This study provides further experimental data supporting positive prospects of using genome-wide data to capture large proportions of trait heritability and predict growth traits in trees with accuracies equal or better than those attainable by phenotypic selection. Additionally, our results document the superiority of the whole-genome regression approach in accounting for large proportions of the heritability of complex traits such as growth in contrast to the limited value of the local GWAS approach toward breeding applications in forest trees.
The Mitochondrial Genome of Chara vulgaris: Insights into the Mitochondrial DNA Architecture of the Last Common Ancestor of Green Algae and Land PlantsW⃞

PubMed Central

Turmel, Monique; Otis, Christian; Lemieux, Claude

2003-01-01

Mitochondrial DNA (mtDNA) has undergone radical changes during the evolution of green plants, yet little is known about the dynamics of mtDNA evolution in this phylum. Land plant mtDNAs differ from the few green algal mtDNAs that have been analyzed to date by their expanded size, long spacers, and diversity of introns. We have determined the mtDNA sequence of Chara vulgaris (Charophyceae), a green alga belonging to the charophycean order (Charales) that is thought to be the most closely related alga to land plants. This 67,737-bp mtDNA sequence, displaying 68 conserved genes and 27 introns, was compared with those of three angiosperms, the bryophyte Marchantia polymorpha, the charophycean alga Chaetosphaeridium globosum (Coleochaetales), and the green alga Mesostigma viride. Despite important differences in size and intron composition, Chara mtDNA strikingly resembles Marchantia mtDNA; for instance, all except 9 of 68 conserved genes lie within blocks of colinear sequences. Overall, our genome comparisons and phylogenetic analyses provide unequivocal support for a sister-group relationship between the Charales and the land plants. Only four introns in land plant mtDNAs appear to have been inherited vertically from a charalean algar ancestor. We infer that the common ancestor of green algae and land plants harbored a tightly packed, gene-rich, and relatively intron-poor mitochondrial genome. The group II introns in this ancestral genome appear to have spread to new mtDNA sites during the evolution of bryophytes and charalean green algae, accounting for part of the intron diversity found in Chara and land plant mitochondria. PMID:12897260
Single nucleotide polymorphism and haplotype effects associated with somatic cell score in German Holstein cattle

PubMed Central

2014-01-01

Background To better understand the genetic determination of udder health, we performed a genome-wide association study (GWAS) on a population of 2354 German Holstein bulls for which daughter yield deviations (DYD) for somatic cell score (SCS) were available. For this study, we used genetic information of 44 576 informative single nucleotide polymorphisms (SNPs) and 11 725 inferred haplotype blocks. Results When accounting for the sub-structure of the analyzed population, 16 SNPs and 10 haplotypes in six genomic regions were significant at the Bonferroni threshold of P ≤ 1.14 × 10-6. The size of the identified regions ranged from 0.05 to 5.62 Mb. Genomic regions on chromosomes 5, 6, 18 and 19 coincided with known QTL affecting SCS, while additional genomic regions were found on chromosomes 13 and X. Of particular interest is the region on chromosome 6 between 85 and 88 Mb, where QTL for mastitis traits and significant SNPs for SCS in different Holstein populations coincide with our results. In all identified regions, except for the region on chromosome X, significant SNPs were present in significant haplotypes. The minor alleles of identified SNPs on chromosomes 18 and 19, and the major alleles of SNPs on chromosomes 6 and X were favorable for a lower SCS. Differences in somatic cell count (SCC) between alternative SNP alleles reached 14 000 cells/mL. Conclusions The results support the polygenic nature of the genetic determination of SCS, confirm the importance of previously reported QTL, and provide evidence for the segregation of additional QTL for SCS in Holstein cattle. The small size of the regions identified here will facilitate the search for causal genetic variations that affect gene functions. PMID:24898131
PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

PubMed

Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

2011-01-01

PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.

Effector profiles distinguish formae speciales of Fusarium oxysporum.

PubMed

van Dam, Peter; Fokkens, Like; Schmidt, Sarah M; Linmans, Jasper H J; Kistler, H Corby; Ma, Li-Jun; Rep, Martijn

2016-11-01

Formae speciales (ff.spp.) of the fungus Fusarium oxysporum are often polyphyletic within the species complex, making it impossible to identify them on the basis of conserved genes. However, sequences that determine host-specific pathogenicity may be expected to be similar between strains within the same forma specialis. Whole genome sequencing was performed on strains from five different ff.spp. (cucumerinum, niveum, melonis, radicis-cucumerinum and lycopersici). In each genome, genes for putative effectors were identified based on small size, secretion signal, and vicinity to a "miniature impala" transposable element. The candidate effector genes of all genomes were collected and the presence/absence patterns in each individual genome were clustered. Members of the same forma specialis turned out to group together, with cucurbit-infecting strains forming a supercluster separate from other ff.spp. Moreover, strains from different clonal lineages within the same forma specialis harbour identical effector gene sequences, supporting horizontal transfer of genetic material. These data offer new insight into the genetic basis of host specificity in the F. oxysporum species complex and show that (putative) effectors can be used to predict host specificity in F. oxysporum. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Breaking Lander-Waterman’s Coverage Bound

PubMed Central

Nashta-ali, Damoun; Motahari, Seyed Abolfazl; Hosseinkhalaj, Babak

2016-01-01

Lander-Waterman’s coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector’s problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman’s result and prove that by combining the sequencing and computing processes, one can re-sequence the whole genome with as low as O(G) sequenced bases in total. Our approach also dramatically reduces the required computational power for the combined process. Simulation results are performed on real genomes with different sequencing error rates. The results support our theory predicting the log G improvement on coverage bound and corresponding reduction in the total number of bases required to be sequenced. PMID:27806058
Genomic basis of the differences between cider and dessert apple varieties

PubMed Central

Leforestier, Diane; Ravon, Elisa; Muranty, Hélène; Cornille, Amandine; Lemaire, Christophe; Giraud, Tatiana; Durel, Charles-Eric; Branca, Antoine

2015-01-01

Unraveling the genomic processes at play during variety diversification is of fundamental interest for understanding evolution, but also of applied interest in crop science. It can indeed provide knowledge on the genetic bases of traits for crop improvement and germplasm diversity management. Apple is one of the most important fruit crops in temperate regions, having both great economic and cultural values. Sweet dessert apples are used for direct consumption, while bitter cider apples are used to produce cider. Several important traits are known to differentiate the two variety types, in particular fruit size, biennial versus annual fruit bearing, and bitterness, caused by a higher content in polyphenols. Here, we used an Illumina 8k SNP chip on two core collections, of 48 dessert and 48 cider apples, respectively, for identifying genomic regions responsible for the differences between cider and dessert apples. The genome-wide level of genetic differentiation between cider and dessert apples was low, although 17 candidate regions showed signatures of divergent selection, displaying either outlier FST values or significant association with phenotypic traits (bitter versus sweet fruits). These candidate regions encompassed 420 genes involved in a variety of functions and metabolic pathways, including several colocalizations with QTLs for polyphenol compounds. PMID:26240603
Genome Wide Search for Biomarkers to Diagnose Yersinia Infections.

PubMed

Kalia, Vipin Chandra; Kumar, Prasun

2015-12-01

Bacterial identification on the basis of the highly conserved 16S rRNA (rrs) gene is limited by its presence in multiple copies and a very high level of similarity among them. The need is to look for other genes with unique characteristics to be used as biomarkers. Fifty-one sequenced genomes belonging to 10 different Yersinia species were used for searching genes common to all the genomes. Out of 304 common genes, 34 genes of sizes varying from 0.11 to 4.42 kb, were selected and subjected to in silico digestion with 10 different Restriction endonucleases (RE) (4-6 base cutters). Yersinia species have 6-7 copies of rrs per genome, which are difficult to distinguish by multiple sequence alignments or their RE digestion patterns. However, certain unique combinations of other common gene sequences-carB, fadJ, gluM, gltX, ileS, malE, nusA, ribD, and rlmL and their RE digestion patterns can be used as markers for identifying 21 strains belonging to 10 Yersinia species: Y. aldovae, Y. enterocolitica, Y. frederiksenii, Y. intermedia, Y. kristensenii, Y. pestis, Y. pseudotuberculosis, Y. rohdei, Y. ruckeri, and Y. similis. This approach can be applied for rapid diagnostic applications.
Fragmented mitochondrial genomes in two suborders of parasitic lice of eutherian mammals (Anoplura and Rhynchophthirina, Insecta)

PubMed Central

Shao, Renfu; Barker, Stephen C; Li, Hu; Song, Simon; Poudel, Shreekanta; Su, Yuan

2015-01-01

Parasitic lice (order Phthiraptera) infest birds and mammals. The typical animal mitochondrial (mt) genome organization, which consists of a single chromosome with 37 genes, was found in chewing lice in the suborders Amblycera and Ischnocera. The sucking lice (suborder Anoplura) known, however, have fragmented mt genomes with 9–20 minichromosomes. We sequenced the mt genome of the elephant louse, Haematomyzus elephantis – the first species of chewing lice investigated from the suborder Rhynchophthirina. We identified 33 mt genes in the elephant louse, which were on 10 minichromosomes. Each minichromosome is 3.5–4.2 kb in size and has 2–6 genes. Phylogenetic analyses of mt genome sequences confirm that the elephant louse is more closely related to sucking lice than to the chewing lice in the Amblycera and Ischnocera. Our results indicate that mt genome fragmentation is shared by the suborders Anoplura and Rhynchophthirina. Nine of the 10 mt minichromosomes of the elephant louse differ from those of the sucking lice (Anoplura) known in gene content and gene arrangement, indicating that distinct mt karyotypes have evolved in Anoplura and Rhynchophthirina since they diverged ~92 million years ago. PMID:26617060
Contributions of Zea mays subspecies mexicana haplotypes to modern maize.

PubMed

Yang, Ning; Xu, Xi-Wen; Wang, Rui-Ru; Peng, Wen-Lei; Cai, Lichun; Song, Jia-Ming; Li, Wenqiang; Luo, Xin; Niu, Luyao; Wang, Yuebin; Jin, Min; Chen, Lu; Luo, Jingyun; Deng, Min; Wang, Long; Pan, Qingchun; Liu, Feng; Jackson, David; Yang, Xiaohong; Chen, Ling-Ling; Yan, Jianbing

2017-11-30

Maize was domesticated from lowland teosinte (Zea mays ssp. parviglumis), but the contribution of highland teosinte (Zea mays ssp. mexicana, hereafter mexicana) to modern maize is not clear. Here, two genomes for Mo17 (a modern maize inbred) and mexicana are assembled using a meta-assembly strategy after sequencing of 10 lines derived from a maize-teosinte cross. Comparative analyses reveal a high level of diversity between Mo17, B73, and mexicana, including three Mb-size structural rearrangements. The maize spontaneous mutation rate is estimated to be 2.17 × 10 -8 ~3.87 × 10 -8 per site per generation with a nonrandom distribution across the genome. A higher deleterious mutation rate is observed in the pericentromeric regions, and might be caused by differences in recombination frequency. Over 10% of the maize genome shows evidence of introgression from the mexicana genome, suggesting that mexicana contributed to maize adaptation and improvement. Our data offer a rich resource for constructing the pan-genome of Zea mays and genetic improvement of modern maize varieties.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Hunting for genes for hypertension: the Millennium Genome Project for Hypertension.

PubMed

Tabara, Yasuharu; Kohara, Katsuhiko; Miki, Tetsuro

2012-06-01

The Millennium Genome Project for Hypertension was started in 2000 to identify genetic variants conferring susceptibility to hypertension, with the aim of furthering the understanding of the pathogenesis of this condition and realizing genome-based personalized medical care. Two different approaches were launched, genome-wide association analysis using single-nucleotide polymorphisms (SNPs) and microsatellite markers, and systematic candidate gene analysis, under the hypothesis that common variants have an important role in the etiology of common diseases. These multilateral approaches identified ATP2B1 as a gene responsible for hypertension in not only Japanese but also Caucasians. The high blood pressure susceptibility conferred by certain alleles of ATP2B1 has been widely replicated in various populations. Ex vivo mRNA expression analysis in umbilical artery smooth muscle cells indicated that reduced expression of this gene associated with the risk allele may be an underlying mechanism relating the ATP2B1 variant to hypertension. However, the effect size of a SNP was too small to clarify the entire picture of the genetic basis of hypertension. Further, dense genome analysis with accurate phenotype data may be required.
ProGeRF: Proteome and Genome Repeat Finder Utilizing a Fast Parallel Hash Function

PubMed Central

Moraes, Walas Jhony Lopes; Rodrigues, Thiago de Souza; Bartholomeu, Daniella Castanheira

2015-01-01

Repetitive element sequences are adjacent, repeating patterns, also called motifs, and can be of different lengths; repetitions can involve their exact or approximate copies. They have been widely used as molecular markers in population biology. Given the sizes of sequenced genomes, various bioinformatics tools have been developed for the extraction of repetitive elements from DNA sequences. However, currently available tools do not provide options for identifying repetitive elements in the genome or proteome, displaying a user-friendly web interface, and performing-exhaustive searches. ProGeRF is a web site for extracting repetitive regions from genome and proteome sequences. It was designed to be efficient, fast, and accurate and primarily user-friendly web tool allowing many ways to view and analyse the results. ProGeRF (Proteome and Genome Repeat Finder) is freely available as a stand-alone program, from which the users can download the source code, and as a web tool. It was developed using the hash table approach to extract perfect and imperfect repetitive regions in a (multi)FASTA file, while allowing a linear time complexity. PMID:25811026
Genome analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of the etiologic agent of tuberculosis

PubMed Central

Supply, Philip; Marceau, Michael; Mangenot, Sophie; Roche, David; Rouanet, Carine; Khanna, Varun; Majlessi, Laleh; Criscuolo, Alexis; Tap, Julien; Pawlik, Alexandre; Fiette, Laurence; Orgeur, Mickael; Fabre, Michel; Parmentier, Cécile; Frigui, Wafa; Simeone, Roxane; Boritsch, Eva C.; Debrie, Anne-Sophie; Willery, Eve; Walker, Danielle; Quail, Michael A.; Ma, Laurence; Bouchier, Christiane; Salvignol, Grégory; Sayes, Fadel; Cascioferro, Alessandro; Seemann, Torsten; Barbe, Valérie; Locht, Camille; Gutierrez, Maria-Cristina; Leclerc, Claude; Bentley, Stephen; Stinear, Timothy P.; Brisse, Sylvain; Médigue, Claudine; Parkhill, Julian; Cruveiller, Stéphane; Brosch, Roland

2013-01-01

Global spread and genetic monomorphism are hallmarks of Mycobacterium tuberculosis, the agent of human tuberculosis. In contrast, Mycobacterium canettii, and related tubercle bacilli that also cause human tuberculosis and exhibit unusual smooth colony morphology, are restricted to East-Africa. Here, we sequenced and analyzed the genomes of five representative strains of smooth tubercle bacilli (STB) using Sanger (4-5x coverage), 454/Roche (13-18x coverage) and/or Illumina DNA sequencing (45-105x coverage). We show that STB are highly recombinogenic and evolutionary early-branching, with larger genome sizes, 25-fold more SNPs, fewer molecular scars and distinct CRISPR-Cas systems relative to M. tuberculosis. Despite the differences, all tuberculosis-causing mycobacteria share a highly conserved core genome. Mouse-infection experiments revealed that STB are less persistent and virulent than M. tuberculosis. We conclude that M. tuberculosis emerged from an ancestral, STB-like pool of mycobacteria by gain of persistence and virulence mechanisms and we provide genome-wide insights into the molecular events involved. PMID:23291586
Characterization of complete genome sequence of the spring viremia of carp virus isolated from common carp (Cyprinus carpio) in China.

PubMed

Teng, Y; Liu, H; Lv, J Q; Fan, W H; Zhang, Q Y; Qin, Q W

2007-01-01

The complete genome of spring viraemia of carp virus (SVCV) strain A-1 isolated from cultured common carp (Cyprinus carpio) in China was sequenced and characterized. Reverse transcription-polymerase chain reaction (RT-PCR) derived clones were constructed and the DNA was sequenced. It showed that the entire genome of SVCV A-1 consists of 11,100 nucleotide base pairs, the predicted size of the viral RNA of rhabdoviruses. However, the additional insertions in bp 4633-4676 and bp 4684-4724 of SVCV A-1 were different from the other two published SVCV complete genomes. Five open reading frames (ORFs) of SVCV A-1 were identified and further confirmed by RT-PCR and DNA sequencing of their respective RT-PCR products. The 5 structural proteins encoded by the viral RNA were ordered 3'-N-P-M-G-L-5'. This is the first report of a complete genome sequence of SVCV isolated from cultured carp in China. Phylogenetic analysis indicates that SVCV A-1 is closely related to the members of the genus Vesiculovirus, family Rhabdoviridae.
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition

PubMed Central

Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Bazire, Pascal; Beluche, Odette; Bertrand, Laurie; Besnard-Gonnet, Marielle; Bordelais, Isabelle; Boutard, Magali; Dubois, Maria; Dumont, Corinne; Ettedgui, Evelyne; Fernandez, Patricia; Garcia, Espérance; Aiach, Nathalie Giordanenco; Guerin, Thomas; Hamon, Chadia; Brun, Elodie; Lebled, Sandrine; Lenoble, Patricia; Louesse, Claudine; Mahieu, Eric; Mairey, Barbara; Martins, Nathalie; Megret, Catherine; Milani, Claire; Muanga, Jacqueline; Orvain, Céline; Payen, Emilie; Perroud, Peggy; Petit, Emmanuelle; Robert, Dominique; Ronsin, Murielle; Vacherie, Benoit; Acinas, Silvia G.; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M.; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E.; Stepanauskas, Ramunas; Sullivan, Matthew B.; Brum, Jennifer R.; Duhaime, Melissa B.; Poulos, Bonnie T.; Hurwitz, Bonnie L.; Acinas, Silvia G.; Bork, Peer; Boss, Emmanuel; Bowler, Chris; De Vargas, Colomban; Follows, Michael; Gorsky, Gabriel; Grimsley, Nigel; Hingamp, Pascal; Iudicone, Daniele; Jaillon, Olivier; Kandels-Lewis, Stefanie; Karp-Boss, Lee; Karsenti, Eric; Not, Fabrice; Ogata, Hiroyuki; Pesant, Stéphane; Raes, Jeroen; Sardet, Christian; Sieracki, Michael E.; Speich, Sabrina; Stemmann, Lars; Sullivan, Matthew B.; Sunagawa, Shinichi; Wincker, Patrick; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

2017-01-01

A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009–2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world’s planktonic ecosystems. PMID:28763055
Viral to metazoan marine plankton nucleotide sequences from the Tara Oceans expedition.

PubMed

Alberti, Adriana; Poulain, Julie; Engelen, Stefan; Labadie, Karine; Romac, Sarah; Ferrera, Isabel; Albini, Guillaume; Aury, Jean-Marc; Belser, Caroline; Bertrand, Alexis; Cruaud, Corinne; Da Silva, Corinne; Dossat, Carole; Gavory, Frédérick; Gas, Shahinaz; Guy, Julie; Haquelle, Maud; Jacoby, E'krame; Jaillon, Olivier; Lemainque, Arnaud; Pelletier, Eric; Samson, Gaëlle; Wessner, Mark; Acinas, Silvia G; Royo-Llonch, Marta; Cornejo-Castillo, Francisco M; Logares, Ramiro; Fernández-Gómez, Beatriz; Bowler, Chris; Cochrane, Guy; Amid, Clara; Hoopen, Petra Ten; De Vargas, Colomban; Grimsley, Nigel; Desgranges, Elodie; Kandels-Lewis, Stefanie; Ogata, Hiroyuki; Poulton, Nicole; Sieracki, Michael E; Stepanauskas, Ramunas; Sullivan, Matthew B; Brum, Jennifer R; Duhaime, Melissa B; Poulos, Bonnie T; Hurwitz, Bonnie L; Pesant, Stéphane; Karsenti, Eric; Wincker, Patrick

2017-08-01

A unique collection of oceanic samples was gathered by the Tara Oceans expeditions (2009-2013), targeting plankton organisms ranging from viruses to metazoans, and providing rich environmental context measurements. Thanks to recent advances in the field of genomics, extensive sequencing has been performed for a deep genomic analysis of this huge collection of samples. A strategy based on different approaches, such as metabarcoding, metagenomics, single-cell genomics and metatranscriptomics, has been chosen for analysis of size-fractionated plankton communities. Here, we provide detailed procedures applied for genomic data generation, from nucleic acids extraction to sequence production, and we describe registries of genomics datasets available at the European Nucleotide Archive (ENA, www.ebi.ac.uk/ena). The association of these metadata to the experimental procedures applied for their generation will help the scientific community to access these data and facilitate their analysis. This paper complements other efforts to provide a full description of experiments and open science resources generated from the Tara Oceans project, further extending their value for the study of the world's planktonic ecosystems.
On the molecular mechanism of GC content variation among eubacterial genomes

PubMed Central

2012-01-01

Background As a key parameter of genome sequence variation, the GC content of bacterial genomes has been investigated for over half a century, and many hypotheses have been put forward to explain this GC content variation and its relationship to other fundamental processes. Previously, we classified eubacteria into dnaE-based groups (the dimeric combination of DNA polymerase III alpha subunits), according to a hypothesis where GC content variation is essentially governed by genome replication and DNA repair mechanisms. Further investigation led to the discovery that two major mutator genes, polC and dnaE2, may be responsible for genomic GC content variation. Consequently, an in-depth analysis was conducted to evaluate various potential intrinsic and extrinsic factors in association with GC content variation among eubacterial genomes. Results Mutator genes, especially those with dominant effects on the mutation spectra, are biased towards either GC or AT richness, and they alter genomic GC content in the two opposite directions. Increased bacterial genome size (or gene number) appears to rely on increased genomic GC content; however, it is unclear whether the changes are directly related to certain environmental pressures. Certain environmental and bacteriological features are related to GC content variation, but their trends are more obvious when analyzed under the dnaE-based grouping scheme. Most terrestrial, plant-associated, and nitrogen-fixing bacteria are members of the dnaE1|dnaE2 group, whereas most pathogenic or symbiotic bacteria in insects, and those dwelling in aquatic environments, are largely members of the dnaE1|polV group. Conclusion Our studies provide several lines of evidence indicating that DNA polymerase III α subunit and its isoforms participating in either replication (such as polC) or SOS mutagenesis/translesion synthesis (such as dnaE2), play dominant roles in determining GC variability. Other environmental or bacteriological factors, such as genome size, temperature, oxygen requirement, and habitat, either play subsidiary roles or rely indirectly on different mutator genes to fine-tune the GC content. These results provide a comprehensive insight into mechanisms of GC content variation and the robustness of eubacterial genomes in adapting their ever-changing environments over billions of years. Reviewers This paper was reviewed by Nicolas Galtier, Adam Eyre-Walker, and Eugene Koonin. PMID:22230424
Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds

DOE PAGES

Jiao, Jian-Yu; Carro, Lorena; Liu, Lan; ...

2017-02-03

Jiangella gansuensis strain YIM 002 T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the Genomic Encyclopedia of Bacteria and Archaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strainmore » YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin.« less
Complete genome sequence of Jiangella gansuensis strain YIM 002T (DSM 44835T), the type species of the genus Jiangella and source of new antibiotic compounds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jiao, Jian-Yu; Carro, Lorena; Liu, Lan

Jiangella gansuensis strain YIM 002 T is the type strain of the type species of the genus Jiangella, which is at the present time composed of five species, and was isolated from desert soil sample in Gansu Province (China). The five strains of this genus are clustered in a monophyletic group when closer actinobacterial genera are used to infer a 16S rRNA gene sequence phylogeny. The study of this genome is part of the Genomic Encyclopedia of Bacteria and Archaea project, and here we describe the complete genome sequence and annotation of this taxon. The genome of J. gansuensis strainmore » YIM 002T contains a single scaffold of size 5,585,780 bp, which involves 149 pseudogenes, 4905 protein-coding genes and 50 RNA genes, including 2520 hypothetical proteins and 4 rRNA genes. From the investigation of genome sizes of Jiangella species, J. gansuensis shows a smaller size, which indicates this strain might have discarded too much genetic information to adapt to desert environment. Seven new compounds from this bacterium have recently been described; however, its potential should be higher, as secondary metabolite gene cluster analysis predicted 60 gene clusters, including the potential to produce the pristinamycin.« less
Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility

PubMed Central

Chain, Patrick S. G.; Denef, Vincent J.; Konstantinidis, Konstantinos T.; Vergez, Lisa M.; Agulló, Loreine; Reyes, Valeria Latorre; Hauser, Loren; Córdova, Macarena; Gómez, Luis; González, Myriam; Land, Miriam; Lao, Victoria; Larimer, Frank; LiPuma, John J.; Mahenthiralingam, Eshwar; Malfatti, Stephanie A.; Marx, Christopher J.; Parnell, J. Jacob; Ramette, Alban; Richardson, Paul; Seeger, Michael; Smith, Daryl; Spilker, Theodore; Sul, Woo Jun; Tsoi, Tamara V.; Ulrich, Luke E.; Zhulin, Igor B.; Tiedje, James M.

2006-01-01

Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome size varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven “central aromatic” and twenty “peripheral aromatic” pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes. PMID:17030797
Exploring the Limits for Reduction of Plastid Genomes: A Case Study of the Mycoheterotrophic Orchids Epipogium aphyllum and Epipogium roseum

PubMed Central

Schelkunov, Mikhail I.; Shtratnikova, Viktoria Yu; Nuraliev, Maxim S.; Selosse, Marc-Andre; Penin, Aleksey A.; Logacheva, Maria D.

2015-01-01

The question on the patterns and limits of reduction of plastid genomes in nonphotosynthetic plants and the reasons of their conservation is one of the intriguing topics in plant genome evolution. Here, we report sequencing and analysis of plastid genome in nonphotosynthetic orchids Epipogium aphyllum and Epipogium roseum, which, with sizes of 31 and 19 kbp, respectively, represent the smallest plastid genomes characterized by now. Besides drastic reduction, which is expected, we found several unusual features of these “minimal” plastomes: Multiple rearrangements, highly biased nucleotide composition, and unprecedentedly high substitution rate. Only 27 and 29 genes remained intact in the plastomes of E. aphyllum and E. roseum—those encoding ribosomal components, transfer RNAs, and three additional housekeeping genes (infA, clpP, and accD). We found no signs of relaxed selection acting on these genes. We hypothesize that the main reason for retention of plastid genomes in Epipogium is the necessity to translate messenger RNAs (mRNAs) of accD and/or clpP proteins which are essential for cell metabolism. However, these genes are absent in plastomes of several plant species; their absence is compensated by the presence of a functional copy arisen by gene transfer from plastid to the nuclear genome. This suggests that there is no single set of plastid-encoded essential genes, but rather different sets for different species and that the retention of a gene in the plastome depends on the interaction between the nucleus and plastids. PMID:25635040
Genome sequencing and comparative genomics of enterohemorrhagic Escherichia coli O145:H25 and O145:H28 reveal distinct evolutionary paths and marked variations in traits associated with virulence & colonization.

PubMed

Lorenz, Sandra C; Gonzalez-Escalona, Narjol; Kotewicz, Michael L; Fischer, Markus; Kase, Julie A

2017-08-22

Enterohemorrhagic Escherichia coli (EHEC) O145 are among the top non-O157 serogroups associated with severe human disease worldwide. Two serotypes, O145:H25 and O145:H28 have been isolated from human patients but little information is available regarding the virulence repertoire, origin and evolutionary relatedness of O145:H25. Hence, we sequenced the complete genome of two O145:H25 strains associated with hemolytic uremic syndrome (HUS) and compared the genomes with those of previously sequenced O145:H28 and other EHEC strains. The genomes of the two O145:H25 strains were 5.3 Mbp in size; slightly smaller than those of O145:H28 and other EHEC strains. Both strains contained three nearly identical plasmids and several prophages and integrative elements, many of which differed significantly in size, gene content and organization as compared to those present in O145:H28 and other EHECs. Furthermore, notable variations were observed in several fimbrial gene cluster and intimin types possessed by O145:H25 and O145:H28 indicating potential adaptation to distinct areas of host colonization. Comparative genomics further revealed that O145:H25 are genetically more similar to other non-O157 EHEC strains than to O145:H28. Phylogenetic analysis accompanied by comparative genomics revealed that O145:H25 and O145:H28 evolved from two separate clonal lineages and that horizontal gene transfer and gene loss played a major role in the divergence of these EHEC serotypes. The data provide further evidence that ruminants might be a possible reservoir for O145:H25 but that they might be impaired in their ability to establish a persistent colonization as compared to other EHEC strains.
Genotyping by sequencing for genomic prediction in a soybean breeding population.

PubMed

Jarquín, Diego; Kocak, Kyle; Posadas, Luis; Hyma, Katie; Jedlicka, Joseph; Graef, George; Lorenz, Aaron

2014-08-29

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations. Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller. Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.