A Solution to the C-Value Paradox and the Function of Junk DNA: The Genome Balance Hypothesis.
Freeling, Michael; Xu, Jie; Woodhouse, Margaret; Lisch, Damon
2015-06-01
The Genome Balance Hypothesis originated from a recent study that provided a mechanism for the phenomenon of genome dominance in ancient polyploids: unique 24nt RNA coverage near genes is greater in genes on the recessive subgenome irrespective of differences in gene expression. 24nt RNAs target transposons. Transposon position effects are now hypothesized to balance the expression of networked genes and provide spring-like tension between pericentromeric heterochromatin and microtubules. The balance (coordination) of gene expression and centromere movement is under selection. Our hypothesis states that this balance can be maintained by many or few transposons about equally well. We explain known balanced distributions of junk DNA within genomes and between subgenomes in allopolyploids (and our hypothesis passes "the onion test" for any so-called solution to the C-value paradox). Importantly, when the allotetraploid maize chromosomes delete redundant genes, their nearby transposons are also lost; this result is explained if transposons near genes function. The Genome Balance Hypothesis is hypothetical because the position effect mechanisms implicated are not proved to apply to all junk DNA, and the continuous nature of the centromeric and gene position effects have not yet been studied as a single phenomenon. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting
2013-01-01
We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.
Eggs, embryos and the evolution of imprinting: insights from the platypus genome.
Renfree, Marilyn B; Papenfuss, Anthony T; Shaw, Geoff; Pask, Andrew J
2009-01-01
Genomic imprinting is widespread in eutherian and marsupial mammals. Although there have been many hypotheses to explain why genomic imprinting evolved in mammals, few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large-scale genomic resources from all extant classes. The recent release of the platypus genome sequence has provided the first opportunity to make comparisons between prototherian (monotreme, which show no signs of imprinting) and therian (marsupial and eutherian, which have imprinting) mammals. We compared the distribution of repeat elements known to attract epigenetic silencing across the genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long-terminal repeats and DNA elements, in therian imprinted genes and gene clusters therefore appears to be coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. Comparative platypus genome analyses of orthologous imprinted regions have provided strong support for the host defence hypothesis to explain the origin of imprinting.
Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Smith, G D; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A
2016-01-01
Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case–control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence. PMID:26239293
Spain, S L; Pedroso, I; Kadeva, N; Miller, M B; Iacono, W G; McGue, M; Stergiakouli, E; Davey Smith, G; Putallaz, M; Lubinski, D; Meaburn, E L; Plomin, R; Simpson, M A
2016-08-01
Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case-control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence.
Analysis of the platypus genome suggests a transposon origin for mammalian imprinting.
Pask, Andrew J; Papenfuss, Anthony T; Ager, Eleanor I; McColl, Kaighin A; Speed, Terence P; Renfree, Marilyn B
2009-01-01
Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. Many hypotheses have been advanced to explain why genomic imprinting evolved in mammals, but few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large scale genomic resources between all extant classes. The recent release of the platypus genome has provided the first opportunity to perform comparisons between prototherian (monotreme; which appear to lack imprinting) and therian (marsupial and eutherian; which have imprinting) mammals. We compared the distribution of repeat elements known to attract epigenetic silencing across the entire genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. There is a significant accumulation of certain repeat elements within imprinted regions of therian mammals compared to the platypus. Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long terminal repeats and DNA elements, in therian imprinted genes and gene clusters is coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. These data provide strong support for the host defence hypothesis.
Analysis of the platypus genome suggests a transposon origin for mammalian imprinting
Pask, Andrew J; Papenfuss, Anthony T; Ager, Eleanor I; McColl, Kaighin A; Speed, Terence P; Renfree, Marilyn B
2009-01-01
Background Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. Many hypotheses have been advanced to explain why genomic imprinting evolved in mammals, but few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large scale genomic resources between all extant classes. The recent release of the platypus genome has provided the first opportunity to perform comparisons between prototherian (monotreme; which appear to lack imprinting) and therian (marsupial and eutherian; which have imprinting) mammals. Results We compared the distribution of repeat elements known to attract epigenetic silencing across the entire genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. There is a significant accumulation of certain repeat elements within imprinted regions of therian mammals compared to the platypus. Conclusions Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long terminal repeats and DNA elements, in therian imprinted genes and gene clusters is coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. These data provide strong support for the host defence hypothesis. PMID:19121219
DOE Office of Scientific and Technical Information (OSTI.GOV)
Probst, Alexander J.; Ladd, Bethany; Jarett, Jessica K.
An enormous diversity of previously unknown bacteria and archaea has been discovered recently, yet their functional capacities and distributions in the terrestrial subsurface remain uncertain. Here, we continually sampled a CO 2-driven geyser (Colorado Plateau, Utah, USA) over its 5-day eruption cycle to test the hypothesis that stratified, sandstone-hosted aquifers sampled over three phases of the eruption cycle have microbial communities that differ both in membership and function. Genome-resolved metagenomics, single-cell genomics and geochemical analyses confirmed this hypothesis and linked microorganisms to groundwater compositions from different depths. Autotrophic Candidatus “Altiarchaeum sp.” and phylogenetically deep-branching nanoarchaea dominate the deepest groundwater. Amore » nanoarchaeon with limited metabolic capacity is inferred to be a potential symbiont of the Ca. “Altiarchaeum”. Candidate Phyla Radiation bacteria are also present in the deepest groundwater and they are relatively abundant in water from intermediate depths. During the recovery phase of the geyser, microaerophilic Fe- and S-oxidizers have high in situ genome replication rates. Autotrophic Sulfurimonas sustained by aerobic sulfide oxidation and with the capacity for N 2 fixation dominate the shallow aquifer. Overall, 104 different phylum-level lineages are present in water from these subsurface environments, with uncultivated archaea and bacteria partitioned to the deeper subsurface.« less
Probst, Alexander J.; Ladd, Bethany; Jarett, Jessica K.; ...
2018-01-29
An enormous diversity of previously unknown bacteria and archaea has been discovered recently, yet their functional capacities and distributions in the terrestrial subsurface remain uncertain. Here, we continually sampled a CO 2-driven geyser (Colorado Plateau, Utah, USA) over its 5-day eruption cycle to test the hypothesis that stratified, sandstone-hosted aquifers sampled over three phases of the eruption cycle have microbial communities that differ both in membership and function. Genome-resolved metagenomics, single-cell genomics and geochemical analyses confirmed this hypothesis and linked microorganisms to groundwater compositions from different depths. Autotrophic Candidatus “Altiarchaeum sp.” and phylogenetically deep-branching nanoarchaea dominate the deepest groundwater. Amore » nanoarchaeon with limited metabolic capacity is inferred to be a potential symbiont of the Ca. “Altiarchaeum”. Candidate Phyla Radiation bacteria are also present in the deepest groundwater and they are relatively abundant in water from intermediate depths. During the recovery phase of the geyser, microaerophilic Fe- and S-oxidizers have high in situ genome replication rates. Autotrophic Sulfurimonas sustained by aerobic sulfide oxidation and with the capacity for N 2 fixation dominate the shallow aquifer. Overall, 104 different phylum-level lineages are present in water from these subsurface environments, with uncultivated archaea and bacteria partitioned to the deeper subsurface.« less
The Global Phylogeography of Lyssaviruses - Challenging the 'Out of Africa' Hypothesis
Fooks, Anthony R.; Marston, Denise A.; Garcia-R, Juan C.
2016-01-01
Rabies virus kills tens of thousands of people globally each year, especially in resource-limited countries. Yet, there are genetically- and antigenically-related lyssaviruses, all capable of causing the disease rabies, circulating globally among bats without causing conspicuous disease outbreaks. The species richness and greater genetic diversity of African lyssaviruses, along with the lack of antibody cross-reactivity among them, has led to the hypothesis that Africa is the origin of lyssaviruses. This hypothesis was tested using a probabilistic phylogeographical approach. The nucleoprotein gene sequences from 153 representatives of 16 lyssavirus species, collected between 1956 and 2015, were used to develop a phylogenetic tree which incorporated relevant geographic and temporal data relating to the viruses. In addition, complete genome sequences from all 16 (putative) species were analysed. The most probable ancestral distribution for the internal nodes was inferred using three different approaches and was confirmed by analysis of complete genomes. These results support a Palearctic origin for lyssaviruses (posterior probability = 0.85), challenging the ‘out of Africa’ hypothesis, and suggest three independent transmission events to the Afrotropical region, representing the three phylogroups that form the three major lyssavirus clades. PMID:28036390
Zheng, Chunfang; Santos Muñoz, Daniella; Albert, Victor A; Sankoff, David
2015-01-01
Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. With time, the distribution broadens and loses volume due to variable decay of duplicate gene similarity and to the process of duplicate gene loss. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The goal of this paper is to advance statistical methods of identifying, or at least counting, the WGD events in the lineage of a given genome. For a set of 15 angiosperm genomes, we analyze all 15 × 14 = 210 ordered pairs of target genome versus reference genome, using SynMap to find syntenic blocks. We consider all sets of B ≥ 2 syntenic blocks in the target genome that overlap in the reference genome as evidence of WGD activity in the target, whether it be one event or several. We hypothesize that in fitting an exponential function to the tail of the empirical distribution f (B) of block multiplicities, the size of the exponent will reflect the amount of WGD in the history of the target genome. By amalgamating the results from all reference genomes, a range of values of SynMap parameters, and alternative cutoff points for the tail, we find a clear pattern whereby multiple-WGD core eudicots have the smallest (negative) exponents, followed by core eudicots with only the single "γ" triplication in their history, followed by a non-core eudicot with a single WGD, followed by the monocots, with a basal angiosperm, the WGD-free Amborella having the largest exponent. The hypothesis that the exponent of the fit to the tail of the multiplicity distribution is a signature of the amount of WGD is verified, but there is also a clear complicating factor in the monocot clade, where a history of multiple WGD is not reflected in a small exponent.
de Sotero-Caio, Cibele Gomes; Cabral-de-Mello, Diogo Cavalcanti; Calixto, Merilane da Silva; Valente, Guilherme Targino; Martins, Cesar; Loreto, Vilma; de Souza, Maria José; Santos, Neide
2017-10-01
Despite their ubiquitous incidence, little is known about the chromosomal distribution of long interspersed elements (LINEs) in mammalian genomes. Phyllostomid bats, characterized by lineages with distinct trends of chromosomal evolution coupled with remarkable ecological and taxonomic diversity, represent good models to understand how these repetitive sequences contribute to the evolution of genome architecture and its link to lineage diversification. To test the hypothesis that LINE-1 sequences were important modifiers of bat genome architecture, we characterized the distribution of LINE-1-derived sequences on genomes of 13 phyllostomid species within a phylogenetic framework. We found massive accumulation of LINE-1 elements in the centromeres of most species: a rare phenomenon on mammalian genomes. We hypothesize that expansion of these elements has occurred early in the radiation of phyllostomids and recurred episodically. LINE-1 expansions on centromeric heterochromatin probably spurred chromosomal change before the radiation of phyllostomids into the extant 11 subfamilies and contributed to the high degree of karyotypic variation observed among different lineages. Understanding centromere architecture in a variety of taxa promises to explain how lineage-specific changes on centromere structure can contribute to karyotypic diversity while not disrupting functional constraints for proper cell division.
Independent test assessment using the extreme value distribution theory.
Almeida, Marcio; Blondell, Lucy; Peralta, Juan M; Kent, Jack W; Jun, Goo; Teslovich, Tanya M; Fuchsberger, Christian; Wood, Andrew R; Manning, Alisa K; Frayling, Timothy M; Cingolani, Pablo E; Sladek, Robert; Dyer, Thomas D; Abecasis, Goncalo; Duggirala, Ravindranath; Blangero, John
2016-01-01
The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14 , associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4 , encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.
Abundance and distribution of the highly iterated palindrome 1 (HIP1) among prokaryotes
Moya, Andrés
2011-01-01
We have studied the abundance and phylogenetic distribution of the Highly Iterated Palindrome 1 (HIP1) among sequenced prokaryotic genomes. We show that an overrepresentation of HIP1 is exclusive of some lineages of cyanobacteria, and that this abundance was gained only once during evolution and was subsequently lost in the lineage leading to marine pico-cyanobacteria. We show that among cyanobacterial protein sequences with annotated Pfam domains, only OpcA (glucose 6-phosphate dehydrogenase assembly protein) has a phylogenetic distribution fully matching HIP1 abundance, suggesting a functional relationship; we also show that DAM methylase (an enzyme that has the four central nucleotides of HIP1 as is site of action) is present in all cyanobacterial genomes (independently of their HIP1 content) with the exception of marine pico-cyanobacteria whom might have lost this enzyme during the process of genome reduction. Our analyses also show that in some prokaryotic lineages (particularly in those species with large genomes), HIP1 is unevenly distributed between coding and non-coding DNA (being more common in coding regions; with the exception of Cyanobacteria Yellowstone B' and Synechococcus elongates where the reverse pattern is true). Finally, we explore the hypothesis that the HIP1 can be used as a molecular “water-mark” to identify horizontally transferred genes from cyanobacteria to other species. PMID:22312590
Abundance and distribution of the highly iterated palindrome 1 (HIP1) among prokaryotes.
Delaye, Luis; Moya, Andrés
2011-09-01
We have studied the abundance and phylogenetic distribution of the Highly Iterated Palindrome 1 (HIP1) among sequenced prokaryotic genomes. We show that an overrepresentation of HIP1 is exclusive of some lineages of cyanobacteria, and that this abundance was gained only once during evolution and was subsequently lost in the lineage leading to marine pico-cyanobacteria. We show that among cyanobacterial protein sequences with annotated Pfam domains, only OpcA (glucose 6-phosphate dehydrogenase assembly protein) has a phylogenetic distribution fully matching HIP1 abundance, suggesting a functional relationship; we also show that DAM methylase (an enzyme that has the four central nucleotides of HIP1 as is site of action) is present in all cyanobacterial genomes (independently of their HIP1 content) with the exception of marine pico-cyanobacteria whom might have lost this enzyme during the process of genome reduction. Our analyses also show that in some prokaryotic lineages (particularly in those species with large genomes), HIP1 is unevenly distributed between coding and non-coding DNA (being more common in coding regions; with the exception of Cyanobacteria Yellowstone B' and Synechococcus elongates where the reverse pattern is true). Finally, we explore the hypothesis that the HIP1 can be used as a molecular "water-mark" to identify horizontally transferred genes from cyanobacteria to other species.
LINE-1 retrotransposons: from 'parasite' sequences to functional elements.
Paço, Ana; Adega, Filomena; Chaves, Raquel
2015-02-01
Long interspersed nuclear elements-1 (LINE-1) are the most abundant and active retrotransposons in the mammalian genomes. Traditionally, the occurrence of LINE-1 sequences in the genome of mammals has been explained by the selfish DNA hypothesis. Nevertheless, recently, it has also been argued that these sequences could play important roles in these genomes, as in the regulation of gene expression, genome modelling and X-chromosome inactivation. The non-random chromosomal distribution is a striking feature of these retroelements that somehow reflects its functionality. In the present study, we have isolated and analysed a fraction of the open reading frame 2 (ORF2) LINE-1 sequence from three rodent species, Cricetus cricetus, Peromyscus eremicus and Praomys tullbergi. Physical mapping of the isolated sequences revealed an interspersed longitudinal AT pattern of distribution along all the chromosomes of the complement in the three genomes. A detailed analysis shows that these sequences are preferentially located in the euchromatic regions, although some signals could be detected in the heterochromatin. In addition, a coincidence between the location of imprinted gene regions (as Xist and Tsix gene regions) and the LINE-1 retroelements was also observed. According to these results, we propose an involvement of LINE-1 sequences in different genomic events as gene imprinting, X-chromosome inactivation and evolution of repetitive sequences located at the heterochromatic regions (e.g. satellite DNA sequences) of the rodents' genomes analysed.
Savary, Romain; Masclaux, Frédéric G; Wyss, Tania; Droh, Germain; Cruz Corella, Joaquim; Machado, Ana Paula; Morton, Joseph B; Sanders, Ian R
2018-01-01
Arbuscular mycorrhizal fungi (AMF; phylum Gomeromycota) associate with plants forming one of the most successful microbe-plant associations. The fungi promote plant diversity and have a potentially important role in global agriculture. Plant growth depends on both inter- and intra-specific variation in AMF. It was recently reported that an unusually large number of AMF taxa have an intercontinental distribution, suggesting long-distance gene flow for many AMF species, facilitated by either long-distance natural dispersal mechanisms or human-assisted dispersal. However, the intercontinental distribution of AMF species has been questioned because the use of very low-resolution markers may be unsuitable to detect genetic differences among geographically separated AMF, as seen with some other fungi. This has been untestable because of the lack of population genomic data, with high resolution, for any AMF taxa. Here we use phylogenetics and population genomics to test for intra-specific variation in Rhizophagus irregularis, an AMF species for which genome sequence information already exists. We used ddRAD sequencing to obtain thousands of markers distributed across the genomes of 81 R. irregularis isolates and related species. Based on 6 888 variable positions, we observed significant genetic divergence into four main genetic groups within R. irregularis, highlighting that previous studies have not captured underlying genetic variation. Despite considerable genetic divergence, surprisingly, the variation could not be explained by geographical origin, thus also supporting the hypothesis for at least one AMF species of widely dispersed AMF genotypes at an intercontinental scale. Such information is crucial for understanding AMF ecology, and how these fungi can be used in an environmentally safe way in distant locations.
Reconstructing the complex evolutionary history of mobile plasmids in red algal genomes
Lee, JunMo; Kim, Kyeong Mi; Yang, Eun Chan; Miller, Kathy Ann; Boo, Sung Min; Bhattacharya, Debashish; Yoon, Hwan Su
2016-01-01
The integration of foreign DNA into algal and plant plastid genomes is a rare event, with only a few known examples of horizontal gene transfer (HGT). Plasmids, which are well-studied drivers of HGT in prokaryotes, have been reported previously in red algae (Rhodophyta). However, the distribution of these mobile DNA elements and their sites of integration into the plastid (ptDNA), mitochondrial (mtDNA), and nuclear genomes of Rhodophyta remain unknown. Here we reconstructed the complex evolutionary history of plasmid-derived DNAs in red algae. Comparative analysis of 21 rhodophyte ptDNAs, including new genome data for 5 species, turned up 22 plasmid-derived open reading frames (ORFs) that showed syntenic and copy number variation among species, but were conserved within different individuals in three lineages. Several plasmid-derived homologs were found not only in ptDNA but also in mtDNA and in the nuclear genome of green plants, stramenopiles, and rhizarians. Phylogenetic and plasmid-derived ORF analyses showed that the majority of plasmid DNAs originated within red algae, whereas others were derived from cyanobacteria, other bacteria, and viruses. Our results elucidate the evolution of plasmid DNAs in red algae and suggest that they spread as parasitic genetic elements. This hypothesis is consistent with their sporadic distribution within Rhodophyta. PMID:27030297
Koh, Xuan-Han; Liu, Xuanyao; Teo, Yik-Ying
2014-01-01
Body fat deposition and distribution differ between East Asians and Europeans, and for the same level of obesity, East Asians are at higher risks of Type 2 diabetes (T2D) and other metabolic disorders. This observation has prompted the reclassifications of body mass index thresholds for the definitions of “overweight” and “obese” in East Asians. However, the question remains over what evolutionary mechanisms have driven the differences in adiposity morphology between two population groups that shared a common ancestor less than 80,000 years ago. The Thrifty Gene hypothesis has been suggested as a possible explanation, where genetic factors that allowed for efficient food-energy conversion and storage are evolutionarily favoured by conferring increased chances of survival and fertility. Here, we leveraged on the existing findings from genome-wide association studies and large-scale surveys of positive natural selection to evaluate whether there is currently any evidence to support the Thrifty Gene hypothesis. We first assess whether the existing genetic associations with obesity and T2D are located in genomic regions that are reported to be under positive selection, and if so, whether the risk alleles sit on the extended haplotype forms. In addition, we interrogate whether these risk alleles are the derived forms that differ from the ancestral alleles, and whether there is significant evidence of population differentiation at these SNPs between East Asian and European populations. Our systematic survey did not yield conclusive evidence to support the Thrifty Gene hypothesis as a possible explanation for the differences observed between East Asians and Europeans. PMID:25337808
Comparative analysis and supragenome modeling of twelve Moraxella catarrhalis clinical isolates
2011-01-01
Background M. catarrhalis is a gram-negative, gamma-proteobacterium and an opportunistic human pathogen associated with otitis media (OM) and exacerbations of chronic obstructive pulmonary disease (COPD). With direct and indirect costs for treating these conditions annually exceeding $33 billion in the United States alone, and nearly ubiquitous resistance to beta-lactam antibiotics among M. catarrhalis clinical isolates, a greater understanding of this pathogen's genome and its variability among isolates is needed. Results The genomic sequences of ten geographically and phenotypically diverse clinical isolates of M. catarrhalis were determined and analyzed together with two publicly available genomes. These twelve genomes were subjected to detailed comparative and predictive analyses aimed at characterizing the supragenome and understanding the metabolic and pathogenic potential of this species. A total of 2383 gene clusters were identified, of which 1755 are core with the remaining 628 clusters unevenly distributed among the twelve isolates. These findings are consistent with the distributed genome hypothesis (DGH), which posits that the species genome possesses a far greater number of genes than any single isolate. Multiple and pair-wise whole genome alignments highlight limited chromosomal re-arrangement. Conclusions M. catarrhalis gene content and chromosomal organization data, although supportive of the DGH, show modest overall genic diversity. These findings are in stark contrast with the reported heterogeneity of the species as a whole, as wells as to other bacterial pathogens mediating OM and COPD, providing important insight into M. catarrhalis pathogenesis that will aid in the development of novel therapeutic regimens. PMID:21269504
Ji, Yanzhu; DeWoody, J Andrew
2016-06-01
Transposable elements (TEs) are nearly ubiquitous among eukaryotic genomes, but TE contents vary dramatically among phylogenetic lineages. Several mechanisms have been proposed as drivers of TE dynamics in genomes, including the fixation/loss of a particular TE insertion by selection or drift as well as structural changes in the genome due to mutation (e.g., recombination). In particular, recombination can have a significant and directional effect on the genomic TE landscape. For example, ectopic recombination removes internal regions of long terminal repeat retrotransposons (LTR-RTs) as well as one long terminal repeat (LTR), resulting in a solo LTR. In this study, we focus on the intra-species dynamics of LTR-RTs and solo LTRs in bird genomes. The distribution of LTR-RTs and solo LTRs in birds is intriguing because avian recombination rates vary widely within a given genome. We used published linkage maps and whole genome assemblies to study the relationship between recombination rates and LTR-removal events in the chicken and zebra finch. We hypothesized that regions with low recombination rates would harbor more full-length LTR-RTs (and fewer solo LTRs) than regions with high recombination rates. We tested this hypothesis by comparing the ratio of full-length LTR-RTs and solo LTRs across chromosomes, across non-overlapping megabase windows, and across physical features (i.e., centromeres and telomeres). The chicken data statistically supported the hypothesis that recombination rates are inversely correlated with the ratio of full-length to solo LTRs at both the chromosome level and in 1-Mb non-overlapping windows. We also found that the ratio of full-length to solo LTRs near chicken telomeres was significantly lower than those ratios near centromeres. Our results suggest a potential role of ectopic recombination in shaping the chicken LTR-RT genomic landscape.
Genetic drift and mutational hazard in the evolution of salamander genomic gigantism.
Mohlhenrich, Erik Roger; Mueller, Rachel Lockridge
2016-12-01
Salamanders have the largest nuclear genomes among tetrapods and, excepting lungfishes, among vertebrates as a whole. Lynch and Conery (2003) have proposed the mutational-hazard hypothesis to explain variation in genome size and complexity. Under this hypothesis, noncoding DNA imposes a selective cost by increasing the target for degenerative mutations (i.e., the mutational hazard). Expansion of noncoding DNA, and thus genome size, is driven by increased levels of genetic drift and/or decreased mutation rates; the former determines the efficiency with which purifying selection can remove excess DNA, whereas the latter determines the level of mutational hazard. Here, we test the hypothesis that salamanders have experienced stronger long-term, persistent genetic drift than frogs, a related clade with more typically sized vertebrate genomes. To test this hypothesis, we compared dN/dS and Kr/Kc values of protein-coding genes between these clades. Our results do not support this hypothesis; we find that salamanders have not experienced stronger genetic drift than frogs. Additionally, we find evidence consistent with a lower nucleotide substitution rate in salamanders. This result, along with previous work showing lower rates of small deletion and ectopic recombination in salamanders, suggests that a lower mutational hazard may contribute to genomic gigantism in this clade. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Second, Gérard; Rouhan, Germinal
2008-01-01
Background The genus Oryza is being used as a model in plant genomic studies although there are several issues still to be resolved regarding the spatio-temporal evolution of this ancient genus. Particularly contentious is whether undated transoceanic natural dispersal or recent human interference has been the principal agent determining its present distribution and differentiation. In this context, we studied the origin and distribution history of the allotetraploid CD rice genome. It is endemic to the Neotropics but the genus is thought to have originated in the Paleotropics, and there is relatively little genetic divergence between some orthologous sequences of the C genome component and their Old World counterparts. Methodology/Principal Findings Because of its allotetraploidy, there are several potential pitfalls in trying to date the formation of the CD genome using molecular data and this could lead to erroneous estimates. Therefore, we rather chose to rely on historical evidence to determine whether or not the CD genome was present in the Neotropics before the arrival of Columbus. We searched early collections of herbarium specimens and studied the reports of explorers of the tropical Americas for references to rice. In spite of numerous collectors traveling inland and collecting Oryza, plants determined as CD genome species were not observed away from cultivated rice fields until 1869. Various arguments suggest that they only consisted of weedy forms until that time. Conclusions/Significance The spatio-temporal distribution of herbarium collections fits a simple biogeographical scenario for the emergence in cultivated rice fields followed by radiation in the wild of the CD genome in the Neotropics during the last four centuries. This probably occurred from species introduced to the Americas by humans and we found no evidence that the CD genome pre-existed in the Old World. We therefore propose a new evolutionary hypothesis for such a recent origin of the CD genome. Moreover, we exemplify how an historical approach can provide potentially important information and help to disentangle the timing of evolutionary events in the history of the Oryza genomes. PMID:18596981
Yajima, Misako; Ikuta, Kazufumi; Kanda, Teru
2018-04-03
Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically.
Ikuta, Kazufumi; Kanda, Teru
2018-01-01
Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically. PMID:29614006
NASA Astrophysics Data System (ADS)
Acquisti, Claudia; Allegrini, Paolo; Bogani, Patrizia; Buiatti, Marcello; Catanese, Elena; Fronzoni, Leone; Grigolini, Paolo; Mersi, Giuseppe; Palatella, Luigi
2004-04-01
We investigate on a possible way to connect the presence of Low-Complexity Sequences (LCS) in DNA genomes and the nonstationary properties of base correlations. Under the hypothesis that these variations signal a change in the DNA function, we use a new technique, called Non-Stationarity Entropic Index (NSEI) method, and we prove that this technique is an efficient way to detect functional changes with respect to a random baseline. The remarkable aspect is that NSEI does not imply any training data or fitting parameter, the only arbitrarity being the choice of a marker in the sequence. We make this choice on the basis of biological information about LCS distributions in genomes. We show that there exists a correlation between changing the amount in LCS and the ratio of long- to short-range correlation.
de Melo, Warita Alves; Lima-Ribeiro, Matheus S.; Terribile, Levi Carina
2016-01-01
Studies based on contemporary plant occurrences and pollen fossil records have proposed that the current disjunct distribution of seasonally dry tropical forests (SDTFs) across South America is the result of fragmentation of a formerly widespread and continuously distributed dry forest during the arid climatic conditions associated with the Last Glacial Maximum (LGM), which is known as the modern-day dry forest refugia hypothesis. We studied the demographic history of Tabebuia rosealba (Bignoniaceae) to understand the disjunct geographic distribution of South American SDTFs based on statistical phylogeography and ecological niche modeling (ENM). We specifically tested the dry forest refugia hypothesis; i.e., if the multiple and isolated patches of SDTFs are current climatic relicts of a widespread and continuously distributed dry forest during the LGM. We sampled 235 individuals across 18 populations in Central Brazil and analyzed the polymorphisms at chloroplast (trnS-trnG, psbA-trnH and ycf6-trnC intergenic spacers) and nuclear (ITS nrDNA) genomes. We performed coalescence simulations of alternative hypotheses under demographic expectations from two a priori biogeographic hypotheses (1. the Pleistocene Arc hypothesis and, 2. a range shift to Amazon Basin) and other two demographic expectances predicted by ENMs (3. expansion throughout the Neotropical South America, including Amazon Basin, and 4. retraction during the LGM). Phylogenetic analyses based on median-joining network showed haplotype sharing among populations with evidence of incomplete lineage sorting. Coalescent analyses showed smaller effective population sizes for T. roseoalba during the LGM compared to the present-day. Simulations and ENM also showed that its current spatial pattern of genetic diversity is most likely due to a scenario of range retraction during the LGM instead of the fragmentation from a once extensive and largely contiguous SDTF across South America, not supporting the South American dry forest refugia hypothesis. PMID:27458982
de Melo, Warita Alves; Lima-Ribeiro, Matheus S; Terribile, Levi Carina; Collevatti, Rosane G
2016-01-01
Studies based on contemporary plant occurrences and pollen fossil records have proposed that the current disjunct distribution of seasonally dry tropical forests (SDTFs) across South America is the result of fragmentation of a formerly widespread and continuously distributed dry forest during the arid climatic conditions associated with the Last Glacial Maximum (LGM), which is known as the modern-day dry forest refugia hypothesis. We studied the demographic history of Tabebuia rosealba (Bignoniaceae) to understand the disjunct geographic distribution of South American SDTFs based on statistical phylogeography and ecological niche modeling (ENM). We specifically tested the dry forest refugia hypothesis; i.e., if the multiple and isolated patches of SDTFs are current climatic relicts of a widespread and continuously distributed dry forest during the LGM. We sampled 235 individuals across 18 populations in Central Brazil and analyzed the polymorphisms at chloroplast (trnS-trnG, psbA-trnH and ycf6-trnC intergenic spacers) and nuclear (ITS nrDNA) genomes. We performed coalescence simulations of alternative hypotheses under demographic expectations from two a priori biogeographic hypotheses (1. the Pleistocene Arc hypothesis and, 2. a range shift to Amazon Basin) and other two demographic expectances predicted by ENMs (3. expansion throughout the Neotropical South America, including Amazon Basin, and 4. retraction during the LGM). Phylogenetic analyses based on median-joining network showed haplotype sharing among populations with evidence of incomplete lineage sorting. Coalescent analyses showed smaller effective population sizes for T. roseoalba during the LGM compared to the present-day. Simulations and ENM also showed that its current spatial pattern of genetic diversity is most likely due to a scenario of range retraction during the LGM instead of the fragmentation from a once extensive and largely contiguous SDTF across South America, not supporting the South American dry forest refugia hypothesis.
Genomic signatures of positive selection in humans and the limits of outlier approaches.
Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M
2006-08-01
Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Esnault, Caroline; Graça, Paula; Higuet, Dominique; Bonnivard, Eric
2013-01-01
Transposable elements are major constituents of eukaryote genomes and have a great impact on genome structure and stability. They can contribute to the genetic diversity and evolution of organisms. Knowledge of their distribution among several genomes is an essential condition to study their dynamics and to better understand their role in species evolution. LTR-retrotransposons have been reported in many diverse eukaryote species, describing a ubiquitous distribution. Given their abundance, diversity and their extended ranges in C-values, environment and life styles, crustaceans are a great taxon to investigate the genomic component of adaptation and its possible relationships with TEs. However, crustaceans have been greatly underrepresented in transposable element studies. Using both degenerate PCR and in silico approaches, we have identified 35 Copia and 46 Gypsy families in 15 and 18 crustacean species, respectively. In particular, we characterized several full-length elements from the shrimp Rimicaris exoculata that is listed as a model organism from hydrothermal vents. Phylogenic analyses show that Copia and Gypsy retrotransposons likely present two opposite dynamics within crustaceans. The Gypsy elements appear relatively frequent and diverse whereas Copia are much more homogeneous, as 29 of them belong to the single GalEa clade, and species- or lineage-dependent. Our results also support the hypothesis of the Copia retrotransposon scarcity in metazoans compared to Gypsy elements. In such a context, the GalEa-like elements present an outstanding wide distribution among eukaryotes, from fishes to red algae, and can be even highly predominant within a large taxon, such as Malacostraca. Their distribution among crustaceans suggests a dynamics that follows a “domino days spreading” branching process in which successive amplifications may interact positively. PMID:23469217
Goffinet, Bernard; Wickett, Norman J; Werner, Olaf; Ros, Rosa Maria; Shaw, A Jonathan; Cox, Cymon J
2007-04-01
The recent assembly of the complete sequence of the plastid genome of the model taxon Physcomitrella patens (Funariaceae, Bryophyta) revealed that a 71-kb fragment, encompassing much of the large single copy region, is inverted. This inversion of 57% of the genome is the largest rearrangement detected in the plastid genomes of plants to date. Although initially considered diagnostic of Physcomitrella patens, the inversion was recently shown to characterize the plastid genome of two species from related genera within Funariaceae, but was lacking in another member of Funariidae. The phylogenetic significance of the inversion has remained ambiguous. Exemplars of all families included in Funariidae were surveyed. DNA sequences spanning the inversion break ends were amplified, using primers that anneal to genes on either side of the putative end points of the inversion. Primer combinations were designed to yield a product for either the inverted or the non-inverted architecture. The survey reveals that exemplars of eight genera of Funariaceae, the sole species of Disceliaceae and three generic representatives of Encalyptales all share the 71-kb inversion in the large single copy of the plastid genome. By contrast, the plastid genome of Gigaspermaceae (Funariales) is characterized by a gene order congruent with that described for other mosses, liverworts and hornworts, and hence it does not possess this inversion. The phylogenetic distribution of the inversion in the gene order supports a hypothesis only weakly supported by inferences from sequence data whereby Funariales are paraphyletic, with Funariaceae and Disceliaceae sharing a common ancestor with Encalyptales, and Gigaspermaceae sister to this combined clade. To reflect these relationships, Gigaspermaceae are excluded from Funariales and accommodated in their own order, Gigaspermales order nov., within Funariideae.
Han, Buhm; Kang, Hyun Min; Eskin, Eleazar
2009-01-01
With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255
Tumor Hypoxia and Genetic Alterations in Sporadic Cancers
Koi, Minoru; Boland, C.R.
2011-01-01
The cancer genome contains many gene alterations. How cancer cells acquire these alterations is a matter for discussion. One hypothesis is that cancer cells obtain mutations in genome stability genes at an early stage of tumor development, which results in genetic instability and generates a gene pool that enhances cellular proliferation and survival. Another hypothesis puts its emphasis on the natural selection of gene mutations for fitness. Recent data for systematic cancer genome sequencing shows that mutations in stability genes are rare in human sporadic cancers. Instead, many “passenger” mutations that do not drive the carcinogenesis process have been found in the cancer genome. Both the hypotheses mentioned above fall short in explaining recent data. Recently, many studies demonstrate the role of the tumor microenvironment, especially hypoxia and reoxygenation, in genetic instability. In this review, literature will be presented which supports a third hypothesis, i.e. that hypoxia/re-oxygenation induces genetic instability. PMID:21272156
Efficient identification of context dependent subgroups of risk from genome wide association studies
Dyson, Greg; Sing, Charles F.
2014-01-01
We have developed a modified Patient Rule-Induction Method (PRIM) as an alternative strategy for analyzing representative samples of non-experimental human data to estimate and test the role of genomic variations as predictors of disease risk in etiologically heterogeneous sub-samples. A computational limit of the proposed strategy is encountered when the number of genomic variations (predictor variables) under study is large (> 500) because permutations are used to generate a null distribution to test the significance of a term (defined by values of particular variables) that characterizes a sub-sample of individuals through the peeling and pasting processes. As an alternative, in this paper we introduce a theoretical strategy that facilitates the quick calculation of Type I and Type II errors in the evaluation of terms in the peeling and pasting processes carried out in the execution of a PRIM analysis that are underestimated and non-existent, respectively, when a permutation-based hypothesis test is employed. The resultant savings in computational time makes possible the consideration of larger numbers of genomic variations (an example genome wide association study is given) in the selection of statistically significant terms in the formulation of PRIM prediction models. PMID:24570412
Dor, Roi; Carling, Matthew D; Lovette, Irby J; Sheldon, Frederick H; Winkler, David W
2012-10-01
The New World swallow genus Tachycineta comprises nine species that collectively have a wide geographic distribution and remarkable variation both within- and among-species in ecologically important traits. Existing phylogenetic hypotheses for Tachycineta are based on mitochondrial DNA sequences, thus they provide estimates of a single gene tree. In this study we sequenced multiple individuals from each species at 16 nuclear intron loci. We used gene concatenated approaches (Bayesian and maximum likelihood) as well as coalescent-based species tree inference to reconstruct phylogenetic relationships of the genus. We examined the concordance and conflict between the nuclear and mitochondrial trees and between concatenated and coalescent-based inferences. Our results provide an alternative phylogenetic hypothesis to the existing mitochondrial DNA estimate of phylogeny. This new hypothesis provides a more accurate framework in which to explore trait evolution and examine the evolution of the mitochondrial genome in this group. Copyright © 2012 Elsevier Inc. All rights reserved.
Selective significance of genome size in a plant community with heavy metal pollution.
Vidic, T; Greilhuber, J; Vilhar, B; Dermastia, M
2009-09-01
In eukaryotes, nuclear genome sizes vary by more than five orders of magnitude. This variation is not related to organismal complexity, and its origin and biological significance are still disputed. One of the open questions is whether genome size has an adaptive role. We tested the hypothesis that genome size has selective significance, using five grassland communities occurring on a gradient of metal pollution of the soil as a model. We detected a negative correlation between the concentration of contaminating metals in the soil and the number of vascular plant species. Analysis of genome sizes of 70 herbaceous dicot perennial species occurring on the investigated plots revealed a negative correlation between the concentration of contaminating metals in the soil and the proportion of species with large genomes in plant communities. Consistent with the hypothesis, these results show that species with large genomes are at selective disadvantage in extreme environmental conditions.
The public goods hypothesis for the evolution of life on Earth
2011-01-01
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis. PMID:21861918
The Public Goods Hypothesis for the evolution of life on Earth.
McInerney, James O; Pisani, Davide; Bapteste, Eric; O'Connell, Mary J
2011-08-23
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.
Valenzuela, Carlos Y
2017-02-13
Direct tests of the random or non-random distribution of nucleotides on genomes have been devised to test the hypothesis of neutral, nearly-neutral or selective evolution. These tests are based on the direct base distribution and are independent of the functional (coding or non-coding) or structural (repeated or unique sequences) properties of the DNA. The first approach described the longitudinal distribution of bases in tandem repeats under the Bose-Einstein statistics. A huge deviation from randomness was found. A second approach was the study of the base distribution within dinucleotides whose bases were separated by 0, 1, 2… K nucleotides. Again an enormous difference from the random distribution was found with significances out of tables and programs. These test values were periodical and included the 16 dinucleotides. For example a high "positive" (more observed than expected dinucleotides) value, found in dinucleotides whose bases were separated by (3K + 2) sites, was preceded by two smaller "negative" (less observed than expected dinucleotides) values, whose bases were separated by (3K) or (3K + 1) sites. We examined mtDNAs, prokaryote genomes and some eukaryote chromosomes and found that the significant non-random interactions and periodicities were present up to 1000 or more sites of base separation and in human chromosome 21 until separations of more than 10 millions sites. Each nucleotide has its own significant value of its distance to neutrality; this yields 16 hierarchical significances. A three dimensional table with the number of sites of separation between the bases and the 16 significances (the third dimension is the dinucleotide, individual or taxon involved) gives directly an evolutionary state of the analyzed genome that can be used to obtain phylogenies. An example is provided.
Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric
2018-01-01
Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/
Genomic analysis reveals hidden biodiversity within colugos, the sister group to primates
Mason, Victor C.; Li, Gang; Minx, Patrick; Schmitz, Jürgen; Churakov, Gennady; Doronina, Liliya; Melin, Amanda D.; Dominy, Nathaniel J.; Lim, Norman T-L.; Springer, Mark S.; Wilson, Richard K.; Warren, Wesley C.; Helgen, Kristofer M.; Murphy, William J.
2016-01-01
Colugos are among the most poorly studied mammals despite their centrality to resolving supraordinal primate relationships. Two described species of these gliding mammals are the sole living members of the order Dermoptera, distributed throughout Southeast Asia. We generated a draft genome sequence for a Sunda colugo and a Philippine colugo reference alignment, and used these to identify colugo-specific genetic changes that were enriched in sensory and musculoskeletal-related genes that likely underlie their nocturnal and gliding adaptations. Phylogenomic analysis and catalogs of rare genomic changes overwhelmingly support the contested hypothesis that colugos are the sister group to primates (Primatomorpha), to the exclusion of treeshrews. We captured ~140 kb of orthologous sequence data from colugo museum specimens sampled across their range and identified large genetic differences between many geographically isolated populations that may result in a >300% increase in the number of recognized colugo species. Our results identify conservation units to mitigate future losses of this enigmatic mammalian order. PMID:27532052
Buti, Matteo; Sargent, Daniel J; Mhelembe, Khethani G; Delfino, Pietro; Tobutt, Kenneth R; Velasco, Riccardo
2016-05-11
The Rosaceae family encompasses numerous genera exhibiting morphological diversification in fruit types and plant habit as well as a wide variety of chromosome numbers. Comparative genomics between various Rosaceous genera has led to the hypothesis that the ancestral genome of the family contained nine chromosomes, however, the synteny studies performed in the Rosaceae to date encompass species with base chromosome numbers x = 7 (Fragaria), x = 8 (Prunus), and x = 17 (Malus), and no study has included species from one of the many Rosaceous genera containing a base chromosome number of x = 9. A genetic linkage map of the species Physocarpus opulifolius (x = 9) was populated with sequence characterised SNP markers using genotyping by sequencing. This allowed for the first time, the extent of the genome diversification of a Rosaceous genus with a base chromosome number of x = 9 to be performed. Orthologous loci distributed throughout the nine chromosomes of Physocarpus and the eight chromosomes of Prunus were identified which permitted a meaningful comparison of the genomes of these two genera to be made. The study revealed a high level of macro-synteny between the two genomes, and relatively few chromosomal rearrangements, as has been observed in studies of other Rosaceous genomes, lending further support for a relatively simple model of genomic evolution in Rosaceae.
Legault, Michel
2015-01-01
The North-east American Rainbow smelt (Osmerus mordax) is composed of two glacial races first identified through the spatial distribution of two distinct mtDNA lineages. Contemporary breeding populations of smelt in the St. Lawrence estuary comprise contrasting mixtures of both lineages, suggesting that the two races came into secondary contact in this estuary. The overall objective of this study was to assess the role of intraspecific genetic admixture in the morphological diversification of the estuarine rainbow smelt population complex. The morphology of mixed-ancestry populations varied as a function of the relative contribution of the two races to estuarine populations, supporting the hypothesis of genetic admixture. Populations comprising both ancestral mtDNA races did not exhibit intermediate morphologies relative to pure populations but rather exhibited many traits that exceeded the parental trait values, consistent with the hypothesis of transgressive segregation. Evidence for genetic admixture at the level of the nuclear gene pool, however, provided only partial support for this hypothesis. Variation at nuclear AFLP markers revealed clear evidence of the two corresponding mtDNA glacial races. The admixture of the two races at the nuclear level is only pronounced in mixed-ancestry populations dominated by one of the mtDNA lineages, the same populations showing the greatest degree of morphological diversification and population structure. In contrast, mixed-ancestry populations dominated by the alternate mtDNA lineage showed little evidence of introgression of the nuclear genome, little morphological diversification and little contemporary population genetic structure. These results only partially support the hypothesis of transgressive segregation and may be the result of the differential effects of natural selection acting on admixed genomes from different sources. PMID:25856193
Genomic instability--an evolving hallmark of cancer.
Negrini, Simona; Gorgoulis, Vassilis G; Halazonetis, Thanos D
2010-03-01
Genomic instability is a characteristic of most cancers. In hereditary cancers, genomic instability results from mutations in DNA repair genes and drives cancer development, as predicted by the mutator hypothesis. In sporadic (non-hereditary) cancers the molecular basis of genomic instability remains unclear, but recent high-throughput sequencing studies suggest that mutations in DNA repair genes are infrequent before therapy, arguing against the mutator hypothesis for these cancers. Instead, the mutation patterns of the tumour suppressor TP53 (which encodes p53), ataxia telangiectasia mutated (ATM) and cyclin-dependent kinase inhibitor 2A (CDKN2A; which encodes p16INK4A and p14ARF) support the oncogene-induced DNA replication stress model, which attributes genomic instability and TP53 and ATM mutations to oncogene-induced DNA damage.
Role of the horizontal gene exchange in evolution of pathogenic Mycobacteria.
Reva, Oleg; Korotetskiy, Ilya; Ilin, Aleksandr
2015-01-01
Mycobacterium tuberculosis is one of the most dangerous human pathogens, the causative agent of tuberculosis. While this pathogen is considered as extremely clonal and resistant to horizontal gene exchange, there are many facts supporting the hypothesis that on the early stages of evolution the development of pathogenicity of ancestral Mtb has started with a horizontal acquisition of virulence factors. Episodes of infections caused by non-tuberculosis Mycobacteria reported worldwide may suggest a potential for new pathogens to appear. If so, what is the role of horizontal gene transfer in this process? Availing of accessibility of complete genomes sequences of multiple pathogenic, conditionally pathogenic and saprophytic Mycobacteria, a genome comparative study was performed to investigate the distribution of genomic islands among bacteria and identify ontological links between these mobile elements. It was shown that the ancient genomic islands from M. tuberculosis still may be rooted to the pool of mobile genetic vectors distributed among Mycobacteria. A frequent exchange of genes was observed between M. marinum and several saprophytic and conditionally pathogenic species. Among them M. avium was the most promiscuous species acquiring genetic materials from diverse origins. Recent activation of genetic vectors circulating among Mycobacteria potentially may lead to emergence of new pathogens from environmental and conditionally pathogenic Mycobacteria. The species which require monitoring are M. marinum and M. avium as they eagerly acquire genes from different sources and may become donors of virulence gene cassettes to other micro-organisms.
Choudoir, Mallory J; Buckley, Daniel H
2018-06-07
The latitudinal diversity gradient is a pattern of biogeography observed broadly in plants and animals but largely undocumented in terrestrial microbial systems. Although patterns of microbial biogeography across broad taxonomic scales have been described in a range of contexts, the mechanisms that generate biogeographic patterns between closely related taxa remain incompletely characterized. Adaptive processes are a major driver of microbial biogeography, but there is less understanding of how microbial biogeography and diversification are shaped by dispersal limitation and drift. We recently described a latitudinal diversity gradient of species richness and intraspecific genetic diversity in Streptomyces by using a geographically explicit culture collection. Within this geographically explicit culture collection, we have identified Streptomyces sister-taxa whose geographic distribution is delimited by latitude. These sister-taxa differ in geographic distribution, genomic diversity, and ecological traits despite having nearly identical SSU rRNA gene sequences. Comparative genomic analysis reveals genomic differentiation of these sister-taxa consistent with restricted gene flow across latitude. Furthermore, we show phylogenetic conservatism of thermal traits between the sister-taxa suggesting that thermal trait adaptation limits dispersal and gene flow across climate regimes as defined by latitude. Such phylogenetic conservatism of thermal traits is commonly associated with latitudinal diversity gradients for plants and animals. These data provide further support for the hypothesis that the Streptomyces latitudinal diversity gradient was formed as a result of historical demographic processes defined by dispersal limitation and driven by paleoclimate dynamics.
A global perspective on Campanulaceae: Biogeographic, genomic, and floral evolution.
Crowl, Andrew A; Miles, Nicholas W; Visger, Clayton J; Hansen, Kimberly; Ayers, Tina; Haberle, Rosemarie; Cellinese, Nico
2016-02-01
The Campanulaceae are a diverse clade of flowering plants encompassing more than 2300 species in myriad habitats from tropical rainforests to arctic tundra. A robust, multigene phylogeny, including all major lineages, is presented to provide a broad, evolutionary perspective of this cosmopolitan clade. We used a phylogenetic framework, in combination with divergence dating, ancestral range estimation, chromosome modeling, and morphological character reconstruction analyses to infer phylogenetic placement and timing of major biogeographic, genomic, and morphological changes in the history of the group and provide insights into the diversification of this clade across six continents. Ancestral range estimation supports an out-of-Africa diversification following the Cretaceous-Tertiary extinction event. Chromosomal modeling, with corroboration from the distribution of synonymous substitutions among gene duplicates, provides evidence for as many as 20 genome-wide duplication events before large radiations. Morphological reconstructions support the hypothesis that switches in floral symmetry and anther dehiscence were important in the evolution of secondary pollen presentation mechanisms. This study provides a broad, phylogenetic perspective on the evolution of the Campanulaceae clade. The remarkable habitat diversity and cosmopolitan distribution of this lineage appears to be the result of a complex history of genome duplications and numerous long-distance dispersal events. We failed to find evidence for an ancestral polyploidy event for this clade, and our analyses indicate an ancestral base number of nine for the group. This study will serve as a framework for future studies in diverse areas of research in Campanulaceae. © 2016 Botanical Society of America.
Radhakrishnan, Srihari; Literman, Robert; Mizoguchi, Beatriz; Valenzuela, Nicole
2017-01-01
DNA methylation alters gene expression but not DNA sequence and mediates some cases of phenotypic plasticity. Temperature-dependent sex determination (TSD) epitomizes phenotypic plasticity where environmental temperature drives embryonic sexual fate, as occurs commonly in turtles. Importantly, the temperature-specific transcription of two genes underlying gonadal differentiation is known to be induced by differential methylation in TSD fish, turtle and alligator. Yet, how extensive is the link between DNA methylation and TSD remains unclear. Here we test for broad differences in genome-wide DNA methylation between male and female hatchling gonads of the TSD painted turtle Chrysemys picta using methyl DNA immunoprecipitation sequencing, to identify differentially methylated candidates for future study. We also examine the genome-wide nCpG distribution (which affects DNA methylation) in painted turtles and test for historic methylation in genes regulating vertebrate gonadogenesis. Turtle global methylation was consistent with other vertebrates (57% of the genome, 78% of all CpG dinucleotides). Numerous genes predicted to regulate turtle gonadogenesis exhibited sex-specific methylation and were proximal to methylated repeats. nCpG distribution predicted actual turtle DNA methylation and was bimodal in gene promoters (as other vertebrates) and introns (unlike other vertebrates). Differentially methylated genes, including regulators of sexual development, had lower nCpG content indicative of higher historic methylation. Ours is the first evidence suggesting that sexually dimorphic DNA methylation is pervasive in turtle gonads (perhaps mediated by repeat methylation) and that it targets numerous regulators of gonadal development, consistent with the hypothesis that it may regulate thermosensitive transcription in TSD vertebrates. However, further research during embryogenesis will help test this hypothesis and the alternative that instead, most differential methylation observed in hatchlings is the by-product of sexual differentiation and not its cause.
Genomic individuality and its biological implications.
Zhao, J
1996-06-01
It is a widely accepted fundamental concept that all somatic genomes of a human individual are identical to each other. The theoretical basis of this concept is that all of these somatic genomes are the descendants of the genome of a single fertilized cell as well as the simple replicated products of asexual reproduction, thus not forming any new recombined genomes. The question here is whether such a concept might only represent one side of somatic genome biology and, even worse, whether it has perhaps already led to a very prevalent misconception that within the organism body, there exists no variability among individual somatic genomes. A hypothesis, called genomic individuality, is proposed, simply saying that every individual somatic genome, perhaps with rare exceptions, has its own unique or individual 'genetic identity' or 'fingerprint', which is characterized by its distinctive sequences or patterns of deoxyribonucleic acid molecules, or both. Thus, no two somatic genomes can be identical to each other in every or all aspects, and consequently, there must be a great deal of genomic variation present within the body of any multicellular organism. The concept or hypothesis of genomic individuality would not only provide a more complete understanding of genome biology, but also suggest a new insight into the studies of the biology of cells and organisms.
Gu, Xun; Wang, Yufeng; Gu, Jianying
2002-06-01
The classical (two-round) hypothesis of vertebrate genome duplication proposes two successive whole-genome duplication(s) (polyploidizations) predating the origin of fishes, a view now being seriously challenged. As the debate largely concerns the relative merits of the 'big-bang mode' theory (large-scale duplication) and the 'continuous mode' theory (constant creation by small-scale duplications), we tested whether a significant proportion of paralogous genes in the contemporary human genome was indeed generated in the early stage of vertebrate evolution. After an extensive search of major databases, we dated 1,739 gene duplication events from the phylogenetic analysis of 749 vertebrate gene families. We found a pattern characterized by two waves (I, II) and an ancient component. Wave I represents a recent gene family expansion by tandem or segmental duplications, whereas wave II, a rapid paralogous gene increase in the early stage of vertebrate evolution, supports the idea of genome duplication(s) (the big-bang mode). Further analysis indicated that large- and small-scale gene duplications both make a significant contribution during the early stage of vertebrate evolution to build the current hierarchy of the human proteome.
Distance from sub-Saharan Africa predicts mutational load in diverse human genomes.
Henn, Brenna M; Botigué, Laura R; Peischl, Stephan; Dupanloup, Isabelle; Lipatov, Mikhail; Maples, Brian K; Martin, Alicia R; Musharoff, Shaila; Cann, Howard; Snyder, Michael P; Excoffier, Laurent; Kidd, Jeffrey M; Bustamante, Carlos D
2016-01-26
The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.
Negi, Pooja; Rai, Archana N; Suprasanna, Penna
2016-01-01
The recognition of a positive correlation between organism genome size with its transposable element (TE) content, represents a key discovery of the field of genome biology. Considerable evidence accumulated since then suggests the involvement of TEs in genome structure, evolution and function. The global genome reorganization brought about by transposon activity might play an adaptive/regulatory role in the host response to environmental challenges, reminiscent of McClintock's original 'Controlling Element' hypothesis. This regulatory aspect of TEs is also garnering support in light of the recent evidences, which project TEs as "distributed genomic control modules." According to this view, TEs are capable of actively reprogramming host genes circuits and ultimately fine-tuning the host response to specific environmental stimuli. Moreover, the stress-induced changes in epigenetic status of TE activity may allow TEs to propagate their stress responsive elements to host genes; the resulting genome fluidity can permit phenotypic plasticity and adaptation to stress. Given their predominating presence in the plant genomes, nested organization in the genic regions and potential regulatory role in stress response, TEs hold unexplored potential for crop improvement programs. This review intends to present the current information about the roles played by TEs in plant genome organization, evolution, and function and highlight the regulatory mechanisms in plant stress responses. We will also briefly discuss the connection between TE activity, host epigenetic response and phenotypic plasticity as a critical link for traversing the translational bridge from a purely basic study of TEs, to the applied field of stress adaptation and crop improvement.
Negi, Pooja; Rai, Archana N.; Suprasanna, Penna
2016-01-01
The recognition of a positive correlation between organism genome size with its transposable element (TE) content, represents a key discovery of the field of genome biology. Considerable evidence accumulated since then suggests the involvement of TEs in genome structure, evolution and function. The global genome reorganization brought about by transposon activity might play an adaptive/regulatory role in the host response to environmental challenges, reminiscent of McClintock's original ‘Controlling Element’ hypothesis. This regulatory aspect of TEs is also garnering support in light of the recent evidences, which project TEs as “distributed genomic control modules.” According to this view, TEs are capable of actively reprogramming host genes circuits and ultimately fine-tuning the host response to specific environmental stimuli. Moreover, the stress-induced changes in epigenetic status of TE activity may allow TEs to propagate their stress responsive elements to host genes; the resulting genome fluidity can permit phenotypic plasticity and adaptation to stress. Given their predominating presence in the plant genomes, nested organization in the genic regions and potential regulatory role in stress response, TEs hold unexplored potential for crop improvement programs. This review intends to present the current information about the roles played by TEs in plant genome organization, evolution, and function and highlight the regulatory mechanisms in plant stress responses. We will also briefly discuss the connection between TE activity, host epigenetic response and phenotypic plasticity as a critical link for traversing the translational bridge from a purely basic study of TEs, to the applied field of stress adaptation and crop improvement. PMID:27777577
Romano, Stefano; Fernàndez-Guerra, Antonio; Reen, F. Jerry; Glöckner, Frank O.; Crowley, Susan P.; O'Sullivan, Orla; Cotter, Paul D.; Adams, Claire; Dobson, Alan D. W.; O'Gara, Fergal
2016-01-01
Strains of the Pseudovibrio genus have been detected worldwide, mainly as part of bacterial communities associated with marine invertebrates, particularly sponges. This recurrent association has been considered as an indication of a symbiotic relationship between these microbes and their host. Until recently, the availability of only two genomes, belonging to closely related strains, has limited the knowledge on the genomic and physiological features of the genus to a single phylogenetic lineage. Here we present 10 newly sequenced genomes of Pseudovibrio strains isolated from marine sponges from the west coast of Ireland, and including the other two publicly available genomes we performed an extensive comparative genomic analysis. Homogeneity was apparent in terms of both the orthologous genes and the metabolic features shared amongst the 12 strains. At the genomic level, a key physiological difference observed amongst the isolates was the presence only in strain P. axinellae AD2 of genes encoding proteins involved in assimilatory nitrate reduction, which was then proved experimentally. We then focused on studying those systems known to be involved in the interactions with eukaryotic and prokaryotic cells. This analysis revealed that the genus harbors a large diversity of toxin-like proteins, secretion systems and their potential effectors. Their distribution in the genus was not always consistent with the phylogenetic relationship of the strains. Finally, our analyses identified new genomic islands encoding potential toxin-immunity systems, previously unknown in the genus. Our analyses shed new light on the Pseudovibrio genus, indicating a large diversity of both metabolic features and systems for interacting with the host. The diversity in both distribution and abundance of these systems amongst the strains underlines how metabolically and phylogenetically similar bacteria may use different strategies to interact with the host and find a niche within its microbiota. Our data suggest the presence of a sponge-specific lineage of Pseudovibrio. The reduction in genome size and the loss of some systems potentially used to successfully enter the host, leads to the hypothesis that P. axinellae strain AD2 may be a lineage that presents an ancient association with the host and that may be vertically transmitted to the progeny. PMID:27065959
Origin of noncoding DNA sequences: molecular fossils of genome evolution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naora, H.; Miyahara, K.; Curnow, R.N.
The total amount of noncoding sequences on chromosomes of contemporary organisms varies significantly from species to species. The authors propose a hypothesis for the origin of these noncoding sequences that assumes that (i) an approx. 0.55-kilobase (kb)-long reading frame composed the primordial gene and (ii) a 20-kb-long single-stranded polynucleotide is the longest molecule (as a genome) that was polymerized at random and without a specific template in the primordial soup/cell. The statistical distribution of stop codons allows examination of the probability of generating reading frames of approx. 0.55 kb in this primordial polynucleotide. This analysis reveals that with three stopmore » codons, a run of at least 0.55-kb equivalent length of nonstop codons would occur in 4.6% of 20-kb-long polynucleotide molecules. They attempt to estimate the total amount of noncoding sequences that would be present on the chromosomes of contemporary species assuming that present-day chromosomes retain the prototype primordial genome structure. Theoretical estimates thus obtained for most eukaryotes do not differ significantly from those reported for these specific organisms, with only a few exceptions. Furthermore, analysis of possible stop-codon distributions suggests that life on earth would not exist, at least in its present form, had two or four stop codons been selected early in evolution.« less
Peltola, Tomi; Marttinen, Pekka; Vehtari, Aki
2012-01-01
High-dimensional datasets with large amounts of redundant information are nowadays available for hypothesis-free exploration of scientific questions. A particular case is genome-wide association analysis, where variations in the genome are searched for effects on disease or other traits. Bayesian variable selection has been demonstrated as a possible analysis approach, which can account for the multifactorial nature of the genetic effects in a linear regression model. Yet, the computation presents a challenge and application to large-scale data is not routine. Here, we study aspects of the computation using the Metropolis-Hastings algorithm for the variable selection: finite adaptation of the proposal distributions, multistep moves for changing the inclusion state of multiple variables in a single proposal and multistep move size adaptation. We also experiment with a delayed rejection step for the multistep moves. Results on simulated and real data show increase in the sampling efficiency. We also demonstrate that with application specific proposals, the approach can overcome a specific mixing problem in real data with 3822 individuals and 1,051,811 single nucleotide polymorphisms and uncover a variant pair with synergistic effect on the studied trait. Moreover, we illustrate multimodality in the real dataset related to a restrictive prior distribution on the genetic effect sizes and advocate a more flexible alternative. PMID:23166669
Overcoming Barriers to Progress in Exercise Genomics
Bouchard, Claude
2011-01-01
This commentary focuses on the issues of statistical power, the usefulness of hypothesis-free approaches such as in genome-wide association explorations, the necessity of expanding the research beyond common DNA variants, the advantage of combining transcriptomics with genomics, and the complexities inherent to the search for links between genotype and phenotype in exercise genomics research. PMID:21697717
Buu, Anne; Williams, L Keoki; Yang, James J
2018-03-01
We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.
Abad-Grau, Mara M; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker 2-Groups TDT (mTDT(2G)), a test which under the hypothesis of no linkage, asymptotically follows a χ2 distribution with 1 degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that mTDT(2G) test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, mTDT(2G) turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases.
Abad-Grau, Mara M.; Medina-Medina, Nuria; Montes-Soldado, Rosana; Matesanz, Fuencisla; Bafna, Vineet
2012-01-01
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker -Groups TDT ( ), a test which under the hypothesis of no linkage, asymptotically follows a distribution with degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases. PMID:22363405
Stem-Loop RNA Hairpins in Giant Viruses: Invading rRNA-Like Repeats and a Template Free RNA
Seligmann, Hervé; Raoult, Didier
2018-01-01
We examine the hypothesis that de novo template-free RNAs still form spontaneously, as they did at the origins of life, invade modern genomes, contribute new genetic material. Previously, analyses of RNA secondary structures suggested that some RNAs resembling ancestral (t)RNAs formed recently de novo, other parasitic sequences cluster with rRNAs. Here positive control analyses of additional RNA secondary structures confirm ancestral and de novo statuses of RNA grouped according to secondary structure. Viroids with branched stems resemble de novo RNAs, rod-shaped viroids resemble rRNA secondary structures, independently of GC contents. 5′ UTR leading regions of West Nile and Dengue flavivirid viruses resemble de novo and rRNA structures, respectively. An RNA homologous with Megavirus, Dengue and West Nile genomes, copperhead snake microsatellites and levant cotton repeats, not templated by Mimivirus' genome, persists throughout Mimivirus' infection. Its secondary structure clusters with candidate de novo RNAs. The saltatory phyletic distribution and secondary structure of Mimivirus' peculiar RNA suggest occasional template-free polymerization of this sequence, rather than noncanonical transcriptions (swinger polymerization, posttranscriptional editing). PMID:29449833
3D genomics imposes evolution of the domain model of eukaryotic genome organization.
Razin, Sergey V; Vassetzky, Yegor S
2017-02-01
The hypothesis that the genome is composed of a patchwork of structural and functional domains (units) that may be either active or repressed was proposed almost 30 years ago. Here, we examine the evolution of the domain model of eukaryotic genome organization in view of the expansion of genome-scale techniques in the twenty-first century that have provided us with a wealth of information on genome organization, folding, and functioning.
Families of transposable elements, population structure and the origin of species.
Jurka, Jerzy; Bao, Weidong; Kojima, Kenji K
2011-09-19
Eukaryotic genomes harbor diverse families of repetitive DNA derived from transposable elements (TEs) that are able to replicate and insert into genomic DNA. The biological role of TEs remains unclear, although they have profound mutagenic impact on eukaryotic genomes and the origin of repetitive families often correlates with speciation events. We present a new hypothesis to explain the observed correlations based on classical concepts of population genetics. The main thesis presented in this paper is that the TE-derived repetitive families originate primarily by genetic drift in small populations derived mostly by subdivisions of large populations into subpopulations. We outline the potential impact of the emerging repetitive families on genetic diversification of different subpopulations, and discuss implications of such diversification for the origin of new species. Several testable predictions of the hypothesis are examined. First, we focus on the prediction that the number of diverse families of TEs fixed in a representative genome of a particular species positively correlates with the cumulative number of subpopulations (demes) in the historical metapopulation from which the species has emerged. Furthermore, we present evidence indicating that human AluYa5 and AluYb8 families might have originated in separate proto-human subpopulations. We also revisit prior evidence linking the origin of repetitive families to mammalian phylogeny and present additional evidence linking repetitive families to speciation based on mammalian taxonomy. Finally, we discuss evidence that mammalian orders represented by the largest numbers of species may be subject to relatively recent population subdivisions and speciation events. The hypothesis implies that subdivision of a population into small subpopulations is the major step in the origin of new families of TEs as well as of new species. The origin of new subpopulations is likely to be driven by the availability of new biological niches, consistent with the hypothesis of punctuated equilibria. The hypothesis also has implications for the ongoing debate on the role of genetic drift in genome evolution.
Gao, Yuan; Zhang, Yan; Yang, Xin; Qiu, Jian-Hua; Duan, Hong; Xu, Wen-Wen; Chang, Qiao-Cheng; Wang, Chun-Ren
2017-01-01
Equine strongyles, the significant nematode pathogens of horses, are characterized by high quantities and species abundance, but classification of this group of parasitic nematodes is debated. Mitochondrial (mt) genome DNA data are often used to address classification controversies. Thus, the objectives of this study were to determine the complete mt genomes of three Cyathostominae nematode species (Cyathostomum catinatum, Cylicostephanus minutus, and Poteriostomum imparidentatum) of horses and reconstruct the phylogenetic relationship of Strongylidae with other nematodes in Strongyloidea to test the hypothesis that Triodontophorus spp. belong to Cyathostominae using the mt genomes. The mt genomes of Cy. catinatum, Cs. minutus, and P. imparidentatum were 13,838, 13,826, and 13,817 bp in length, respectively. Complete mt nucleotide sequence comparison of all Strongylidae nematodes revealed that sequence identity ranged from 77.8 to 91.6%. The mt genome sequences of Triodontophorus species had relatively high identity with Cyathostominae nematodes, rather than Strongylus species of the same subfamily (Strongylinae). Comparative analyses of mt genome organization for Strongyloidea nematodes sequenced to date revealed that members of this superfamily possess identical gene arrangements. Phylogenetic analyses using mtDNA data indicated that the Triodontophorus species clustered with Cyathostominae species instead of Strongylus species. The present study first determined the complete mt genome sequences of Cy. catinatum, Cs. minutus, and P. imparidentatum, which will provide novel genetic markers for further studies of Strongylidae taxonomy, population genetics, and systematics. Importantly, sequence comparison and phylogenetic analyses based on mtDNA sequences supported the hypothesis that Triodontophorus belongs to Cyathostominae. PMID:28824575
Math, Renukaradhya K; Jin, Hyun Mi; Kim, Jeong Myeong; Hahn, Yoonsoo; Park, Woojun; Madsen, Eugene L; Jeon, Che Ok
2012-01-01
Alteromonas species are globally distributed copiotrophic bacteria in marine habitats. Among these, sea-tidal flats are distinctive: undergoing seasonal temperature and oxygen-tension changes, plus periodic exposure to petroleum hydrocarbons. Strain SN2 of the genus Alteromonas was isolated from hydrocarbon-contaminated sea-tidal flat sediment and has been shown to metabolize aromatic hydrocarbons there. Strain SN2's genomic features were analyzed bioinformatically and compared to those of Alteromonas macleodii ecotypes: AltDE and ATCC 27126. Strain SN2's genome differs from that of the other two strains in: size, average nucleotide identity value, tRNA genes, noncoding RNAs, dioxygenase gene content, signal transduction genes, and the degree to which genes collected during the Global Ocean Sampling project are represented. Patterns in genetic characteristics (e.g., GC content, GC skew, Karlin signature, CRISPR gene homology) indicate that strain SN2's genome architecture has been altered via horizontal gene transfer (HGT). Experiments proved that strain SN2 was far more cold tolerant, especially at 5°C, than the other two strains. Consistent with the HGT hypothesis, a total of 15 genomic islands in strain SN2 likely confer ecological fitness traits (especially membrane transport, aromatic hydrocarbon metabolism, and fatty acid biosynthesis) specific to the adaptation of strain SN2 to its seasonally cold sea-tidal flat habitat.
Math, Renukaradhya K.; Jin, Hyun Mi; Kim, Jeong Myeong; Hahn, Yoonsoo; Park, Woojun; Madsen, Eugene L.; Jeon, Che Ok
2012-01-01
Alteromonas species are globally distributed copiotrophic bacteria in marine habitats. Among these, sea-tidal flats are distinctive: undergoing seasonal temperature and oxygen-tension changes, plus periodic exposure to petroleum hydrocarbons. Strain SN2 of the genus Alteromonas was isolated from hydrocarbon-contaminated sea-tidal flat sediment and has been shown to metabolize aromatic hydrocarbons there. Strain SN2's genomic features were analyzed bioinformatically and compared to those of Alteromonas macleodii ecotypes: AltDE and ATCC 27126. Strain SN2's genome differs from that of the other two strains in: size, average nucleotide identity value, tRNA genes, noncoding RNAs, dioxygenase gene content, signal transduction genes, and the degree to which genes collected during the Global Ocean Sampling project are represented. Patterns in genetic characteristics (e.g., GC content, GC skew, Karlin signature, CRISPR gene homology) indicate that strain SN2's genome architecture has been altered via horizontal gene transfer (HGT). Experiments proved that strain SN2 was far more cold tolerant, especially at 5°C, than the other two strains. Consistent with the HGT hypothesis, a total of 15 genomic islands in strain SN2 likely confer ecological fitness traits (especially membrane transport, aromatic hydrocarbon metabolism, and fatty acid biosynthesis) specific to the adaptation of strain SN2 to its seasonally cold sea-tidal flat habitat. PMID:22563400
Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne
2015-02-10
Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
Hjortø, L; Ettema, J F; Kargo, M; Sørensen, A C
2015-01-01
Until now, genomic information has mainly been used to improve the accuracy of genomic breeding values for breeding animals at a population level. However, we hypothesize that the use of information from genotyped females also opens up the possibility of reducing genetic lag in a dairy herd, especially if genomic tests are used in combination with sexed semen or a high management level for reproductive performance, because both factors provide the opportunity for generating a reproductive surplus in the herd. In this study, sexed semen is used in combination with beef semen to produce high-value crossbred beef calves. Thus, on average there is no surplus of and selection among replacement heifers whether to go into the herd or to be sold. In this situation, the selection opportunities arise when deciding which cows to inseminate with sexed semen, conventional semen, or beef semen. We tested the hypothesis by combining the results of 2 stochastic simulation programs, SimHerd and ADAM. SimHerd estimates the economic effect of different strategies for use of sexed semen and beef semen at 3 levels of reproductive performance in a dairy herd. Besides simulating the operational return, SimHerd also simulates the parity distribution of the dams of heifer calves. The ADAM program estimates genetic merit per year in a herd under different strategies for use of sexed semen and genomic tests. The annual net return per slot was calculated as the sum of operational return and value of genetic lag minus costs of genomic tests divided by the total number of slots. Our results showed that the use of genomic tests for decision making decreases genetic lag by as much as 0.14 genetic standard deviation units of the breeding goal and that genetic lag decreases even more (up to 0.30 genetic standard deviation units) when genomic tests are used in combination with strategies for increasing and using a reproductive surplus. Thus, our hypothesis was supported. We also observed that genomic tests are used most efficiently to decrease genetic lag when the genomic information is used more than once in the lifetime of an animal and when as many selection decisions as possible are based on genomic information. However, all breakeven prices were lower than or equal to €50, which is the current price of low-density chip genotyping in Denmark, Finland, and Sweden, so in the vast majority of cases, it is not profitable to genotype routinely for management purposes under the present price assumptions. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Schmidt, Thomas L; Rašić, Gordana; Zhang, Dongjing; Zheng, Xiaoying; Xi, Zhiyong; Hoffmann, Ary A
2017-10-01
Aedes albopictus is a highly invasive disease vector with an expanding worldwide distribution. Genetic assays using low to medium resolution markers have found little evidence of spatial genetic structure even at broad geographic scales, suggesting frequent passive movement along human transportation networks. Here we analysed genetic structure of Aedes albopictus collected from 12 sample sites in Guangzhou, China, using thousands of genome-wide single nucleotide polymorphisms (SNPs). We found evidence for passive gene flow, with distance from shipping terminals being the strongest predictor of genetic distance among mosquitoes. As further evidence of passive dispersal, we found multiple pairs of full-siblings distributed between two sample sites 3.7 km apart. After accounting for geographical variability, we also found evidence for isolation by distance, previously undetectable in Ae. albopictus. These findings demonstrate how large SNP datasets and spatially-explicit hypothesis testing can be used to decipher processes at finer geographic scales than formerly possible. Our approach can be used to help predict new invasion pathways of Ae. albopictus and to refine strategies for vector control that involve the transformation or suppression of mosquito populations.
Grams, Vanessa; Wellmann, Robin; Preuß, Siegfried; Grashorn, Michael A; Kjaer, Jörgen B; Bessei, Werner; Bennewitz, Jörn
2015-09-30
Feather pecking (FP) in laying hens is a well-known and multi-factorial behaviour with a genetic background. In a selection experiment, two lines were developed for 11 generations for high (HFP) and low (LFP) feather pecking, respectively. Starting with the second generation of selection, there was a constant difference in mean number of FP bouts between both lines. We used the data from this experiment to perform a quantitative genetic analysis and to map selection signatures. Pedigree and phenotypic data were available for the last six generations of both lines. Univariate quantitative genetic analyses were conducted using mixed linear and generalized mixed linear models assuming a Poisson distribution. Selection signatures were mapped using 33,228 single nucleotide polymorphisms (SNPs) genotyped on 41 HFP and 34 LFP individuals of generation 11. For each SNP, we estimated Wright's fixation index (FST). We tested the null hypothesis that FST is driven purely by genetic drift against the alternative hypothesis that it is driven by genetic drift and selection. The mixed linear model failed to analyze the LFP data because of the large number of 0s in the observation vector. The Poisson model fitted the data well and revealed a small but continuous genetic trend in both lines. Most of the 17 genome-wide significant SNPs were located on chromosomes 3 and 4. Thirteen clusters with at least two significant SNPs within an interval of 3 Mb maximum were identified. Two clusters were mapped on chromosomes 3, 4, 8 and 19. Of the 17 genome-wide significant SNPs, 12 were located within the identified clusters. This indicates a non-random distribution of significant SNPs and points to the presence of selection sweeps. Data on FP should be analysed using generalised linear mixed models assuming a Poisson distribution, especially if the number of FP bouts is small and the distribution is heavily peaked at 0. The FST-based approach was suitable to map selection signatures that need to be confirmed by linkage or association mapping.
Vitorino, Luciana Cristina; Lima-Ribeiro, Matheus S; Terribile, Levi Carina; Collevatti, Rosane G
2016-10-13
We studied the phylogeography and demographical history of Tabebuia serratifolia (Bignoniaceae) to understand the disjunct geographical distribution of South American seasonally dry tropical forests (SDTFs). We specifically tested if the multiple and isolated patches of SDTFs are current climatic relicts of a widespread and continuously distributed dry forest during the last glacial maximum (LGM), the so called South American dry forest refugia hypothesis, using ecological niche modelling (ENM) and statistical phylogeography. We sampled 235 individuals of T. serratifolia in 17 populations in Brazil and analysed the polymorphisms at three intergenic chloroplast regions and ITS nuclear ribosomal DNA. Coalescent analyses showed a demographical expansion at the last c. 130 ka (thousand years before present). Simulations and ENM also showed that the current spatial pattern of genetic diversity is most likely due to a scenario of range expansion and range shift towards the Amazon Basin during the colder and arid climatic conditions associated with the LGM, matching the expected for the South American dry forest refugia hypothesis, although contrasting to the Pleistocene Arc hypothesis. Populations in more stable areas or with higher suitability through time showed higher genetic diversity. Postglacial range shift towards the Southeast and Atlantic coast may have led to spatial genome assortment due to leading edge colonization as the species tracks suitable environments, leading to lower genetic diversity in populations at higher distance from the distribution centroid at 21 ka. Haplotype sharing or common ancestry among populations from Caatinga in Northeast Brazil, Atlantic Forest in Southeast and Cerrado biome and ENM evince the past connection among these biomes.
Phase distribution of spliceosomal introns: implications for intron origin
Nguyen, Hung D; Yoshihama, Maki; Kenmochi, Naoya
2006-01-01
Background The origin of spliceosomal introns is the central subject of the introns-early versus introns-late debate. The distribution of intron phases is non-uniform, with an excess of phase-0 introns. Introns-early explains this by speculating that a fraction of present-day introns were present between minigenes in the progenote and therefore must lie in phase-0. In contrast, introns-late predicts that the nonuniformity of intron phase distribution reflects the nonrandomness of intron insertions. Results In this paper, we tested the two theories using analyses of intron phase distribution. We inferred the evolution of intron phase distribution from a dataset of 684 gene orthologs from seven eukaryotes using a maximum likelihood method. We also tested whether the observed intron phase distributions from 10 eukaryotes can be explained by intron insertions on a genome-wide scale. In contrast to the prediction of introns-early, the inferred evolution of intron phase distribution showed that the proportion of phase-0 introns increased over evolution. Consistent with introns-late, the observed intron phase distributions matched those predicted by an intron insertion model quite well. Conclusion Our results strongly support the introns-late hypothesis of the origin of spliceosomal introns. PMID:16959043
Mapping cis-Regulatory Domains in the Human Genome UsingMulti-Species Conservation of Synteny
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis
2005-06-13
Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A totalmore » of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.« less
Comparative and evolutionary studies of vertebrate ALDH1A-like genes and proteins.
Holmes, Roger S
2015-06-05
Vertebrate ALDH1A-like genes encode cytosolic enzymes capable of metabolizing all-trans-retinaldehyde to retinoic acid which is a molecular 'signal' guiding vertebrate development and adipogenesis. Bioinformatic analyses of vertebrate and invertebrate genomes were undertaken using known ALDH1A1, ALDH1A2 and ALDH1A3 amino acid sequences. Comparative analyses of the corresponding human genes provided evidence for distinct modes of gene regulation and expression with putative transcription factor binding sites (TFBS), CpG islands and micro-RNA binding sites identified for the human genes. ALDH1A-like sequences were identified for all mammalian, bird, lizard and frog genomes examined, whereas fish genomes displayed a more restricted distribution pattern for ALDH1A1 and ALDH1A3 genes. The ALDH1A1 gene was absent in many bony fish genomes examined, with the ALDH1A3 gene also absent in the medaka and tilapia genomes. Multiple ALDH1A1-like genes were identified in mouse, rat and marsupial genomes. Vertebrate ALDH1A1, ALDH1A2 and ALDH1A3 subunit sequences were highly conserved throughout vertebrate evolution. Comparative amino acid substitution rates showed that mammalian ALDH1A2 sequences were more highly conserved than for the ALDH1A1 and ALDH1A3 sequences. Phylogenetic studies supported an hypothesis for ALDH1A2 as a likely primordial gene originating in invertebrate genomes and undergoing sequential gene duplication to generate two additional genes, ALDH1A1 and ALDH1A3, in most vertebrate genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Ginkgo biloba's footprint of dynamic Pleistocene history dates back only 390,000 years ago.
Hohmann, Nora; Wolf, Eva M; Rigault, Philippe; Zhou, Wenbin; Kiefer, Markus; Zhao, Yunpeng; Fu, Cheng-Xin; Koch, Marcus A
2018-04-27
At the end of the Pliocene and the beginning of Pleistocene glaciation and deglaciation cycles Ginkgo biloba went extinct all over the world, and only few populations remained in China in relict areas serving as sanctuary for Tertiary relict trees. Yet the status of these regions as refuge areas with naturally existing populations has been proven not earlier than one decade ago. Herein we elaborated the hypothesis that during the Pleistocene cooling periods G. biloba expanded its distribution range in China repeatedly. Whole plastid genomes were sequenced, assembled and annotated, and sequence data was analyzed in a phylogenetic framework of the entire gymnosperms to establish a robust spatio-temporal framework for gymnosperms and in particular for G. biloba Pleistocene evolutionary history. Using a phylogenetic approach, we identified that Ginkgoatae stem group age is about 325 million years, whereas crown group radiation of extant Ginkgo started not earlier than 390,000 years ago. During repeated warming phases, Gingko populations were separated and isolated by contraction of distribution range and retreated into mountainous regions serving as refuge for warm-temperate deciduous forests. Diversification and phylogenetic splits correlate with the onset of cooling phases when Ginkgo expanded its distribution range and gene pools merged. Analysis of whole plastid genome sequence data representing the entire spatio-temporal genetic variation of wild extant Ginkgo populations revealed the deepest temporal footprint dating back to approximately 390,000 years ago. Present-day directional West-East admixture of genetic diversity is shown to be the result of pronounced effects of the last cooling period. Our evolutionary framework will serve as a conceptual roadmap for forthcoming genomic sequence data, which can then provide deep insights into the demographic history of Ginkgo.
Are algal genes in nonphotosynthetic protists evidence of historical plastid endosymbioses?
Stiller, John W; Huang, Jinling; Ding, Qin; Tian, Jing; Goodwillie, Carol
2009-10-20
How photosynthetic organelles, or plastids, were acquired by diverse eukaryotes is among the most hotly debated topics in broad scale eukaryotic evolution. The history of plastid endosymbioses commonly is interpreted under the "chromalveolate" hypothesis, which requires numerous plastid losses from certain heterotrophic groups that now are entirely aplastidic. In this context, discoveries of putatively algal genes in plastid-lacking protists have been cited as evidence of gene transfer from a photosynthetic endosymbiont that subsequently was lost completely. Here we examine this evidence, as it pertains to the chromalveolate hypothesis, through genome-level statistical analyses of similarity scores from queries with two diatoms, Phaeodactylum tricornutum and Thalassiosira pseudonana, and two aplastidic sister taxa, Phytophthora ramorum and P. sojae. Contingency tests of specific predictions of the chromalveolate model find no evidence for an unusual red algal contribution to Phytophthora genomes, nor that putative cyanobacterial sequences that are present entered these genomes through a red algal endosymbiosis. Examination of genes unrelated to plastid function provide extraordinarily significant support for both of these predictions in diatoms, the control group where a red endosymbiosis is known to have occurred, but none of that support is present in genes specifically conserved between diatoms and oomycetes. In addition, we uncovered a strong association between overall sequence similarities among taxa and relative sizes of genomic data sets in numbers of genes. Signal from "algal" genes in oomycete genomes is inconsistent with the chromalveolate hypothesis, and better explained by alternative models of sequence and genome evolution. Combined with the numerous sources of intragenomic phylogenetic conflict characterized previously, our results underscore the potential to be mislead by a posteriori interpretations of variable phylogenetic signals contained in complex genome-level data. They argue strongly for explicit testing of the different a priori assumptions inherent in competing evolutionary hypotheses.
A phylogenetic blueprint for a modern whale.
Gatesy, John; Geisler, Jonathan H; Chang, Joseph; Buell, Carl; Berta, Annalisa; Meredith, Robert W; Springer, Mark S; McGowen, Michael R
2013-02-01
The emergence of Cetacea in the Paleogene represents one of the most profound macroevolutionary transitions within Mammalia. The move from a terrestrial habitat to a committed aquatic lifestyle engendered wholesale changes in anatomy, physiology, and behavior. The results of this remarkable transformation are extant whales that include the largest, biggest brained, fastest swimming, loudest, deepest diving mammals, some of which can detect prey with a sophisticated echolocation system (Odontoceti - toothed whales), and others that batch feed using racks of baleen (Mysticeti - baleen whales). A broad-scale reconstruction of the evolutionary remodeling that culminated in extant cetaceans has not yet been based on integration of genomic and paleontological information. Here, we first place Cetacea relative to extant mammalian diversity, and assess the distribution of support among molecular datasets for relationships within Artiodactyla (even-toed ungulates, including Cetacea). We then merge trees derived from three large concatenations of molecular and fossil data to yield a composite hypothesis that encompasses many critical events in the evolutionary history of Cetacea. By combining diverse evidence, we infer a phylogenetic blueprint that outlines the stepwise evolutionary development of modern whales. This hypothesis represents a starting point for more detailed, comprehensive phylogenetic reconstructions in the future, and also highlights the synergistic interaction between modern (genomic) and traditional (morphological+paleontological) approaches that ultimately must be exploited to provide a rich understanding of evolutionary history across the entire tree of Life. Copyright © 2012 Elsevier Inc. All rights reserved.
Transcontinental Phylogeography of the Daphnia pulex Species Complex
Costanzo, Katie S.; Taylor, Derek J.
2012-01-01
Daphnia pulex is quickly becoming an attractive model species in the field of ecological genomics due to the recent release of its complete genome sequence, a wide variety of new genomic resources, and a rich history of ecological data. Sequences of the mitochondrial NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit 1 genes were used to assess the global phylogeography of this species, and to further elucidate its phylogenetic relationship to other members of the Daphnia pulex species complex. Using both newly acquired and previously published data, we analyzed 398 individuals from collections spanning five continents. Eleven strongly supported lineages were found within the D. pulex complex, and one lineage in particular, panarctic D. pulex, has very little phylogeographical structure and a near worldwide distribution. Mismatch distribution, haplotype network, and population genetic analyses are compatible with a North American origin for this lineage and subsequent spatial expansion in the Late Pleistocene. In addition, our analyses suggest that dispersal between North and South America of this and other species in the D. pulex complex has occurred multiple times, and is predominantly from north to south. Our results provide additional support for the evolutionary relationships of the eleven main mitochondrial lineages of the D. pulex complex. We found that the well-studied panarctic D. pulex is present on every continent except Australia and Antarctica. Despite being geographically very widespread, there is a lack of strong regionalism in the mitochondrial genomes of panarctic D. pulex – a pattern that differs from that of most studied cladocerans. Moreover, our analyses suggest recent expansion of the panarctic D. pulex lineage, with some continents sharing haplotypes. The hypothesis that hybrid asexuality has contributed to the recent and unusual geographic success of the panarctic D. pulex lineage warrants further study. PMID:23056371
Transcontinental phylogeography of the Daphnia pulex species complex.
Crease, Teresa J; Omilian, Angela R; Costanzo, Katie S; Taylor, Derek J
2012-01-01
Daphnia pulex is quickly becoming an attractive model species in the field of ecological genomics due to the recent release of its complete genome sequence, a wide variety of new genomic resources, and a rich history of ecological data. Sequences of the mitochondrial NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit 1 genes were used to assess the global phylogeography of this species, and to further elucidate its phylogenetic relationship to other members of the Daphnia pulex species complex. Using both newly acquired and previously published data, we analyzed 398 individuals from collections spanning five continents. Eleven strongly supported lineages were found within the D. pulex complex, and one lineage in particular, panarctic D. pulex, has very little phylogeographical structure and a near worldwide distribution. Mismatch distribution, haplotype network, and population genetic analyses are compatible with a North American origin for this lineage and subsequent spatial expansion in the Late Pleistocene. In addition, our analyses suggest that dispersal between North and South America of this and other species in the D. pulex complex has occurred multiple times, and is predominantly from north to south. Our results provide additional support for the evolutionary relationships of the eleven main mitochondrial lineages of the D. pulex complex. We found that the well-studied panarctic D. pulex is present on every continent except Australia and Antarctica. Despite being geographically very widespread, there is a lack of strong regionalism in the mitochondrial genomes of panarctic D. pulex--a pattern that differs from that of most studied cladocerans. Moreover, our analyses suggest recent expansion of the panarctic D. pulex lineage, with some continents sharing haplotypes. The hypothesis that hybrid asexuality has contributed to the recent and unusual geographic success of the panarctic D. pulex lineage warrants further study.
Armijos Jaramillo, Vinicio Danilo; Vargas, Walter Alberto; Sukno, Serenella Ana; Thon, Michael R.
2013-01-01
The genus Colletotrichum contains a large number of phytopathogenic fungi that produce enormous economic losses around the world. The effect of horizontal gene transfer (HGT) has not been studied yet in these organisms. Inter-Kingdom HGT into fungal genomes has been reported in the past but knowledge about the HGT between plants and fungi is particularly limited. We describe a gene in the genome of several species of the genus Colletotrichum with a strong resemblance to subtilisins typically found in plant genomes. Subtilisins are an important group of serine proteases, widely distributed in all of the kingdoms of life. Our hypothesis is that the gene was acquired by Colletotrichum spp. through (HGT) from plants to a Colletotrichum ancestor. We provide evidence to support this hypothesis in the form of phylogenetic analyses as well as a characterization of the similarity of the subtilisin at the primary, secondary and tertiary structural levels. The remarkable level of structural conservation of Colletotrichum plant-like subtilisin (CPLS) with plant subtilisins and the differences with the rest of Colletotrichum subtilisins suggests the possibility of molecular mimicry. Our phylogenetic analysis indicates that the HGT event would have occurred approximately 150–155 million years ago, after the divergence of the Colletotrichum lineage from other fungi. Gene expression analysis shows that the gene is modulated during the infection of maize by C. graminicola suggesting that it has a role in plant disease. Furthermore, the upregulation of the CPLS coincides with the downregulation of several plant genes encoding subtilisins. Based on the known roles of subtilisins in plant pathogenic fungi and the gene expression pattern that we observed, we postulate that the CPLSs have an important role in plant infection. PMID:23554975
Armijos Jaramillo, Vinicio Danilo; Vargas, Walter Alberto; Sukno, Serenella Ana; Thon, Michael R
2013-01-01
The genus Colletotrichum contains a large number of phytopathogenic fungi that produce enormous economic losses around the world. The effect of horizontal gene transfer (HGT) has not been studied yet in these organisms. Inter-Kingdom HGT into fungal genomes has been reported in the past but knowledge about the HGT between plants and fungi is particularly limited. We describe a gene in the genome of several species of the genus Colletotrichum with a strong resemblance to subtilisins typically found in plant genomes. Subtilisins are an important group of serine proteases, widely distributed in all of the kingdoms of life. Our hypothesis is that the gene was acquired by Colletotrichum spp. through (HGT) from plants to a Colletotrichum ancestor. We provide evidence to support this hypothesis in the form of phylogenetic analyses as well as a characterization of the similarity of the subtilisin at the primary, secondary and tertiary structural levels. The remarkable level of structural conservation of Colletotrichum plant-like subtilisin (CPLS) with plant subtilisins and the differences with the rest of Colletotrichum subtilisins suggests the possibility of molecular mimicry. Our phylogenetic analysis indicates that the HGT event would have occurred approximately 150-155 million years ago, after the divergence of the Colletotrichum lineage from other fungi. Gene expression analysis shows that the gene is modulated during the infection of maize by C. graminicola suggesting that it has a role in plant disease. Furthermore, the upregulation of the CPLS coincides with the downregulation of several plant genes encoding subtilisins. Based on the known roles of subtilisins in plant pathogenic fungi and the gene expression pattern that we observed, we postulate that the CPLSs have an important role in plant infection.
Genomic Imprinting and the Expression of Affect in Angelman Syndrome: What's in the Smile?
ERIC Educational Resources Information Center
Oliver, Chris; Horsler, Kate; Berg, Katy; Bellamy, Gail; Dick, Katie; Griffiths, Emily
2007-01-01
Background: Kinship theory (or the genomic conflict hypothesis) proposes that the phenotypic effects of genomic imprinting arise from conflict between paternally and maternally inherited alleles. A prediction arising for social behaviour from this theory is that imbalance in this conflict resulting from a deletion of a maternally imprinted gene,…
GIGGLE: a search engine for large-scale integrated genome analysis.
Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R
2018-02-01
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
GIGGLE: a search engine for large-scale integrated genome analysis
Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R
2018-01-01
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061
Ajmal, Wajya; Khan, Hiba; Abbasi, Amir Ali
2014-12-01
Understanding the genetic mechanisms underlying the organismal complexity and origin of novelties during vertebrate history is one of the central goals of evolutionary biology. Ohno (1970) was the first to postulate that whole genome duplications (WGD) have played a vital role in the evolution of new gene functions: permitting an increase in morphological, physiological and anatomical complexity during early vertebrate history. Here, we analyze the evolutionary history of human FGFR-bearing paralogon (human autosome 4/5/8/10) by the phylogenetic analysis of multigene families with triplicate and quadruplicate distribution on these chromosomes. Our results categorized the histories of 21 families into discrete co-duplicated groups. Genes of a particular co-duplicated group exhibit identical evolutionary history and have duplicated in concert with each other, whereas genes belonging to different groups have dissimilar histories and have not duplicated concurrently. Taken together with our previously published data, we submit that there is sufficient empirical evidence to disprove the 1R/2R hypothesis and to support the general prediction that vertebrate genome evolved by relatively small-scale, regional duplication events that spread across the history of life. Copyright © 2014 Elsevier Inc. All rights reserved.
Evolution and the complexity of bacteriophages.
Serwer, Philip
2007-03-13
The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.
Gompert, Zachariah; Lucas, Lauren K; Nice, Chris C; Fordyce, James A; Forister, Matthew L; Buerkle, C Alex
2012-07-01
Speciation is the process by which reproductively isolated lineages arise, and is one of the fundamental means by which the diversity of life increases. Whereas numerous studies have documented an association between ecological divergence and reproductive isolation, relatively little is known about the role of natural selection in genome divergence during the process of speciation. Here, we use genome-wide DNA sequences and Bayesian models to test the hypothesis that loci under divergent selection between two butterfly species (Lycaeides idas and L. melissa) also affect fitness in an admixed population. Locus-specific measures of genetic differentiation between L. idas and L. melissa and genomic introgression in hybrids varied across the genome. The most differentiated genetic regions were characterized by elevated L. idas ancestry in the admixed population, which occurs in L. idas-like habitat, consistent with the hypothesis that local adaptation contributes to speciation. Moreover, locus-specific measures of genetic differentiation (a metric of divergent selection) were positively associated with extreme genomic introgression (a metric of hybrid fitness). Interestingly, concordance of differentiation and introgression was only partial. We discuss multiple, complementary explanations for this partial concordance. © 2012 The Author(s).
Karro, J E; Peifer, M; Hardison, R C; Kollmann, M; von Grünberg, H H
2008-02-01
The distribution of guanine and cytosine nucleotides throughout a genome, or the GC content, is associated with numerous features in mammals; understanding the pattern and evolutionary history of GC content is crucial to our efforts to annotate the genome. The local GC content is decaying toward an equilibrium point, but the causes and rates of this decay, as well as the value of the equilibrium point, remain topics of debate. By comparing the results of 2 methods for estimating local substitution rates, we identify 620 Mb of the human genome in which the rates of the various types of nucleotide substitutions are the same on both strands. These strand-symmetric regions show an exponential decay of local GC content at a pace determined by local substitution rates. DNA segments subjected to higher rates experience disproportionately accelerated decay and are AT rich, whereas segments subjected to lower rates decay more slowly and are GC rich. Although we are unable to draw any conclusions about causal factors, the results support the hypothesis proposed by Khelifi A, Meunier J, Duret L, and Mouchiroud D (2006. GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol. 62:745-752.) that the isochore structure has been reshaped over time. If rate variation were a determining factor, then the current isochore structure of mammalian genomes could result from the local differences in substitution rates. We predict that under current conditions strand-symmetric portions of the human genome will stabilize at an average GC content of 30% (considerably less than the current 42%), thus confirming that the human genome has not yet reached equilibrium.
de Souza, Flávio S.J.; Franchini, Lucía F.; Rubinstein, Marcelo
2013-01-01
Transposable elements (TEs) are mobile genetic sequences that can jump around the genome from one location to another, behaving as genomic parasites. TEs have been particularly effective in colonizing mammalian genomes, and such heavy TE load is expected to have conditioned genome evolution. Indeed, studies conducted both at the gene and genome levels have uncovered TE insertions that seem to have been co-opted—or exapted—by providing transcription factor binding sites (TFBSs) that serve as promoters and enhancers, leading to the hypothesis that TE exaptation is a major factor in the evolution of gene regulation. Here, we critically review the evidence for exaptation of TE-derived sequences as TFBSs, promoters, enhancers, and silencers/insulators both at the gene and genome levels. We classify the functional impact attributed to TE insertions into four categories of increasing complexity and argue that so far very few studies have conclusively demonstrated exaptation of TEs as transcriptional regulatory regions. We also contend that many genome-wide studies dealing with TE exaptation in recent lineages of mammals are still inconclusive and that the hypothesis of rapid transcriptional regulatory rewiring mediated by TE mobilization must be taken with caution. Finally, we suggest experimental approaches that may help attributing higher-order functions to candidate exapted TEs. PMID:23486611
PISMA: A Visual Representation of Motif Distribution in DNA Sequences.
Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina
2017-01-01
Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.
PISMA: A Visual Representation of Motif Distribution in DNA Sequences
Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina
2017-01-01
Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418
Genome Surfing As Driver of Microbial Genomic Diversity.
Choudoir, Mallory J; Panke-Buisse, Kevin; Andam, Cheryl P; Buckley, Daniel H
2017-08-01
Historical changes in population size, such as those caused by demographic range expansions, can produce nonadaptive changes in genomic diversity through mechanisms such as gene surfing. We propose that demographic range expansion of a microbial population capable of horizontal gene exchange can result in genome surfing, a mechanism that can cause widespread increase in the pan-genome frequency of genes acquired by horizontal gene exchange. We explain that patterns of genetic diversity within Streptomyces are consistent with genome surfing, and we describe several predictions for testing this hypothesis both in Streptomyces and in other microorganisms. Copyright © 2017 Elsevier Ltd. All rights reserved.
Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon
Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian
2014-01-01
Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567
Neutral aggregation in finite-length genotype space
NASA Astrophysics Data System (ADS)
Houchmandzadeh, Bahram
2017-01-01
The advent of modern genome sequencing techniques allows for a more stringent test of the neutrality hypothesis of Darwinian evolution, where all individuals have the same fitness. Using the individual-based model of Wright and Fisher, we compute the amplitude of neutral aggregation in the genome space, i.e., the probability of finding two individuals at genetic (Hamming) distance k as a function of the genome size L , population size N , and mutation probability per base ν . In well-mixed populations, we show that for N ν <1 /L , neutral aggregation is the dominant force and most individuals are found at short genetic distances from each other. For N ν >1 , on the contrary, individuals are randomly dispersed in genome space. The results are extended to a geographically dispersed population, where the controlling parameter is shown to be a combination of mutation and migration probability. The theory we develop can be used to test the neutrality hypothesis in various ecological and evolutionary systems.
Telles, Guilherme P; Araújo, Graziela S; Walter, Maria E M T; Brigido, Marcelo M; Almeida, Nalvo F
2018-05-16
In phylogenetic reconstruction the result is a tree where all taxa are leaves and internal nodes are hypothetical ancestors. In a live phylogeny, both ancestral and living taxa may coexist, leading to a tree where internal nodes may be living taxa. The well-known Neighbor-Joining heuristic is largely used for phylogenetic reconstruction. We present Live Neighbor-Joining, a heuristic for building a live phylogeny. We have investigated Live Neighbor-Joining on datasets of viral genomes, a plausible scenario for its application, which allowed the construction of alternative hypothesis for the relationships among virus that embrace both ancestral and descending taxa. We also applied Live Neighbor-Joining on a set of bacterial genomes and to sets of images and texts. Non-biological data may be better explored visually when their relationship in terms of content similarity is represented by means of a phylogeny. Our experiments have shown interesting alternative phylogenetic hypothesis for RNA virus genomes, bacterial genomes and alternative relationships among images and texts, illustrating a wide range of scenarios where Live Neighbor-Joining may be used.
Ancient genomic changes associated with domestication of the horse.
Librado, Pablo; Gamba, Cristina; Gaunitz, Charleen; Der Sarkissian, Clio; Pruvost, Mélanie; Albrechtsen, Anders; Fages, Antoine; Khan, Naveed; Schubert, Mikkel; Jagannathan, Vidhya; Serres-Armero, Aitor; Kuderna, Lukas F K; Povolotskaya, Inna S; Seguin-Orlando, Andaine; Lepetz, Sébastien; Neuditschko, Markus; Thèves, Catherine; Alquraishi, Saleh; Alfarhan, Ahmed H; Al-Rasheid, Khaled; Rieder, Stefan; Samashev, Zainolla; Francfort, Henri-Paul; Benecke, Norbert; Hofreiter, Michael; Ludwig, Arne; Keyser, Christine; Marques-Bonet, Tomas; Ludes, Bertrand; Crubézy, Eric; Leeb, Tosso; Willerslev, Eske; Orlando, Ludovic
2017-04-28
The genomic changes underlying both early and late stages of horse domestication remain largely unknown. We examined the genomes of 14 early domestic horses from the Bronze and Iron Ages, dating to between ~4.1 and 2.3 thousand years before present. We find early domestication selection patterns supporting the neural crest hypothesis, which provides a unified developmental origin for common domestic traits. Within the past 2.3 thousand years, horses lost genetic diversity and archaic DNA tracts introgressed from a now-extinct lineage. They accumulated deleterious mutations later than expected under the cost-of-domestication hypothesis, probably because of breeding from limited numbers of stallions. We also reveal that Iron Age Scythian steppe nomads implemented breeding strategies involving no detectable inbreeding and selection for coat-color variation and robust forelimbs. Copyright © 2017, American Association for the Advancement of Science.
Improved Statistical Methods Enable Greater Sensitivity in Rhythm Detection for Genome-Wide Data
Hutchison, Alan L.; Maienschein-Cline, Mark; Chiang, Andrew H.; Tabei, S. M. Ali; Gudjonson, Herman; Bahroos, Neil; Allada, Ravi; Dinner, Aaron R.
2015-01-01
Robust methods for identifying patterns of expression in genome-wide data are important for generating hypotheses regarding gene function. To this end, several analytic methods have been developed for detecting periodic patterns. We improve one such method, JTK_CYCLE, by explicitly calculating the null distribution such that it accounts for multiple hypothesis testing and by including non-sinusoidal reference waveforms. We term this method empirical JTK_CYCLE with asymmetry search, and we compare its performance to JTK_CYCLE with Bonferroni and Benjamini-Hochberg multiple hypothesis testing correction, as well as to five other methods: cyclohedron test, address reduction, stable persistence, ANOVA, and F24. We find that ANOVA, F24, and JTK_CYCLE consistently outperform the other three methods when data are limited and noisy; empirical JTK_CYCLE with asymmetry search gives the greatest sensitivity while controlling for the false discovery rate. Our analysis also provides insight into experimental design and we find that, for a fixed number of samples, better sensitivity and specificity are achieved with higher numbers of replicates than with higher sampling density. Application of the methods to detecting circadian rhythms in a metadataset of microarrays that quantify time-dependent gene expression in whole heads of Drosophila melanogaster reveals annotations that are enriched among genes with highly asymmetric waveforms. These include a wide range of oxidation reduction and metabolic genes, as well as genes with transcripts that have multiple splice forms. PMID:25793520
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhengqiu, C.; Penaflor, C.; Kuehl, J.V.
2006-06-01
The magnoliids represent the largest basal angiosperm clade with four orders, 19 families and 8,500 species. Although several recent angiosperm molecular phylogenies have supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence has resulted in phylogenies supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. This is one of the most important remaining issues concerning relationships among basal angiosperms. We sequenced the chloroplast genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales),more » and Piper (Piperales), and used these data in combination with 32 other completed angiosperm chloroplast genomes to assess phylogenetic relationships among magnoliids. The Drimys and Piper chloroplast genomes are nearly identical in size at 160,606 and 160,624 bp, respectively. The genomes include a pair of inverted repeats of 26,649 bp (Drimys) and 27,039 (Piper), separated by a small single copy region of 18,621 (Drimys) and 18,878 (Piper) and a large single copy region of 88,685 bp (Drimys) and 87,666 bp (Piper). The gene order of both taxa is nearly identical to many other unrearranged angiosperm chloroplast genomes, including Calycanthus, the other published magnoliid genome. Comparisons of angiosperm chloroplast genomes indicate that GC content is not uniformly distributed across the genome. Overall GC content ranges from 34-39%, and coding regions have a substantially higher GC content than non-coding regions (both intergenic spacers and introns). Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Across the genome, GC content is highest in the inverted repeat due to the presence of rRNA genes and lowest in the small single copy region where most NADH genes are located. Phylogenetic analyses using maximum parsimony and maximum likelihood methods were performed on DNA sequences of 61 protein-coding genes. Trees from both analyses provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. The phylogenies also provided moderate to strong support for the basal position of Amborella, and a sister relationship of magnoliids to a clade that includes monocots and eudicots. The complete sequences of three magnoliid chloroplast genomes provide new data from the largest basal angiosperm clade. Evolutionary comparisons of these new genome sequences, combined with other published angiosperm genome, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.« less
Šmarda, Petr; Bureš, Petr; Horová, Lucie
2007-01-01
Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968
Díaz-Jaimes, Píndaro; Bayona-Vásquez, Natalia J.; Adams, Douglas H.; Uribe-Alcocer, Manuel
2015-01-01
Elasmobranchs are one of the most diverse groups in the marine realm represented by 18 orders, 55 families and about 1200 species reported, but also one of the most vulnerable to exploitation and to climate change. Phylogenetic relationships among main orders have been controversial since the emergence of the Hypnosqualean hypothesis by Shirai (1992) that considered batoids as a sister group of sharks. The use of the complete mitochondrial DNA (mtDNA) may shed light to further validate this hypothesis by increasing the number of informative characters. We report the mtDNA genome of the bonnethead shark Sphyrna tiburo, and compare it with mitogenomes of other 48 species to assess phylogenetic relationships. The mtDNA genome of S. tiburo, is quite similar in size to that of congeneric species but also similar to the reported mtDNA genome of other Carcharhinidae species. Like most vertebrate mitochondrial genomes, it contained 13 protein coding genes, two rRNA genes and 22 tRNA genes and the control region of 1086 bp (D-loop). The Bayesian analysis of the 49 mitogenomes supported the view that sharks and batoids are separate groups. PMID:27014583
Jabaily, Rachel S; Shepherd, Kelly A; Michener, Pryce S; Bush, Caroline J; Rivero, Rodrigo; Gardner, Andrew G; Sessa, Emily B
2018-05-15
Goodeniaceae is a primarily Australian flowering plant family with a complex taxonomy and evolutionary history. Previous phylogenetic analyses have successfully resolved the backbone topology of the largest clade in the family, Goodenia s.l., but have failed to clarify relationships within the species-rich and enigmatic Goodenia clade C, a prerequisite for taxonomic revision of the group. We used genome skimming to retrieve sequences for chloroplast, mitochondrial, and nuclear markers for 24 taxa representing Goodenia s.l., with a particular focus on Goodenia clade C. We performed extensive hypothesis tests to explore incongruence in clade C and evaluate statistical support for clades within this group, using datasets from all three genomic compartments. The mitochondrial dataset is comparable to the chloroplast dataset in providing resolution within Goodenia clade C, though backbone support values within this clade remain low. The hypothesis tests provided an additional, complementary means of evaluating support for clades. We propose that the major subclades of Goodenia clade C (C1-C3 + Verreauxia) are the result of a rapid radiation, and each represents a distinct lineage. Copyright © 2018. Published by Elsevier Inc.
Direct-to-Consumer Racial Admixture Tests and Beliefs About Essential Racial Differences
Phelan, Jo C.; Link, Bruce G.; Zelner, Sarah; Yang, Lawrence H.
2015-01-01
Although at first relatively disinterested in race, modern genomic research has increasingly turned attention to racial variations. We examine a prominent example of this focus—direct-to-consumer racial admixture tests—and ask how information about the methods and results of these tests in news media may affect beliefs in racial differences. The reification hypothesis proposes that by emphasizing a genetic basis for race, thereby reifying race as a biological reality, the tests increase beliefs that whites and blacks are essentially different. The challenge hypothesis suggests that by describing differences between racial groups as continua rather than sharp demarcations, the results produced by admixture tests break down racial categories and reduce beliefs in racial differences. A nationally representative survey experiment (N = 526) provided clear support for the reification hypothesis. The results suggest that an unintended consequence of the genomic revolution may be to reinvigorate age-old beliefs in essential racial differences. PMID:25870464
Hologenomics: Systems-Level Host Biology.
Theis, Kevin R
2018-01-01
The hologenome concept of evolution is a hypothesis explaining host evolution in the context of the host microbiomes. As a hypothesis, it needs to be evaluated, especially with respect to the extent of fidelity of transgenerational coassociation of host and microbial lineages and the relative fitness consequences of repeated associations within natural holobiont populations. Behavioral ecologists are in a prime position to test these predictions because they typically focus on animal phenotypes that are quantifiable, conduct studies over multiple generations within natural animal populations, and collect metadata on genetic relatedness and relative reproductive success within these populations. Regardless of the conclusion on the hologenome concept as an evolutionary hypothesis, a hologenomic perspective has applied value as a systems-level framework for host biology, including in medicine. Specifically, it emphasizes investigating the multivarious and dynamic interactions between patient genomes and the genomes of their diverse microbiota when attempting to elucidate etiologies of complex, noninfectious diseases.
The evolution of WRKY transcription factors.
Rinerson, Charles I; Rabara, Roel C; Tripathi, Prateek; Shen, Qingxi J; Rushton, Paul J
2015-02-27
The availability of increasing numbers of sequenced genomes has necessitated a re-evaluation of the evolution of the WRKY transcription factor family. Modern day plants descended from a charophyte green alga that colonized the land between 430 and 470 million years ago. The first charophyte genome sequence from Klebsormidium flaccidum filled a gap in the available genome sequences in the plant kingdom between unicellular green algae that typically have 1-3 WRKY genes and mosses that contain 30-40. WRKY genes have been previously found in non-plant species but their occurrence has been difficult to explain. Only two WRKY genes are present in the Klebsormidium flaccidum genome and the presence of a Group IIb gene was unexpected because it had previously been thought that Group IIb WRKY genes first appeared in mosses. We found WRKY transcription factor genes outside of the plant lineage in some diplomonads, social amoebae, fungi incertae sedis, and amoebozoa. This patchy distribution suggests that lateral gene transfer is responsible. These lateral gene transfer events appear to pre-date the formation of the WRKY groups in flowering plants. Flowering plants contain proteins with domains typical for both resistance (R) proteins and WRKY transcription factors. R protein-WRKY genes have evolved numerous times in flowering plants, each type being restricted to specific flowering plant lineages. These chimeric proteins contain not only novel combinations of protein domains but also novel combinations and numbers of WRKY domains. Once formed, R protein WRKY genes may combine different components of signalling pathways that may either create new diversity in signalling or accelerate signalling by short circuiting signalling pathways. We propose that the evolution of WRKY transcription factors includes early lateral gene transfers to non-plant organisms and the occurrence of algal WRKY genes that have no counterparts in flowering plants. We propose two alternative hypotheses of WRKY gene evolution: The "Group I Hypothesis" sees all WRKY genes evolving from Group I C-terminal WRKY domains. The alternative "IIa + b Separate Hypothesis" sees Groups IIa and IIb evolving directly from a single domain algal gene separate from the Group I-derived lineage.
Worldwide patterns of genomic variation and admixture in gray wolves
Fan, Zhenxin; Silva, Pedro; Gronau, Ilan; Wang, Shuoguo; Armero, Aitor Serres; Schweizer, Rena M.; Ramirez, Oscar; Pollinger, John; Galaverni, Marco; Ortega Del-Vecchyo, Diego; Du, Lianming; Zhang, Wenping; Zhang, Zhihe; Xing, Jinchuan; Vilà, Carles; Marques-Bonet, Tomas; Godinho, Raquel; Yue, Bisong; Wayne, Robert K.
2016-01-01
The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog was domesticated in the Old World. However, no single wolf population is more closely related to dogs, supporting the hypothesis that dogs were derived from an extinct wolf population. All extant wolves have a surprisingly recent common ancestry and experienced a dramatic population decline beginning at least ∼30 thousand years ago (kya). We suggest this crisis was related to the colonization of Eurasia by modern human hunter–gatherers, who competed with wolves for limited prey but also domesticated them, leading to a compensatory population expansion of dogs. We found extensive admixture between dogs and wolves, with up to 25% of Eurasian wolf genomes showing signs of dog ancestry. Dogs have influenced the recent history of wolves through admixture and vice versa, potentially enhancing adaptation. Simple scenarios of dog domestication are confounded by admixture, and studies that do not take admixture into account with specific demographic models are problematic. PMID:26680994
Nisa-Martínez, Rafael; Laporte, Philippe; Jiménez-Zurdo, José Ignacio; Frugier, Florian; Crespi, Martin; Toro, Nicolás
2013-01-01
Some bacterial group II introns are widely used for genetic engineering in bacteria, because they can be reprogrammed to insert into the desired DNA target sites. There is considerable interest in developing this group II intron gene targeting technology for use in eukaryotes, but nuclear genomes present several obstacles to the use of this approach. The nuclear genomes of eukaryotes do not contain group II introns, but these introns are thought to have been the progenitors of nuclear spliceosomal introns. We investigated the expression and subcellular localization of the bacterial RmInt1 group II intron-encoded protein (IEP) in Arabidopsis thaliana protoplasts. Following the expression of translational fusions of the wild-type protein and several mutant variants with EGFP, the full-length IEP was found exclusively in the nucleolus, whereas the maturase domain alone targeted EGFP to nuclear speckles. The distribution of the bacterial RmInt1 IEP in plant cell protoplasts suggests that the compartmentalization of eukaryotic cells into nucleus and cytoplasm does not prevent group II introns from invading the host genome. Furthermore, the trafficking of the IEP between the nucleolus and the speckles upon maturase inactivation is consistent with the hypothesis that the spliceosomal machinery evolved from group II introns.
Nisa-Martínez, Rafael; Laporte, Philippe; Jiménez-Zurdo, José Ignacio; Frugier, Florian; Crespi, Martin; Toro, Nicolás
2013-01-01
Some bacterial group II introns are widely used for genetic engineering in bacteria, because they can be reprogrammed to insert into the desired DNA target sites. There is considerable interest in developing this group II intron gene targeting technology for use in eukaryotes, but nuclear genomes present several obstacles to the use of this approach. The nuclear genomes of eukaryotes do not contain group II introns, but these introns are thought to have been the progenitors of nuclear spliceosomal introns. We investigated the expression and subcellular localization of the bacterial RmInt1 group II intron-encoded protein (IEP) in Arabidopsis thaliana protoplasts. Following the expression of translational fusions of the wild-type protein and several mutant variants with EGFP, the full-length IEP was found exclusively in the nucleolus, whereas the maturase domain alone targeted EGFP to nuclear speckles. The distribution of the bacterial RmInt1 IEP in plant cell protoplasts suggests that the compartmentalization of eukaryotic cells into nucleus and cytoplasm does not prevent group II introns from invading the host genome. Furthermore, the trafficking of the IEP between the nucleolus and the speckles upon maturase inactivation is consistent with the hypothesis that the spliceosomal machinery evolved from group II introns. PMID:24391881
Joost, Stéphane; Vuilleumier, Séverine; Jensen, Jeffrey D; Schoville, Sean; Leempoel, Kevin; Stucki, Sylvie; Widmer, Ivo; Melodelima, Christelle; Rolland, Jonathan; Manel, Stéphanie
2013-07-01
A workshop recently held at the École Polytechnique Fédérale de Lausanne (EPFL, Switzerland) was dedicated to understanding the genetic basis of adaptive change, taking stock of the different approaches developed in theoretical population genetics and landscape genomics and bringing together knowledge accumulated in both research fields. Indeed, an important challenge in theoretical population genetics is to incorporate effects of demographic history and population structure. But important design problems (e.g. focus on populations as units, focus on hard selective sweeps, no hypothesis-based framework in the design of the statistical tests) reduce their capability of detecting adaptive genetic variation. In parallel, landscape genomics offers a solution to several of these problems and provides a number of advantages (e.g. fast computation, landscape heterogeneity integration). But the approach makes several implicit assumptions that should be carefully considered (e.g. selection has had enough time to create a functional relationship between the allele distribution and the environmental variable, or this functional relationship is assumed to be constant). To address the respective strengths and weaknesses mentioned above, the workshop brought together a panel of experts from both disciplines to present their work and discuss the relevance of combining these approaches, possibly resulting in a joint software solution in the future.
Williams, L. Keoki; Buu, Anne
2017-01-01
We propose a multivariate genome-wide association test for mixed continuous, binary, and ordinal phenotypes. A latent response model is used to estimate the correlation between phenotypes with different measurement scales so that the empirical distribution of the Fisher’s combination statistic under the null hypothesis is estimated efficiently. The simulation study shows that our proposed correlation estimation methods have high levels of accuracy. More importantly, our approach conservatively estimates the variance of the test statistic so that the type I error rate is controlled. The simulation also shows that the proposed test maintains the power at the level very close to that of the ideal analysis based on known latent phenotypes while controlling the type I error. In contrast, conventional approaches–dichotomizing all observed phenotypes or treating them as continuous variables–could either reduce the power or employ a linear regression model unfit for the data. Furthermore, the statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that conducting a multivariate test on multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests. The proposed method also offers a new approach to analyzing the Fagerström Test for Nicotine Dependence as multivariate phenotypes in genome-wide association studies. PMID:28081206
AID to overcome the limitations of genomic information by introducing somatic DNA alterations.
Honjo, Tasuku; Muramatsu, Masamichi; Nagaoka, Hitoshi; Kinoshita, Kazuo; Shinkura, Reiko
2006-05-01
The immune system has adopted somatic DNA alterations to overcome the limitations of the genomic information. Activation induced cytidine deaminase (AID) is an essential enzyme to regulate class switch recombination (CSR), somatic hypermutation (SHM) and gene conversion (GC) of the immunoglobulin gene. AID is known to be required for DNA cleavage of S regions in CSR and V regions in SHM. However, its molecular mechanism is a focus of extensive debate. RNA editing hypothesis postulates that AID edits yet unknown mRNA, to generate specific endonucleases for CSR and SHM. By contrast, DNA deamination hypothesis assumes that AID deaminates cytosine in DNA, followed by DNA cleavage by base excision repair enzymes. We summarize the basic knowledge for molecular mechanisms for CSR and SHM and then discuss the importance of AID not only in the immune regulation but also in the genome instability.
Panzera, Francisco; Ferreiro, María J; Pita, Sebastián; Calleros, Lucía; Pérez, Ruben; Basmadjián, Yester; Guevara, Yenny; Brenière, Simone Frédérique; Panzera, Yanina
2014-10-01
Chagas disease, one of the most important vector-borne diseases in the Americas, is caused by Trypanosoma cruzi and transmitted to humans by insects of the subfamily Triatominae. An effective control of this disease depends on elimination of vectors through spraying with insecticides. Genetic research can help insect control programs by identifying and characterizing vector populations. In southern Latin America, Triatoma infestans is the main vector and presents two distinct lineages, known as Andean and non-Andean chromosomal groups, that are highly differentiated by the amount of heterochromatin and genome size. Analyses with nuclear and mitochondrial sequences are not conclusive about resolving the origin and spread of T. infestans. The present paper includes the analyses of karyotypes, heterochromatin distribution and chromosomal mapping of the major ribosomal cluster (45S rDNA) to specimens throughout the distribution range of this species, including pyrethroid-resistant populations. A total of 417 specimens from seven different countries were analyzed. We show an unusual wide rDNA variability related to number and chromosomal position of the ribosomal genes, never before reported in species with holocentric chromosomes. Considering the chromosomal groups previously described, the ribosomal patterns are associated with a particular geographic distribution. Our results reveal that the differentiation process between both T. infestans chromosomal groups has involved significant genomic reorganization of essential coding sequences, besides the changes in heterochromatin and genomic size previously reported. The chromosomal markers also allowed us to detect the existence of a hybrid zone occupied by individuals derived from crosses between both chromosomal groups. Our genetic studies support the hypothesis of an Andean origin for T. infestans, and suggest that pyrethroid-resistant populations from the Argentinean-Bolivian border are most likely the result of recent secondary contact between both lineages. We suggest that vector control programs should make a greater effort in the entomological surveillance of those regions with both chromosomal groups to avoid rapid emergence of resistant individuals. Copyright © 2014 Elsevier B.V. All rights reserved.
Burby, Joshua W.; Lacker, Daniel
2016-01-01
Systems as diverse as the interacting species in a community, alleles at a genetic locus, and companies in a market are characterized by competition (over resources, space, capital, etc) and adaptation. Neutral theory, built around the hypothesis that individual performance is independent of group membership, has found utility across the disciplines of ecology, population genetics, and economics, both because of the success of the neutral hypothesis in predicting system properties and because deviations from these predictions provide information about the underlying dynamics. However, most tests of neutrality are weak, based on static system properties such as species-abundance distributions or the number of singletons in a sample. Time-series data provide a window onto a system’s dynamics, and should furnish tests of the neutral hypothesis that are more powerful to detect deviations from neutrality and more informative about to the type of competitive asymmetry that drives the deviation. Here, we present a neutrality test for time-series data. We apply this test to several microbial time-series and financial time-series and find that most of these systems are not neutral. Our test isolates the covariance structure of neutral competition, thus facilitating further exploration of the nature of asymmetry in the covariance structure of competitive systems. Much like neutrality tests from population genetics that use relative abundance distributions have enabled researchers to scan entire genomes for genes under selection, we anticipate our time-series test will be useful for quick significance tests of neutrality across a range of ecological, economic, and sociological systems for which time-series data are available. Future work can use our test to categorize and compare the dynamic fingerprints of particular competitive asymmetries (frequency dependence, volatility smiles, etc) to improve forecasting and management of complex adaptive systems. PMID:27689714
Farré, Marta; Robinson, Terence J; Ruiz-Herrera, Aurora
2015-05-01
Our understanding of genomic reorganization, the mechanics of genomic transmission to offspring during germ line formation, and how these structural changes contribute to the speciation process, and genetic disease is far from complete. Earlier attempts to understand the mechanism(s) and constraints that govern genome remodeling suffered from being too narrowly focused, and failed to provide a unified and encompassing view of how genomes are organized and regulated inside cells. Here, we propose a new multidisciplinary Integrative Breakage Model for the study of genome evolution. The analysis of the high-level structural organization of genomes (nucleome), together with the functional constrains that accompany genome reshuffling, provide insights into the origin and plasticity of genome organization that may assist with the detection and isolation of therapeutic targets for the treatment of complex human disorders. © 2015 WILEY Periodicals, Inc.
USDA-ARS?s Scientific Manuscript database
Progress in studying the biology of Trichinella spp. was greatly advanced with the publication and analysis of the draft genome sequence of T. spiralis. Those data provide a basis for constructing testable hypothesis concerning parasite physiology, immunology, and genetics. They also provide tools...
Molecular Cytogenetic Analysis of Deschampsia antarctica Desv. (Poaceae), Maritime Antarctic.
Amosova, Alexandra V; Bolsheva, Nadezhda L; Samatadze, Tatiana E; Twardovska, Maryana O; Zoshchuk, Svyatoslav A; Andreev, Igor O; Badaeva, Ekaterina D; Kunakh, Viktor A; Muravenko, Olga V
2015-01-01
Deschampsia antarctica Desv. (Poaceae) (2n = 26) is one of the two vascular plants adapted to the harshest environment of the Antarctic. Although the species is a valuable model for study of environmental stress tolerance in plants, its karyotype is still poorly investigated. We firstly conducted a comprehensive molecular cytogenetic analysis of D. antarctica collected on four islands of the Maritime Antarctic. D. antarctica karyotypes were studied by Giemsa C- and DAPI/C-banding, Ag-NOR staining, multicolour fluorescence in situ hybridization with repeated DNA probes (pTa71, pTa794, telomere repeats, pSc119.2, pAs1) and the GAA simple sequence repeat probe. We also performed sequential rapid in situ hybridization with genomic DNA of D. caespitosa. Two chromosome pairs bearing transcriptionally active 45S rDNA loci and five pairs with 5S rDNA sites were detected. A weak intercalary site of telomere repeats was revealed on the largest chromosome in addition to telomere hybridization signals at terminal positions. This fact confirms indirectly the hypothesis that chromosome fusion might have been the cause of the unusual for cereals chromosome number in this species. Based on patterns of distribution of the examined molecular cytogenetic markers, all chromosomes in karyotypes were identified, and chromosome idiograms of D. antarctica were constructed. B chromosomes were found in most karyotypes of plants from Darboux Island. A mixoploid plant with mainly triploid cells bearing a Robertsonian rearrangement was detected among typical diploid specimens from Great Jalour Island. The karyotype variability found in D. antarctica is probably an expression of genome instability induced by environmental stress factors. The differences in C-banding patterns and in chromosome distribution of rDNA loci as well as homologous highly repeated DNA sequences detected between genomes of D. antarctica and its related species D. caespitosa indicate that genome reorganization involving coding and noncoding repeated DNA sequences had occurred during the divergence of these species.
Molecular Cytogenetic Analysis of Deschampsia antarctica Desv. (Poaceae), Maritime Antarctic
Amosova, Alexandra V.; Bolsheva, Nadezhda L.; Samatadze, Tatiana E.; Twardovska, Maryana O.; Zoshchuk, Svyatoslav A.; Andreev, Igor O.; Badaeva, Ekaterina D.; Kunakh, Viktor A.; Muravenko, Olga V.
2015-01-01
Deschampsia antarctica Desv. (Poaceae) (2n = 26) is one of the two vascular plants adapted to the harshest environment of the Antarctic. Although the species is a valuable model for study of environmental stress tolerance in plants, its karyotype is still poorly investigated. We firstly conducted a comprehensive molecular cytogenetic analysis of D. antarctica collected on four islands of the Maritime Antarctic. D. antarctica karyotypes were studied by Giemsa C- and DAPI/C-banding, Ag-NOR staining, multicolour fluorescence in situ hybridization with repeated DNA probes (pTa71, pTa794, telomere repeats, pSc119.2, pAs1) and the GAA simple sequence repeat probe. We also performed sequential rapid in situ hybridization with genomic DNA of D. caespitosa. Two chromosome pairs bearing transcriptionally active 45S rDNA loci and five pairs with 5S rDNA sites were detected. A weak intercalary site of telomere repeats was revealed on the largest chromosome in addition to telomere hybridization signals at terminal positions. This fact confirms indirectly the hypothesis that chromosome fusion might have been the cause of the unusual for cereals chromosome number in this species. Based on patterns of distribution of the examined molecular cytogenetic markers, all chromosomes in karyotypes were identified, and chromosome idiograms of D. antarctica were constructed. B chromosomes were found in most karyotypes of plants from Darboux Island. A mixoploid plant with mainly triploid cells bearing a Robertsonian rearrangement was detected among typical diploid specimens from Great Jalour Island. The karyotype variability found in D. antarctica is probably an expression of genome instability induced by environmental stress factors. The differences in C-banding patterns and in chromosome distribution of rDNA loci as well as homologous highly repeated DNA sequences detected between genomes of D. antarctica and its related species D. caespitosa indicate that genome reorganization involving coding and noncoding repeated DNA sequences had occurred during the divergence of these species. PMID:26394331
Acar, Elif F; Sun, Lei
2013-06-01
Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.
Ramasamy, Sukanya; Ometto, Lino; Crava, Cristina M.; Revadi, Santosh; Kaur, Rupinder; Horner, David S.; Pisani, Davide; Dekker, Teun; Anfora, Gianfranco; Rota-Stabelli, Omar
2016-01-01
How the evolution of olfactory genes correlates with adaption to new ecological niches is still a debated topic. We explored this issue in Drosophila suzukii, an emerging model that reproduces on fresh fruit rather than in fermenting substrates like most other Drosophila. We first annotated the repertoire of odorant receptors (ORs), odorant binding proteins (OBPs), and antennal ionotropic receptors (aIRs) in the genomes of two strains of D. suzukii and of its close relative Drosophila biarmipes. We then analyzed these genes on the phylogeny of 14 Drosophila species: whereas ORs and OBPs are characterized by higher turnover rates in some lineages including D. suzukii, aIRs are conserved throughout the genus. Drosophila suzukii is further characterized by a non-random distribution of OR turnover on the gene phylogeny, consistent with a change in selective pressures. In D. suzukii, we found duplications and signs of positive selection in ORs with affinity for short-chain esters, and loss of function of ORs with affinity for volatiles produced during fermentation. These receptors—Or85a and Or22a—are characterized by divergent alleles in the European and American genomes, and we hypothesize that they may have been replaced by some of the duplicated ORs in corresponding neurons, a hypothesis reciprocally confirmed by electrophysiological recordings. Our study quantifies the evolution of olfactory genes in Drosophila and reveals an array of genomic events that can be associated with the ecological adaptations of D. suzukii. PMID:27435796
Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar
2014-12-01
Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Biased Gene Conversion and GC-Content Evolution in the Coding Sequences of Reptiles and Vertebrates
Figuet, Emeric; Ballenghien, Marion; Romiguier, Jonathan; Galtier, Nicolas
2015-01-01
Mammalian and avian genomes are characterized by a substantial spatial heterogeneity of GC-content, which is often interpreted as reflecting the effect of local GC-biased gene conversion (gBGC), a meiotic repair bias that favors G and C over A and T alleles in high-recombining genomic regions. Surprisingly, the first fully sequenced nonavian sauropsid (i.e., reptile), the green anole Anolis carolinensis, revealed a highly homogeneous genomic GC-content landscape, suggesting the possibility that gBGC might not be at work in this lineage. Here, we analyze GC-content evolution at third-codon positions (GC3) in 44 vertebrates species, including eight newly sequenced transcriptomes, with a specific focus on nonavian sauropsids. We report that reptiles, including the green anole, have a genome-wide distribution of GC3 similar to that of mammals and birds, and we infer a strong GC3-heterogeneity to be already present in the tetrapod ancestor. We further show that the dynamic of coding sequence GC-content is largely governed by karyotypic features in vertebrates, notably in the green anole, in agreement with the gBGC hypothesis. The discrepancy between third-codon positions and noncoding DNA regarding GC-content dynamics in the green anole could not be explained by the activity of transposable elements or selection on codon usage. This analysis highlights the unique value of third-codon positions as an insertion/deletion-free marker of nucleotide substitution biases that ultimately affect the evolution of proteins. PMID:25527834
Testing whether Metazoan Tyrosine Loss Was Driven by Selection against Promiscuous Phosphorylation
Pandya, Siddharth; Struck, Travis J.; Mannakee, Brian K.; Paniscus, Mary; Gutenkunst, Ryan N.
2015-01-01
Protein tyrosine phosphorylation is a key regulatory modification in metazoans, and the corresponding kinase enzymes have diversified dramatically. This diversification is correlated with a genome-wide reduction in protein tyrosine content, and it was recently suggested that this reduction was driven by selection to avoid promiscuous phosphorylation that might be deleterious. We tested three predictions of this intriguing hypothesis. 1) Selection should be stronger on residues that are more likely to be phosphorylated due to local solvent accessibility or structural disorder. 2) Selection should be stronger on proteins that are more likely to be promiscuously phosphorylated because they are abundant. We tested these predictions by comparing distributions of tyrosine within and among human and yeast orthologous proteins. 3) Selection should be stronger against mutations that create tyrosine versus remove tyrosine. We tested this prediction using human population genomic variation data. We found that all three predicted effects are modest for tyrosine when compared with the other amino acids, suggesting that selection against deleterious phosphorylation was not dominant in driving metazoan tyrosine loss. PMID:25312910
2013-01-01
Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512
Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J
2013-07-12
Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.
Onozawa, Masahiro; Zhang, Zhenhua; Kim, Yoo Jung; Goldberg, Liat; Varga, Tamas; Bergsagel, P Leif; Kuehl, W Michael; Aplan, Peter D
2014-05-27
We used the I-SceI endonuclease to produce DNA double-strand breaks (DSBs) and observed that a fraction of these DSBs were repaired by insertion of sequences, which we termed "templated sequence insertions" (TSIs), derived from distant regions of the genome. These TSIs were derived from genic, retrotransposon, or telomere sequences and were not deleted from the donor site in the genome, leading to the hypothesis that they were derived from reverse-transcribed RNA. Cotransfection of RNA and an I-SceI expression vector demonstrated insertion of RNA-derived sequences at the DNA-DSB site, and TSIs were suppressed by reverse-transcriptase inhibitors. Both observations support the hypothesis that TSIs were derived from RNA templates. In addition, similar insertions were detected at sites of DNA DSBs induced by transcription activator-like effector nuclease proteins. Whole-genome sequencing of myeloma cell lines revealed additional TSIs, demonstrating that repair of DNA DSBs via insertion was not restricted to experimentally produced DNA DSBs. Analysis of publicly available databases revealed that many of these TSIs are polymorphic in the human genome. Taken together, these results indicate that insertional events should be considered as alternatives to gross chromosomal rearrangements in the interpretation of whole-genome sequence data and that this mutagenic form of DNA repair may play a role in genetic disease, exon shuffling, and mammalian evolution.
Docherty, S J; Davis, O S P; Kovas, Y; Meaburn, E L; Dale, P S; Petrill, S A; Schalkwyk, L C; Plomin, R
2010-01-01
Numeracy is as important as literacy and exhibits a similar frequency of disability. Although its etiology is relatively poorly understood, quantitative genetic research has demonstrated mathematical ability to be moderately heritable. In this first genome-wide association study (GWAS) of mathematical ability and disability, 10 out of 43 single nucleotide polymorphism (SNP) associations nominated from two high- vs. low-ability (n = 600 10-year-olds each) scans of pooled DNA were validated (P < 0.05) in an individually genotyped sample of *2356 individuals spanning the entire distribution of mathematical ability, as assessed by teacher reports and online tests. Although the effects are of the modest sizes now expected for complex traits and require further replication, interesting candidate genes are implicated such as NRCAM which encodes a neuronal cell adhesion molecule. When combined into a set, the 10 SNPs account for 2.9% (F = 56.85; df = 1 and 1881; P = 7.277e–14) of the phenotypic variance. The association is linear across the distribution consistent with a quantitative trait locus (QTL) hypothesis; the third of children in our sample who harbour 10 or more of the 20 risk alleles identified are nearly twice as likely (OR = 1.96; df = 1; P = 3.696e–07) to be in the lowest performing 15% of the distribution. Our results correspond with those of quantitative genetic research in indicating that mathematical ability and disability are influenced by many genes generating small effects across the entire spectrum of ability, implying that more highly powered studies will be needed to detect and replicate these QTL associations. PMID:20039944
Docherty, S J; Davis, O S P; Kovas, Y; Meaburn, E L; Dale, P S; Petrill, S A; Schalkwyk, L C; Plomin, R
2010-03-01
Numeracy is as important as literacy and exhibits a similar frequency of disability. Although its etiology is relatively poorly understood, quantitative genetic research has demonstrated mathematical ability to be moderately heritable. In this first genome-wide association study (GWAS) of mathematical ability and disability, 10 out of 43 single nucleotide polymorphism (SNP) associations nominated from two high- vs. low-ability (n = 600 10-year-olds each) scans of pooled DNA were validated (P < 0.05) in an individually genotyped sample of (*)2356 individuals spanning the entire distribution of mathematical ability, as assessed by teacher reports and online tests. Although the effects are of the modest sizes now expected for complex traits and require further replication, interesting candidate genes are implicated such as NRCAM which encodes a neuronal cell adhesion molecule. When combined into a set, the 10 SNPs account for 2.9% (F = 56.85; df = 1 and 1881; P = 7.277e-14) of the phenotypic variance. The association is linear across the distribution consistent with a quantitative trait locus (QTL) hypothesis; the third of children in our sample who harbour 10 or more of the 20 risk alleles identified are nearly twice as likely (OR = 1.96; df = 1; P = 3.696e-07) to be in the lowest performing 15% of the distribution. Our results correspond with those of quantitative genetic research in indicating that mathematical ability and disability are influenced by many genes generating small effects across the entire spectrum of ability, implying that more highly powered studies will be needed to detect and replicate these QTL associations.
Beyond hormones: a novel hypothesis for the biological basis of male sexual orientation.
Bocklandt, S; Hamer, D H
2003-01-01
For the past several decades, research on the development of human sexual orientation has focused on the role of pre- or peri-natal androgen levels on brain development. However, there is no evidence that physiologically occurring variations in androgen exposure influence differences in sexual orientation. In this review, we discuss an alternative hypothesis involving genomic imprinting in the regulation of sex specific expression of genes regulating sexually dimorphic traits, including sexual orientation. A possible experiment to test this hypothesis is discussed.
Newly discovered young CORE-SINEs in marsupial genomes.
Munemasa, Maruo; Nikaido, Masato; Nishihara, Hidenori; Donnellan, Stephen; Austin, Christopher C; Okada, Norihiro
2008-01-15
Although recent mammalian genome projects have uncovered a large part of genomic component of various groups, several repetitive sequences still remain to be characterized and classified for particular groups. The short interspersed repetitive elements (SINEs) distributed among marsupial genomes are one example. We have identified and characterized two new SINEs from marsupial genomes that belong to the CORE-SINE family, characterized by a highly conserved "CORE" domain. PCR and genomic dot blot analyses revealed that the distribution of each SINE shows distinct patterns among the marsupial genomes, implying different timing of their retroposition during the evolution of marsupials. The members of Mar3 (Marsupialia 3) SINE are distributed throughout the genomes of all marsupials, whereas the Mac1 (Macropodoidea 1) SINE is distributed specifically in the genomes of kangaroos. Sequence alignment of the Mar3 SINEs revealed that they can be further divided into four subgroups, each of which has diagnostic nucleotides. The insertion patterns of each SINE at particular genomic loci, together with the distribution patterns of each SINE, suggest that the Mar3 SINEs have intensively amplified after the radiation of diprotodontians, whereas the Mac1 SINE has amplified only slightly after the divergence of hypsiprimnodons from other macropods. By compiling the information of CORE-SINEs characterized to date, we propose a comprehensive picture of how SINE evolution occurred in the genomes of marsupials.
A probabilistic method for testing and estimating selection differences between populations
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-01-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. PMID:26463656
USDA-ARS?s Scientific Manuscript database
Mutations often accompany DNA replication. Since there may be fewer cell cycles per year in the germlines of long-lived than short-lived angiosperms, the genomes of long-lived angiosperms may be diverging more slowly than those of short-lived angiosperms. Here we test this hypothesis. We first const...
Wang, Zhang
2017-01-01
Abstract Amoebae have been considered as a genetic “melting pot” for its symbionts, facilitating genetic exchanges of the bacteria that co-inhabit the same host. To test the “melting pot” hypothesis, we analyzed six genomes of amoeba endosymbionts within Rickettsiales, four of which belong to Holosporaceae family and two to Candidatus Midichloriaceae. For the first time, we identified plasmids in obligate amoeba endosymbionts, which suggests conjugation as a potential mechanism for lateral gene transfers (LGTs) that underpin the “melting pot” hypothesis. We found strong evidence of recent LGTs between the Rickettsiales amoeba endosymbionts, suggesting that the LGTs are continuous and ongoing. In addition, comparative genomic and phylogenomic analyses revealed pervasive and recurrent LGTs between Rickettsiales and distantly related amoeba-associated bacteria throughout the Rickettsiales evolution. Many of these exchanged genes are important for amoeba–symbiont interactions, including genes in transport system, antibiotic resistance, stress response, and bacterial virulence, suggesting that LGTs have played important roles in the adaptation of endosymbionts to their intracellular habitats. Surprisingly, we found little evidence of LGTs between amoebae and their bacterial endosymbionts. Our study strongly supports the “melting pot” hypothesis and highlights the role of amoebae in shaping the Rickettsiales evolution. PMID:29177480
Erren, T C; Erren, M
2004-04-01
When David Horrobin suggested that phospholipid and fatty acid metabolism played a major role in human evolution, his 'fat utilization hypothesis' unified intriguing work from paleoanthropology, evolutionary biology, genetic and nervous system research in a novel and coherent lipid-related context. Interestingly, unlike most other evolutionary concepts, the hypothesis allows specific predictions which can be empirically tested in the near future. This paper summarizes some of Horrobin's intriguing propositions and suggests as to how approaches of comparative genomics published in Cell, Nature, Science and elsewhere since 1997 may be used to examine his evolutionary hypothesis. Indeed, systematic investigations of the genomic clock in the species' mitochondrial DNA, the Y and autosomal chromosomes as evidence of evolutionary relationships and distinctions can help to scrutinize associated predictions for their validity, namely that key mutations which differentiate us from Neanderthals and from great apes are in the genes coding for proteins which regulate fat metabolism, and particularly the phospholipid metabolism of the synapses of the brain. It is concluded that beyond clues to humans' relationships with living primates and to the Neanderthals' cognitive performance and their disappearance, the suggested molecular clock analyses may provide crucial insights into the biochemical evolution-and means of possible manipulation-of our brain.
Lee, Kang-Hoon; Shin, Kyung-Seop; Lim, Debora; Kim, Woo-Chan; Chung, Byung Chang; Han, Gyu-Bum; Roh, Jeongkyu; Cho, Dong-Ho; Cho, Kiho
2015-07-01
The genomes of living organisms are populated with pleomorphic repetitive elements (REs) of varying densities. Our hypothesis that genomic RE landscapes are species/strain/individual-specific was implemented into the Genome Signature Imaging system to visualize and compute the RE-based signatures of any genome. Following the occurrence profiling of 5-nucleotide REs/words, the information from top-50 frequency words was transformed into a genome-specific signature and visualized as Genome Signature Images (GSIs), using a CMYK scheme. An algorithm for computing distances among GSIs was formulated using the GSIs' variables (word identity, frequency, and frequency order). The utility of the GSI-distance computation system was demonstrated with control genomes. GSI-based computation of genome-relatedness among 1766 microbes (117 archaea and 1649 bacteria) identified their clustering patterns; although the majority paralleled the established classification, some did not. The Genome Signature Imaging system, with its visualization and distance computation functions, enables genome-scale evolutionary studies involving numerous genomes with varying sizes. Copyright © 2015 Elsevier Inc. All rights reserved.
Sex drives intracellular conflict in yeast.
Harrison, E; MacLean, R C; Koufopanou, V; Burt, A
2014-08-01
Theory predicts that sex can drive the evolution of conflict within the cell. During asexual reproduction, genetic material within the cell is inherited as a single unit, selecting for cooperation both within the genome as well as between the extra-genomic elements within the cell (e.g. plasmids and endosymbionts). Under sexual reproduction, this unity is broken down as parental genomes are distributed between meiotic progeny. Genetic elements able to transmit to more than 50% of meiotic progeny have a transmission advantage over the rest of the genome and are able to spread, even where they reduce the fitness of the individual as a whole. Sexual reproduction is therefore expected to drive the evolution of selfish genetic elements (SGEs). Here, we directly test this hypothesis by studying the evolution of two independent SGEs, the 2-μm plasmid and selfish mitochondria, in populations of Saccharomyces cerevisiae. Following 22 rounds of sexual reproduction, 2-μm copy number increased by approximately 13.2 (±5.6) copies per cell, whereas in asexual populations copy number decreased by approximately 5.1 (±1.5) copies per cell. Given that the burden imposed by this parasite increases with copy number, these results support the idea that sex drives the evolution of increased SGE virulence. Moreover, we found that mitochondria that are respiratory-deficient rapidly invaded sexual but not asexual populations, demonstrating that frequent outcrossed sex can drive the de novo evolution of genetic parasites. Our study highlights the genomic perils of sex and suggests that SGEs may play a key role in driving major evolutionary transitions, such as uniparental inheritance. © 2014 The Authors. Journal of Evolutionary Biology © 2014 European Society For Evolutionary Biology.
NASA Astrophysics Data System (ADS)
Zhou, K.; Sylvan, J. B.; Hallam, S. J.
2017-12-01
The Bacteroidetes are a ubiquitous phylum of bacteria found in a wide variety of habitats. Marine Bacteroidetes are known to utilize complex carbohydrates and have a potentially important role in the global carbon cycle through processing these compounds, which are not digestible by many other microbes. Some members of the phylum are known to perform denitrification and are facultative anaerobes, but Bacteroidetes are not known to participate in sulfur redox cycling. Recently, it was shown that a clade of uncultured Bacteroidetes, including the VC2.1_Bac22 group, appears to be endemic to sulfidic environments, including hydrothermal vent sulfide chimneys, sediments and marine water column oxygen minimum zones (OMZs). This clade, dubbed the Sulfiphilic Bacteroidetes, is not detected in 16S rRNA amplicon studies from non-sulfidic environments. To test the hypothesis that the Sulphiphilic Bacteroidetes are involved in sulfur redox chemistry, we updated our meta-analysis of the clade using 16s rRNA sequences from public databases and employed single-cell genomics to survey their genomic potential using 19 single amplified genomes (SAGs) isolated from the seasonally anoxic Saanich Inlet, a seasonally hypoxic basin in British Columbia. Initial analysis of these SAGs indicates the Sulphiphilic Bacteroidetes may perform sulfur redox reactions using a three gene psrABC operon encoding the polysulfide reductase enzyme complex with a thiosulfate sulfurtransferase (rhodanese), which putatively uses cyanide to convert thiosulfate to sulfite, just upstream. Interestingly, this is the same configuration as discovered recently in some Marine Group A bacteria. Further aspects of the Sulphiphilic Bacteroidetes' genomic potential will be presented in light of their presence in sulfidic environments.
Allen, John F.
2015-01-01
Chloroplasts and mitochondria are subcellular bioenergetic organelles with their own genomes and genetic systems. DNA replication and transmission to daughter organelles produces cytoplasmic inheritance of characters associated with primary events in photosynthesis and respiration. The prokaryotic ancestors of chloroplasts and mitochondria were endosymbionts whose genes became copied to the genomes of their cellular hosts. These copies gave rise to nuclear chromosomal genes that encode cytosolic proteins and precursor proteins that are synthesized in the cytosol for import into the organelle into which the endosymbiont evolved. What accounts for the retention of genes for the complete synthesis within chloroplasts and mitochondria of a tiny minority of their protein subunits? One hypothesis is that expression of genes for protein subunits of energy-transducing enzymes must respond to physical environmental change by means of a direct and unconditional regulatory control—control exerted by change in the redox state of the corresponding gene product. This hypothesis proposes that, to preserve function, an entire redox regulatory system has to be retained within its original membrane-bound compartment. Colocation of gene and gene product for redox regulation of gene expression (CoRR) is a hypothesis in agreement with the results of a variety of experiments designed to test it and which seem to have no other satisfactory explanation. Here, I review evidence relating to CoRR and discuss its development, conclusions, and implications. This overview also identifies predictions concerning the results of experiments that may yet prove the hypothesis to be incorrect. PMID:26286985
Chatterjee, Aniruddha; Lagisz, Malgorzata; Rodger, Euan J; Zhen, Li; Stockwell, Peter A; Duncan, Elizabeth J; Horsfield, Julia A; Jeyakani, Justin; Mathavan, Sinnakaruppan; Ozaki, Yuichi; Nakagawa, Shinichi
2016-09-30
The sex drive hypothesis predicts that stronger selection on male traits has resulted in masculinization of the genome. Here we test whether such masculinizing effects can be detected at the level of the transcriptome and methylome in the adult zebrafish brain. Although methylation is globally similar, we identified 914 specific differentially methylated CpGs (DMCs) between males and females (435 were hypermethylated and 479 were hypomethylated in males compared to females). These DMCs were prevalent in gene body, intergenic regions and CpG island shores. We also discovered 15 distinct CpG clusters with striking sex-specific DNA methylation differences. In contrast, at transcriptome level, more female-biased genes than male-biased genes were expressed, giving little support for the male sex drive hypothesis. Our study provides genome-wide methylome and transcriptome assessment and sheds light on sex-specific epigenetic patterns and in zebrafish for the first time. Copyright © 2016 Elsevier B.V. All rights reserved.
We report the draft genome of two Sphingopyxis spp. strains isolated from a chloraminated drinking water distribution system simulator. Both strains are ubiquitous residents and early colonizers of water distribution systems. Genomic annotation identified a class 1 integron (in...
Symonová, Radka; Majtánová, Zuzana; Arias-Rodriguez, Lenin; Mořkovský, Libor; Kořínková, Tereza; Cavin, Lionel; Pokorná, Martina Johnson; Doležálková, Marie; Flajšhans, Martin; Normandeau, Eric; Ráb, Petr; Meyer, Axel; Bernatchez, Louis
2017-11-01
Genomic GC content can vary locally, and GC-rich regions are usually associated with increased DNA thermostability in thermophilic prokaryotes and warm-blooded eukaryotes. Among vertebrates, fish and amphibians appeared to possess a distinctly less heterogeneous AT/GC organization in their genomes, whereas cytogenetically detectable GC heterogeneity has so far only been documented in mammals and birds. The subject of our study is the gar, an ancient "living fossil" of a basal ray-finned fish lineage, known from the Cretaceous period. We carried out cytogenomic analysis in two gar genera (Atractosteus and Lepisosteus) uncovering a GC chromosomal pattern uncharacteristic for fish. Bioinformatic analysis of the spotted gar (Lepisosteus oculatus) confirmed a GC compartmentalization on GC profiles of linkage groups. This indicates a rather mammalian mode of compositional organization on gar chromosomes. Gars are thus the only analyzed extant ray-finned fishes with a GC compartmentalized genome. Since gars are cold-blooded anamniotes, our results contradict the generally accepted hypothesis that the phylogenomic onset of GC compartmentalization occurred near the origin of amniotes. Ecophysiological findings of other authors indicate a metabolic similarity of gars with mammals. We hypothesize that gars might have undergone convergent evolution with the tetrapod lineages leading to mammals on both metabolic and genomic levels. Their metabolic adaptations might have left footprints in their compositional genome evolution, as proposed by the metabolic rate hypothesis. The genome organization described here in gars sheds new light on the compositional genome evolution in vertebrates generally and contributes to better understanding of the complexities of the mechanisms involved in this process. © 2016 Wiley Periodicals, Inc.
A Slowed Cell Cycle Stabilizes the Budding Yeast Genome.
Vinton, Peter J; Weinert, Ted
2017-06-01
During cell division, aberrant DNA structures are detected by regulators called checkpoints that slow division to allow error correction. In addition to checkpoint-induced delay, it is widely assumed, though rarely shown, that merely slowing the cell cycle might allow more time for error detection and correction, thus resulting in a more stable genome. Fidelity by a slowed cell cycle might be independent of checkpoints. Here we tested the hypothesis that a slowed cell cycle stabilizes the genome, independent of checkpoints, in the budding yeast Saccharomyces cerevisiae We were led to this hypothesis when we identified a gene ( ERV14 , an ER cargo membrane protein) that when mutated, unexpectedly stabilized the genome, as measured by three different chromosome assays. After extensive studies of pathways rendered dysfunctional in erv14 mutant cells, we are led to the inference that no particular pathway is involved in stabilization, but rather the slowed cell cycle induced by erv14 stabilized the genome. We then demonstrated that, in genetic mutations and chemical treatments unrelated to ERV14 , a slowed cell cycle indeed correlates with a more stable genome, even in checkpoint-proficient cells. Data suggest a delay in G2/M may commonly stabilize the genome. We conclude that chromosome errors are more rarely made or are more readily corrected when the cell cycle is slowed (even ∼15 min longer in an ∼100-min cell cycle). And, some chromosome errors may not signal checkpoint-mediated responses, or do not sufficiently signal to allow correction, and their correction benefits from this "time checkpoint." Copyright © 2017 by the Genetics Society of America.
Hypothesis: Gene-rich plastid genomes in red algae may be an outcome of nuclear genome reduction.
Qiu, Huan; Lee, Jun Mo; Yoon, Hwan Su; Bhattacharya, Debashish
2017-06-01
Red algae (Rhodophyta) putatively diverged from the eukaryote tree of life >1.2 billion years ago and are the source of plastids in the ecologically important diatoms, haptophytes, and dinoflagellates. In general, red algae contain the largest plastid gene inventory among all such organelles derived from primary, secondary, or additional rounds of endosymbiosis. In contrast, their nuclear gene inventory is reduced when compared to their putative sister lineage, the Viridiplantae, and other photosynthetic lineages. The latter is thought to have resulted from a phase of genome reduction that occurred in the stem lineage of Rhodophyta. A recent comparative analysis of a taxonomically broad collection of red algal and Viridiplantae plastid genomes demonstrates that the red algal ancestor encoded ~1.5× more plastid genes than Viridiplantae. This difference is primarily explained by more extensive endosymbiotic gene transfer (EGT) in the stem lineage of Viridiplantae, when compared to red algae. We postulate that limited EGT in Rhodophytes resulted from the countervailing force of ancient, and likely recurrent, nuclear genome reduction. In other words, the propensity for nuclear gene loss led to the retention of red algal plastid genes that would otherwise have undergone intracellular gene transfer to the nucleus. This hypothesis recognizes the primacy of nuclear genome evolution over that of plastids, which have no inherent control of their gene inventory and can change dramatically (e.g., secondarily non-photosynthetic eukaryotes, dinoflagellates) in response to selection acting on the host lineage. © 2017 Phycological Society of America.
2010-01-01
Background The Galliformes is a well-known and widely distributed Order in Aves. The phylogenetic relationships of galliform birds, especially the turkeys, grouse, chickens, quails, and pheasants, have been studied intensively, likely because of their close association with humans. Despite extensive studies, convergent morphological evolution and rapid radiation have resulted in conflicting hypotheses of phylogenetic relationships. Many internal nodes have remained ambiguous. Results We analyzed the complete mitochondrial (mt) genomes from 34 galliform species, including 14 new mt genomes and 20 published mt genomes, and obtained a single, robust tree. Most of the internal branches were relatively short and the terminal branches long suggesting an ancient, rapid radiation. The Megapodiidae formed the sister group to all other galliforms, followed in sequence by the Cracidae, Odontophoridae and Numididae. The remaining clade included the Phasianidae, Tetraonidae and Meleagrididae. The genus Arborophila was the sister group of the remaining taxa followed by Polyplectron. This was followed by two major clades: ((((Gallus, Bambusicola) Francolinus) (Coturnix, Alectoris)) Pavo) and (((((((Chrysolophus, Phasianus) Lophura) Syrmaticus) Perdix) Pucrasia) (Meleagris, Bonasa)) ((Lophophorus, Tetraophasis) Tragopan))). Conclusions The traditional hypothesis of monophyletic lineages of pheasants, partridges, peafowls and tragopans was not supported in this study. Mitogenomic analyses recovered robust phylogenetic relationships and suggested that the Galliformes formed a model group for the study of morphological and behavioral evolution. PMID:20444289
Worldwide patterns of genomic variation and admixture in gray wolves.
Fan, Zhenxin; Silva, Pedro; Gronau, Ilan; Wang, Shuoguo; Armero, Aitor Serres; Schweizer, Rena M; Ramirez, Oscar; Pollinger, John; Galaverni, Marco; Ortega Del-Vecchyo, Diego; Du, Lianming; Zhang, Wenping; Zhang, Zhihe; Xing, Jinchuan; Vilà, Carles; Marques-Bonet, Tomas; Godinho, Raquel; Yue, Bisong; Wayne, Robert K
2016-02-01
The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog was domesticated in the Old World. However, no single wolf population is more closely related to dogs, supporting the hypothesis that dogs were derived from an extinct wolf population. All extant wolves have a surprisingly recent common ancestry and experienced a dramatic population decline beginning at least ∼30 thousand years ago (kya). We suggest this crisis was related to the colonization of Eurasia by modern human hunter-gatherers, who competed with wolves for limited prey but also domesticated them, leading to a compensatory population expansion of dogs. We found extensive admixture between dogs and wolves, with up to 25% of Eurasian wolf genomes showing signs of dog ancestry. Dogs have influenced the recent history of wolves through admixture and vice versa, potentially enhancing adaptation. Simple scenarios of dog domestication are confounded by admixture, and studies that do not take admixture into account with specific demographic models are problematic. © 2016 Fan et al.; Published by Cold Spring Harbor Laboratory Press.
Merging Marine Ecosystem Models and Genomics
NASA Astrophysics Data System (ADS)
Coles, V.; Hood, R. R.; Stukel, M. R.; Moran, M. A.; Paul, J. H.; Satinsky, B.; Zielinski, B.; Yager, P. L.
2015-12-01
oceanography. One of the grand challenges of oceanography is to develop model techniques to more effectively incorporate genomic information. As one approach, we developed an ecosystem model whose community is determined by randomly assigning functional genes to build each organism's "DNA". Microbes are assigned a size that sets their baseline environmental responses using allometric response cuves. These responses are modified by the costs and benefits conferred by each gene in an organism's genome. The microbes are embedded in a general circulation model where environmental conditions shape the emergent population. This model is used to explore whether organisms constructed from randomized combinations of metabolic capability alone can self-organize to create realistic oceanic biogeochemical gradients. Realistic community size spectra and chlorophyll-a concentrations emerge in the model. The model is run repeatedly with randomly-generated microbial communities and each time realistic gradients in community size spectra, chlorophyll-a, and forms of nitrogen develop. This supports the hypothesis that the metabolic potential of a community rather than the realized species composition is the primary factor setting vertical and horizontal environmental gradients. Vertical distributions of nitrogen and transcripts for genes involved in nitrification are broadly consistent with observations. Modeled gene and transcript abundance for nitrogen cycling and processing of land-derived organic material match observations along the extreme gradients in the Amazon River plume, and they help to explain the factors controlling observed variability.
Adaptive Evolution of the Insulin Two-Gene System in Mouse
Shiao, Meng-Shin; Liao, Ben-Yang; Long, Manyuan; Yu, Hon-Tsen
2008-01-01
Insulin genes in mouse and rat compose a two-gene system in which Ins1 was retroposed from the partially processed mRNA of Ins2. When Ins1 originated and how it was retained in genomes still remain interesting problems. In this study, we used genomic approaches to detect insulin gene copy number variation in rodent species and investigated evolutionary forces acting on both Ins1 and Ins2. We characterized the phylogenetic distribution of the new insulin gene (Ins1) by Southern analyses and confirmed by sequencing insulin genes in the rodent genomes. The results demonstrate that Ins1 originated right before the mouse–rat split (∼20 MYA), and both Ins1 and Ins2 are under strong functional constraints in these murine species. Interestingly, by examining a range of nucleotide polymorphisms, we detected positive selection acting on both Ins2 and Ins1 gene regions in the Mus musculus domesticus populations. Furthermore, three amino acid sites were also identified as having evolved under positive selection in two insulin peptides: two are in the signal peptide and one is in the C-peptide. Our data suggest an adaptive divergence in the mouse insulin two-gene system, which may result from the response to environmental change caused by the rise of agricultural civilization, as proposed by the thrifty-genotype hypothesis. PMID:18245324
Positive selection on the killer whale mitogenome.
Foote, Andrew D; Morin, Phillip A; Durban, John W; Pitman, Robert L; Wade, Paul; Willerslev, Eske; Gilbert, M Thomas P; da Fonseca, Rute R
2011-02-23
Mitochondria produce up to 95 per cent of the eukaryotic cell's energy. The coding genes of the mitochondrial DNA may therefore evolve under selection owing to metabolic requirements. The killer whale, Orcinus orca, is polymorphic, has a global distribution and occupies a range of ecological niches. It is therefore a suitable organism for testing this hypothesis. We compared a global dataset of the complete mitochondrial genomes of 139 individuals for amino acid changes that were associated with radical physico-chemical property changes and were influenced by positive selection. Two such selected non-synonymous amino acid changes were found; one in each of two ecotypes that inhabit the Antarctic pack ice. Both substitutions were associated with changes in local polarity, increased steric constraints and α-helical tendencies that could influence overall metabolic performance, suggesting a functional change.
The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.
Christensen, G B; Knight, S; Camp, N J
2009-11-01
We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.
Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C
2016-06-15
Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. Copyright © 2016 Lee et al.
Lee, Kevin C.; Stott, Matthew B.; Dunfield, Peter F.; Huttenhower, Curtis; McDonald, Ian R.
2016-01-01
ABSTRACT Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. IMPORTANCE This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes. It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. PMID:27060125
Silva, Saura R.; Diaz, Yani C. A.; Penha, Helen Alves; Pinheiro, Daniel G.; Fernandes, Camila C.; Miranda, Vitor F. O.; Michael, Todd P.
2016-01-01
Lentibulariaceae is the richest family of carnivorous plants spanning three genera including Pinguicula, Genlisea, and Utricularia. Utricularia is globally distributed, and, unlike Pinguicula and Genlisea, has both aquatic and terrestrial forms. In this study we present the analysis of the chloroplast (cp) genome of the terrestrial Utricularia reniformis. U. reniformis has a standard cp genome of 139,725bp, encoding a gene repertoire similar to essentially all photosynthetic organisms. However, an exclusive combination of losses and pseudogenization of the plastid NAD(P)H-dehydrogenase (ndh) gene complex were observed. Comparisons among aquatic and terrestrial forms of Pinguicula, Genlisea, and Utricularia indicate that, whereas the aquatic forms retained functional copies of the eleven ndh genes, these have been lost or truncated in terrestrial forms, suggesting that the ndh function may be dispensable in terrestrial Lentibulariaceae. Phylogenetic scenarios of the ndh gene loss and recovery among Pinguicula, Genlisea, and Utricularia to the ancestral Lentibulariaceae cladeare proposed. Interestingly, RNAseq analysis evidenced that U. reniformis cp genes are transcribed, including the truncated ndh genes, suggesting that these are not completely inactivated. In addition, potential novel RNA-editing sites were identified in at least six U. reniformis cp genes, while none were identified in the truncated ndh genes. Moreover, phylogenomic analyses support that Lentibulariaceae is monophyletic, belonging to the higher core Lamiales clade, corroborating the hypothesis that the first Utricularia lineage emerged in terrestrial habitats and then evolved to epiphytic and aquatic forms. Furthermore, several truncated cp genes were found interspersed with U. reniformis mitochondrial and nuclear genome scaffolds, indicating that as observed in other smaller plant genomes, such as Arabidopsis thaliana, and the related and carnivorous Genlisea nigrocaulis and G. hispidula, the endosymbiotic gene transfer may also shape the U. reniformis genome in a similar fashion. Overall the comparative analysis of the U. reniformis cp genome provides new insight into the ndh genes and cp genome evolution of carnivorous plants from Lentibulariaceae family. PMID:27764252
Montague, Michael J; Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L; Searle, Steven M J; Minx, Patrick; Hillier, LaDeana W; Koboldt, Daniel C; Davis, Brian W; Driscoll, Carlos A; Barr, Christina S; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W C; Hahn, Matthew W; Menotti-Raymond, Marilyn; O'Brien, Stephen J; Wilson, Richard K; Lyons, Leslie A; Murphy, William J; Warren, Wesley C
2014-12-02
Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae.
Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L.; Searle, Steven M. J.; Minx, Patrick; Hillier, LaDeana W.; Koboldt, Daniel C.; Davis, Brian W.; Driscoll, Carlos A.; Barr, Christina S.; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W. C.; Hahn, Matthew W.; Menotti-Raymond, Marilyn; O’Brien, Stephen J.; Wilson, Richard K.; Lyons, Leslie A.; Murphy, William J.; Warren, Wesley C.
2014-01-01
Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae. PMID:25385592
Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M
2008-08-08
Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.
Parallel Loss of Plastid Introns and Their Maturase in the Genus Cuscuta
McNeal, Joel R.; Kuehl, Jennifer V.; Boore, Jeffrey L.; Leebens-Mack, Jim; dePamphilis, Claude W.
2009-01-01
Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta. PMID:19543388
Parallel loss of plastid introns and their maturase in the genus Cuscuta.
McNeal, Joel R; Kuehl, Jennifer V; Boore, Jeffrey L; Leebens-Mack, Jim; dePamphilis, Claude W
2009-06-19
Plastid genome content and arrangement are highly conserved across most land plants and their closest relatives, streptophyte algae, with nearly all plastid introns having invaded the genome in their common ancestor at least 450 million years ago. One such intron, within the transfer RNA trnK-UUU, contains a large open reading frame that encodes a presumed intron maturase, matK. This gene is missing from the plastid genomes of two species in the parasitic plant genus Cuscuta but is found in all other published land plant and streptophyte algal plastid genomes, including that of the nonphotosynthetic angiosperm Epifagus virginiana and two other species of Cuscuta. By examining matK and plastid intron distribution in Cuscuta, we add support to the hypothesis that its normal role is in splicing seven of the eight group IIA introns in the genome. We also analyze matK nucleotide sequences from Cuscuta species and relatives that retain matK to test whether changes in selective pressure in the maturase are associated with intron deletion. Stepwise loss of most group IIA introns from the plastid genome results in substantial change in selective pressure within the hypothetical RNA-binding domain of matK in both Cuscuta and Epifagus, either through evolution from a generalist to a specialist intron splicer or due to loss of a particular intron responsible for most of the constraint on the binding region. The possibility of intron-specific specialization in the X-domain is implicated by evidence of positive selection on the lineage leading to C. nitida in association with the loss of six of seven introns putatively spliced by matK. Moreover, transfer RNA gene deletion facilitated by parasitism combined with an unusually high rate of intron loss from remaining functional plastid genes created a unique circumstance on the lineage leading to Cuscuta subgenus Grammica that allowed elimination of matK in the most species-rich lineage of Cuscuta.
2015-09-01
glioblastoma . We have successfully established several patient-derived cell lines from glioblastoma tumors and further established a number of...and single-cell technologies. Although the focus of this research is glioblastoma , the proposed tools are generally applicable to all cancer-based...studies. 15. SUBJECT TERMS Human cohorts, Glioblastoma , Genomic, Proteomic, Single-cell technologies, Hypothesis-driven, integrative systems approach
Origins of DNA Replication and Amplification in the Breast Cancer Genome
2011-09-01
AD_________________ Award Number: W81XWH-10-1-0463 TITLE: Origins of DNA Replication and...hypothesis we need to map origins of DNA replication in the genome and ask which of these coincide with sites of DNA amplification and with ER...Spring Harbor DNA Replication meetings this summer/earlyfall. Figures from the posters and also the abstracts are attached. The samples have been
Developing a Hypothetical Learning Trajectory for the Sampling Distribution of the Sample Means
NASA Astrophysics Data System (ADS)
Syafriandi
2018-04-01
Special types of probability distribution are sampling distributions that are important in hypothesis testing. The concept of a sampling distribution may well be the key concept in understanding how inferential procedures work. In this paper, we will design a hypothetical learning trajectory (HLT) for the sampling distribution of the sample mean, and we will discuss how the sampling distribution is used in hypothesis testing.
NASA Astrophysics Data System (ADS)
Tanji, Hajime; Kiri, Hirohide; Kobayashi, Shintaro
When total supply is smaller than total demand, it is difficult to apply the paddy irrigation water distribution rule. The gap must be narrowed by decreasing demand. Historically, the upstream served rule, rotation schedule, or central schedule weight to irrigated area was adopted. This paper proposes the hypothesis that these rules are dependent on social justice, a hypothesis called the "Society-Justice-Water Distribution Rule Hypothesis". Justice, which means a balance of efficiency and equity of distribution, is discussed under the political philosophy of utilitarianism, liberalism (Rawls), libertarianism, and communitarianism. The upstream served rule can be derived from libertarianism. The rotation schedule and central schedule can be derived from communitarianism. Liberalism can provide arranged schedule to adjust supply and demand based on "the Difference Principle". The authors conclude that to achieve efficiency and equity, liberalism may provide the best solution after modernization.
Elhaik, Eran
2013-01-01
The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The "Rhineland hypothesis" depicts Eastern European Jews as a "population isolate" that emerged from a small group of German Jews who migrated eastward and expanded rapidly. Alternatively, the "Khazarian hypothesis" suggests that Eastern European Jews descended from the Khazars, an amalgam of Turkic clans that settled the Caucasus in the early centuries CE and converted to Judaism in the 8th century. Mesopotamian and Greco-Roman Jews continuously reinforced the Judaized empire until the 13th century. Following the collapse of their empire, the Judeo-Khazars fled to Eastern Europe. The rise of European Jewry is therefore explained by the contribution of the Judeo-Khazars. Thus far, however, the Khazars' contribution has been estimated only empirically, as the absence of genome-wide data from Caucasus populations precluded testing the Khazarian hypothesis. Recent sequencing of modern Caucasus populations prompted us to revisit the Khazarian hypothesis and compare it with the Rhineland hypothesis. We applied a wide range of population genetic analyses to compare these two hypotheses. Our findings support the Khazarian hypothesis and portray the European Jewish genome as a mosaic of Near Eastern-Caucasus, European, and Semitic ancestries, thereby consolidating previous contradictory reports of Jewish ancestry. We further describe a major difference among Caucasus populations explained by the early presence of Judeans in the Southern and Central Caucasus. Our results have important implications for the demographic forces that shaped the genetic diversity in the Caucasus and for medical studies.
Rarity and genetic diversity in Indo–Pacific Acropora corals
Richards, Zoe T; Oppen, Madeleine J H
2012-01-01
Among various potential consequences of rarity is genetic erosion. Neutral genetic theory predicts that rare species will have lower genetic diversity than common species. To examine the association between genetic diversity and rarity, variation at eight DNA microsatellite markers was documented for 14 Acropora species that display different patterns of distribution and abundance in the Indo–Pacific Ocean. Our results show that the relationship between rarity and genetic diversity is not a positive linear association because, contrary to expectations, some rare species are genetically diverse and some populations of common species are genetically depleted. Our data suggest that inbreeding is the most likely mechanism of genetic depletion in both rare and common corals, and that hybridization is the most likely explanation for higher than expected levels of genetic diversity in rare species. A significant hypothesis generated from our study with direct conservation implications is that as a group, Acropora corals have lower genetic diversity at neutral microsatellite loci than may be expected from their taxonomic diversity, and this may suggest a heightened susceptibility to environmental change. This hypothesis requires validation based on genetic diversity estimates derived from a large portion of the genome. PMID:22957189
The protective function of noncoding DNA in genome defense of eukaryotic male germ cells.
Qiu, Guo-Hua; Huang, Cuiqin; Zheng, Xintian; Yang, Xiaoyan
2018-04-01
Peripheral and abundant noncoding DNA has been hypothesized to protect the genome and the central protein-coding sequences against DNA damage in somatic genome. In the cytosol, invading exogenous nucleic acids may first be deactivated by small RNAs encoded by noncoding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. In the nucleus, the radicals generated by radiation in the cytosol, radiation energy and invading exogenous nucleic acids are absorbed, blocked and/or reduced by peripheral heterochromatin, and damaged DNA in heterochromatin is removed and excluded from the nucleus to the cytoplasm through nuclear pore complexes. To further strengthen the hypothesis, this review summarizes the experimental evidence supporting the protective function of noncoding DNA in the genome of male germ cells. Based on these data, this review provides evidence supporting the protective role of noncoding DNA in the genome defense of sperm genome through similar mechanisms to those of the somatic genome.
Genomics, "Discovery Science," Systems Biology, and Causal Explanation: What Really Works?
Davidson, Eric H
2015-01-01
Diverse and non-coherent sets of epistemological principles currently inform research in the general area of functional genomics. Here, from the personal point of view of a scientist with over half a century of immersion in hypothesis driven scientific discovery, I compare and deconstruct the ideological bases of prominent recent alternatives, such as "discovery science," some productions of the ENCODE project, and aspects of large data set systems biology. The outputs of these types of scientific enterprise qualitatively reflect their radical definitions of scientific knowledge, and of its logical requirements. Their properties emerge in high relief when contrasted (as an example) to a recent, system-wide, predictive analysis of a developmental regulatory apparatus that was instead based directly on hypothesis-driven experimental tests of mechanism.
A probabilistic method for testing and estimating selection differences between populations.
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-12-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. © 2015 He et al.; Published by Cold Spring Harbor Laboratory Press.
Pereira, Ricardo J; Martínez-Solano, Iñigo; Buckley, David
2016-04-01
Ecological models predict that, in the face of climate change, taxa occupying steep altitudinal gradients will shift their distributions, leading to the contraction or extinction of the high-elevation (cold-adapted) taxa. However, hybridization between ecomorphologically divergent taxa commonly occurs in nature and may lead to alternative evolutionary outcomes, such as genetic merger or gene flow at specific genes. We evaluate this hypothesis by studying patterns of divergence and gene flow across three replicate contact zones between high- and low-elevation ecomorphs of the fire salamander (Salamandra salamandra) that have experienced altitudinal range shifts over the current postglacial period. Strong population structure with high genetic divergence in mitochondrial DNA suggests that vicariant evolution has occurred over several glacial-interglacial cycles and that it has led to cryptic differentiation within ecomorphs. In current parapatric boundaries, we do not find evidence for local extinction and replacement upon postglacial expansion. Instead, parapatric taxa recurrently show discordance between mitochondrial and nuclear markers, suggesting nuclear-mediated gene flow across contact zones. Isolation with migration models support this hypothesis by showing significant gene flow across all five parapatric boundaries. Together, our results suggest that, while some genomic regions, such as the mitochondria, may follow morphologic species traits and retreat to isolated mountain tops, other genomic regions, such as nuclear markers, may flow across parapatric boundaries, sometimes leading to a complete genetic merger. We show that despite high ecologic and morphologic divergence over prolonged periods of time, hybridization allows for evolutionary outcomes alternative to extinction and replacement of taxa in response to climate change. © 2016 John Wiley & Sons Ltd.
Biased distributions and decay of long interspersed nuclear elements in the chicken genome.
Abrusán, György; Krambeck, Hans-Jürgen; Junier, Thomas; Giordano, Joti; Warburton, Peter E
2008-01-01
The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates.
Romine, Margaret F; Rodionov, Dmitry A; Maezato, Yukari; Osterman, Andrei L; Nelson, William C
2017-01-01
Many microorganisms are unable to synthesize essential B vitamin-related enzyme cofactors de novo. The underlying mechanisms by which such microbes survive in multi-species communities are largely unknown. We previously reported the near-complete genome sequence of two ~18-member unicyanobacterial microbial consortia that maintain stable membership on defined medium lacking vitamins. Here we have used genome analysis and growth studies on isolates derived from the consortia to reconstruct pathways for biogenesis of eight essential cofactors and predict cofactor usage and precursor exchange in these communities. Our analyses revealed that all but the two Halomonas and cyanobacterial community members were auxotrophic for at least one cofactor. We also observed a mosaic distribution of salvage routes for a variety of cofactor precursors, including those produced by photolysis. Potentially bidirectional transporters were observed to be preferentially in prototrophs, suggesting a mechanism for controlled precursor release. Furthermore, we found that Halomonas sp. do not require cobalamin nor control its synthesis, supporting the hypothesis that they overproduce and export vitamins. Collectively, these observations suggest that the consortia rely on syntrophic metabolism of cofactors as a survival strategy for optimization of metabolic exchange within a shared pool of micronutrients. PMID:28186498
Mazzoleni, Sofia; Rovatsos, Michail; Schillaci, Odessa; Dumas, Francesca
2018-01-01
Abstract We explored the topology of 18S and 28S rDNA units by fluorescence in situ hybridization (FISH) in the karyotypes of thirteen species representatives from major groups of Primates and Tupaia minor (Günther, 1876) (Scandentia), in order to expand our knowledge of Primate genome reshuffling and to identify the possible dispersion mechanisms of rDNA sequences. We documented that rDNA probe signals were identified on one to six pairs of chromosomes, both acrocentric and metacentric ones. In addition, we examined the potential homology of chromosomes bearing rDNA genes across different species and in a wide phylogenetic perspective, based on the DAPI-inverted pattern and their synteny to human. Our analysis revealed an extensive variability in the topology of the rDNA signals across studied species. In some cases, closely related species show signals on homologous chromosomes, thus representing synapomorphies, while in other cases, signal was detected on distinct chromosomes, leading to species specific patterns. These results led us to support the hypothesis that different mechanisms are responsible for the distribution of the ribosomal DNA cluster in Primates. PMID:29416829
Rapid Detection of Positive Selection in Genes and Genomes Through Variation Clusters
Wagner, Andreas
2007-01-01
Positive selection in genes and genomes can point to the evolutionary basis for differences among species and among races within a species. The detection of positive selection can also help identify functionally important protein regions and thus guide protein engineering. Many existing tests for positive selection are excessively conservative, vulnerable to artifacts caused by demographic population history, or computationally very intensive. I here propose a simple and rapid test that is complementary to existing tests and that overcomes some of these problems. It relies on the null hypothesis that neutrally evolving DNA regions should show a Poisson distribution of nucleotide substitutions. The test detects significant deviations from this expectation in the form of variation clusters, highly localized groups of amino acid changes in a coding region. In applying this test to several thousand human–chimpanzee gene orthologs, I show that such variation clusters are not generally caused by relaxed selection. They occur in well-defined domains of a protein's tertiary structure and show a large excess of amino acid replacement over silent substitutions. I also identify multiple new human–chimpanzee orthologs subject to positive selection, among them genes that are involved in reproductive functions, immune defense, and the nervous system. PMID:17603100
Dating Antarctic ice sheet collapse: Proposing a molecular genetic approach
NASA Astrophysics Data System (ADS)
Strugnell, Jan M.; Pedro, Joel B.; Wilson, Nerida G.
2018-01-01
Sea levels at the end of this century are projected to be 0.26-0.98 m higher than today. The upper end of this range, and even higher estimates, cannot be ruled out because of major uncertainties in the dynamic response of polar ice sheets to a warming climate. Here, we propose an ecological genetics approach that can provide insight into the past stability and configuration of the West Antarctic Ice Sheet (WAIS). We propose independent testing of the hypothesis that a trans-Antarctic seaway occurred at the last interglacial. Examination of the genomic signatures of bottom-dwelling marine species using the latest methods can provide an independent window into the integrity of the WAIS more than 100,000 years ago. Periods of connectivity facilitated by trans-Antarctic seaways could be revealed by dating coalescent events recorded in DNA. These methods allow alternative scenarios to be tested against a fit to genomic data. Ideal candidate taxa for this work would need to possess a circumpolar distribution, a benthic habitat, and some level of genetic structure indicated by phylogeographical investigation. The purpose of this perspective piece is to set out an ecological genetics method to help resolve when the West Antarctic Ice Shelf last collapsed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, J; Velsko, S
This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link twomore » infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the availability of only partial rather than whole genome sequencing, indel information was shown to have the potential to improve performance but only for select outbreak conditions. In examined HIV transmission cases, extended evolution proved to be the limiting factor in assigning high confidence to transmission links, however, the potential to correct for extended evolution not associated with transmission events is demonstrated. Outbreak specific conditions such as selective pressure (in the form of varying mutation rate), are shown to impact the strength of inference made and a Monte Carlo simulation tool is introduced, which is used to provide upper and lower bounds on the confidence values associated with a forensic hypothesis.« less
Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations
Rodriguez-Flores, Juan L.; Fakhro, Khalid; Agosto-Perez, Francisco; Ramstetter, Monica D.; Arbiza, Leonardo; Vincent, Thomas L.; Robay, Amal; Malek, Joel A.; Suhre, Karsten; Chouchane, Lotfi; Badii, Ramin; Al-Nabet Al-Marri, Ajayeb; Abi Khalil, Charbel; Zirie, Mahmoud; Jayyousi, Amin; Salit, Jacqueline; Keinan, Alon; Clark, Andrew G.; Crystal, Ronald G.; Mezey, Jason G.
2016-01-01
An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out-of-Africa migrations. PMID:26728717
On the roles of repetitive DNA elements in the context of a unified genomic-epigenetic system.
von Sternberg, Richard
2002-12-01
Repetitive DNA sequences comprise a substantial portion of most eukaryotic and some prokaryotic chromosomes. Despite nearly forty years of research, the functions of various sequence families as a whole and their monomer units remain largely unknown. The inability to map specific functional roles onto many repetitive DNA elements (REs), coupled with the taxon-specificity of sequence families, have led many to speculate that these genomic components are "selfish" replicators generating genomic "junk." The purpose of this paper is to critically examine the selfishness, evolutionary effects, and functionality of REs. First, a brief overview of the range of ideas pertaining to RE function is presented. Second, the argument is presented that the selfish DNA "hypothesis" is actually a narrative scheme, that it serves to protect neo-Darwinian assumptions from criticism, and that this story is untestable and therefore not a hypothesis. Third, attempts to synthesize the selfish DNA concept with complex systems models of the genome and RE functionality are critiqued. Fourth, the supposed connection between RE-induced mutations and macroevolutionary events are stated to be at variance with empirical evidence and theoretical considerations. Hypotheses that base phylogenetic transitions in repetitive sequence changes thus remain speculative. Fifth and finally, the case is made for viewing REs as integrally functional components of chromosomes, genomes, and cells. It is argued throughout that a new conceptual framework is needed for understanding the roles of repetitive DNA in genomic/epigenetic systems, and that neo-Darwinian "narratives" have been the primary obstacle to elucidating the effects of these enigmatic components of chromosomes.
Coconut genome size determined by flow cytometry: Tall versus Dwarf types.
Freitas Neto, M; Pereira, T N S; Geronimo, I G C; Azevedo, A O N; Ramos, S R R; Pereira, M G
2016-02-11
Coconuts (Cocos nucifera L.) are tropical palm trees that are classified into Tall and Dwarf types based on height, and both types are diploid (2n = 2x = 32 chromosomes). The reproduction mode is autogamous for Dwarf types and allogamous for Tall types. One hypothesis for the origin of the Dwarf coconut suggests that it is a Tall variant that resulted from either mutation or inbreeding, and differences in genome size between the two types would support this hypothesis. In this study, we estimated the genome sizes of 14 coconut accessions (eight Tall and six Dwarf types) using flow cytometry. Nuclei were extracted from leaf discs and stained with propidium iodide, and Pisum sativum (2C = 9.07 pg DNA) was used as an internal standard. Histograms with good resolution and low coefficients of variation (2.5 to 3.2%) were obtained. The 2C DNA content ranged from 5.72 to 5.48 pg for Tall accessions and from 5.58 to 5.52 pg for Dwarf accessions. The mean genome sizes for Tall and Dwarf specimens were 5.59 and 5.55 pg, respectively. Among all accessions, Rennel Island Tall had the highest mean DNA content (5.72 pg), whereas West African Tall had the lowest (5.48 pg). The mean coconut genome size (2C = 5.57 pg, corresponding to 2723.73 Mbp/haploid set) was classified as small. Only small differences in genome size existed among the coconut accessions, suggesting that the Dwarf type did not evolve from the Tall type.
Genome-wide detection of intervals of genetic heterogeneity associated with complex traits
Llinares-López, Felipe; Grimm, Dominik G.; Bodenham, Dean A.; Gieraths, Udo; Sugiyama, Mahito; Rowan, Beth; Borgwardt, Karsten
2015-01-01
Motivation: Genetic heterogeneity, the fact that several sequence variants give rise to the same phenotype, is a phenomenon that is of the utmost interest in the analysis of complex phenotypes. Current approaches for finding regions in the genome that exhibit genetic heterogeneity suffer from at least one of two shortcomings: (i) they require the definition of an exact interval in the genome that is to be tested for genetic heterogeneity, potentially missing intervals of high relevance, or (ii) they suffer from an enormous multiple hypothesis testing problem due to the large number of potential candidate intervals being tested, which results in either many false positives or a lack of power to detect true intervals. Results: Here, we present an approach that overcomes both problems: it allows one to automatically find all contiguous sequences of single nucleotide polymorphisms in the genome that are jointly associated with the phenotype. It also solves both the inherent computational efficiency problem and the statistical problem of multiple hypothesis testing, which are both caused by the huge number of candidate intervals. We demonstrate on Arabidopsis thaliana genome-wide association study data that our approach can discover regions that exhibit genetic heterogeneity and would be missed by single-locus mapping. Conclusions: Our novel approach can contribute to the genome-wide discovery of intervals that are involved in the genetic heterogeneity underlying complex phenotypes. Availability and implementation: The code can be obtained at: http://www.bsse.ethz.ch/mlcb/research/bioinformatics-and-computational-biology/sis.html. Contact: felipe.llinares@bsse.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072488
Evidence of codon usage in the nearest neighbor spacing distribution of bases in bacterial genomes
NASA Astrophysics Data System (ADS)
Higareda, M. F.; Geiger, O.; Mendoza, L.; Méndez-Sánchez, R. A.
2012-02-01
Statistical analysis of whole genomic sequences usually assumes a homogeneous nucleotide density throughout the genome, an assumption that has been proved incorrect for several organisms since the nucleotide density is only locally homogeneous. To avoid giving a single numerical value to this variable property, we propose the use of spectral statistics, which characterizes the density of nucleotides as a function of its position in the genome. We show that the cumulative density of bases in bacterial genomes can be separated into an average (or secular) plus a fluctuating part. Bacterial genomes can be divided into two groups according to the qualitative description of their secular part: linear and piecewise linear. These two groups of genomes show different properties when their nucleotide spacing distribution is studied. In order to analyze genomes having a variable nucleotide density, statistically, the use of unfolding is necessary, i.e., to get a separation between the secular part and the fluctuations. The unfolding allows an adequate comparison with the statistical properties of other genomes. With this methodology, four genomes were analyzed Burkholderia, Bacillus, Clostridium and Corynebacterium. Interestingly, the nearest neighbor spacing distributions or detrended distance distributions are very similar for species within the same genus but they are very different for species from different genera. This difference can be attributed to the difference in the codon usage.
Landscape genomics: natural selection drives the evolution of mitogenome in penguins.
Ramos, Barbara; González-Acuña, Daniel; Loyola, David E; Johnson, Warren E; Parker, Patricia G; Massaro, Melanie; Dantas, Gisele P M; Miranda, Marcelo D; Vianna, Juliana A
2018-01-16
Mitochondria play a key role in the balance of energy and heat production, and therefore the mitochondrial genome is under natural selection by environmental temperature and food availability, since starvation can generate more efficient coupling of energy production. However, selection over mitochondrial DNA (mtDNA) genes has usually been evaluated at the population level. We sequenced by NGS 12 mitogenomes and with four published genomes, assessed genetic variation in ten penguin species distributed from the equator to Antarctica. Signatures of selection of 13 mitochondrial protein-coding genes were evaluated by comparing among species within and among genera (Spheniscus, Pygoscelis, Eudyptula, Eudyptes and Aptenodytes). The genetic data were correlated with environmental data obtained through remote sensing (sea surface temperature [SST], chlorophyll levels [Chl] and a combination of SST and Chl [COM]) through the distribution of these species. We identified the complete mtDNA genomes of several penguin species, including ND6 and 8 tRNAs on the light strand and 12 protein coding genes, 14 tRNAs and two rRNAs positioned on the heavy strand. The highest diversity was found in NADH dehydrogenase genes and the lowest in COX genes. The lowest evolutionary divergence among species was between Humboldt (Spheniscus humboldti) and Galapagos (S. mendiculus) penguins (0.004), while the highest was observed between little penguin (Eudyptula minor) and Adélie penguin (Pygoscelis adeliae) (0.097). We identified a signature of purifying selection (Ka/Ks < 1) across the mitochondrial genome, which is consistent with the hypothesis that purifying selection is constraining mitogenome evolution to maintain Oxidative phosphorylation (OXPHOS) proteins and functionality. Pairwise species maximum-likelihood analyses of selection at codon sites suggest positive selection has occurred on ATP8 (Fixed-Effects Likelihood, FEL) and ND4 (Single Likelihood Ancestral Counting, SLAC) in all penguins. In contrast, COX1 had a signature of strong negative selection. ND4 Ka/Ks ratios were highly correlated with SST (Mantel, p-value: 0.0001; GLM, p-value: 0.00001) and thus may be related to climate adaptation throughout penguin speciation. These results identify mtDNA candidate genes under selection which could be involved in broad-scale adaptations of penguins to their environment. Such knowledge may be particularly useful for developing predictive models of how these species may respond to severe climatic changes in the future.
Zhang, Ya; Kitajima, Masaaki; Whittle, Andrew J.; Liu, Wen-Tso
2017-01-01
The occurrence of pathogenic bacteria in drinking water distribution systems (DWDSs) is a major health concern, and our current understanding is mostly related to pathogenic species such as Legionella pneumophila and Mycobacterium avium but not to bacterial species closely related to them. In this study, genomic-based approaches were used to characterize pathogen-related species in relation to their abundance, diversity, potential pathogenicity, genetic exchange, and distribution across an urban drinking water system. Nine draft genomes recovered from 10 metagenomes were identified as Legionella (4 draft genomes), Mycobacterium (3 draft genomes), Parachlamydia (1 draft genome), and Leptospira (1 draft genome). The pathogenicity potential of these genomes was examined by the presence/absence of virulence machinery, including genes belonging to Type III, IV, and VII secretion systems and their effectors. Several virulence factors known to pathogenic species were detected with these retrieved draft genomes except the Leptospira-related genome. Identical clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins (CRISPR-Cas) genetic signatures were observed in two draft genomes recovered at different stages of the studied system, suggesting that the spacers in CRISPR-Cas could potentially be used as a biomarker in the monitoring of Legionella related strains at an evolutionary scale of several years across different drinking water production and distribution systems. Overall, metagenomics approach was an effective and complementary tool of culturing techniques to gain insights into the pathogenic characteristics and the CRISPR-Cas signatures of pathogen-related species in DWDSs. PMID:29097994
Malhotra, Sony; Sowdhamini, Ramanathan
2013-08-01
The interaction of proteins with their respective DNA targets is known to control many high-fidelity cellular processes. Performing a comprehensive survey of the sequenced genomes for DNA-binding proteins (DBPs) will help in understanding their distribution and the associated functions in a particular genome. Availability of fully sequenced genome of Arabidopsis thaliana enables the review of distribution of DBPs in this model plant genome. We used profiles of both structure and sequence-based DNA-binding families, derived from PDB and PFam databases, to perform the survey. This resulted in 4471 proteins, identified as DNA-binding in Arabidopsis genome, which are distributed across 300 different PFam families. Apart from several plant-specific DNA-binding families, certain RING fingers and leucine zippers also had high representation. Our search protocol helped to assign DNA-binding property to several proteins that were previously marked as unknown, putative or hypothetical in function. The distribution of Arabidopsis genes having a role in plant DNA repair were particularly studied and noted for their functional mapping. The functions observed to be overrepresented in the plant genome harbour DNA-3-methyladenine glycosylase activity, alkylbase DNA N-glycosylase activity and DNA-(apurinic or apyrimidinic site) lyase activity, suggesting their role in specialized functions such as gene regulation and DNA repair.
Brengdahl, Martin; Kimber, Christopher M; Maguire-Baxter, Jack; Friberg, Urban
2018-03-01
Life span differs between the sexes in many species. Three hypotheses to explain this interesting pattern have been proposed, involving different drivers: sexual selection, asymmetrical inheritance of cytoplasmic genomes, and hemizygosity of the X(Z) chromosome (the unguarded X hypothesis). Of these, the unguarded X has received the least experimental attention. This hypothesis suggests that the heterogametic sex suffers a shortened life span because recessive deleterious alleles on its single X(Z) chromosome are expressed unconditionally. In Drosophila melanogaster, the X chromosome is unusually large (∼20% of the genome), providing a powerful model for evaluating theories involving the X. Here, we test the unguarded X hypothesis by forcing D. melanogaster females from a laboratory population to express recessive X-linked alleles to the same degree as males, using females exclusively made homozygous for the X chromosome. We find no evidence for reduced life span or egg-to-adult viability due to X homozygozity. In contrast, males and females homozygous for an autosome both suffer similar, significant reductions in those traits. The logic of the unguarded X hypothesis is indisputable, but our results suggest that the degree to which recessive deleterious X-linked alleles depress performance in the heterogametic sex appears too small to explain general sex differences in life span. © 2018 The Author(s). Evolution © 2018 The Society for the Study of Evolution.
Experimental evidence supports a sex-specific selective sieve in mitochondrial genome evolution.
Innocenti, Paolo; Morrow, Edward H; Dowling, Damian K
2011-05-13
Mitochondria are maternally transmitted; hence, their genome can only make a direct and adaptive response to selection through females, whereas males represent an evolutionary dead end. In theory, this creates a sex-specific selective sieve, enabling deleterious mutations to accumulate in mitochondrial genomes if they exert male-specific effects. We tested this hypothesis, expressing five mitochondrial variants alongside a standard nuclear genome in Drosophila melanogaster, and found striking sexual asymmetry in patterns of nuclear gene expression. Mitochondrial polymorphism had few effects on nuclear gene expression in females but major effects in males, modifying nearly 10% of transcripts. These were mostly male-biased in expression, with enrichment hotspots in the testes and accessory glands. Our results suggest an evolutionary mechanism that results in mitochondrial genomes harboring male-specific mutation loads.
Detecting microsatellites within genomes: significant variation among algorithms.
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-04-18
Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Detecting microsatellites within genomes: significant variation among algorithms
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-01-01
Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102
Genomic comparison of closely related Giant Viruses supports an accordion-like model of evolution.
Filée, Jonathan
2015-01-01
Genome gigantism occurs so far in Phycodnaviridae and Mimiviridae (order Megavirales). Origin and evolution of these Giant Viruses (GVs) remain open questions. Interestingly, availability of a collection of closely related GV genomes enabling genomic comparisons offer the opportunity to better understand the different evolutionary forces acting on these genomes. Whole genome alignment for five groups of viruses belonging to the Mimiviridae and Phycodnaviridae families show that there is no trend of genome expansion or general tendency of genome contraction. Instead, GV genomes accumulated genomic mutations over the time with gene gains compensating the different losses. In addition, each lineage displays specific patterns of genome evolution. Mimiviridae (megaviruses and mimiviruses) and Chlorella Phycodnaviruses evolved mainly by duplications and losses of genes belonging to large paralogous families (including movements of diverse mobiles genetic elements), whereas Micromonas and Ostreococcus Phycodnaviruses derive most of their genetic novelties thought lateral gene transfers. Taken together, these data support an accordion-like model of evolution in which GV genomes have undergone successive steps of gene gain and gene loss, accrediting the hypothesis that genome gigantism appears early, before the diversification of the different GV lineages.
Benchmarking distributed data warehouse solutions for storing genomic variant information
Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.
2017-01-01
Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442
Washburne, Alex D.; Burby, Joshua W.; Lacker, Daniel; ...
2016-09-30
Systems as diverse as the interacting species in a community, alleles at a genetic locus, and companies in a market are characterized by competition (over resources, space, capital, etc) and adaptation. Neutral theory, built around the hypothesis that individual performance is independent of group membership, has found utility across the disciplines of ecology, population genetics, and economics, both because of the success of the neutral hypothesis in predicting system properties and because deviations from these predictions provide information about the underlying dynamics. However, most tests of neutrality are weak, based on static system properties such as species-abundance distributions or themore » number of singletons in a sample. Time-series data provide a window onto a system’s dynamics, and should furnish tests of the neutral hypothesis that are more powerful to detect deviations from neutrality and more informative about to the type of competitive asymmetry that drives the deviation. Here, we present a neutrality test for time-series data. We apply this test to several microbial time-series and financial time-series and find that most of these systems are not neutral. Our test isolates the covariance structure of neutral competition, thus facilitating further exploration of the nature of asymmetry in the covariance structure of competitive systems. Much like neutrality tests from population genetics that use relative abundance distributions have enabled researchers to scan entire genomes for genes under selection, we anticipate our time-series test will be useful for quick significance tests of neutrality across a range of ecological, economic, and sociological systems for which time-series data are available. Here, future work can use our test to categorize and compare the dynamic fingerprints of particular competitive asymmetries (frequency dependence, volatility smiles, etc) to improve forecasting and management of complex adaptive systems.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Washburne, Alex D.; Burby, Joshua W.; Lacker, Daniel
Systems as diverse as the interacting species in a community, alleles at a genetic locus, and companies in a market are characterized by competition (over resources, space, capital, etc) and adaptation. Neutral theory, built around the hypothesis that individual performance is independent of group membership, has found utility across the disciplines of ecology, population genetics, and economics, both because of the success of the neutral hypothesis in predicting system properties and because deviations from these predictions provide information about the underlying dynamics. However, most tests of neutrality are weak, based on static system properties such as species-abundance distributions or themore » number of singletons in a sample. Time-series data provide a window onto a system’s dynamics, and should furnish tests of the neutral hypothesis that are more powerful to detect deviations from neutrality and more informative about to the type of competitive asymmetry that drives the deviation. Here, we present a neutrality test for time-series data. We apply this test to several microbial time-series and financial time-series and find that most of these systems are not neutral. Our test isolates the covariance structure of neutral competition, thus facilitating further exploration of the nature of asymmetry in the covariance structure of competitive systems. Much like neutrality tests from population genetics that use relative abundance distributions have enabled researchers to scan entire genomes for genes under selection, we anticipate our time-series test will be useful for quick significance tests of neutrality across a range of ecological, economic, and sociological systems for which time-series data are available. Here, future work can use our test to categorize and compare the dynamic fingerprints of particular competitive asymmetries (frequency dependence, volatility smiles, etc) to improve forecasting and management of complex adaptive systems.« less
Base composition and expression level of human genes.
Arhondakis, Stilianos; Auletta, Fabio; Torelli, Giuseppe; D'Onofrio, Giuseppe
2004-01-21
It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
Positive selection on the killer whale mitogenome
Foote, Andrew D.; Morin, Phillip A.; Durban, John W.; Pitman, Robert L.; Wade, Paul; Willerslev, Eske; Gilbert, M. Thomas P.; da Fonseca, Rute R.
2011-01-01
Mitochondria produce up to 95 per cent of the eukaryotic cell's energy. The coding genes of the mitochondrial DNA may therefore evolve under selection owing to metabolic requirements. The killer whale, Orcinus orca, is polymorphic, has a global distribution and occupies a range of ecological niches. It is therefore a suitable organism for testing this hypothesis. We compared a global dataset of the complete mitochondrial genomes of 139 individuals for amino acid changes that were associated with radical physico-chemical property changes and were influenced by positive selection. Two such selected non-synonymous amino acid changes were found; one in each of two ecotypes that inhabit the Antarctic pack ice. Both substitutions were associated with changes in local polarity, increased steric constraints and α-helical tendencies that could influence overall metabolic performance, suggesting a functional change. PMID:20810427
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parham, James F.; Feldman, Chris R.; Boore, Jeffrey L.
2005-12-28
The big-headed turtle (Platysternon megacephalum) from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae). It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae) based on some studies of its morphology and mitochondrial (mt) DNA, however, other studies of morphology and nuclear (nu) DNA do not support that hypothesis. We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtlemore » mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles. Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mt DNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be maintained while some of the duplicated genes were eroded, examples of this are rare. So far, duplicated control regions have been reported for mt genomes from just 12 clades of metazoans, including Platysternon.« less
Poisson Approximation-Based Score Test for Detecting Association of Rare Variants.
Fang, Hongyan; Zhang, Hong; Yang, Yaning
2016-07-01
Genome-wide association study (GWAS) has achieved great success in identifying genetic variants, but the nature of GWAS has determined its inherent limitations. Under the common disease rare variants (CDRV) hypothesis, the traditional association analysis methods commonly used in GWAS for common variants do not have enough power for detecting rare variants with a limited sample size. As a solution to this problem, pooling rare variants by their functions provides an efficient way for identifying susceptible genes. Rare variant typically have low frequencies of minor alleles, and the distribution of the total number of minor alleles of the rare variants can be approximated by a Poisson distribution. Based on this fact, we propose a new test method, the Poisson Approximation-based Score Test (PAST), for association analysis of rare variants. Two testing methods, namely, ePAST and mPAST, are proposed based on different strategies of pooling rare variants. Simulation results and application to the CRESCENDO cohort data show that our methods are more powerful than the existing methods. © 2016 John Wiley & Sons Ltd/University College London.
Ramasamy, Sukanya; Ometto, Lino; Crava, Cristina M; Revadi, Santosh; Kaur, Rupinder; Horner, David S; Pisani, Davide; Dekker, Teun; Anfora, Gianfranco; Rota-Stabelli, Omar
2016-08-16
How the evolution of olfactory genes correlates with adaption to new ecological niches is still a debated topic. We explored this issue in Drosophila suzukii, an emerging model that reproduces on fresh fruit rather than in fermenting substrates like most other Drosophila We first annotated the repertoire of odorant receptors (ORs), odorant binding proteins (OBPs), and antennal ionotropic receptors (aIRs) in the genomes of two strains of D. suzukii and of its close relative Drosophila biarmipes We then analyzed these genes on the phylogeny of 14 Drosophila species: whereas ORs and OBPs are characterized by higher turnover rates in some lineages including D. suzukii, aIRs are conserved throughout the genus. Drosophila suzukii is further characterized by a non-random distribution of OR turnover on the gene phylogeny, consistent with a change in selective pressures. In D. suzukii, we found duplications and signs of positive selection in ORs with affinity for short-chain esters, and loss of function of ORs with affinity for volatiles produced during fermentation. These receptors-Or85a and Or22a-are characterized by divergent alleles in the European and American genomes, and we hypothesize that they may have been replaced by some of the duplicated ORs in corresponding neurons, a hypothesis reciprocally confirmed by electrophysiological recordings. Our study quantifies the evolution of olfactory genes in Drosophila and reveals an array of genomic events that can be associated with the ecological adaptations of D. suzukii. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Zou, Jiabin; Sun, Yongshuai; Li, Long; Wang, Gaini; Yue, Wei; Lu, Zhiqiang; Wang, Qian; Liu, Jianquan
2013-01-01
Background and Aims Genetic drift due to geographical isolation, gene flow and mutation rates together make it difficult to determine the evolutionary relationships of present-day species. In this study, population genetic data were used to model and decipher interspecific relationships, speciation patterns and gene flow between three species of spruce with similar morphology, Picea wilsonii, P. neoveitchii and P. morrisonicola. Picea wilsonii and P. neoveitchii occur from central to north-west China, where they have overlapping distributions. Picea morrisonicola, however, is restricted solely to the island of Taiwan and is isolated from the other two species by a long distance. Methods Sequence variations were examined in 18 DNA fragments for 22 populations, including three fragments from the chloroplast (cp) genome, two from the mitochondrial (mt) genome and 13 from the nuclear genome. Key Results In both the cpDNA and the mtDNA, P. morrisonicola accumulated more species-specific mutations than the other two species. However, most nuclear haplotypes of P. morrisonicola were shared by P. wilsonii, or derived from the dominant haplotypes found in that species. Modelling of population genetic data supported the hypothesis that P. morrisonicola derived from P. wilsonii within the more recent past, most probably indicating progenitor–derivative speciation with a distinct bottleneck, although further gene flow from the progenitor to the derivative continued. In addition, the occurrence was detected of an obvious mtDNA introgression from P. neoveitchii to P. wilsonii despite their early divergence. Conclusions The extent of mutation, introgression and lineage sorting taking place during interspecific divergence and demographic changes in the three species had varied greatly between the three genomes. The findings highlight the complex evolutionary histories of these three Asian spruce species. PMID:24220103
2010-01-01
Background Maternally inherited endosymbionts like Wolbachia pipientis are in linkage disequilibrium with the mtDNA of their hosts. Therefore, they can induce selective sweeps, decreasing genetic diversity over many generations. This sex ratio distorter, that is involved in the origin of parthenogenesis and other reproductive alterations, infects the parthenogenetic weevil Naupactus cervinus, a serious pest of ornamental and fruit plants. Results Molecular evolution analyses of mitochondrial (COI) and nuclear (ITS1) sequences from 309 individuals of Naupactus cervinus sampled over a broad range of its geographical distribution were carried out. Our results demonstrate lack of recombination in the nuclear fragment, non-random association between nuclear and mitochondrial genomes and the consequent coevolution of both genomes, being an indirect evidence of apomixis. This weevil is infected by a single Wolbachia strain, which could have caused a moderate bottleneck in the invaded population which survived the initial infection. Conclusions Clonal reproduction and Wolbachia infection induce the coevolution of bacterial, mitochondrial and nuclear genomes. The time elapsed since the Wolbachia invasion would have erased the traces of the demographic crash in the mtDNA, being the nuclear genome the only one that retained the signal of the bottleneck. The amount of genetic change accumulated in the mtDNA and the high prevalence of Wolbachia in all populations of N. cervinus agree with the hypothesis of an ancient infection. Wolbachia probably had great influence in shaping the genetic diversity of N. cervinus. However, it would have not caused the extinction of males, since sexual and asexual infected lineages coexisted until recent times. PMID:21050430
Competence in Streptococcus pneumoniae Is a Response to an Increasing Mutational Burden
Gagne, Alyssa L.; Stevens, Kathleen E.; Cassone, Marco; Pujari, Amit; Abiola, Olufunke E.; Chang, Diana J.; Sebert, Michael E.
2013-01-01
Competence for genetic transformation in Streptococcus pneumoniae has previously been described as a quorum-sensing trait regulated by a secreted peptide pheromone. Recently we demonstrated that competence is also activated by reduction in the accuracy of protein biosynthesis. We have now investigated whether errors upstream of translation in the form of random genomic mutations can provide a similar stimulus. Here we show that generation of a mutator phenotype in S. pneumoniae through deletions of mutX, hexA or hexB enhanced the expression of competence. Similarly, chemical mutagenesis with the nucleotide analog dPTP promoted development of competence. To investigate the relationship between mutational load and the activation of competence, replicate lineages of the mutX strain were serially passaged under conditions of relaxed selection allowing random accumulation of secondary mutations. Competence increased with propagation in these lineages but not in control lineages having wild-type mutX. Resequencing of these derived strains revealed between 1 and 9 single nucleotide polymorphisms (SNPs) per lineage, which were broadly distributed across the genome and did not involve known regulators of competence. Notably, the frequency of competence development among the sequenced strains correlated significantly with the number of nonsynonymous mutations that had been acquired. Together, these observations provide support for the hypothesis that competence in S. pneumoniae is regulated in response to the accumulated burden of coding mutations in the bacterial genome. In contrast to previously described DNA damage response systems that are activated by physical lesions in the chromosome, this pneumococcal pathway may represent a unique stress response system that monitors the coding integrity of the genome. PMID:23967325
Qanbari, Saber; Strom, Tim M.; Haberer, Georg; Weigend, Steffen; Gheyas, Almas A.; Turner, Frances; Burt, David W.; Preisinger, Rudolf; Gianola, Daniel; Simianer, Henner
2012-01-01
In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes. PMID:23209582
Caricati, Luca
2017-01-01
The status-legitimacy hypothesis was tested by analyzing cross-national data about social inequality. Several indicators were used as indexes of social advantage: social class, personal income, and self-position in the social hierarchy. Moreover, inequality and freedom in nations, as indexed by Gini and by the human freedom index, were considered. Results from 36 nations worldwide showed no support for the status-legitimacy hypothesis. The perception that income distribution was fair tended to increase as social advantage increased. Moreover, national context increased the difference between advantaged and disadvantaged people in the perception of social fairness: Contrary to the status-legitimacy hypothesis, disadvantaged people were more likely than advantaged people to perceive income distribution as too large, and this difference increased in nations with greater freedom and equality. The implications for the status-legitimacy hypothesis are discussed.
Gil-Serna, Jessica; García-Díaz, Marta; González-Jaén, María Teresa; Vázquez, Covadonga; Patiño, Belén
2018-03-02
Ochratoxin A (OTA) is one of the most important mycotoxins due to its toxic properties and worldwide distribution which is produced by several Aspergillus and Penicillium species. The knowledge of OTA biosynthetic genes and understanding of the mechanisms involved in their regulation are essential. In this work, we obtained a clear picture of biosynthetic genes organization in the main OTA-producing Aspergillus and Penicillium species (A. steynii, A. westerdijkiae, A. niger, A. carbonarius and P. nordicum) using complete genome sequences obtained in this work or previously available on databases. The results revealed a region containing five ORFs which predicted five proteins: halogenase, bZIP transcription factor, cytochrome P450 monooxygenase, non-ribosomal peptide synthetase and polyketide synthase in all the five species. Genetic synteny was conserved in both Penicillium and Aspergillus species although genomic location seemed to be different since the clusters presented different flanking regions (except for A. steynii and A. westerdijkiae); these observations support the hypothesis of the orthology of this genomic region and that it might have been acquired by horizontal transfer. New real-time RT-PCR assays for quantification of the expression of these OTA biosynthetic genes were developed. In all species, the five genes were consistently expressed in OTA-producing strains in permissive conditions. These protocols might favour futures studies on the regulation of biosynthetic genes in order to develop new efficient control methods to avoid OTA entering the food chain. Copyright © 2018 Elsevier B.V. All rights reserved.
Purdue ionomics information management system. An integrated functional genomics platform.
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S; Salt, David E
2007-02-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics.
Origins of DNA Replication and Amplification in the Breast Cancer Genome
2012-09-01
W81XWH-10-1-0463 TITLE: Origins of DNA Replication and Amplification in the...2. REPORT TYPE Final 3. DATES COVERED 1 Sep 2010 – 31 Aug 2012 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Origins of DNA Replication and...described in the DOD funded parent grant, to test our hypothesis we need to map origins of DNA replication in the genome and ask which of these
França, Giovanny Vinícius Araújo de; De Lucia Rolfe, Emanuella; Horta, Bernardo Lessa; Gigante, Denise Petrucci; Yudkin, John S; Ong, Ken K; Victora, Cesar Gomes
2017-01-01
We aimed to identify the independent associations of genomic ancestry and education level with abdominal fat distributions in the 1982 Pelotas birth cohort study, Brazil. In 2,890 participants (1,409 men and 1,481 women), genomic ancestry was assessed using genotype data on 370,539 genome-wide variants to quantify ancestral proportions in each individual. Years of completed education was used to indicate socio-economic position. Visceral fat depth and subcutaneous abdominal fat thickness were measured by ultrasound at age 29-31y; these measures were adjusted for BMI to indicate abdominal fat distributions. Linear regression models were performed, separately by sex. Admixture was observed between European (median proportion 85.3), African (6.6), and Native American (6.3) ancestries, with a strong inverse correlation between the African and European ancestry scores (ρ = -0.93; p<0.001). Independent of education level, African ancestry was inversely associated with both visceral and subcutaneous abdominal fat distributions in men (both P = 0.001), and inversely associated with subcutaneous abdominal fat distribution in women (p = 0.009). Independent of genomic ancestry, higher education level was associated with lower visceral fat, but higher subcutaneous fat, in both men and women (all p<0.001). Our findings, from an admixed population, indicate that both genomic ancestry and education level were independently associated with abdominal fat distribution in adults. African ancestry appeared to lower abdominal fat distributions, particularly in men.
De Lucia Rolfe, Emanuella; Horta, Bernardo Lessa; Gigante, Denise Petrucci; Yudkin, John S.; Ong, Ken K.; Victora, Cesar Gomes
2017-01-01
We aimed to identify the independent associations of genomic ancestry and education level with abdominal fat distributions in the 1982 Pelotas birth cohort study, Brazil. In 2,890 participants (1,409 men and 1,481 women), genomic ancestry was assessed using genotype data on 370,539 genome-wide variants to quantify ancestral proportions in each individual. Years of completed education was used to indicate socio-economic position. Visceral fat depth and subcutaneous abdominal fat thickness were measured by ultrasound at age 29–31y; these measures were adjusted for BMI to indicate abdominal fat distributions. Linear regression models were performed, separately by sex. Admixture was observed between European (median proportion 85.3), African (6.6), and Native American (6.3) ancestries, with a strong inverse correlation between the African and European ancestry scores (ρ = -0.93; p<0.001). Independent of education level, African ancestry was inversely associated with both visceral and subcutaneous abdominal fat distributions in men (both P = 0.001), and inversely associated with subcutaneous abdominal fat distribution in women (p = 0.009). Independent of genomic ancestry, higher education level was associated with lower visceral fat, but higher subcutaneous fat, in both men and women (all p<0.001). Our findings, from an admixed population, indicate that both genomic ancestry and education level were independently associated with abdominal fat distribution in adults. African ancestry appeared to lower abdominal fat distributions, particularly in men. PMID:28582437
Genome expansion via lineage splitting and genome reduction in the cicada endosymbiont Hodgkinia.
Campbell, Matthew A; Van Leuven, James T; Meister, Russell C; Carey, Kaitlin M; Simon, Chris; McCutcheon, John P
2015-08-18
Comparative genomics from mitochondria, plastids, and mutualistic endosymbiotic bacteria has shown that the stable establishment of a bacterium in a host cell results in genome reduction. Although many highly reduced genomes from endosymbiotic bacteria are stable in gene content and genome structure, organelle genomes are sometimes characterized by dramatic structural diversity. Previous results from Candidatus Hodgkinia cicadicola, an endosymbiont of cicadas, revealed that some lineages of this bacterium had split into two new cytologically distinct yet genetically interdependent species. It was hypothesized that the long life cycle of cicadas in part enabled this unusual lineage-splitting event. Here we test this hypothesis by investigating the structure of the Ca. Hodgkinia genome in one of the longest-lived cicadas, Magicicada tredecim. We show that the Ca. Hodgkinia genome from M. tredecim has fragmented into multiple new chromosomes or genomes, with at least some remaining partitioned into discrete cells. We also show that this lineage-splitting process has resulted in a complex of Ca. Hodgkinia genomes that are 1.1-Mb pairs in length when considered together, an almost 10-fold increase in size from the hypothetical single-genome ancestor. These results parallel some examples of genome fragmentation and expansion in organelles, although the mechanisms that give rise to these extreme genome instabilities are likely different.
A Genome-Wide Survey of Genetic Instability by Transposition in Drosophila Hybrids
Vela, Doris; Fontdevila, Antonio; Vieira, Cristina; García Guerreiro, María Pilar
2014-01-01
Hybridization between species is a genomic instability factor involved in increasing mutation rate and new chromosomal rearrangements. Evidence of a relationship between interspecific hybridization and transposable element mobilization has been reported in different organisms, but most studies are usually performed with particular TEs and do not discuss the real effect of hybridization on the whole genome. We have therefore studied whole genome instability of Drosophila interspecific hybrids, looking for the presence of new AFLP markers in hybrids. A high percentage (27–90%) of the instability markers detected corresponds to TEs belonging to classes I and II. Moreover, three transposable elements (Osvaldo, Helena and Galileo) representative of different families, showed an overall increase of transposition rate in hybrids compared to parental species. This research confirms the hypothesis that hybridization induces genomic instability by transposition bursts and suggests that genomic stress by transposition could contribute to a relaxation of mechanisms controlling TEs in the Drosophila genome. PMID:24586475
Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate
Dehal, Paramvir; Boore, Jeffrey L
2005-01-01
The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish–tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of four-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage. PMID:16128622
Recently evolved human-specific methylated regions are enriched in schizophrenia signals.
Banerjee, Niladri; Polushina, Tatiana; Bettella, Francesco; Giddaluru, Sudheer; Steen, Vidar M; Andreassen, Ole A; Le Hellard, Stephanie
2018-05-11
One explanation for the persistence of schizophrenia despite the reduced fertility of patients is that it is a by-product of recent human evolution. This hypothesis is supported by evidence suggesting that recently-evolved genomic regions in humans are involved in the genetic risk for schizophrenia. Using summary statistics from genome-wide association studies (GWAS) of schizophrenia and 11 other phenotypes, we tested for enrichment of association with GWAS traits in regions that have undergone methylation changes in the human lineage compared to Neanderthals and Denisovans, i.e. human-specific differentially methylated regions (DMRs). We used analytical tools that evaluate polygenic enrichment of a subset of genomic variants against all variants. Schizophrenia was the only trait in which DMR SNPs showed clear enrichment of association that passed the genome-wide significance threshold. The enrichment was not observed for Neanderthal or Denisovan DMRs. The enrichment seen in human DMRs is comparable to that for genomic regions tagged by Neanderthal Selective Sweep markers, and stronger than that for Human Accelerated Regions. The enrichment survives multiple testing performed through permutation (n = 10,000) and bootstrapping (n = 5000) in INRICH (p < 0.01). Some enrichment of association with height was observed at the gene level. Regions where DNA methylation modifications have changed during recent human evolution show enrichment of association with schizophrenia and possibly with height. Our study further supports the hypothesis that genetic variants conferring risk of schizophrenia co-occur in genomic regions that have changed as the human species evolved. Since methylation is an epigenetic mark, potentially mediated by environmental changes, our results also suggest that interaction with the environment might have contributed to that association.
Covell, David G
2015-01-01
Developing reliable biomarkers of tumor cell drug sensitivity and resistance can guide hypothesis-driven basic science research and influence pre-therapy clinical decisions. A popular strategy for developing biomarkers uses characterizations of human tumor samples against a range of cancer drug responses that correlate with genomic change; developed largely from the efforts of the Cancer Cell Line Encyclopedia (CCLE) and Sanger Cancer Genome Project (CGP). The purpose of this study is to provide an independent analysis of this data that aims to vet existing and add novel perspectives to biomarker discoveries and applications. Existing and alternative data mining and statistical methods will be used to a) evaluate drug responses of compounds with similar mechanism of action (MOA), b) examine measures of gene expression (GE), copy number (CN) and mutation status (MUT) biomarkers, combined with gene set enrichment analysis (GSEA), for hypothesizing biological processes important for drug response, c) conduct global comparisons of GE, CN and MUT as biomarkers across all drugs screened in the CGP dataset, and d) assess the positive predictive power of CGP-derived GE biomarkers as predictors of drug response in CCLE tumor cells. The perspectives derived from individual and global examinations of GEs, MUTs and CNs confirm existing and reveal unique and shared roles for these biomarkers in tumor cell drug sensitivity and resistance. Applications of CGP-derived genomic biomarkers to predict the drug response of CCLE tumor cells finds a highly significant ROC, with a positive predictive power of 0.78. The results of this study expand the available data mining and analysis methods for genomic biomarker development and provide additional support for using biomarkers to guide hypothesis-driven basic science research and pre-therapy clinical decisions.
Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations.
Rodriguez-Flores, Juan L; Fakhro, Khalid; Agosto-Perez, Francisco; Ramstetter, Monica D; Arbiza, Leonardo; Vincent, Thomas L; Robay, Amal; Malek, Joel A; Suhre, Karsten; Chouchane, Lotfi; Badii, Ramin; Al-Nabet Al-Marri, Ajayeb; Abi Khalil, Charbel; Zirie, Mahmoud; Jayyousi, Amin; Salit, Jacqueline; Keinan, Alon; Clark, Andrew G; Crystal, Ronald G; Mezey, Jason G
2016-02-01
An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out-of-Africa migrations. © 2016 Rodriguez-Flores et al.; Published by Cold Spring Harbor Laboratory Press.
Genome scale engineering techniques for metabolic engineering.
Liu, Rongming; Bassalo, Marcelo C; Zeitoun, Ramsey I; Gill, Ryan T
2015-11-01
Metabolic engineering has expanded from a focus on designs requiring a small number of genetic modifications to increasingly complex designs driven by advances in genome-scale engineering technologies. Metabolic engineering has been generally defined by the use of iterative cycles of rational genome modifications, strain analysis and characterization, and a synthesis step that fuels additional hypothesis generation. This cycle mirrors the Design-Build-Test-Learn cycle followed throughout various engineering fields that has recently become a defining aspect of synthetic biology. This review will attempt to summarize recent genome-scale design, build, test, and learn technologies and relate their use to a range of metabolic engineering applications. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Citrus and Prunuscopia-like retrotransposons.
Asíns, M J; Monforte, A J; Mestre, P F; Carbonell, E A
1999-08-01
Many of the world's most important citrus cultivars ("Washington Navel", satsumas, clementines) have arisen through somatic mutation. This phenomenon occurs fairly often in the various species and varieties of the genus.The presence of copia-like retrotransposons has been investigated in fruit trees, especially citrus, by using a PCR assay designed to detect copia-like reverse transcriptase (RT) sequences. Amplification products from a genotype of each the following species Citrus sinensis, Citrus grandis, Citrus clementina, Prunus armeniaca and Prunus amygdalus, were cloned and some of them sequenced. Southern-blot hybridization using RT clones as probes showed that multiple copies are integrated throughout the citrus genome, while only 1-3 copies are detected in the P. armeniaca genome, which is in accordance with the Citrus and Prunus genome sizes. Sequence analysis of RT clones allowed a search for homologous sequences within three gene banks. The most similar ones correspond to RT domains of copia-like retrotransposons from unrelated plant species. Cluster analysis of these sequences has shown a great heterogeneity among RT domains cloned from the same genotype. This finding supports the hypothesis that horizontal transmission of retrotransposons has occurred in the past. The species presenting a RT sequence most similar to citrus RT clones is Gnetum montanum, a gymnosperm whose distribution area coincides with two of the main centers of origin of Citrus spp. A new C-methylated restriction DNA fragment containing a RT sequence is present in navel sweet oranges, but not in Valencia oranges from which the former originated suggesting, that retrotransposon activity might be, at least in part, involved in the genetic variability among sweet orange cultivars. Given that retrotransposons are quite abundant throughout the citrus genome, their activity should be investigated thoroughly before commercializing any transgenic citrus plant where the transgene(s) is part of a viral genome in order to avoid its possible recombination with an active retroelement. Focusing on other strategies to control virus diseases is recommended in citrus.
Perez, Manolo F; Carstens, Bryan C; Rodrigues, Gustavo L; Moraes, Evandro M
2016-02-01
The Pilosocereus aurisetus complex consists of eight cactus species with a fragmented distribution associated to xeric enclaves within the Cerrado biome in eastern South America. The phylogeny of these species is incompletely resolved, and this instability complicates evolutionary analyses. Previous analyses based on both plastid and microsatellite markers suggested that this complex contained species with inherent phylogeographic structure, which was attributed to recent diversification and recurring range shifts. However, limitations of the molecular markers used in these analyses prevented some questions from being properly addressed. In order to better understand the relationship among these species and make a preliminary assessment of the genetic structure within them, we developed anonymous nuclear loci from pyrosequencing data of 40 individuals from four species in the P. aurisetus complex. The data obtained from these loci were used to identify genetic clusters within species, and to investigate the phylogenetic relationship among these inferred clusters using a species tree methodology. Coupled with a palaeodistributional modelling, our results reveal a deep phylogenetic and climatic disjunction between two geographic lineages. Our results highlight the importance of sampling more regions from the genome to gain better insights on the evolution of species with an intricate evolutionary history. The methodology used here provides a feasible approach to develop numerous genealogical molecular markers throughout the genome for non-model species. These data provide a more robust hypothesis for the relationship among the lineages of the P. aurisetus complex. Copyright © 2015 Elsevier Inc. All rights reserved.
Impact of a nonuniform charge distribution on virus assembly
NASA Astrophysics Data System (ADS)
Li, Siyu; Erdemci-Tandogan, Gonca; Wagner, Jef; van der Schoot, Paul; Zandi, Roya
2017-08-01
Many spherical viruses encapsulate their genomes in protein shells with icosahedral symmetry. This process is spontaneous and driven by electrostatic interactions between positive domains on the virus coat proteins and the negative genomes. We model the effect of the nonuniform icosahedral charge distribution from the protein shell instead using a mean-field theory. We find that this nonuniform charge distribution strongly affects the optimal genome length and that it can explain the experimentally observed phenomenon of overcharging of virus and viruslike particles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leebens-Mack, Jim; Raubeson, Linda A.; Cui, Liying
2005-05-27
While there has been strong support for Amborella and Nymphaeales (water lilies) as branching from basal-most nodes in the angiosperm phylogeny, this hypothesis has recently been challenged by phylogenetic analyses of 61 protein-coding genes extracted from the chloroplast genome sequences of Amborella, Nymphaea and 12 other available land plant chloroplast genomes. These character-rich analyses placed the monocots, represented by three grasses (Poaceae), as sister to all other extant angiosperm lineages. We have extracted protein-coding regions from draft sequences for six additional chloroplast genomes to test whether this surprising result could be an artifact of long-branch attraction due to limited taxonmore » sampling. The added taxa include three monocots (Acorus, Yucca and Typha), a water lily (Nuphar), a ranunculid(Ranunculus), and a gymnosperm (Ginkgo). Phylogenetic analyses of the expanded DNA and protein datasets together with microstructural characters (indels) provided unambiguous support for Amborella and the Nymphaeales as branching from the basal-most nodes in the angiospermphylogeny. However, their relative positions proved to be dependent on method of analysis, with parsimony favoring Amborella as sister to all other angiosperms, and maximum likelihood and neighbor-joining methods favoring an Amborella + Nympheales clade as sister. The maximum likelihood phylogeny supported the later hypothesis, but the likelihood for the former hypothesis was not significantly different. Parametric bootstrap analysis, single gene phylogenies, estimated divergence dates and conflicting in del characters all help to illuminate the nature of the conflict in resolution of the most basal nodes in the angiospermphylogeny. Molecular dating analyses provided median age estimates of 161 mya for the most recent common ancestor of all extant angiosperms and 145 mya for the most recent common ancestor of monocots, magnoliids andeudicots. Whereas long sequences reduce variance in branch lengths and molecular dating estimates, the impact of improved taxon sampling on the rooting of the angiosperm phylogeny together with the results of parametric bootstrap analyses demonstrate how long-branch attraction can mislead genome-scale phylogenetic analyses.« less
Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong
2013-01-01
Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856
2011-01-01
Background The carnivorous plant Utricularia gibba (bladderwort) is remarkable in having a minute genome, which at ca. 80 megabases is approximately half that of Arabidopsis. Bladderworts show an incredible diversity of forms surrounding a defined theme: tiny, bladder-like suction traps on terrestrial, epiphytic, or aquatic plants with a diversity of unusual vegetative forms. Utricularia plants, which are rootless, are also anomalous in physiological features (respiration and carbon distribution), and highly enhanced molecular evolutionary rates in chloroplast, mitochondrial and nuclear ribosomal sequences. Despite great interest in the genus, no genomic resources exist for Utricularia, and the substitution rate increase has received limited study. Results Here we describe the sequencing and analysis of the Utricularia gibba transcriptome. Three different organs were surveyed, the traps, the vegetative shoot bodies, and the inflorescence stems. We also examined the bladderwort transcriptome under diverse stress conditions. We detail aspects of functional classification, tissue similarity, nitrogen and phosphorus metabolism, respiration, DNA repair, and detoxification of reactive oxygen species (ROS). Long contigs of plastid and mitochondrial genomes, as well as sequences for 100 individual nuclear genes, were compared with those of other plants to better establish information on molecular evolutionary rates. Conclusion The Utricularia transcriptome provides a detailed genomic window into processes occurring in a carnivorous plant. It contains a deep representation of the complex metabolic pathways that characterize a putative minimal plant genome, permitting its use as a source of genomic information to explore the structural, functional, and evolutionary diversity of the genus. Vegetative shoots and traps are the most similar organs by functional classification of their transcriptome, the traps expressing hydrolytic enzymes for prey digestion that were previously thought to be encoded by bacteria. Supporting physiological data, global gene expression analysis shows that traps significantly over-express genes involved in respiration and that phosphate uptake might occur mainly in traps, whereas nitrogen uptake could in part take place in vegetative parts. Expression of DNA repair and ROS detoxification enzymes may be indicative of a response to increased respiration. Finally, evidence from the bladderwort transcriptome, direct measurement of ROS in situ, and cross-species comparisons of organellar genomes and multiple nuclear genes supports the hypothesis that increased nucleotide substitution rates throughout the plant may be due to the mutagenic action of amplified ROS production. PMID:21639913
Linking genomics and ecology to investigate the complex evolution of an invasive Drosophila pest.
Ometto, Lino; Cestaro, Alessandro; Ramasamy, Sukanya; Grassi, Alberto; Revadi, Santosh; Siozios, Stefanos; Moretto, Marco; Fontana, Paolo; Varotto, Claudio; Pisani, Davide; Dekker, Teun; Wrobel, Nicola; Viola, Roberto; Pertot, Ilaria; Cavalieri, Duccio; Blaxter, Mark; Anfora, Gianfranco; Rota-Stabelli, Omar
2013-01-01
Drosophilid fruit flies have provided science with striking cases of behavioral adaptation and genetic innovation. A recent example is the invasive pest Drosophila suzukii, which, unlike most other Drosophila, lays eggs and feeds on undamaged, ripening fruits. This not only poses a serious threat for fruit cultivation but also offers an interesting model to study evolution of behavioral innovation. We developed genome and transcriptome resources for D. suzukii. Coupling analyses of these data with field observations, we propose a hypothesis of the origin of its peculiar ecology. Using nuclear and mitochondrial phylogenetic analyses, we confirm its Asian origin and reveal a surprising sister relationship between the eugracilis and the melanogaster subgroups. Although the D. suzukii genome is comparable in size and repeat content to other Drosophila species, it has the lowest nucleotide substitution rate among the species analyzed in this study. This finding is compatible with the overwintering diapause of D. suzukii, which results in a reduced number of generations per year compared with its sister species. Genome-scale relaxed clock analyses support a late Miocene origin of D. suzukii, concomitant with paleogeological and climatic conditions that suggest an adaptation to temperate montane forests, a hypothesis confirmed by field trapping. We propose a causal link between the ecological adaptations of D. suzukii in its native habitat and its invasive success in Europe and North America.
Podsiadlowski, Lars; Braband, Anke; Mayer, Georg
2008-01-01
Onychophora (velvet worms) play a crucial role in current discussions on position of arthropods. The ongoing Articulata/Ecdysozoa debate is in need of additional ground pattern characters for Panarthropoda (Arthropoda, Tardigrada, and Onychophora). Hence, Onychophora is an important outgroup taxon in resolving the relationships among arthropods, irrespective of whether morphological or molecular data are used. To date, there has been a noticeable lack of mitochondrial genome data from onychophorans. Here, we present the first complete mitochondrial genome sequence of an onychophoran, Epiperipatus biolleyi (Peripatidae), which shows several characteristic features. Specifically, the gene order is considerably different from that in other arthropods and other bilaterians. In addition, there is a lack of 9 tRNA genes usually present in bilaterian mitochondrial genomes. All these missing tRNAs have anticodon sequences corresponding to 4-fold degenerate codons, whereas the persisting 13 tRNAs all have anticodons pairing with 2-fold degenerate codons. Sequence-based phylogenetic analysis of the mitochondrial protein-coding genes provides a robust support for a clade consisting of Onychophora, Priapulida, and Arthropoda, which confirms the Ecdysozoa hypothesis. However, resolution of the internal ecdysozoan relationships suffers from a cluster of long-branching taxa (including Nematoda and Platyhelminthes) and a lack of data from Tardigrada and further nemathelminth taxa in addition to nematodes and priapulids.
Linking Genomics and Ecology to Investigate the Complex Evolution of an Invasive Drosophila Pest
Ometto, Lino; Cestaro, Alessandro; Ramasamy, Sukanya; Grassi, Alberto; Revadi, Santosh; Siozios, Stefanos; Moretto, Marco; Fontana, Paolo; Varotto, Claudio; Pisani, Davide; Dekker, Teun; Wrobel, Nicola; Viola, Roberto; Pertot, Ilaria; Cavalieri, Duccio; Blaxter, Mark; Anfora, Gianfranco; Rota-Stabelli, Omar
2013-01-01
Drosophilid fruit flies have provided science with striking cases of behavioral adaptation and genetic innovation. A recent example is the invasive pest Drosophila suzukii, which, unlike most other Drosophila, lays eggs and feeds on undamaged, ripening fruits. This not only poses a serious threat for fruit cultivation but also offers an interesting model to study evolution of behavioral innovation. We developed genome and transcriptome resources for D. suzukii. Coupling analyses of these data with field observations, we propose a hypothesis of the origin of its peculiar ecology. Using nuclear and mitochondrial phylogenetic analyses, we confirm its Asian origin and reveal a surprising sister relationship between the eugracilis and the melanogaster subgroups. Although the D. suzukii genome is comparable in size and repeat content to other Drosophila species, it has the lowest nucleotide substitution rate among the species analyzed in this study. This finding is compatible with the overwintering diapause of D. suzukii, which results in a reduced number of generations per year compared with its sister species. Genome-scale relaxed clock analyses support a late Miocene origin of D. suzukii, concomitant with paleogeological and climatic conditions that suggest an adaptation to temperate montane forests, a hypothesis confirmed by field trapping. We propose a causal link between the ecological adaptations of D. suzukii in its native habitat and its invasive success in Europe and North America. PMID:23501831
RNA Synthesis by in Vitro Selected Ribozymes for Recreating an RNA World
Martin, Lyssa L.; Unrau, Peter J.; Müller, Ulrich F.
2015-01-01
The RNA world hypothesis states that during an early stage of life, RNA molecules functioned as genome and as the only genome-encoded catalyst. This hypothesis is supported by several lines of evidence, one of which is the in vitro selection of catalytic RNAs (ribozymes) in the laboratory for a wide range of reactions that might have been used by RNA world organisms. This review focuses on three types of ribozymes that could have been involved in the synthesis of RNA, the core activity in the self-replication of RNA world organisms. These ribozyme classes catalyze nucleoside synthesis, triphosphorylation, and the polymerization of nucleoside triphosphates. The strengths and weaknesses regarding each ribozyme’s possible function in a self-replicating RNA network are described, together with the obstacles that need to be overcome before an RNA world organism can be generated in the laboratory. PMID:25610978
Pita, Sebastián; Panzera, Francisco; Mora, Pablo; Vela, Jesús; Palomeque, Teresa; Lorite, Pedro
2016-01-01
Abstract Next-generation sequencing data analysis on Triatoma infestans Klug, 1834 (Heteroptera, Cimicomorpha, Reduviidae) revealed the presence of the ancestral insect (TTAGG)n telomeric motif in its genome. Fluorescence in situ hybridization confirms that chromosomes bear this telomeric sequence in their chromosomal ends. Furthermore, motif amount estimation was about 0.03% of the total genome, so that the average telomere length in each chromosomal end is almost 18 kb long. We also detected the presence of (TTAGG)n telomeric repeat in mitotic and meiotic chromosomes in other three species of Triatominae: Triatoma dimidiata Latreille, 1811, Dipetalogaster maxima Uhler, 1894, and Rhodnius prolixus Ståhl, 1859. This is the first report of the (TTAGG)n telomeric repeat in the infraorder Cimicomorpha, contradicting the currently accepted hypothesis that evolutionarily recent heteropterans lack this ancestral insect telomeric sequence. PMID:27830050
Jose de Carli, Gabriel; Campos Pereira, Tiago
2017-09-01
Spontaneous parthenogenetic and androgenetic events occur in humans, but they result in tumours: the ovarian teratoma and the hydatidiform mole, respectively. However, the observation of fetiform (ovarian) teratomas, the serependious identification of several chimeric human parthenotes and androgenotes in the last two decades, along with the creation of viable bi-maternal mice in the laboratory based on minor genetic interferences, raises the question of whether natural cases of clinically healthy human parthenotes have gone unnoticed to science. Here we present a hypothesis based on three elements to support the existence of such elusive individuals: mutations affecting (i) genomic imprinting, (ii) meiosis and (iii) oocyte activation. Additionally, we suggest that the routine practice of whole genome sequencing on every single newborn worldwide will be the ultimate test to this controversial, yet astonishing hypothesis. Finally, several medical implications of such intriguing event are presented. Copyright © 2017 Elsevier Ltd. All rights reserved.
Host-Parasite Interactions and Purifying Selection in a Microsporidian Parasite of Honey Bees
Huang, Qiang; Chen, Yan Ping; Wang, Rui Wu; Cheng, Shang; Evans, Jay D.
2016-01-01
To clarify the mechanisms of Nosema ceranae parasitism, we deep-sequenced both honey bee host and parasite mRNAs throughout a complete 6-day infection cycle. By time-series analysis, 1122 parasite genes were significantly differently expressed during the reproduction cycle, clustering into 4 expression patterns. We found reactive mitochondrial oxygen species modulator 1 of the host to be significantly down regulated during the entire infection period. Our data support the hypothesis that apoptosis of honey bee cells was suppressed during infection. We further analyzed genome-wide genetic diversity of this parasite by comparing samples collected from the same site in 2007 and 2013. The number of SNP positions per gene and the proportion of non-synonymous substitutions per gene were significantly reduced over this time period, suggesting purifying selection on the parasite genome and supporting the hypothesis that a subset of N. ceranae strains might be dominating infection. PMID:26840596
Host-Parasite Interactions and Purifying Selection in a Microsporidian Parasite of Honey Bees.
Huang, Qiang; Chen, Yan Ping; Wang, Rui Wu; Cheng, Shang; Evans, Jay D
2016-01-01
To clarify the mechanisms of Nosema ceranae parasitism, we deep-sequenced both honey bee host and parasite mRNAs throughout a complete 6-day infection cycle. By time-series analysis, 1122 parasite genes were significantly differently expressed during the reproduction cycle, clustering into 4 expression patterns. We found reactive mitochondrial oxygen species modulator 1 of the host to be significantly down regulated during the entire infection period. Our data support the hypothesis that apoptosis of honey bee cells was suppressed during infection. We further analyzed genome-wide genetic diversity of this parasite by comparing samples collected from the same site in 2007 and 2013. The number of SNP positions per gene and the proportion of non-synonymous substitutions per gene were significantly reduced over this time period, suggesting purifying selection on the parasite genome and supporting the hypothesis that a subset of N. ceranae strains might be dominating infection.
Epigenetics: the language of the cell?
Huang, Biao; Jiang, Cizhong; Zhang, Rongxin
2014-02-01
Epigenetics is one of the most rapidly developing fields of biological research. Breakthroughs in several technologies have enabled the possibility of genome-wide epigenetic research, for example the mapping of human genome-wide DNA methylation. In addition, with the development of various high-throughput and high-resolution sequencing technologies, a large number of functional noncoding RNAs have been identified. Massive studies indicated that these functional ncRNA also play an important role in epigenetics. In this review, we gain inspiration from the recent proposal of the ceRNAs hypothesis. This hypothesis proposes that miRNAs act as a language of communication. Accordingly, we further deduce that all of epigenetics may functionally acquire such a unique language characteristic. In summary, various epigenetic markers may not only participate in regulating cellular processes, but they may also act as the intracellular 'language' of communication and are involved in extensive information exchanges within cell.
Adaptive evolution of complex innovations through stepwise metabolic niche expansion.
Szappanos, Balázs; Fritzemeier, Jonathan; Csörgő, Bálint; Lázár, Viktória; Lu, Xiaowen; Fekete, Gergely; Bálint, Balázs; Herczeg, Róbert; Nagy, István; Notebaart, Richard A; Lercher, Martin J; Pál, Csaba; Papp, Balázs
2016-05-20
A central challenge in evolutionary biology concerns the mechanisms by which complex metabolic innovations requiring multiple mutations arise. Here, we propose that metabolic innovations accessible through the addition of a single reaction serve as stepping stones towards the later establishment of complex metabolic features in another environment. We demonstrate the feasibility of this hypothesis through three complementary analyses. First, using genome-scale metabolic modelling, we show that complex metabolic innovations in Escherichia coli can arise via changing nutrient conditions. Second, using phylogenetic approaches, we demonstrate that the acquisition patterns of complex metabolic pathways during the evolutionary history of bacterial genomes support the hypothesis. Third, we show how adaptation of laboratory populations of E. coli to one carbon source facilitates the later adaptation to another carbon source. Our work demonstrates how complex innovations can evolve through series of adaptive steps without the need to invoke non-adaptive processes.
Adaptive evolution of complex innovations through stepwise metabolic niche expansion
Szappanos, Balázs; Fritzemeier, Jonathan; Csörgő, Bálint; Lázár, Viktória; Lu, Xiaowen; Fekete, Gergely; Bálint, Balázs; Herczeg, Róbert; Nagy, István; Notebaart, Richard A.; Lercher, Martin J.; Pál, Csaba; Papp, Balázs
2016-01-01
A central challenge in evolutionary biology concerns the mechanisms by which complex metabolic innovations requiring multiple mutations arise. Here, we propose that metabolic innovations accessible through the addition of a single reaction serve as stepping stones towards the later establishment of complex metabolic features in another environment. We demonstrate the feasibility of this hypothesis through three complementary analyses. First, using genome-scale metabolic modelling, we show that complex metabolic innovations in Escherichia coli can arise via changing nutrient conditions. Second, using phylogenetic approaches, we demonstrate that the acquisition patterns of complex metabolic pathways during the evolutionary history of bacterial genomes support the hypothesis. Third, we show how adaptation of laboratory populations of E. coli to one carbon source facilitates the later adaptation to another carbon source. Our work demonstrates how complex innovations can evolve through series of adaptive steps without the need to invoke non-adaptive processes. PMID:27197754
Ontology-based meta-analysis of global collections of high-throughput public data.
Kupershmidt, Ilya; Su, Qiaojuan Jane; Grewal, Anoop; Sundaresh, Suman; Halperin, Inbal; Flynn, James; Shekar, Mamatha; Wang, Helen; Park, Jenny; Cui, Wenwu; Wall, Gregory D; Wisotzkey, Robert; Alag, Satnam; Akhtari, Saeid; Ronaghi, Mostafa
2010-09-29
The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today. We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets. Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis.
Elevated mitochondrial genome variation after 50 generations of radiation exposure in a wild rodent.
Baker, Robert J; Dickins, Benjamin; Wickliffe, Jeffrey K; Khan, Faisal A A; Gaschak, Sergey; Makova, Kateryna D; Phillips, Caleb D
2017-09-01
Currently, the effects of chronic, continuous low dose environmental irradiation on the mitochondrial genome of resident small mammals are unknown. Using the bank vole ( Myodes glareolus ) as a model system, we tested the hypothesis that approximately 50 generations of exposure to the Chernobyl environment has significantly altered genetic diversity of the mitochondrial genome. Using deep sequencing, we compared mitochondrial genomes from 131 individuals from reference sites with radioactive contamination comparable to that present in northern Ukraine before the 26 April 1986 meltdown, to populations where substantial fallout was deposited following the nuclear accident. Population genetic variables revealed significant differences among populations from contaminated and uncontaminated localities. Therefore, we rejected the null hypothesis of no significant genetic effect from 50 generations of exposure to the environment created by the Chernobyl meltdown. Samples from contaminated localities exhibited significantly higher numbers of haplotypes and polymorphic loci, elevated genetic diversity, and a significantly higher average number of substitutions per site across mitochondrial gene regions. Observed genetic variation was dominated by synonymous mutations, which may indicate a history of purify selection against nonsynonymous or insertion/deletion mutations. These significant differences were not attributable to sample size artifacts. The observed increase in mitochondrial genomic diversity in voles from radioactive sites is consistent with the possibility that chronic, continuous irradiation resulting from the Chernobyl disaster has produced an accelerated mutation rate in this species over the last 25 years. Our results, being the first to demonstrate this phenomenon in a wild mammalian species, are important for understanding genetic consequences of exposure to low-dose radiation sources.
Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes.
Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W
2013-01-01
Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.
Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes
Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W.
2013-01-01
Introduction Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. Results We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Conclusion Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study. PMID:23922837
2017-01-01
The consequences of selection at linked sites are multiple and widespread across the genomes of most species. Here, I first review the main concepts behind models of selection and linkage in recombining genomes, present the difficulty in parametrizing these models simply as a reduction in effective population size (Ne) and discuss the predicted impact of recombination rates on levels of diversity across genomes. Arguments are then put forward in favour of using a model of selection and linkage with neutral and deleterious mutations (i.e. the background selection model, BGS) as a sensible null hypothesis for investigating the presence of other forms of selection, such as balancing or positive. I also describe and compare two studies that have generated high-resolution landscapes of the predicted consequences of selection at linked sites in Drosophila melanogaster. Both studies show that BGS can explain a very large fraction of the observed variation in diversity across the whole genome, thus supporting its use as null model. Finally, I identify and discuss a number of caveats and challenges in studies of genetic hitchhiking that have been often overlooked, with several of them sharing a potential bias towards overestimating the evidence supporting recent selective sweeps to the detriment of a BGS explanation. One potential source of bias is the analysis of non-equilibrium populations: it is precisely because models of selection and linkage predict variation in Ne across chromosomes that demographic dynamics are not expected to be equivalent chromosome- or genome-wide. Other challenges include the use of incomplete genome annotations, the assumption of temporally stable recombination landscapes, the presence of genes under balancing selection and the consequences of ignoring non-crossover (gene conversion) recombination events. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’. PMID:29109230
The Wigner distribution and 2D classical maps
NASA Astrophysics Data System (ADS)
Sakhr, Jamal
2017-07-01
The Wigner spacing distribution has a long and illustrious history in nuclear physics and in the quantum mechanics of classically chaotic systems. In this paper, a novel connection between the Wigner distribution and 2D classical mechanics is introduced. Based on a well-known correspondence between the Wigner distribution and the 2D Poisson point process, the hypothesis that typical pseudo-trajectories of a 2D ergodic map have a Wignerian nearest-neighbor spacing distribution (NNSD) is put forward and numerically tested. The standard Euclidean metric is used to compute the interpoint spacings. In all test cases, the hypothesis is upheld, and the range of validity of the hypothesis appears to be robust in the sense that it is not affected by the presence or absence of: (i) mixing; (ii) time-reversal symmetry; and/or (iii) dissipation.
Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus F X
2007-08-30
Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.
Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus FX
2007-01-01
Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from . PMID:17760972
Question 7: Comparative Genomics and Early Cell Evolution: A Cautionary Methodological Note
NASA Astrophysics Data System (ADS)
Islas, Sara; Hernández-Morales, Ricardo; Lazcano, Antonio
2007-10-01
Inventories of the gene content of the last common ancestor (LCA), i.e., the cenancestor, include sequences that may have undergone horizontal transfer events, as well as sequences that have originated in different pre-cenancestral epochs. However, the universal distribution of highly conserved genes involved in RNA metabolism provide insights into early stages of cell evolution during which RNA played a much more conspicuous biological role, and is consistent with the hypothesis that extant living systems were preceded by an RNA/protein world. Insights into the traits of primitive entities from which the LCA evolved may be derived from the analysis of paralogous gene families, including those formed by sequences that resulted from internal elongation events. Three major types of paralogous gene families can be recognized. The importance of this grouping for understanding the traits of early cells is discussed.
Hughes, Austin L
2013-02-15
The hypothesis that domestication leads to a relaxation of purifying selection on mitochondrial (mt) genomes was tested by comparative analysis of mt genes from dog, pig, chicken, and silkworm. The three vertebrate species showed mt genome phylogenies in which domestic and wild isolates were intermingled, whereas the domestic silkworm (Bombyx mori) formed a distinct cluster nested within its closest wild relative (Bombyx mandarina). In spite of these differences in phylogenetic pattern, significantly greater proportions of nonsynonymous SNPs than of synonymous SNPs were unique to the domestic populations of all four species. Likewise, in all four species, significantly greater proportions of RNA-encoding SNPs than of synonymous SNPs were unique to the domestic populations. Thus, domestic populations were characterized by an excess of unique polymorphisms in two categories generally subject to purifying selection: nonsynonymous sites and RNA-encoding sites. Many of these unique polymorphisms thus seem likely to be slightly deleterious; the latter hypothesis was supported by the generally lower gene diversities of polymorphisms unique to domestic populations in comparison to those of polymorphisms shared by domestic and wild populations. Copyright © 2012 Elsevier B.V. All rights reserved.
Mitochondrial genomes of two Australian fishflies with an evolutionary timescale of Chauliodinae.
Yang, Fan; Jiang, Yunlan; Yang, Ding; Liu, Xingyue
2017-06-30
Fishflies (Corydalidae: Chauliodinae) with a total of ca. 130 extant species are one of the major groups of the holometabolous insect order Megaloptera. As a group which originated during the Mesozoic, the phylogeny and historical biogeography of fishflies are of high interest. The previous hypothesis on the evolutionary history of fishflies was based primarily on morphological data. To further test the existing phylogenetic relationships and to understand the divergence pattern of fishflies, we conducted a molecule-based study. We determined the complete mitochondrial (mt) genomes of two Australian fishfly species, Archichauliodes deceptor Kimmins, 1954 and Protochauliodes biconicus Kimmins, 1954, both members of a major subgroup of Chauliodinae with high phylogenetic significance. A phylogenomic analysis was carried out based on 13 mt protein coding genes (PCGs) and two rRNAs genes from the megalopteran species with determined mt genomes. Both maximum likelihood and Bayesian inference analyses recovered the Dysmicohermes clade as the sister group of the Archichauliodes clade + the Protochauliodes clade, which is consistent with the previous morphology-based hypothesis. The divergence time estimation suggested that the divergence among the three major subgroups of fishflies occurred during the Late Jurassic and Early Cretaceous when the supercontinent Pangaea was undergoing sequential breakup.
Omics strategies for revealing Yersinia pestis virulence
Yang, Ruifu; Du, Zongmin; Han, Yanping; Zhou, Lei; Song, Yajun; Zhou, Dongsheng; Cui, Yujun
2012-01-01
Omics has remarkably changed the way we investigate and understand life. Omics differs from traditional hypothesis-driven research because it is a discovery-driven approach. Mass datasets produced from omics-based studies require experts from different fields to reveal the salient features behind these data. In this review, we summarize omics-driven studies to reveal the virulence features of Yersinia pestis through genomics, trascriptomics, proteomics, interactomics, etc. These studies serve as foundations for further hypothesis-driven research and help us gain insight into Y. pestis pathogenesis. PMID:23248778
Applications of Bayesian Statistics to Problems in Gamma-Ray Bursts
NASA Technical Reports Server (NTRS)
Meegan, Charles A.
1997-01-01
This presentation will describe two applications of Bayesian statistics to Gamma Ray Bursts (GRBS). The first attempts to quantify the evidence for a cosmological versus galactic origin of GRBs using only the observations of the dipole and quadrupole moments of the angular distribution of bursts. The cosmological hypothesis predicts isotropy, while the galactic hypothesis is assumed to produce a uniform probability distribution over positive values for these moments. The observed isotropic distribution indicates that the Bayes factor for the cosmological hypothesis over the galactic hypothesis is about 300. Another application of Bayesian statistics is in the estimation of chance associations of optical counterparts with galaxies. The Bayesian approach is preferred to frequentist techniques here because the Bayesian approach easily accounts for galaxy mass distributions and because one can incorporate three disjoint hypotheses: (1) bursts come from galactic centers, (2) bursts come from galaxies in proportion to luminosity, and (3) bursts do not come from external galaxies. This technique was used in the analysis of the optical counterpart to GRB970228.
Karev, Georgy P; Wolf, Yuri I; Koonin, Eugene V
2003-10-12
The distributions of many genome-associated quantities, including the membership of paralogous gene families can be approximated with power laws. We are interested in developing mathematical models of genome evolution that adequately account for the shape of these distributions and describe the evolutionary dynamics of their formation. We show that simple stochastic models of genome evolution lead to power-law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced birth-and-death processes, in which domain duplication and deletion rates are asymptotically equal up to the second order. The simplest, linear BDIM shows an excellent fit to the observed distributions of domain family size in diverse prokaryotic and eukaryotic genomes. However, the stochastic version of the linear BDIM explored here predicts that the actual size of large paralogous families is reached on an unrealistically long timescale. We show that introduction of non-linearity, which might be interpreted as interaction of a particular order between individual family members, allows the model to achieve genome evolution rates that are much better compatible with the current estimates of the rates of individual duplication/loss events.
Phylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa.
van Belkum, Alex; Soriaga, Leah B; LaFave, Matthew C; Akella, Srividya; Veyrieras, Jean-Baptiste; Barbu, E Magda; Shortridge, Dee; Blanc, Bernadette; Hannum, Gregory; Zambardi, Gilles; Miller, Kristofer; Enright, Mark C; Mugnier, Nathalie; Brami, Daniel; Schicklin, Stéphane; Felderman, Martina; Schwartz, Ariel S; Richardson, Toby H; Peterson, Todd C; Hubby, Bolyn; Cady, Kyle C
2015-11-24
Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. P. aeruginosa is both an antibiotic-refractory pathogen and an important model system for type I CRISPR-Cas bacterial immune systems. By combining the genome sequences of 672 newly and previously sequenced genomes, we were able to provide a global view of the phylogenetic distribution, conservation, and potential targets of these systems. This analysis identified a new and putatively mobile P. aeruginosa CRISPR-Cas subtype, characterized the diverse distribution of known CRISPR-inhibiting genes, and provided a potential new use for CRISPR spacer libraries in accessory genome analysis. Our data demonstrated the importance of CRISPR-Cas systems in modulating the accessory genomes of globally distributed strains while also providing substantial data for subsequent genomic and experimental studies in multiple fields. Understanding why certain genotypes of P. aeruginosa are clinically prevalent and adept at horizontally acquiring virulence and antibiotic resistance elements is of major clinical and economic importance. Copyright © 2015 van Belkum et al.
Quanbeck, Stephanie M.; Brachova, Libuse; Campbell, Alexis A.; Guan, Xin; Perera, Ann; He, Kun; Rhee, Seung Y.; Bais, Preeti; Dickerson, Julie A.; Dixon, Philip; Wohlgemuth, Gert; Fiehn, Oliver; Barkan, Lenore; Lange, Iris; Lange, B. Markus; Lee, Insuk; Cortes, Diego; Salazar, Carolina; Shuman, Joel; Shulaev, Vladimir; Huhman, David V.; Sumner, Lloyd W.; Roth, Mary R.; Welti, Ruth; Ilarslan, Hilal; Wurtele, Eve S.; Nikolau, Basil J.
2012-01-01
Metabolomics is the methodology that identifies and measures global pools of small molecules (of less than about 1,000 Da) of a biological sample, which are collectively called the metabolome. Metabolomics can therefore reveal the metabolic outcome of a genetic or environmental perturbation of a metabolic regulatory network, and thus provide insights into the structure and regulation of that network. Because of the chemical complexity of the metabolome and limitations associated with individual analytical platforms for determining the metabolome, it is currently difficult to capture the complete metabolome of an organism or tissue, which is in contrast to genomics and transcriptomics. This paper describes the analysis of Arabidopsis metabolomics data sets acquired by a consortium that includes five analytical laboratories, bioinformaticists, and biostatisticians, which aims to develop and validate metabolomics as a hypothesis-generating functional genomics tool. The consortium is determining the metabolomes of Arabidopsis T-DNA mutant stocks, grown in standardized controlled environment optimized to minimize environmental impacts on the metabolomes. Metabolomics data were generated with seven analytical platforms, and the combined data is being provided to the research community to formulate initial hypotheses about genes of unknown function (GUFs). A public database (www.PlantMetabolomics.org) has been developed to provide the scientific community with access to the data along with tools to allow for its interactive analysis. Exemplary datasets are discussed to validate the approach, which illustrate how initial hypotheses can be generated from the consortium-produced metabolomics data, integrated with prior knowledge to provide a testable hypothesis concerning the functionality of GUFs. PMID:22645570
Dombrovsky, Aviv; Luria, Neta
2013-04-01
In a survey that was conducted during the year 2011, a local strain of Aphid lethal paralysis virus (ALPV) was identified and isolated from a wild population of Aphis nerii aphids living on Nerium oleander plants located in northern Israel. The new strain was tentatively named (ALPV-An). RNA extracted from the viral particles allowed the amplification and determination of the complete genome sequence. The virus genome is comprised of 9835 nucleotides. In a BLAST search analysis, the ALPV-An sequence showed 89 % nucleotide sequence identity with the whole genome of a South African ALPV and 96 and 94 % amino acid sequence identity with the ORF1 and ORF2 of that strain, respectively. In preliminary experiments, spray-applied, purified ALPV virions were highly pathogenic to the green peach aphid Myzus persicae; 95 % mortality was recorded 4 days post-infection. These preliminary results demonstrate the potential of ALPV for use as a biologic agent for some aphid control. Surprisingly, no visible ALPV pathogenic effects, such as morphological changes or paralysis, were observed in the A. nerii aphids infected with ALPV-An. The absence of clear ALPV symptoms in A. nerii led to the formulation of two hypotheses, which were partially examined in this study. The first hypothesis suggest that A. nerii is resistant or tolerant of ALPV, while the second hypothesis propose that ALPV-An may be a mild strain of ALPV. Currently, our results is in favor with the first hypothesis since ALPV-An is cryptic in A. nerii aphids and can be lethal for M. persicae aphids.
NASA Astrophysics Data System (ADS)
Pereira, Sara B.; Mota, Rita; Vieira, Cristina P.; Vieira, Jorge; Tamagnini, Paula
2015-10-01
Many cyanobacteria produce extracellular polymeric substances (EPS) with particular characteristics (e.g. anionic nature and presence of sulfate) that make them suitable for industrial processes such as bioremediation of heavy metals or thickening, suspending or emulsifying agents. Nevertheless, their biosynthetic pathway(s) are still largely unknown, limiting their utilization. In this work, a phylum-wide analysis of genes/proteins putatively involved in the assembly and export of EPS in cyanobacteria was performed. Our results demonstrated that most strains harbor genes encoding proteins related to the three main pathways: Wzy-, ABC transporter-, and Synthase-dependent, but often not the complete set defining one pathway. Multiple gene copies are mainly correlated to larger genomes, and the strains with reduced genomes (e.g. the clade of marine unicellular Synechococcus and Prochlorococcus), seem to have lost most of the EPS-related genes. Overall, the distribution of the different genes/proteins within the cyanobacteria phylum raises the hypothesis that cyanobacterial EPS production may not strictly follow one of the pathways previously characterized. Moreover, for the proteins involved in EPS polymerization, amino acid patterns were defined and validated constituting a novel and robust tool to identify proteins with similar functions and giving a first insight to which polymer biosynthesis they are related to.
Genome structure and primitive sex chromosome revealed in Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuskan, Gerald A; Yin, Tongming; Gunter, Lee E
We constructed a comprehensive genetic map for Populus and ordered 332 Mb of sequence scaffolds along the 19 haploid chromosomes in order to compare chromosomal regions among diverse members of the genus. These efforts lead us to conclude that chromosome XIX in Populus is evolving into a sex chromosome. Consistent segregation distortion in favor of the sub-genera Tacamahaca alleles provided evidence of divergent selection among species, particularly at the proximal end of chromosome XIX. A large microsatellite marker (SSR) cluster was detected in the distorted region even though the genome-wide distribute SSR sites was uniform across the physical map. Themore » differences between the genetic map and physical sequence data suggested recombination suppression was occurring in the distorted region. A gender-determination locus and an overabundance of NBS-LRR genes were also co-located to the distorted region and were put forth as the cause for divergent selection and recombination suppression. This hypothesis was verified by using fine-scale mapping of an integrated scaffold in the vicinity of the gender-determination locus. As such it appears that chromosome XIX in Populus is in the process of evolving from an autosome into a sex chromosome and that NBS-LRR genes may play important role in the chromosomal diversification process in Populus.« less
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species
Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha
2011-01-01
Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
NASA Astrophysics Data System (ADS)
Yu, Z. P.; Yue, Z. F.; Liu, W.
2018-05-01
With the development of artificial intelligence, more and more reliability experts have noticed the roles of subjective information in the reliability design of complex system. Therefore, based on the certain numbers of experiment data and expert judgments, we have divided the reliability estimation based on distribution hypothesis into cognition process and reliability calculation. Consequently, for an illustration of this modification, we have taken the information fusion based on intuitional fuzzy belief functions as the diagnosis model of cognition process, and finished the reliability estimation for the open function of cabin door affected by the imprecise judgment corresponding to distribution hypothesis.
Speed congenics: accelerated genome recovery using genetic markers.
Visscher, P M
1999-08-01
Genetic markers throughout the genome can be used to speed up 'recovery' of the recipient genome in the backcrossing phase of the construction of a congenic strain. The prediction of the genomic proportion during backcrossing depends on the assumptions regarding the distribution of chromosome segments, the population structure, the marker spacing and the selection strategy. In this study simulation was used to investigate the rate of recovery of the recipient genome for a mouse, Drosophila and Arabidopsis genome. It was shown that an incorrect assumption of a binomial distribution of chromosome segments, and failing to take account of a reduction in variance in genomic proportion due to selection, can lead to a downward bias of up to two generations in the estimation of the number of generations required for the formation of a congenic strain.
Advances in Significance Testing for Cluster Detection
NASA Astrophysics Data System (ADS)
Coleman, Deidra Andrea
Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic surveillance data while controlling the Bayesian False Discovery Rate (BFDR). The procedure entails choosing an appropriate Bayesian model that captures the spatial dependency inherent in epidemiological data and considers all days of interest, selecting a test statistic based on a chosen measure that provides the magnitude of the maximumal spatial cluster for each day, and identifying a cutoff value that controls the BFDR for rejecting the collective null hypothesis of no outbreak over a collection of days for a specified region.We use our procedure to analyze botulism-like syndrome data collected by the North Carolina Disease Event Tracking and Epidemiologic Collection Tool (NC DETECT).
Genomic islands of divergence are not affected by geography of speciation in sunflowers.
Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H
2013-01-01
Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.
2013-09-01
DATES COVERED (From - To) 22 August 2012 – 21 August 2013 4. TITLE AND SUBTITLE Identification of Novel, Inherited Genetic Markers for Aggressive... Inherited markers of aggressive PCa could be used for screening and diagnosis of aggressive PCa at an early stage while reducing over-diagnosis and...treatment for others. The overall hypothesis is that inherited sequence variants in the genome are associated with a lethal (aggressive) form of PCa but not
Leslie, James F.; Vrijenhoek, Robert C.
1978-01-01
Theoretical considerations suggest that a high load of deleterious mutations should accumulate in asexual genomes. An ideal system for testing this hypothesis occurs in the hybrid all-female fish Poeciliopsis monacha-lucida. The hybrid genotype is retained between generations by an oogenetic process that transmits only a nonrecombinant haploid monacha genome to their ova. The hybrid genotype is re-established in nature by fertilization of these monacha eggs with sperm from a sexual species, P. lucida. The unique reproductive mechanism of these hybrids allows the genetic dissection of the clonal monacha genome by forced matings with males of P. monacha. The resultant F1 hybrids and their backcross progeny were examined to determine the amount and kinds of genetic changes that might have occurred in two clonal monacha genomes.—Using six allozyme markers, four similar linkage groups were identified in each clonal genome. Segregation and assortment at these loci revealed no apparent differences between monacha genomes from sexually and clonally reproducing species. Mortality of F1 and backcross progeny revealed differences between the two clonal genomes, suggesting that deleterious genes may accumulate in genomes sheltered from recombination. PMID:17248875
Collevatti, Rosane Garcia; de Castro, Thaís Guimarães; de Souza Lima, Jacqueline; de Campos Telles, Mariana Pires
2012-01-01
Many endemic species present disjunct geographical distribution; therefore, they are suitable models to test hypotheses about the ecological and evolutionary mechanisms involved in the origin of disjunct distributions in these habitats. We studied the genetic structure and phylogeography of Tibouchina papyrus (Melastomataceae), endemic to rocky savannas in Central Brazil, to test hypothesis of vicariance and dispersal in the origin of the disjunct geographical distribution. We sampled 474 individuals from the three localities where the species is reported: Serra dos Pirineus, Serra Dourada, and Serra de Natividade. Analyses were based on the polymorphisms at cpDNA and on nuclear microsatellite loci. To test for vicariance and dispersal we constructed a median-joining network and performed an analysis of molecular variance (AMOVA). We also tested population bottleneck and estimated demographic parameters and time to most recent common ancestor (TMRCA) using coalescent analyses. A remarkable differentiation among populations was found. No significant effect of population expansion was detected and coalescent analyses showed a negligible gene flow among populations and an ancient coalescence time for chloroplast genome. Our results support that the disjunct distribution of T. papyrus may represent a climatic relict. With an estimated TMRCA dated from ∼836.491 ± 107.515 kyr BP (before present), we hypothesized that the disjunct distribution may be the outcome of bidirectional expansion of the geographical distribution favored by the drier and colder conditions that prevailed in much of Brazil during the Pre-Illinoian glaciation, followed by the retraction as the climate became warmer and moister. PMID:22837846
Assis, J; Serrão, E A; Claro, B; Perrin, C; Pearson, G A
2014-06-01
The climate-driven dynamics of species ranges is a critical research question in evolutionary ecology. We ask whether present intraspecific diversity is determined by the imprint of past climate. This is an ongoing debate requiring interdisciplinary examination of population genetic pools and persistence patterns across global ranges. Previously, contrasting inferences and predictions have resulted from distinct genomic coverage and/or geographical information. We aim to describe and explain the causes of geographical contrasts in genetic diversity and their consequences for the future baseline of the global genetic pool, by comparing present geographical distribution of genetic diversity and differentiation with predictive species distribution modelling (SDM) during past extremes, present time and future climate scenarios for a brown alga, Fucus vesiculosus. SDM showed that both atmospheric and oceanic variables shape the global distribution of intertidal species, revealing regions of persistence, extinction and expansion during glacial and postglacial periods. These explained the distribution and structure of present genetic diversity, consisting of differentiated genetic pools with maximal diversity in areas of long-term persistence. Most of the present species range comprises postglacial expansion zones and, in contrast to highly dispersive marine organisms, expansions involved only local fronts, leaving distinct genetic pools at rear edges. Besides unravelling a complex phylogeographical history and showing congruence between genetic diversity and persistent distribution zones, supporting the hypothesis of niche conservatism, range shifts and loss of unique genetic diversity at the rear edge were predicted for future climate scenarios, impoverishing the global gene pool. © 2014 John Wiley & Sons Ltd.
Carnivore-specific SINEs (Can-SINEs): distribution, evolution, and genomic impact.
Walters-Conte, Kathryn B; Johnson, Diana L E; Allard, Marc W; Pecon-Slattery, Jill
2011-01-01
Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.
Carnivore-Specific SINEs (Can-SINEs): Distribution, Evolution, and Genomic Impact
Johnson, Diana L.E.; Allard, Marc W.; Pecon-Slattery, Jill
2011-01-01
Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics. PMID:21846743
Marko, Nicholas F.; Weil, Robert J.
2012-01-01
Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863
Next generation tools for genomic data generation, distribution, and visualization
2010-01-01
Background With the rapidly falling cost and availability of high throughput sequencing and microarray technologies, the bottleneck for effectively using genomic analysis in the laboratory and clinic is shifting to one of effectively managing, analyzing, and sharing genomic data. Results Here we present three open-source, platform independent, software tools for generating, analyzing, distributing, and visualizing genomic data. These include a next generation sequencing/microarray LIMS and analysis project center (GNomEx); an application for annotating and programmatically distributing genomic data using the community vetted DAS/2 data exchange protocol (GenoPub); and a standalone Java Swing application (GWrap) that makes cutting edge command line analysis tools available to those who prefer graphical user interfaces. Both GNomEx and GenoPub use the rich client Flex/Flash web browser interface to interact with Java classes and a relational database on a remote server. Both employ a public-private user-group security model enabling controlled distribution of patient and unpublished data alongside public resources. As such, they function as genomic data repositories that can be accessed manually or programmatically through DAS/2-enabled client applications such as the Integrated Genome Browser. Conclusions These tools have gained wide use in our core facilities, research laboratories and clinics and are freely available for non-profit use. See http://sourceforge.net/projects/gnomex/, http://sourceforge.net/projects/genoviz/, and http://sourceforge.net/projects/useq. PMID:20828407
RELATIONSHIP BETWEEN PHYLOGENETIC DISTRIBUTION AND GENOMIC FEATURES IN NEUROSPORA CRASSA
USDA-ARS?s Scientific Manuscript database
In the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome o...
Delta: a new web-based 3D genome visualization and analysis platform.
Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua
2018-04-15
Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tai, Vera; Carpenter, Kevin J.; Weber, Peter K.
By combining genomics and isotope imaging analysis using high-resolution secondary ion mass spectrometry (NanoSIMS), we examined the function and evolution of Bacteroidales ectosymbionts of the protistBarbulanymphafrom the hindguts of the wood-eating cockroachCryptocercus punctulatus. In particular, we investigated the structure of ectosymbiont genomes, which, in contrast to those of endosymbionts, has been little studied to date, and tested the hypothesis that these ectosymbionts fix nitrogen. Unlike with most obligate endosymbionts, genome reduction has not played a major role in the evolution of the Barbulanympha ectosymbionts. Instead, interaction with the external environment has remained important for this symbiont as genes for synthesismore » of transporters, outer membrane proteins, lipopolysaccharides, and lipoproteins have been retained. The ectosymbiont genome carried two complete operons for nitrogen fixation, a urea transporter, and a urease, indicating the availability of nitrogen as a driving force behind the symbiosis. NanoSIMS analysis ofC. punctulatushindgut symbionts exposedin vivoto 15N 2 supports the hypothesis thatBarbulanymphaectosymbionts are capable of nitrogen fixation. This genomic andin vivofunctional investigation of protist ectosymbionts highlights the diversity of evolutionary forces and trajectories that shape symbiotic interactions. The ecological and evolutionary importance of symbioses is increasingly clear, but the overall diversity of symbiotic interactions remains poorly explored. Here in this study, we investigated the evolution and nitrogen fixation capabilities of ectosymbionts attached to the protist Barbulanympha from the hindgut of the wood-eating cockroach Cryptocercus punctulatus. In addressing genome evolution of protist ectosymbionts, our data suggest that the ecological pressures influencing the evolution of extracellular symbionts clearly differ from intracellular symbionts and organelles. Using NanoSIMS analysis, we also obtained direct imaging evidence of a specific hindgut microbe playing a role in nitrogen fixation. These results demonstrate the power of combining NanoSIMS and genomics tools for investigating the biology of uncultivable microbes. This investigation paves the way for a more precise understanding of microbial interactions in the hindguts of wood-eating insects and further exploration of the diversity and ecological significance of symbiosis between microbes.« less
Tai, Vera; Carpenter, Kevin J.; Weber, Peter K.; ...
2016-05-27
By combining genomics and isotope imaging analysis using high-resolution secondary ion mass spectrometry (NanoSIMS), we examined the function and evolution of Bacteroidales ectosymbionts of the protistBarbulanymphafrom the hindguts of the wood-eating cockroachCryptocercus punctulatus. In particular, we investigated the structure of ectosymbiont genomes, which, in contrast to those of endosymbionts, has been little studied to date, and tested the hypothesis that these ectosymbionts fix nitrogen. Unlike with most obligate endosymbionts, genome reduction has not played a major role in the evolution of the Barbulanympha ectosymbionts. Instead, interaction with the external environment has remained important for this symbiont as genes for synthesismore » of transporters, outer membrane proteins, lipopolysaccharides, and lipoproteins have been retained. The ectosymbiont genome carried two complete operons for nitrogen fixation, a urea transporter, and a urease, indicating the availability of nitrogen as a driving force behind the symbiosis. NanoSIMS analysis ofC. punctulatushindgut symbionts exposedin vivoto 15N 2 supports the hypothesis thatBarbulanymphaectosymbionts are capable of nitrogen fixation. This genomic andin vivofunctional investigation of protist ectosymbionts highlights the diversity of evolutionary forces and trajectories that shape symbiotic interactions. The ecological and evolutionary importance of symbioses is increasingly clear, but the overall diversity of symbiotic interactions remains poorly explored. Here in this study, we investigated the evolution and nitrogen fixation capabilities of ectosymbionts attached to the protist Barbulanympha from the hindgut of the wood-eating cockroach Cryptocercus punctulatus. In addressing genome evolution of protist ectosymbionts, our data suggest that the ecological pressures influencing the evolution of extracellular symbionts clearly differ from intracellular symbionts and organelles. Using NanoSIMS analysis, we also obtained direct imaging evidence of a specific hindgut microbe playing a role in nitrogen fixation. These results demonstrate the power of combining NanoSIMS and genomics tools for investigating the biology of uncultivable microbes. This investigation paves the way for a more precise understanding of microbial interactions in the hindguts of wood-eating insects and further exploration of the diversity and ecological significance of symbiosis between microbes.« less
Flux balance analysis predicts Warburg-like effects of mouse hepatocyte deficient in miR-122a
Wu, Hsuan-Hui; Chen, Meng-Chun; Liu, Wen-Huan; Wu, Wu-Hsiung; Chang, Peter Mu-Hsin; Huang, Chi-Ying F.; Tsou, Ann-Ping; Shiao, Ming-Shi
2017-01-01
The liver is a vital organ involving in various major metabolic functions in human body. MicroRNA-122 (miR-122) plays an important role in the regulation of liver metabolism, but its intrinsic physiological functions require further clarification. This study integrated the genome-scale metabolic model of hepatocytes and mouse experimental data with germline deletion of Mir122a (Mir122a–/–) to infer Warburg-like effects. Elevated expression of MiR-122a target genes in Mir122a–/–mice, especially those encoding for metabolic enzymes, was applied to analyze the flux distributions of the genome-scale metabolic model in normal and deficient states. By definition of the similarity ratio, we compared the flux fold change of the genome-scale metabolic model computational results and metabolomic profiling data measured through a liquid-chromatography with mass spectrometer, respectively, for hepatocytes of 2-month-old mice in normal and deficient states. The Ddc gene demonstrated the highest similarity ratio of 95% to the biological hypothesis of the Warburg effect, and similarity of 75% to the experimental observation. We also used 2, 6, and 11 months of mir-122 knockout mice liver cell to examined the expression pattern of DDC in the knockout mice livers to show upregulated profiles of DDC from the data. Furthermore, through a bioinformatics (LINCS program) prediction, BTK inhibitors and withaferin A could downregulate DDC expression, suggesting that such drugs could potentially alter the early events of metabolomics of liver cancer cells. PMID:28686599
Rankinen, Tuomo; Sarzynski, Mark A.; Ghosh, Sujoy; Bouchard, Claude
2015-01-01
Clustering of obesity, coronary artery disease, and cardiovascular disease risk factors is observed in epidemiological studies and clinical settings. Twin and family studies have provided some supporting evidence for the clustering hypothesis. Loci nearest a lead single nucleotide polymorphism (SNP) showing genome-wide significant associations with coronary artery disease, body mass index, C-reactive protein, blood pressure, lipids, and type 2 diabetes mellitus were selected for pathway and network analyses. Eighty-seven autosomal regions (181 SNPs), mapping to 56 genes, were found to be pleiotropic. Most pleiotropic regions contained genes associated with coronary artery disease and plasma lipids, whereas some exhibited coaggregation between obesity and cardiovascular disease risk factors. We observed enrichment for liver X receptor (LXR)/retinoid X receptor (RXR) and farnesoid X receptor/RXR nuclear receptor signaling among pleiotropic genes and for signatures of coronary artery disease and hepatic steatosis. In the search for functionally interacting networks, we found that 43 pleiotropic genes were interacting in a network with an additional 24 linker genes. ENCODE (Encyclopedia of DNA Elements) data were queried for distribution of pleiotropic SNPs among regulatory elements and coding sequence variations. Of the 181 SNPs, 136 were annotated to ≥1 regulatory feature. An enrichment analysis found over-representation of enhancers and DNAse hypersensitive regions when compared against all SNPs of the 1000 Genomes pilot project. In summary, there are genomic regions exerting pleiotropic effects on cardiovascular disease risk factors, although only a few included obesity. Further studies are needed to resolve the clustering in terms of DNA variants, genes, pathways, and actionable targets. PMID:25722444
Purdue Ionomics Information Management System. An Integrated Functional Genomics Platform1[C][W][OA
Baxter, Ivan; Ouzzani, Mourad; Orcun, Seza; Kennedy, Brad; Jandhyala, Shrinivas S.; Salt, David E.
2007-01-01
The advent of high-throughput phenotyping technologies has created a deluge of information that is difficult to deal with without the appropriate data management tools. These data management tools should integrate defined workflow controls for genomic-scale data acquisition and validation, data storage and retrieval, and data analysis, indexed around the genomic information of the organism of interest. To maximize the impact of these large datasets, it is critical that they are rapidly disseminated to the broader research community, allowing open access for data mining and discovery. We describe here a system that incorporates such functionalities developed around the Purdue University high-throughput ionomics phenotyping platform. The Purdue Ionomics Information Management System (PiiMS) provides integrated workflow control, data storage, and analysis to facilitate high-throughput data acquisition, along with integrated tools for data search, retrieval, and visualization for hypothesis development. PiiMS is deployed as a World Wide Web-enabled system, allowing for integration of distributed workflow processes and open access to raw data for analysis by numerous laboratories. PiiMS currently contains data on shoot concentrations of P, Ca, K, Mg, Cu, Fe, Zn, Mn, Co, Ni, B, Se, Mo, Na, As, and Cd in over 60,000 shoot tissue samples of Arabidopsis (Arabidopsis thaliana), including ethyl methanesulfonate, fast-neutron and defined T-DNA mutants, and natural accession and populations of recombinant inbred lines from over 800 separate experiments, representing over 1,000,000 fully quantitative elemental concentrations. PiiMS is accessible at www.purdue.edu/dp/ionomics. PMID:17189337
Mackiewicz, Dorota; de Oliveira, Paulo Murilo Castro; Moss de Oliveira, Suzana; Cebrat, Stanisław
2013-01-01
Recombination is the main cause of genetic diversity. Thus, errors in this process can lead to chromosomal abnormalities. Recombination events are confined to narrow chromosome regions called hotspots in which characteristic DNA motifs are found. Genomic analyses have shown that both recombination hotspots and DNA motifs are distributed unevenly along human chromosomes and are much more frequent in the subtelomeric regions of chromosomes than in their central parts. Clusters of motifs roughly follow the distribution of recombination hotspots whereas single motifs show a negative correlation with the hotspot distribution. To model the phenomena related to recombination, we carried out computer Monte Carlo simulations of genome evolution. Computer simulations generated uneven distribution of hotspots with their domination in the subtelomeric regions of chromosomes. They also revealed that purifying selection eliminating defective alleles is strong enough to cause such hotspot distribution. After sufficiently long time of simulations, the structure of chromosomes reached a dynamic equilibrium, in which number and global distribution of both hotspots and defective alleles remained statistically unchanged, while their precise positions were shifted. This resembles the dynamic structure of human and chimpanzee genomes, where hotspots change their exact locations but the global distributions of recombination events are very similar.
Mackiewicz, Dorota; de Oliveira, Paulo Murilo Castro; Moss de Oliveira, Suzana; Cebrat, Stanisław
2013-01-01
Recombination is the main cause of genetic diversity. Thus, errors in this process can lead to chromosomal abnormalities. Recombination events are confined to narrow chromosome regions called hotspots in which characteristic DNA motifs are found. Genomic analyses have shown that both recombination hotspots and DNA motifs are distributed unevenly along human chromosomes and are much more frequent in the subtelomeric regions of chromosomes than in their central parts. Clusters of motifs roughly follow the distribution of recombination hotspots whereas single motifs show a negative correlation with the hotspot distribution. To model the phenomena related to recombination, we carried out computer Monte Carlo simulations of genome evolution. Computer simulations generated uneven distribution of hotspots with their domination in the subtelomeric regions of chromosomes. They also revealed that purifying selection eliminating defective alleles is strong enough to cause such hotspot distribution. After sufficiently long time of simulations, the structure of chromosomes reached a dynamic equilibrium, in which number and global distribution of both hotspots and defective alleles remained statistically unchanged, while their precise positions were shifted. This resembles the dynamic structure of human and chimpanzee genomes, where hotspots change their exact locations but the global distributions of recombination events are very similar. PMID:23776462
Use of Pearson's Chi-Square for Testing Equality of Percentile Profiles across Multiple Populations.
Johnson, William D; Beyl, Robbie A; Burton, Jeffrey H; Johnson, Callie M; Romer, Jacob E; Zhang, Lei
2015-08-01
In large sample studies where distributions may be skewed and not readily transformed to symmetry, it may be of greater interest to compare different distributions in terms of percentiles rather than means. For example, it may be more informative to compare two or more populations with respect to their within population distributions by testing the hypothesis that their corresponding respective 10 th , 50 th , and 90 th percentiles are equal. As a generalization of the median test, the proposed test statistic is asymptotically distributed as Chi-square with degrees of freedom dependent upon the number of percentiles tested and constraints of the null hypothesis. Results from simulation studies are used to validate the nominal 0.05 significance level under the null hypothesis, and asymptotic power properties that are suitable for testing equality of percentile profiles against selected profile discrepancies for a variety of underlying distributions. A pragmatic example is provided to illustrate the comparison of the percentile profiles for four body mass index distributions.
J.S. (Pat) Heslop-Harrison; Andrea Brandes; Shin Taketa; Thomas Schmidt; Alexander V. Vershinin; Elena G. Alkhimova; Anette Kamm; Robert L. Doudrick; [and others
1997-01-01
Retrotransposons make up a major fraction - sometimes more than 40% - of all plant genomes investigated so far. We have isolated the reverse transcriptase domains of theTyl-copia group elements from several species, ranging in genome size from some 100 Mbp to 23,000 Mbp, and determined the distribution patterns of these retrotransposons on metaphase chromosomes and...
Sun, Gao-Fei; He, Shou-Pu; Du, Xiong-Ming
2013-10-01
Cotton genomic studies have boomed since the release of Gossypium raimondii draft genome. In this study, cis-regulatory element (CRE) in 1 kb length sequence upstream 5' UTR of annotated genes were selected and scanned in the Arabidopsis thaliana (At) and Gossypium raimondii (Gr) genomes, based on the database of PLACE (Plant cis-acting Regulatory DNA Elements). According to the definition of this study, 44 (12.3%) and 57 (15.5%) CREs presented "peak-like" distribution in the 1 kb selected sequences of both genomes, respectively. Thirty-four of them were peak-like distributed in both genomes, which could be further categorized into 4 types based on their core sequences. The coincidence of TATABOX peak position and their actual position ((-) -30 bp) indicated that the position of a common CRE was conservative in different genes, which suggested that the peak position of these CREs was their possible actual position of transcription factors. The position of a common CRE was also different between the two genomes due to stronger length variation of 5' UTR in Gr than At. Furthermore, most of the peak-like CREs were located in the region of -110 bp-0 bp, which suggested that concentrated distribution might be conductive to the interaction of transcription factors, and then regulate the gene expression in downstream.
Reddy, Umesh K.; Nimmakayala, Padma; Abburi, Venkata Lakshmi; Reddy, C. V. C. M.; Saminathan, Thangasamy; Percy, Richard G.; Yu, John Z.; Frelichowski, James; Udall, Joshua A.; Page, Justin T.; Zhang, Dong; Shehzad, Tariq; Paterson, Andrew H.
2017-01-01
Use of 10,129 singleton SNPs of known genomic location in tetraploid cotton provided unique opportunities to characterize genome-wide diversity among 440 Gossypium hirsutum and 219 G. barbadense cultivars and landrace accessions of widespread origin. Using the SNPs distributed genome-wide, we examined genetic diversity, haplotype distribution and linkage disequilibrium patterns in the G. hirsutum and G. barbadense genomes to clarify population demographic history. Diversity and identity-by-state analyses have revealed little sharing of alleles between the two cultivated allotetraploid genomes, with a few exceptions that indicated sporadic gene flow. We found a high number of new alleles, representing increased nucleotide diversity, on chromosomes 1 and 2 in cultivated G. hirsutum as compared with low nucleotide diversity on these chromosomes in landrace G. hirsutum. In contrast, G. barbadense chromosomes showed negative Tajima’s D on several chromosomes for both cultivated and landrace types, which indicate that speciation of G. barbadense itself, might have occurred with relatively narrow genetic diversity. The presence of conserved linkage disequilibrium (LD) blocks and haplotypes between G. hirsutum and G. barbadense provides strong evidence for comparable patterns of evolution in their domestication processes. Our study illustrates the potential use of population genetic techniques to identify genomic regions for domestication. PMID:28128280
Emerling, Christopher A
2017-10-01
Regressive evolution of anatomical traits often corresponds with the regression of genomic loci underlying such characters. As such, studying patterns of gene loss can be instrumental in addressing questions of gene function, resolving conflicting results from anatomical studies, and understanding the evolutionary history of clades. The evolutionary origins of snakes involved the regression of a number of anatomical traits, including limbs, taste buds and the visual system, and by analyzing serpent genomes, I was able to test three hypotheses associated with the regression of these features. The first concerns two keratins that are putatively specific to claws. Both genes that encode these keratins are pseudogenized/deleted in snake genomes, providing additional evidence of claw-specificity. The second hypothesis is that snakes lack taste buds, an issue complicated by conflicting results in the literature. I found evidence that different snakes have lost one or more taste receptors, but all snakes examined retained at least one gustatory channel. The final hypothesis addressed is that the earliest snakes were adapted to a dim light niche. I found evidence of deleted and pseudogenized genes with light-associated functions in snakes, demonstrating a pattern of gene loss similar to other dim light-adapted clades. Molecular dating estimates suggest that dim light adaptation preceded the loss of limbs, providing some bearing on interpretations of the ecological origins of snakes. Copyright © 2017 Elsevier Inc. All rights reserved.
Genomic linkage of male song and female acoustic preference QTL underlying a rapid species radiation
Shaw, Kerry L.; Lesnick, Sky C.
2009-01-01
The genetic coupling hypothesis of signal-preference evolution, whereby the same genes control male signal and female preference for that signal, was first inspired by the evolution of cricket acoustic communication nearly 50 years ago. To examine this hypothesis, we compared the genomic location of quantitative trait loci (QTL) underlying male song and female acoustic preference variation in the Hawaiian cricket genus Laupala. We document a QTL underlying female acoustic preference variation between 2 closely related species (Laupala kohalensis and Laupala paranigra). This preference QTL colocalizes with a song QTL identified previously, providing compelling evidence for a genomic linkage of the genes underlying these traits. We show that both song and preference QTL make small to moderate contributions to the behavioral difference between species, suggesting that divergence in mating behavior among Laupala species is due to the fixation of many genes of minor effect. The diversity of acoustic signaling systems in crickets exemplifies the evolution of elaborate male displays by sexual selection through female choice. Our data reveal genetic conditions that would enable functional coordination between song and acoustic preference divergence during speciation, resulting in a behaviorally coupled mode of signal-preference evolution. Interestingly, Laupala exhibits one of the fastest rates of speciation in animals, concomitant with equally rapid evolution in sexual signaling behaviors. Genomic linkage may facilitate rapid speciation by contributing to genetic correlations between sexual signaling behaviors that eventually cause sexual isolation between diverging populations. PMID:19487670
Sexton, Brittany S.; Druliner, Brooke R.; Vera, Daniel L.; Avey, Denis; Zhu, Fanxiu; Dennis, Jonathan H.
2016-01-01
Nucleosome occupancy is critically important in regulating access to the eukaryotic genome. Few studies in human cells have measured genome-wide nucleosome distributions at high temporal resolution during a response to a common stimulus. We measured nucleosome distributions at high temporal resolution following Kaposi's-sarcoma-associated herpesvirus (KSHV) reactivation using our newly developed mTSS-seq technology, which maps nucleosome distribution at the transcription start sites (TSS) of all human genes. Nucleosomes underwent widespread changes in organization 24 hours after KSHV reactivation and returned to their basal nucleosomal architecture 48 hours after KSHV reactivation. The widespread changes consisted of an indiscriminate remodeling event resulting in the loss of nucleosome rotational phasing signals. Additionally, one in six TSSs in the human genome possessed nucleosomes that are translationally remodeled. 72% of the loci with translationally remodeled nucleosomes have nucleosomes that moved to positions encoded by the underlying DNA sequence. Finally we demonstrated that these widespread alterations in nucleosomal architecture potentiated regulatory factor binding. These descriptions of nucleosomal architecture changes provide a new framework for understanding the role of chromatin in the genomic response, and have allowed us to propose a hierarchical model for chromatin-based regulation of genome response. PMID:26771136
Goswami, Sathi; Sanyal, Sulagna; Chakraborty, Payal; Das, Chandrima; Sarkar, Munna
2017-08-01
NSAIDs are the most common class of painkillers and anti-inflammatory agents. They also show other functions like chemoprevention and chemosuppression for which they act at the protein but not at the genome level since they are mostly anions at physiological pH, which prohibit their approach to the poly-anionic DNA. Complexing the drugs with bioactive metal obliterate their negative charge and allow them to bind to the DNA, thereby, opening the possibility of genome level interaction. To test this hypothesis, we present the interaction of a traditional NSAID, Piroxicam and its copper complex with core histone and chromatin. Spectroscopy, DLS, and SEM studies were applied to see the effect of the interaction on the structure of histone/chromatin. This was coupled with MTT assay, immunoblot analysis, confocal microscopy, micro array analysis and qRT-PCR. The interaction of Piroxicam and its copper complex with histone/chromatin results in structural alterations. Such structural alterations can have different biological manifestations, but to test our hypothesis, we have focused only on the accompanied modulations at the epigenomic/genomic level. The complex, showed alteration of key epigenetic signatures implicated in transcription in the global context, although Piroxicam caused no significant changes. We have correlated such alterations caused by the complex with the changes in global gene expression and validated the candidate gene expression alterations. Our results provide the proof of concept that DNA binding ability of the copper complexes of a traditional NSAID, opens up the possibility of modulations at the epigenomic/genomic level. Copyright © 2017 Elsevier B.V. All rights reserved.
Microdiversification in genome-streamlined ubiquitous freshwater Actinobacteria.
Neuenschwander, Stefan M; Ghai, Rohit; Pernthaler, Jakob; Salcher, Michaela M
2018-01-01
Actinobacteria of the acI lineage are the most abundant microbes in freshwater systems, but there are so far no pure living cultures of these organisms, possibly because of metabolic dependencies on other microbes. This, in turn, has hampered an in-depth assessment of the genomic basis for their success in the environment. Here we present genomes from 16 axenic cultures of acI Actinobacteria. The isolates were not only of minute cell size, but also among the most streamlined free-living microbes, with extremely small genome sizes (1.2-1.4 Mbp) and low genomic GC content. Genome reduction in these bacteria might have led to auxotrophy for various vitamins, amino acids and reduced sulphur sources, thus creating dependencies to co-occurring organisms (the 'Black Queen' hypothesis). Genome analyses, moreover, revealed a surprising degree of inter- and intraspecific diversity in metabolic pathways, especially of carbohydrate transport and metabolism, and mainly encoded in genomic islands. The striking genotype microdiversification of acI Actinobacteria might explain their global success in highly dynamic freshwater environments with complex seasonal patterns of allochthonous and autochthonous carbon sources. We propose a new order within Actinobacteria ('Candidatus Nanopelagicales') with two new genera ('Candidatus Nanopelagicus' and 'Candidatus Planktophila') and nine new species.
Swaggart, Kayleigh A.; Pavlicev, Mihaela; Muglia, Louis J.
2015-01-01
The molecular mechanisms controlling human birth timing at term, or resulting in preterm birth, have been the focus of considerable investigation, but limited insights have been gained over the past 50 years. In part, these processes have remained elusive because of divergence in reproductive strategies and physiology shown by model organisms, making extrapolation to humans uncertain. Here, we summarize the evolution of progesterone signaling and variation in pregnancy maintenance and termination. We use this comparative physiology to support the hypothesis that selective pressure on genomic loci involved in the timing of parturition have shaped human birth timing, and that these loci can be identified with comparative genomic strategies. Previous limitations imposed by divergence of mechanisms provide an important new opportunity to elucidate fundamental pathways of parturition control through increasing availability of sequenced genomes and associated reproductive physiology characteristics across diverse organisms. PMID:25646385
Lampe, David J; Witherspoon, David J; Soto-Adames, Felipe N; Robertson, Hugh M
2003-04-01
We report the isolation and sequencing of genomic copies of mariner transposons involved in recent horizontal transfers into the genomes of the European earwig, Forficula auricularia; the European honey bee, Apis mellifera; the Mediterranean fruit fly, Ceratitis capitata; and a blister beetle, Epicauta funebris, insects from four different orders. These elements are in the mellifera subfamily and are the second documented example of full-length mariner elements involved in this kind of phenomenon. We applied maximum likelihood methods to the coding sequences and determined that the copies in each genome were evolving neutrally, whereas reconstructed ancestral coding sequences appeared to be under selection, which strengthens our previous hypothesis that the primary selective constraint on mariner sequence evolution is the act of horizontal transfer between genomes.
Valdes Franco, José A; Wang, Yi; Huo, Naxin; Ponciano, Grisel; Colvin, Howard A; McMahan, Colleen M; Gu, Yong Q; Belknap, William R
2018-04-19
Guayule (Parthenium argentatum A. Gray) is a rubber-producing desert shrub native to Mexico and the United States. Guayule represents an alternative to Hevea brasiliensis as a source for commercial natural rubber. The efficient application of modern molecular/genetic tools to guayule improvement requires characterization of its genome. The 1.6 Gb guayule genome was sequenced, assembled and annotated. The final 1.5 Gb assembly, while fragmented (N 50 = 22 kb), maps > 95% of the shotgun reads and is essentially complete. Approximately 40,000 transcribed, protein encoding genes were annotated on the assembly. Further characterization of this genome revealed 15 families of small, microsatellite-associated, transposable elements (TEs) with unexpected chromosomal distribution profiles. These SaTar (Satellite Targeted) elements, which are non-autonomous Mu-like elements (MULEs), were frequently observed in multimeric linear arrays of unrelated individual elements within which no individual element is interrupted by another. This uniformly non-nested TE multimer architecture has not been previously described in either eukaryotic or prokaryotic genomes. Five families of similarly distributed non-autonomous MULEs (microsatellite associated, modularly assembled) were characterized in the rice genome. Families of TEs with similar structures and distribution profiles were identified in sorghum and citrus. The sequencing and assembly of the guayule genome provides a foundation for application of current crop improvement technologies to this plant. In addition, characterization of this genome revealed SaTar elements with distribution profiles unique among TEs. Satar targeting appears based on an alternative MULE recombination mechanism with the potential to impact gene evolution.
Liu, Gangbiao; Zou, Yangyun; Cheng, Qiqun; Zeng, Yanwu; Gu, Xun; Su, Zhixi
2014-04-01
The age distribution of gene duplication events within the human genome exhibits two waves of duplications along with an ancient component. However, because of functional constraint differences, genes in different functional categories might show dissimilar retention patterns after duplication. It is known that genes in some functional categories are highly duplicated in the early stage of vertebrate evolution. However, the correlations of the age distribution pattern of gene duplication between the different functional categories are still unknown. To investigate this issue, we developed a robust pipeline to date the gene duplication events in the human genome. We successfully estimated about three-quarters of the duplication events within the human genome, along with the age distribution pattern in each Gene Ontology (GO) slim category. We found that some GO slim categories show different distribution patterns when compared to the whole genome. Further hierarchical clustering of the GO slim functional categories enabled grouping into two main clusters. We found that human genes located in the duplicated copy number variant regions, whose duplicate genes have not been fixed in the human population, were mainly enriched in the groups with a high proportion of recently duplicated genes. Moreover, we used a phylogenetic tree-based method to date the age of duplications in three signaling-related gene superfamilies: transcription factors, protein kinases and G-protein coupled receptors. These superfamilies were expressed in different subcellular localizations. They showed a similar age distribution as the signaling-related GO slim categories. We also compared the differences between the age distributions of gene duplications in multiple subcellular localizations. We found that the distribution patterns of the major subcellular localizations were similar to that of the whole genome. This study revealed the whole picture of the evolution patterns of gene functional categories in the human genome.
Evolution of the Structure and Chromosomal Distribution of Histidine Biosynthetic Genes
NASA Astrophysics Data System (ADS)
Fani, Renato; Mori, Elena; Tamburini, Elena; Lazcano, Antonio
1998-10-01
A database of more than 100 histidine biosynthetic genes from different organisms belonging to the three primary domains has been analyzed, including those found in the now completely sequenced genomes of Haemophilus influenzae, Mycoplasma genitalium, Synechocystis sp., Methanococcus jannaschii, and Saccharomyces cerevisiae. The ubiquity of his genes suggests that it is a highly conserved pathway that was probably already present in the last common ancestor of all extant life. The chromosomal distribution of the his genes shows that the enterobacterial histidine operon structure is not the only possible organization, and that there is a diversity of gene arrays for the his pathway. Analysis of the available sequences shows that gene fusions (like those involved in the origin of the Escherichia coli and Salmonella typhimurium hisIE and hisB gene structures) are not universal. In contrast, the elongation event that led to the extant hisA gene from two homologous ancestral modules, as well as the subsequent paralogous duplication that originated hisF, appear to be irreversible and are conserved in all known organisms. The available evidence supports the hypothesis that histidine biosynthesis was assembled by a gene recruitment process.
Facio, Flavia M; Sapp, Julie C; Linn, Amy; Biesecker, Leslie G
2012-10-10
Massively-parallel sequencing (MPS) technologies create challenges for informed consent of research participants given the enormous scale of the data and the wide range of potential results. We propose that the consent process in these studies be based on whether they use MPS to test a hypothesis or to generate hypotheses. To demonstrate the differences in these approaches to informed consent, we describe the consent processes for two MPS studies. The purpose of our hypothesis-testing study is to elucidate the etiology of rare phenotypes using MPS. The purpose of our hypothesis-generating study is to test the feasibility of using MPS to generate clinical hypotheses, and to approach the return of results as an experimental manipulation. Issues to consider in both designs include: volume and nature of the potential results, primary versus secondary results, return of individual results, duty to warn, length of interaction, target population, and privacy and confidentiality. The categorization of MPS studies as hypothesis-testing versus hypothesis-generating can help to clarify the issue of so-called incidental or secondary results for the consent process, and aid the communication of the research goals to study participants.
Pujolar, J M; Jacobsen, M W; Bekkevold, D; Lobón-Cervià, J; Jónsson, B; Bernatchez, L; Hansen, M M
2015-08-13
Species showing complex life cycles provide excellent opportunities to study the genetic associations between life cycle stages, as selective pressures may differ before and after metamorphosis. The European eel presents a complex life cycle with two metamorphoses, a first metamorphosis from larvae into glass eels (juvenile stage) and a second metamorphosis into silver eels (adult stage). We tested the hypothesis that different genes and gene pathways will be under selection at different life stages when comparing the genetic associations between glass eels and silver eels. We used two sets of markers to test for selection: first, we genotyped individuals using a panel of 80 coding-gene single nucleotide polymorphisms (SNPs) developed in American eel; second, we investigated selection at the genome level using a total of 153,423 RAD-sequencing generated SNPs widely distributed across the genome. Using the RAD approach, outlier tests identified a total of 2413 (1.57%) potentially selected SNPs. Functional annotation analysis identified signal transduction pathways as the most over-represented group of genes, including MAPK/Erk signalling, calcium signalling and GnRH (gonadotropin-releasing hormone) signalling. Many of the over-represented pathways were related to growth, while others could result from the different conditions that eels inhabit during their life cycle. The observation of different genes and gene pathways under selection when comparing glass eels vs. silver eels supports the adaptive decoupling hypothesis for the benefits of metamorphosis. Partitioning the life cycle into discrete morphological phases may be overall beneficial since it allows the different life stages to respond independently to their unique selection pressures. This might translate into a more effective use of food and niche resources and/or performance of phase-specific tasks (e.g. feeding in the case of glass eels, migrating and reproducing in the case of silver eels).
NASA Technical Reports Server (NTRS)
Hussain, A. K. M. F.
1980-01-01
Comparisons of the distributions of large scale structures in turbulent flow with distributions based on time dependent signals from stationary probes and the Taylor hypothesis are presented. The study investigated an area in the near field of a 7.62 cm circular air jet at a Re of 32,000, specifically having coherent structures through small-amplitude controlled excitation and stable vortex pairing in the jet column mode. Hot-wire and X-wire anemometry were employed to establish phase averaged spatial distributions of longitudinal and lateral velocities, coherent Reynolds stress and vorticity, background turbulent intensities, streamlines and pseudo-stream functions. The Taylor hypothesis was used to calculate spatial distributions of the phase-averaged properties, with results indicating that the usage of the local time-average velocity or streamwise velocity produces large distortions.
Sun, Genlou; Komatsuda, Takao
2010-08-01
It is well known that Elymus arose through hybridization between representatives of different genera. Cytogenetic analyses show that all its members include the St genome in combination with one or more of four other genomes, the H, Y, P, and W genomes. The origins of the H, P, and W genomes are known, but not for the Y genome. We analyzed the single copy nuclear gene coding for elongation factor G (EF-G) from 28 accessions of polyploid Elymus species and 45 accessions of diploid Triticeae species in order to investigate origin of the Y genome and its relationship to other genomes in the tribe Triticeae. Sequence comparisons among the St, H, Y, P, W, and E genomes detected genome-specific polymorphisms at 66 nucleotide positions. The St and Y genomes are relatively dissimilar. The phylogeny of the Y genome sequences was investigated for the first time. They were most similar to the W genome sequences. The Y genome sequences were placed in two different groups. These two groups were included in an unresolved clade that included the W and E sequences as well as sequences from many annual species. The H genomes sequences were in a clade with the F, P, and Ns genome sequences as sister groups. These two clades were more closely related to each other and to the L and Xp genomes than they were to the St genome sequences. These data support the hypothesis that the Y genome evolved in a diploid species and has a different origin from the St genome. Copyright 2010 Elsevier Inc. All rights reserved.
GlobAl Distribution of GEnetic Traits (GADGET) web server: polygenic trait scores worldwide.
Chande, Aroon T; Wang, Lu; Rishishwar, Lavanya; Conley, Andrew B; Norris, Emily T; Valderrama-Aguirre, Augusto; Jordan, I King
2018-05-18
Human populations from around the world show striking phenotypic variation across a wide variety of traits. Genome-wide association studies (GWAS) are used to uncover genetic variants that influence the expression of heritable human traits; accordingly, population-specific distributions of GWAS-implicated variants may shed light on the genetic basis of human phenotypic diversity. With this in mind, we developed the GlobAl Distribution of GEnetic Traits web server (GADGET http://gadget.biosci.gatech.edu). The GADGET web server provides users with a dynamic visual platform for exploring the relationship between worldwide genetic diversity and the genetic architecture underlying numerous human phenotypes. GADGET integrates trait-implicated single nucleotide polymorphisms (SNPs) from GWAS, with population genetic data from the 1000 Genomes Project, to calculate genome-wide polygenic trait scores (PTS) for 818 phenotypes in 2504 individual genomes. Population-specific distributions of PTS are shown for 26 human populations across 5 continental population groups, with traits ordered based on the extent of variation observed among populations. Users of GADGET can also upload custom trait SNP sets to visualize global PTS distributions for their own traits of interest.
Comparative Sequence Analysis of the X-Inactivation Center Region in Mouse, Human, and Bovine
Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent
2002-01-01
We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5′ of Xist that was recently shown to attract histone modification early after the onset of X inactivation. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ421478, AJ421479, AJ421480, and AJ421481. Online supplemental data are available at http://pbil.univ-lyon1.fr/datasets/Xic2002/data.html and www.genome.org.] PMID:12045143
Vidigal, Pedro M P; Mafra, Claudio L; Silva, Fernanda M F; Fietto, Juliana L R; Silva Júnior, Abelardo; Almeida, Márcia R
2012-01-01
Porcine circovirus-2 (PCV-2) is an emerging virus associated with a number of different syndromes in pigs known as Porcine Circovirus Associated Diseases (PCVAD). Since its identification and characterization in the early 1990s, PCV-2 has achieved a worldwide distribution, becoming endemic in most pig-producing countries, and is currently considered as the main cause of losses on pig farms. In this study, we analyzed the main routes of the spread of PCV-2 between pig-producing countries using phylogenetic and phylogeographical approaches. A search for PCV-2 genome sequences in GenBank was performed, and the 420 PCV-2 sequences obtained were grouped into haplotypes (group of sequences that showed 100% identity), based on the infinite sites model of genome evolution. A phylogenetic hypothesis was inferred by Bayesian Inference for the classification of viral strains and a haplotype network was constructed by Median Joining to predict the geographical distribution of and genealogical relationships between haplotypes. In order to establish an epidemiological and economic context in these analyses, we considered all information about PCV-2 sequences available in GenBank, including papers published on viral isolation, and live pig trading statistics available on the UN Comtrade database (http://comtrade.un.org/). In these analyses, we identified a strong correlation between the means of PCV-2 dispersal predicted by the haplotype network and the statistics on the international trading of live pigs. This correlation provides a new perspective on the epidemiology of PCV-2, highlighting the importance of the movement of animals around the world in the emergence of new pathogens, and showing the need for effective sanitary barriers when trading live animals. Copyright © 2011 Elsevier B.V. All rights reserved.
Baco, Amy R.; Cairns, Stephen D.
2012-01-01
Recent studies have countered the paradigm of seamount isolation, confounding conservation efforts at a critical time. Efforts to study deep-sea corals, one of the dominant taxa on seamounts, to understand seamount connectivity, are hampered by a lack of taxonomic keys. A prerequisite for connectivity is species overlap. Attempts to better understand species overlap using DNA barcoding methods suggest coral species are widely distributed on seamounts and nearby features. However, no baseline has been established for variation in these genetic markers relative to morphological species designations for deep-sea octocoral families. Here we assess levels of genetic variation in potential octocoral mitochondrial barcode markers relative to thoroughly examined morphological species in the genus Narella. The combination of six markers used here, approximately 3350 bp of the mitochondrial genome, resolved 83% of the morphological species. Our results show that two of the markers, ND2 and NCR1, are not sufficient to resolve genera within Primnoidae, let alone species. Re-evaluation of previous studies of seamount octocorals based on these results suggest that those studies were looking at distributions at a level higher than species, possibly even genus or subfamily. Results for Narella show that using more markers provides haplotypes with relatively narrow depth ranges on the seamounts studied. Given the lack of 100% resolution of species with such a large portion of the mitochondrial genome, we argue that previous genetic studies have not resolved the degree of species overlap on seamounts and that we may not have the power to even test the hypothesis of seamount isolation using mitochondrial markers, let alone refute it. Thus a precautionary approach is advocated in seamount conservation and management, and the potential for depth structuring should be considered. PMID:23029093
HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.
Song, Chi; Tseng, George C
2014-01-01
Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.
Xia, Han; Beck, Andrew S.; Gargili, Aysen; Forrester, Naomi; Barrett, Alan D. T.; Bente, Dennis A.
2016-01-01
The trade-off hypothesis, the current paradigm of arbovirus evolution, proposes that cycling between vertebrate and invertebrate hosts presents significant constraints on genetic change of arboviruses. Studying these constraints in mosquito-borne viruses has led to a new understanding of epizootics. The trade-off hypothesis is assumed to be applicable to tick-borne viruses too, although studies are lacking. Tick-borne Crimean-Congo hemorrhagic fever virus (CCHFV), a member of the family Bunyaviridae, is a major cause of severe human disease worldwide and shows an extraordinary amount of genetic diversity compared to other arboviruses, which has been linked to increased virulence and emergence in new environments. Using a transmission model for CCHFV, utilizing the main vector tick species and mice plus next generation sequencing, we detected a substantial number of consensus-level mutations in CCHFV recovered from ticks after only a single transstadial transmission, whereas none were detected in CCHFV obtained from the mammalian host. Furthermore, greater viral intra-host diversity was detected in the tick compared to the vertebrate host. Long-term association of CCHFV with its tick host for 1 year demonstrated mutations in the viral genome become fixed over time. These findings suggest that the trade-off hypothesis may not be accurate for all arboviruses. PMID:27775001
Xia, Han; Beck, Andrew S; Gargili, Aysen; Forrester, Naomi; Barrett, Alan D T; Bente, Dennis A
2016-10-24
The trade-off hypothesis, the current paradigm of arbovirus evolution, proposes that cycling between vertebrate and invertebrate hosts presents significant constraints on genetic change of arboviruses. Studying these constraints in mosquito-borne viruses has led to a new understanding of epizootics. The trade-off hypothesis is assumed to be applicable to tick-borne viruses too, although studies are lacking. Tick-borne Crimean-Congo hemorrhagic fever virus (CCHFV), a member of the family Bunyaviridae, is a major cause of severe human disease worldwide and shows an extraordinary amount of genetic diversity compared to other arboviruses, which has been linked to increased virulence and emergence in new environments. Using a transmission model for CCHFV, utilizing the main vector tick species and mice plus next generation sequencing, we detected a substantial number of consensus-level mutations in CCHFV recovered from ticks after only a single transstadial transmission, whereas none were detected in CCHFV obtained from the mammalian host. Furthermore, greater viral intra-host diversity was detected in the tick compared to the vertebrate host. Long-term association of CCHFV with its tick host for 1 year demonstrated mutations in the viral genome become fixed over time. These findings suggest that the trade-off hypothesis may not be accurate for all arboviruses.
Dennis, Jessica; Medina-Rivera, Alejandra; Truong, Vinh; Antounians, Lina; Zwingerman, Nora; Carrasco, Giovana; Strug, Lisa; Wells, Phil; Trégouët, David-Alexandre; Morange, Pierre-Emmanuel; Wilson, Michael D; Gagnon, France
2017-07-01
Tissue factor pathway inhibitor (TFPI) regulates the formation of intravascular blood clots, which manifest clinically as ischemic heart disease, ischemic stroke, and venous thromboembolism (VTE). TFPI plasma levels are heritable, but the genetics underlying TFPI plasma level variability are poorly understood. Herein we report the first genome-wide association scan (GWAS) of TFPI plasma levels, conducted in 251 individuals from five extended French-Canadian Families ascertained on VTE. To improve discovery, we also applied a hypothesis-driven (HD) GWAS approach that prioritized single nucleotide polymorphisms (SNPs) in (1) hemostasis pathway genes, and (2) vascular endothelial cell (EC) regulatory regions, which are among the highest expressers of TFPI. Our GWAS identified 131 SNPs with suggestive evidence of association (P-value < 5 × 10 -8 ), but no SNPs reached the genome-wide threshold for statistical significance. Hemostasis pathway genes were not enriched for TFPI plasma level associated SNPs (global hypothesis test P-value = 0.147), but EC regulatory regions contained more TFPI plasma level associated SNPs than expected by chance (global hypothesis test P-value = 0.046). We therefore stratified our genome-wide SNPs, prioritizing those in EC regulatory regions via stratified false discovery rate (sFDR) control, and reranked the SNPs by q-value. The minimum q-value was 0.27, and the top-ranked SNPs did not show association evidence in the MARTHA replication sample of 1,033 unrelated VTE cases. Although this study did not result in new loci for TFPI, our work lays out a strategy to utilize epigenomic data in prioritization schemes for future GWAS studies. © 2017 WILEY PERIODICALS, INC.
Keers, Robert; Coleman, Jonathan R.I.; Lester, Kathryn J.; Roberts, Susanna; Breen, Gerome; Thastum, Mikael; Bögels, Susan; Schneider, Silvia; Heiervang, Einar; Meiser-Stedman, Richard; Nauta, Maaike; Creswell, Cathy; Thirlwall, Kerstin; Rapee, Ronald M.; Hudson, Jennifer L.; Lewis, Cathryn; Plomin, Robert; Eley, Thalia C.
2016-01-01
Background The differential susceptibly hypothesis suggests that certain genetic variants moderate the effects of both negative and positive environments on mental health and may therefore be important predictors of response to psychological treatments. Nevertheless, the identification of such variants has so far been limited to preselected candidate genes. In this study we extended the differential susceptibility hypothesis from a candidate gene to a genome-wide approach to test whether a polygenic score of environmental sensitivity predicted response to cognitive behavioural therapy (CBT) in children with anxiety disorders. Methods We identified variants associated with environmental sensitivity using a novel method in which within-pair variability in emotional problems in 1,026 monozygotic twin pairs was examined as a function of the pairs' genotype. We created a polygenic score of environmental sensitivity based on the whole-genome findings and tested the score as a moderator of parenting on emotional problems in 1,406 children and response to individual, group and brief parent-led CBT in 973 children with anxiety disorders. Results The polygenic score significantly moderated the effects of parenting on emotional problems and the effects of treatment. Individuals with a high score responded significantly better to individual CBT than group CBT or brief parent-led CBT (remission rates: 70.9, 55.5 and 41.6%, respectively). Conclusions Pending successful replication, our results should be considered exploratory. Nevertheless, if replicated, they suggest that individuals with the greatest environmental sensitivity may be more likely to develop emotional problems in adverse environments but also benefit more from the most intensive types of treatment. PMID:27043157
The topography of mutational processes in breast cancer genomes.
Morganella, Sandro; Alexandrov, Ludmil B; Glodzik, Dominik; Zou, Xueqing; Davies, Helen; Staaf, Johan; Sieuwerts, Anieta M; Brinkman, Arie B; Martin, Sancha; Ramakrishna, Manasa; Butler, Adam; Kim, Hyung-Yong; Borg, Åke; Sotiriou, Christos; Futreal, P Andrew; Campbell, Peter J; Span, Paul N; Van Laere, Steven; Lakhani, Sunil R; Eyfjord, Jorunn E; Thompson, Alastair M; Stunnenberg, Hendrik G; van de Vijver, Marc J; Martens, John W M; Børresen-Dale, Anne-Lise; Richardson, Andrea L; Kong, Gu; Thomas, Gilles; Sale, Julian; Rada, Cristina; Stratton, Michael R; Birney, Ewan; Nik-Zainal, Serena
2016-05-02
Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription, DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Furthermore, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis.
Keaton, Jacob M; Gao, Chuan; Guan, Meijian; Hellwege, Jacklyn N; Palmer, Nicholette D; Pankow, James S; Fornage, Myriam; Wilson, James G; Correa, Adolfo; Rasmussen-Torvik, Laura J; Rotter, Jerome I; Chen, Yii-Der I; Taylor, Kent D; Rich, Stephen S; Wagenknecht, Lynne E; Freedman, Barry I; Ng, Maggie C Y; Bowden, Donald W
2018-04-24
Although type 2 diabetes (T2D) results from metabolic defects in insulin secretion and insulin sensitivity, most of the genetic risk loci identified to date relates to insulin secretion. We reported that T2D loci influencing insulin sensitivity may be identified through interactions with insulin secretion loci, thereby leading to T2D. Here, we hypothesize that joint testing of variant main effects and interaction effects with an insulin secretion locus increases power to identify genetic interactions leading to T2D. We tested this hypothesis with an intronic MTNR1B SNP, rs10830963, which is associated with acute insulin response to glucose, a dynamic measure of insulin secretion. rs10830963 was tested for interaction and joint (main + interaction) effects with genome-wide data in African Americans (2,452 cases and 3,772 controls) from five cohorts. Genome-wide genotype data (Affymetrix Human Genome 6.0 array) was imputed to a 1000 Genomes Project reference panel. T2D risk was modeled using logistic regression with rs10830963 dosage, age, sex, and principal component as predictors. Joint effects were captured using the Kraft two degrees of freedom test. Genome-wide significant (P < 5 × 10 -8 ) interaction with MTNR1B and joint effects were detected for CMIP intronic SNP rs17197883 (P interaction = 1.43 × 10 -8 ; P joint = 4.70 × 10 -8 ). CMIP variants have been nominally associated with T2D, fasting glucose, and adiponectin in individuals of East Asian ancestry, with high-density lipoprotein, and with waist-to-hip ratio adjusted for body mass index in Europeans. These data support the hypothesis that additional genetic factors contributing to T2D risk, including insulin sensitivity loci, can be identified through interactions with insulin secretion loci. © 2018 WILEY PERIODICALS, INC.
Genetic diversity and population structure of Musa accessions in ex situ conservation
2013-01-01
Background Banana cultivars are mostly derived from hybridization between wild diploid subspecies of Musa acuminata (A genome) and M. balbisiana (B genome), and they exhibit various levels of ploidy and genomic constitution. The Embrapa ex situ Musa collection contains over 220 accessions, of which only a few have been genetically characterized. Knowledge regarding the genetic relationships and diversity between modern cultivars and wild relatives would assist in conservation and breeding strategies. Our objectives were to determine the genomic constitution based on Internal Transcribed Spacer (ITS) regions polymorphism and the ploidy of all accessions by flow cytometry and to investigate the population structure of the collection using Simple Sequence Repeat (SSR) loci as co-dominant markers based on Structure software, not previously performed in Musa. Results From the 221 accessions analyzed by flow cytometry, the correct ploidy was confirmed or established for 212 (95.9%), whereas digestion of the ITS region confirmed the genomic constitution of 209 (94.6%). Neighbor-joining clustering analysis derived from SSR binary data allowed the detection of two major groups, essentially distinguished by the presence or absence of the B genome, while subgroups were formed according to the genomic composition and commercial classification. The co-dominant nature of SSR was explored to analyze the structure of the population based on a Bayesian approach, detecting 21 subpopulations. Most of the subpopulations were in agreement with the clustering analysis. Conclusions The data generated by flow cytometry, ITS and SSR supported the hypothesis about the occurrence of homeologue recombination between A and B genomes, leading to discrepancies in the number of sets or portions from each parental genome. These phenomenons have been largely disregarded in the evolution of banana, as the “single-step domestication” hypothesis had long predominated. These findings will have an impact in future breeding approaches. Structure analysis enabled the efficient detection of ancestry of recently developed tetraploid hybrids by breeding programs, and for some triploids. However, for the main commercial subgroups, Structure appeared to be less efficient to detect the ancestry in diploid groups, possibly due to sampling restrictions. The possibility of inferring the membership among accessions to correct the effects of genetic structure opens possibilities for its use in marker-assisted selection by association mapping. PMID:23497122
Zeng, Tian Chen; Aw, Alan J; Feldman, Marcus W
2018-05-25
In human populations, changes in genetic variation are driven not only by genetic processes, but can also arise from cultural or social changes. An abrupt population bottleneck specific to human males has been inferred across several Old World (Africa, Europe, Asia) populations 5000-7000 BP. Here, bringing together anthropological theory, recent population genomic studies and mathematical models, we propose a sociocultural hypothesis, involving the formation of patrilineal kin groups and intergroup competition among these groups. Our analysis shows that this sociocultural hypothesis can explain the inference of a population bottleneck. We also show that our hypothesis is consistent with current findings from the archaeogenetics of Old World Eurasia, and is important for conceptions of cultural and social evolution in prehistory.
Tsiagkas, Giannis; Nikolaou, Christoforos; Almirantis, Yannis
2014-12-01
CpG Islands (CGIs) are compositionally defined short genomic stretches, which have been studied in the human, mouse, chicken and later in several other genomes. Initially, they were assigned the role of transcriptional regulation of protein-coding genes, especially the house-keeping ones, while more recently there is found evidence that they are involved in several other functions as well, which might include regulation of the expression of RNA genes, DNA replication etc. Here, an investigation of their distributional characteristics in a variety of genomes is undertaken for both whole CGI populations as well as for CGI subsets that lie away from known genes (gene-unrelated or "orphan" CGIs). In both cases power-law-like linearity in double logarithmic scale is found. An evolutionary model, initially put forward for the explanation of a similar pattern found in gene populations is implemented. It includes segmental duplication events and eliminations of most of the duplicated CGIs, while a moderate rate of non-duplicated CGI eliminations is also applied in some cases. Simulations reproduce all the main features of the observed inter-CGI chromosomal size distributions. Our results on power-law-like linearity found in orphan CGI populations suggest that the observed distributional pattern is independent of the analogous pattern that protein coding segments were reported to follow. The power-law-like patterns in the genomic distributions of CGIs described herein are found to be compatible with several other features of the composition, abundance or functional role of CGIs reported in the current literature across several genomes, on the basis of the proposed evolutionary model. Copyright © 2014 Elsevier Ltd. All rights reserved.
Bacteria-Human Somatic Cell Lateral Gene Transfer Is Enriched in Cancer Samples
Robinson, Kelly M.; White, James Robert; Ganesan, Ashwinkumar; Nourbakhsh, Syrus; Dunning Hotopp, Julie C.
2013-01-01
There are 10× more bacterial cells in our bodies from the microbiome than human cells. Viral DNA is known to integrate in the human genome, but the integration of bacterial DNA has not been described. Using publicly available sequence data from the human genome project, the 1000 Genomes Project, and The Cancer Genome Atlas (TCGA), we examined bacterial DNA integration into the human somatic genome. Here we present evidence that bacterial DNA integrates into the human somatic genome through an RNA intermediate, and that such integrations are detected more frequently in (a) tumors than normal samples, (b) RNA than DNA samples, and (c) the mitochondrial genome than the nuclear genome. Hundreds of thousands of paired reads support random integration of Acinetobacter-like DNA in the human mitochondrial genome in acute myeloid leukemia samples. Numerous read pairs across multiple stomach adenocarcinoma samples support specific integration of Pseudomonas-like DNA in the 5′-UTR and 3′-UTR of four proto-oncogenes that are up-regulated in their transcription, consistent with conversion to an oncogene. These data support our hypothesis that bacterial integrations occur in the human somatic genome and may play a role in carcinogenesis. We anticipate that the application of our approach to additional cancer genome projects will lead to the more frequent detection of bacterial DNA integrations in tumors that are in close proximity to the human microbiome. PMID:23840181
Multiple sclerosis: a geographical hypothesis.
Carlyle, I P
1997-12-01
Multiple sclerosis remains a rare neurological disease of unknown aetiology, with a unique distribution, both geographically and historically. Rare in equatorial regions, it becomes increasingly common in higher latitudes; historically, it was first clinically recognized in the early nineteenth century. A hypothesis, based on geographical reasoning, is here proposed: that the disease is the result of a specific vitamin deficiency. Different individuals suffer the deficiency in separate and often unique ways. Evidence to support the hypothesis exists in cultural considerations, in the global distribution of the disease, and in its historical prevalence.
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-01-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0. PMID:27892471
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-11-28
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
NASA Astrophysics Data System (ADS)
Mieth, Bettina; Kloft, Marius; Rodríguez, Juan Antonio; Sonnenburg, Sören; Vobruba, Robin; Morcillo-Suárez, Carlos; Farré, Xavier; Marigorta, Urko M.; Fehr, Ernst; Dickhaus, Thorsten; Blanchard, Gilles; Schunk, Daniel; Navarro, Arcadi; Müller, Klaus-Robert
2016-11-01
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
Contrasting growth phenology of native and invasive forest shrubs mediated by genome size.
Fridley, Jason D; Craddock, Alaä
2015-08-01
Examination of the significance of genome size to plant invasions has been largely restricted to its association with growth rate. We investigated the novel hypothesis that genome size is related to forest invasions through its association with growth phenology, as a result of the ability of large-genome species to grow more effectively through cell expansion at cool temperatures. We monitored the spring leaf phenology of 54 species of eastern USA deciduous forests, including native and invasive shrubs of six common genera. We used new measurements of genome size to evaluate its association with spring budbreak, cell size, summer leaf production rate, and photosynthetic capacity. In a phylogenetic hierarchical model that differentiated native and invasive species as a function of summer growth rate and spring budbreak timing, species with smaller genomes exhibited both faster growth and delayed budbreak compared with those with larger nuclear DNA content. Growth rate, but not budbreak timing, was associated with whether a species was native or invasive. Our results support genome size as a broad indicator of the growth behavior of woody species. Surprisingly, invaders of deciduous forests show the same small-genome tendencies of invaders of more open habitats, supporting genome size as a robust indicator of invasiveness. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.
Garazha, Andrew; Ivanova, Alena; Suntsova, Maria; Malakhova, Galina; Roumiantsev, Sergey; Zhavoronkov, Alex; Buzdin, Anton
2015-01-01
Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.
Parasitism drives host genome evolution: Insights from the Pasteuria ramosa-Daphnia magna system.
Bourgeois, Yann; Roulin, Anne C; Müller, Kristina; Ebert, Dieter
2017-04-01
Because parasitism is thought to play a major role in shaping host genomes, it has been predicted that genomic regions associated with resistance to parasites should stand out in genome scans, revealing signals of selection above the genomic background. To test whether parasitism is indeed such a major factor in host evolution and to better understand host-parasite interaction at the molecular level, we studied genome-wide polymorphisms in 97 genotypes of the planktonic crustacean Daphnia magna originating from three localities across Europe. Daphnia magna is known to coevolve with the bacterial pathogen Pasteuria ramosa for which host genotypes (clonal lines) are either resistant or susceptible. Using association mapping, we identified two genomic regions involved in resistance to P. ramosa, one of which was already known from a previous QTL analysis. We then performed a naïve genome scan to test for signatures of positive selection and found that the two regions identified with the association mapping further stood out as outliers. Several other regions with evidence for selection were also found, but no link between these regions and phenotypic variation could be established. Our results are consistent with the hypothesis that parasitism is driving host genome evolution. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Core-satellite species hypothesis and native versus exotic species in secondary succession
Martinez, Kelsey A.; Gibson, David J.; Middleton, Beth A.
2015-01-01
A number of hypotheses exist to explain species’ distributions in a landscape, but these hypotheses are not frequently utilized to explain the differences in native and exotic species distributions. The core-satellite species (CSS) hypothesis predicts species occupancy will be bimodally distributed, i.e., many species will be common and many species will be rare, but does not explicitly consider exotic species distributions. The parallel dynamics (PD) hypothesis predicts that regional occurrence patterns of exotic species will be similar to native species. Together, the CSS and PD hypotheses may increase our understanding of exotic species’ distribution relative to natives. We selected an old field undergoing secondary succession to study the CSS and PD hypotheses in conjunction with each other. The ratio of exotic to native species (richness and abundance) was observed through 17 years of secondary succession. We predicted species would be bimodally distributed and that exotic:native species ratios would remain steady or decrease through time under frequent disturbance. In contrast to the CSS and PD hypotheses, native species occupancies were not bimodally distributed at the site, but exotic species were. The exotic:native species ratios for both richness (E:Nrichness) and abundance (E:Ncover) generally decreased or remained constant throughout supporting the PD hypothesis. Our results suggest exotic species exhibit metapopulation structure in old field landscapes, but that metapopulation structures of native species are disrupted, perhaps because these species are dispersal limited in the fragmented landscape.
Kotakis, Christos
2015-01-01
Ars longa, vita brevis -Hippocrates Chloroplasts and mitochondria are genetically semi-autonomous organelles inside the plant cell. These constructions formed after endosymbiosis and keep evolving throughout the history of life. Experimental evidence is provided for active non-coding RNAs (ncRNAs) in these prokaryote-like structures, and a possible functional imprinting on cellular electrophysiology by those RNA entities is described. Furthermore, updated knowledge on RNA metabolism of organellar genomes uncovers novel inter-communication bridges with the nucleus. This class of RNA molecules is considered as a unique ontogeny which transforms their biological role as a genetic rheostat into a synchronous biochemical one that can affect the energetic charge and redox homeostasis inside cells. A hypothesis is proposed where such modulation by non-coding RNAs is integrated with genetic signals regulating gene transfer. The implications of this working hypothesis are discussed, with particular reference to ncRNAs involvement in the organellar and nuclear genomes evolution since their integrity is functionally coupled with redox signals in photosynthetic organisms.
Wertz, J.; Caspi, A.; Belsky, D. W.; Beckley, A. L.; Arseneault, L.; Barnes, J. C.; Corcoran, D. L.; Hogan, S.; Houts, R. M.; Morgan, N.; Odgers, C. L.; Prinz, J. A.; Sugden, K.; Williams, B. S.; Poulton, R.; Moffitt, T. E.
2018-01-01
Drawing on psychological and sociological theories of crime causation, we tested the hypothesis that genetic risk for low educational attainment (assessed via a genome-wide polygenic score) is associated with offending. We further tested hypotheses of how polygenic risk relates to the development of antisocial behavior from childhood through adulthood. Across the Dunedin and E-Risk birth cohorts of individuals growing up 20 years and 20,000 kilometres apart, education polygenic scores predicted risk of a criminal record, with modest effects. Polygenic risk manifested during primary schooling, in lower cognitive abilities, lower self-control, academic difficulties, and truancy, and predicted a life-course persistent pattern of antisocial behavior that onsets in childhood and persists into adulthood. Crime is central in the nature/nurture debate, and findings reported here demonstrate how molecular-genetic discoveries can be incorporated into established theories of antisocial behavior. They also suggest the hypothesis that improving school experiences might prevent genetic influences on crime from unfolding. PMID:29513605
Parasite Prevalence and the Distribution of Intelligence among the States of the USA
ERIC Educational Resources Information Center
Eppig, Christopher; Fincher, Corey L.; Thornhill, Randy
2011-01-01
In this study, we tested the parasite-stress hypothesis for the distribution of intelligence among the USA states: the hypothesis proposes that intelligence emerges from a developmental trade-off between maximizing brain vs. immune function. From this we predicted that among the USA states where infectious disease stress was high, average…
Barcodes for genomes and applications
Zhou, Fengfeng; Olman, Victor; Xu, Ying
2008-01-01
Background Each genome has a stable distribution of the combined frequency for each k-mer and its reverse complement measured in sequence fragments as short as 1000 bps across the whole genome, for 1
Small homologous blocks in phytophthora genomes do not point to an ancient whole-genome duplication.
van Hooff, Jolien J E; Snel, Berend; Seidl, Michael F
2014-05-01
Genomes of the plant-pathogenic genus Phytophthora are characterized by small duplicated blocks consisting of two consecutive genes (2HOM blocks) and by an elevated abundance of similarly aged gene duplicates. Both properties, in particular the presence of 2HOM blocks, have been attributed to a whole-genome duplication (WGD) at the last common ancestor of Phytophthora. However, large intraspecies synteny-compelling evidence for a WGD-has not been detected. Here, we revisited the WGD hypothesis by deducing the age of 2HOM blocks. Two independent timing methods reveal that the majority of 2HOM blocks arose after divergence of the Phytophthora lineages. In addition, a large proportion of the 2HOM block copies colocalize on the same scaffold. Therefore, the presence of 2HOM blocks does not support a WGD at the last common ancestor of Phytophthora. Thus, genome evolution of Phytophthora is likely driven by alternative mechanisms, such as bursts of transposon activity.
Plastid genome structure and loss of photosynthetic ability in the parasitic genus Cuscuta.
Revill, Meredith J W; Stanley, Susan; Hibberd, Julian M
2005-09-01
The genus Cuscuta (dodder) is composed of parasitic plants, some species of which appear to be losing the ability to photosynthesize. A molecular phylogeny was constructed using 15 species of Cuscuta in order to assess whether changes in photosynthetic ability and alterations in structure of the plastid genome relate to phylogenetic position within the genus. The molecular phylogeny provides evidence for four major clades within Cuscuta. Although DNA blot analysis showed that Cuscuta species have smaller plastid genomes than tobacco, and that plastome size varied significantly even within one Cuscuta clade, dot blot analysis indicated that the dodders possess homologous sequence to 101 genes from the tobacco plastome. Evidence is provided for significant rates of DNA transfer from plastid to nucleus in Cuscuta. Size and structure of Cuscuta plastid genomes, as well as photosynthetic ability, appear to vary independently of position within the phylogeny, thus supporting the hypothesis that within Cuscuta photosynthetic ability and organization of the plastid genome are changing in an unco-ordinated manner.
[Genome similarity of Baikal omul and sig].
Bychenko, O S; Sukhanova, L V; Ukolova, S S; Skvortsov, T A; Potapov, V K; Azhikina, T L; Sverdlov, E D
2009-01-01
Two members of the Baikal sig family, a lake sig (Coregonus lavaretus baicalensis Dybovsky) and omul (C. autumnalis migratorius Georgi), are close relatives that diverged from the same ancestor 10-20 thousand years ago. In this work, we studied genomic polymorphism of these two fish species. The method of subtraction hybridization (SH) did not reveal the presence of extended sequences in the sig genome and their absence in the omul genome. All the fragments found by SH corresponded to polymorphous noncoding genome regions varying in mononucleotide substitutions and short deletions. Many of them are mapped close to genes of the immune system and have regions identical to the Tc-1-like transposons abundant among fish, whose transcription activity may affect the expression of adjacent genes. Thus, we showed for the first time that genetic differences between Baikal sig family members are extremely small and cannot be revealed by the SH method. This is another endorsement of the hypothesis on the close relationship between Baikal sig and omul and their evolutionarily recent divergence from a common ancestor.
2010-01-01
The karyotype structure of Arachis trinitensis was studied by conventional Feulgen staining, CMA/DAPI banding and rDNA loci detection by fluorescence in situ hybridization (FISH) in order to establish its genome status and test the hypothesis that this species is a genome donor of cultivated peanut. Conventional staining revealed that the karyotype lacked the small “A chromosomes” characteristic of the A genome. In agreement with this, chromosomal banding showed that none of the chromosomes had the large centromeric bands expected for A chromosomes. FISH revealed one pair each of 5S and 45S rDNA loci, located in different medium-sized metacentric chromosomes. Collectively, these results suggest that A. trinitensis should be removed from the A genome and be considered as a B or non-A genome species. The pattern of heterochromatic bands and rDNA loci of A. trinitensis differ markedly from any of the complements of A. hypogaea, suggesting that the former species is unlikely to be one of the wild diploid progenitors of the latter. PMID:21637581
Finkler, Fabrine; de Lima, Diane Alves; Cerva, Cristine; Cibulski, Samuel Paulo; Teixeira, Thais Fumaco; Dos Santos, Helton Fernandes; de Almeida, Laura Lopes; Roehe, Paulo Michel; Franco, Ana Cláudia
2016-12-01
Chicken parvovirus (ChPV) has been associated with malabsorption syndrome (MAS) in broilers. However, the participation of this virus in such syndrome is unclear, since it may be detected in diseased and healthy chickens. In the course of these studies, it was argued whether ChPV genome loads might be correlated to the occurrence of MAS. To check such a hypothesis, a SYBR green-based quantitative polymerase chain reaction was developed to detect and quantify ChPV genomes. Cloacal swabs from 68 broilers with MAS and 59 from healthy animals were collected from different poultry farms. Genomes of ChPV were detected in all samples, regardless of their health status. However, viral genome loads in MAS-affected broilers were significantly higher (1 × 10 5 genome copies per 100 ng DNA) than in healthy animals (1.3 × 10 3 GC/100 ng DNA). These findings indicate that there is an association between high ChPV genome loads and the occurrence of MAS in broilers.
The sea lamprey meiotic map improves resolution of ancient vertebrate genome duplications.
Smith, Jeramiah J; Keinath, Melissa C
2015-08-01
It is generally accepted that many genes present in vertebrate genomes owe their origin to two whole-genome duplications that occurred deep in the ancestry of the vertebrate lineage. However, details regarding the timing and outcome of these duplications are not well resolved. We present high-density meiotic and comparative genomic maps for the sea lamprey (Petromyzon marinus), a representative of an ancient lineage that diverged from all other vertebrates ∼550 million years ago. Linkage analyses yielded a total of 95 linkage groups, similar to the estimated number of germline chromosomes (1n ∼ 99), spanning a total of 5570.25 cM. Comparative mapping data yield strong support for the hypothesis that a single whole-genome duplication occurred in the basal vertebrate lineage, but do not strongly support a hypothetical second event. Rather, these comparative maps reveal several evolutionarily independent segmental duplications occurring over the last 600+ million years of chordate evolution. This refined history of vertebrate genome duplication should permit more precise investigations of vertebrate evolution. © 2015 Smith and Keinath; Published by Cold Spring Harbor Laboratory Press.
Internet Versus Virtual Reality Settings for Genomics Information Provision.
Persky, Susan; Kistler, William D; Klein, William M P; Ferrer, Rebecca A
2018-06-22
Current models of genomic information provision will be unable to handle large-scale clinical integration of genomic information, as may occur in primary care settings. Therefore, adoption of digital tools for genetic and genomic information provision is anticipated, primarily using Internet-based, distributed approaches. The emerging consumer communication platform of virtual reality (VR) is another potential intermediate approach between face-to-face and distributed Internet platforms to engage in genomics education and information provision. This exploratory study assessed whether provision of genomics information about body weight in a simulated, VR-based consultation (relative to a distributed, Internet platform) would be associated with differences in health behavior-related attitudes and beliefs, and interpersonal reactions to the avatar-physician. We also assessed whether outcomes differed depending upon whether genomic versus lifestyle-oriented information was conveyed. There were significant differences between communication platforms for all health behavior-oriented outcomes. Following communication in the VR setting, participants reported greater self-efficacy, dietary behavioral intentions, and exercise behavioral intentions than in the Internet-based setting. There were no differences in trust of the physician by setting, and no interaction between setting effects and the content of the information. This study was a first attempt to examine the potential capabilities of a VR-based communication setting for conveying genomic content in the context of weight management. There may be benefits to use of VR settings for communication about genomics, as well as more traditional health information, when it comes to influencing the attitudes and beliefs that underlie healthy lifestyle behaviors.
Longitudinal Dimensionality of Adolescent Psychopathology: Testing the Differentiation Hypothesis
ERIC Educational Resources Information Center
Sterba, Sonya K.; Copeland, William; Egger, Helen L.; Costello, E. Jane; Erkanli, Alaattin; Angold, Adrian
2010-01-01
Background: The differentiation hypothesis posits that the underlying liability distribution for psychopathology is of low dimensionality in young children, inflating diagnostic comorbidity rates, but increases in dimensionality with age as latent syndromes become less correlated. This hypothesis has not been adequately tested with longitudinal…
Within-genome evolution of REPINs: a new family of miniature mobile DNA in bacteria.
Bertels, Frederic; Rainey, Paul B
2011-06-01
Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT-containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA.
Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing
2007-01-01
Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.
Pattin, Kristine A.; White, Bill C.; Barney, Nate; Gui, Jiang; Nelson, Heather H.; Kelsey, Karl R.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.
2008-01-01
Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false-positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is in an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1000-fold permutation test. PMID:18671250
Kuleshov, Konstantin V; Kostikova, Anna; Pisarenko, Sergey V; Kovalev, Dmitry A; Tikhonov, Sergey N; Savelievа, Irina V; Saveliev, Vilory N; Vasilieva, Oksana V; Zinich, Liliia S; Pidchenko, Nadiia N; Kulichenko, Alexander N; Shipulin, German A
2016-10-01
Cholera is a water-borne, severe enteric infection essentially caused by toxigenic strains of Vibrio cholera O1 and O139 serogroups. An outbreak of cholera was registered during May-July 2011 in Mariupol, Ukraine, with 33 cholera cases and 25 carriers of cholera. Following this outbreak, the toxigenic strain of V. cholerae 2011EL-301 was isolated from seawater in the recreation area of Taganrog city on the territory of Russia. The aim of our study was to understand genomic features of Mariupol isolates as well as to evaluate hypothesis about possible interconnection between the outbreak of cholera in Mariupol and the single case of isolation of V. cholerae from the Sea of Azov in Russia. Mariupol isolates were phenotypically characterized and subsequently subjected to whole genome sequencing procedure. Phylogenetic analysis based on high-quality SNPs of V. cholera O1 El Tor isolates of the 7th pandemic clade from different regions showed that clinical and environmental isolates from Mariupol outbreak were attributable to a unique phylogenetic clade within wave 3 of V. cholera O1 El Tor isolates and characterized by six clade-specific SNPs. Whereas Taganrog isolate belonged to distantly related clade which allows us to reject the hypothesis of transmission the outbreak strain of V. cholerae O1 from Ukraine to Russia in 2011. Mariupol isolates shared a common ancestor with Haiti\\Nepal-4\\India clade indicating that outbreak progenitor strain most likely originated in the South Asia region and later was introduced to Ukraine. Moreover, genomic data both based on hqSNPs and similarity of virulence-associated mobile genomic elements of Mariupol isolates suggests that environmental and clinical isolates are a part of joint outbreak which confirms the role of contaminated domestic sewage, as an element of the complex chain of infection spread during cholera outbreak. In general, the genome-wide comparative analysis of both genes and genomic regions of epidemiological importance indicates accessory of this isolates to 'new' clone of toxigenic multiple drug resistance atypical variant of V. cholerae O1 El Tor. Copyright © 2016 Elsevier B.V. All rights reserved.
We report the draft genome sequences of six Mycobacterium immunogenum isolated from a chloraminated drinking water distribution system simulator subjected to changes in operational parameters. M. immunogenum, a rapidly growing mycobacteria previously reported as the cause of hyp...
Maintenance of genetic diversity through plant-herbivore interactions
Gloss, Andrew D.; Dittrich, Anna C. Nelson; Goldman-Huertas, Benjamin; Whiteman, Noah K.
2013-01-01
Identifying the factors governing the maintenance of genetic variation is a central challenge in evolutionary biology. New genomic data, methods and conceptual advances provide increasing evidence that balancing selection, mediated by antagonistic species interactions, maintains functionally-important genetic variation within species and natural populations. Because diverse interactions between plants and herbivorous insects dominate terrestrial communities, they provide excellent systems to address this hypothesis. Population genomic studies of Arabidopsis thaliana and its relatives suggest spatial variation in herbivory maintains adaptive genetic variation controlling defense phenotypes, both within and among populations. Conversely, inter-species variation in plant defenses promotes adaptive genetic variation in herbivores. Emerging genomic model herbivores of Arabidopsis could illuminate how genetic variation in herbivores and plants interact simultaneously. PMID:23834766
The Brain as a Distributed Intelligent Processing System: An EEG Study
da Rocha, Armando Freitas; Rocha, Fábio Theoto; Massad, Eduardo
2011-01-01
Background Various neuroimaging studies, both structural and functional, have provided support for the proposal that a distributed brain network is likely to be the neural basis of intelligence. The theory of Distributed Intelligent Processing Systems (DIPS), first developed in the field of Artificial Intelligence, was proposed to adequately model distributed neural intelligent processing. In addition, the neural efficiency hypothesis suggests that individuals with higher intelligence display more focused cortical activation during cognitive performance, resulting in lower total brain activation when compared with individuals who have lower intelligence. This may be understood as a property of the DIPS. Methodology and Principal Findings In our study, a new EEG brain mapping technique, based on the neural efficiency hypothesis and the notion of the brain as a Distributed Intelligence Processing System, was used to investigate the correlations between IQ evaluated with WAIS (Whechsler Adult Intelligence Scale) and WISC (Wechsler Intelligence Scale for Children), and the brain activity associated with visual and verbal processing, in order to test the validity of a distributed neural basis for intelligence. Conclusion The present results support these claims and the neural efficiency hypothesis. PMID:21423657
Nullomers and High Order Nullomers in Genomic Sequences
Vergni, Davide; Santoni, Daniele
2016-01-01
A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971
NASA Astrophysics Data System (ADS)
Thompson, E. M.; Hewlett, J. B.; Baise, L. G.; Vogel, R. M.
2011-01-01
Annual maximum (AM) time series are incomplete (i.e., censored) when no events are included above the assumed censoring threshold (i.e., magnitude of completeness). We introduce a distrtibutional hypothesis test for left-censored Gumbel observations based on the probability plot correlation coefficient (PPCC). Critical values of the PPCC hypothesis test statistic are computed from Monte-Carlo simulations and are a function of sample size, censoring level, and significance level. When applied to a global catalog of earthquake observations, the left-censored Gumbel PPCC tests are unable to reject the Gumbel hypothesis for 45 of 46 seismic regions. We apply four different field significance tests for combining individual tests into a collective hypothesis test. None of the field significance tests are able to reject the global hypothesis that AM earthquake magnitudes arise from a Gumbel distribution. Because the field significance levels are not conclusive, we also compute the likelihood that these field significance tests are unable to reject the Gumbel model when the samples arise from a more complex distributional alternative. A power study documents that the censored Gumbel PPCC test is unable to reject some important and viable Generalized Extreme Value (GEV) alternatives. Thus, we cannot rule out the possibility that the global AM earthquake time series could arise from a GEV distribution with a finite upper bound, also known as a reverse Weibull distribution. Our power study also indicates that the binomial and uniform field significance tests are substantially more powerful than the more commonly used Bonferonni and false discovery rate multiple comparison procedures.
Phylogenomic reconstruction supports supercontinent origins for Leishmania.
Harkins, Kelly M; Schwartz, Rachel S; Cartwright, Reed A; Stone, Anne C
2016-03-01
Leishmania, a genus of parasites transmitted to human hosts and mammalian/reptilian reservoirs by an insect vector, is the causative agent of the human disease complex leishmaniasis. The evolutionary relationships within the genus Leishmania and its origins are the source of ongoing debate, reflected in conflicting phylogenetic and biogeographic reconstructions. This study employs a recently described bioinformatics method, SISRS, to identify over 200,000 informative sites across the genome from newly sequenced and publicly available Leishmania data. This dataset is used to reconstruct the evolutionary relationships of this genus. Additionally, we constructed a large multi-gene dataset, using it to reconstruct the phylogeny and estimate divergence dates for species. We conclude that the genus Leishmania evolved at least 90-100 million years ago, supporting a modified version of the Multiple Origins hypothesis that we call the Supercontinent hypothesis. According to this scenario, separate Leishmania clades emerged prior to, and during, the breakup of Gondwana. Additionally, we confirm that reptile-infecting Leishmania are derived from mammalian forms and that the species that infect porcupines and sloths form a clade long separated from other species. Finally, we firmly place the guinea-pig infecting species, Leishmaniaenriettii, the globally dispersed Leishmaniasiamensis, and the newly identified Australian species from a kangaroo, as sibling species whose distribution arises from the ancient connection between Australia, Antarctica, and South America. Copyright © 2015 Elsevier B.V. All rights reserved.
Effects of spaceflight on the immunoglobulin repertoire of unimmunized C57BL/6 mice
NASA Astrophysics Data System (ADS)
Ward, Claire; Rettig, Trisha A.; Hlavacek, Savannah; Bye, Bailey A.; Pecaut, Michael J.; Chapes, Stephen K.
2018-02-01
Spaceflight has been shown to suppress the adaptive immune response, altering the distribution and function of lymphocyte populations. B lymphocytes express highly specific and highly diversified receptors, known as immunoglobulins (Ig), that directly bind and neutralize pathogens. Ig diversity is achieved through the enzymatic splicing of gene segments within the genomic DNA of each B cell in a host. The collection of Ig specificities within a host, or Ig repertoire, has been increasingly characterized in both basic research and clinical settings using high-throughput sequencing technology (HTS). We utilized HTS to test the hypothesis that spaceflight affects the B-cell repertoire. To test this hypothesis, we characterized the impact of spaceflight on the unimmunized Ig repertoire of C57BL/6 mice that were flown aboard the International Space Station (ISS) during the Rodent Research One validation flight in comparison to ground controls. Individual gene segment usage was similar between ground control and flight animals, however, gene segment combinations and the junctions in which gene segments combine was varied among animals within and between treatment groups. We also found that spontaneous somatic mutations in the IgH and Igκ gene loci were not increased. These data suggest that space flight did not affect the B cell repertoire of mice flown and housed on the ISS over a short period of time.
Pacheco-Arjona, Jose Ramon; Ramirez-Prado, Jorge Humberto
2014-01-01
The cell wall is a protective and versatile structure distributed in all fungi. The component responsible for its rigidity is chitin, a product of chitin synthase (Chsp) enzymes. There are seven classes of chitin synthase genes (CHS) and the amount and type encoded in fungal genomes varies considerably from one species to another. Previous Chsp sequence analyses focused on their study as individual units, regardless of genomic context. The identification of blocks of conserved genes between genomes can provide important clues about the interactions and localization of chitin synthases. On the present study, we carried out an in silico search of all putative Chsp encoded in 54 full fungal genomes, encompassing 21 orders from five phyla. Phylogenetic studies of these Chsp were able to confidently classify 347 out of the 369 Chsp identified (94%). Patterns in the distribution of Chsp related to taxonomy were identified, the most prominent being related to the type of fungal growth. More importantly, a synteny analysis for genomic blocks centered on class IV Chsp (the most abundant and widely distributed Chsp class) identified a putative cell wall metabolism gene cluster in members of the genus Aspergillus, the first such association reported for any fungal genome. PMID:25148134
The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses
Elhaik, Eran
2013-01-01
The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The “Rhineland hypothesis” depicts Eastern European Jews as a “population isolate” that emerged from a small group of German Jews who migrated eastward and expanded rapidly. Alternatively, the “Khazarian hypothesis” suggests that Eastern European Jews descended from the Khazars, an amalgam of Turkic clans that settled the Caucasus in the early centuries CE and converted to Judaism in the 8th century. Mesopotamian and Greco–Roman Jews continuously reinforced the Judaized empire until the 13th century. Following the collapse of their empire, the Judeo–Khazars fled to Eastern Europe. The rise of European Jewry is therefore explained by the contribution of the Judeo–Khazars. Thus far, however, the Khazars’ contribution has been estimated only empirically, as the absence of genome-wide data from Caucasus populations precluded testing the Khazarian hypothesis. Recent sequencing of modern Caucasus populations prompted us to revisit the Khazarian hypothesis and compare it with the Rhineland hypothesis. We applied a wide range of population genetic analyses to compare these two hypotheses. Our findings support the Khazarian hypothesis and portray the European Jewish genome as a mosaic of Near Eastern-Caucasus, European, and Semitic ancestries, thereby consolidating previous contradictory reports of Jewish ancestry. We further describe a major difference among Caucasus populations explained by the early presence of Judeans in the Southern and Central Caucasus. Our results have important implications for the demographic forces that shaped the genetic diversity in the Caucasus and for medical studies. PMID:23241444
Grosche, Christopher; Funk, Helena T.; Maier, Uwe G.; Zauner, Stefan
2012-01-01
RNA editing is a post-transcriptional process that can act upon transcripts from mitochondrial, nuclear, and chloroplast genomes. In chloroplasts, single-nucleotide conversions in mRNAs via RNA editing occur at different frequencies across the plant kingdom. These range from several hundred edited sites in some mosses and ferns to lower frequencies in seed plants and the complete lack of RNA editing in the liverwort Marchantia polymorpha. Here, we report the sequence and edited sites of the chloroplast genome from the liverwort Pellia endiviifolia. The type and frequency of chloroplast RNA editing display a pattern highly similar to that in seed plants. Analyses of the C to U conversions and the genomic context in which the editing sites are embedded provide evidence in favor of the hypothesis that chloroplast RNA editing evolved to compensate mutations in the first land plants. PMID:23221608
Incorporation of a horizontally transferred gene into an operon during cnidarian evolution.
Dana, Catherine E; Glauber, Kristine M; Chan, Titus A; Bridge, Diane M; Steele, Robert E
2012-01-01
Genome sequencing has revealed examples of horizontally transferred genes, but we still know little about how such genes are incorporated into their host genomes. We have previously reported the identification of a gene (flp) that appears to have entered the Hydra genome through horizontal transfer. Here we provide additional evidence in support of our original hypothesis that the transfer was from a unicellular organism, and we show that the transfer occurred in an ancestor of two medusozoan cnidarian species. In addition we show that the gene is part of a bicistronic operon in the Hydra genome. These findings identify a new animal phylum in which trans-spliced leader addition has led to the formation of operons, and define the requirements for evolution of an operon in Hydra. The identification of operons in Hydra also provides a tool that can be exploited in the construction of transgenic Hydra strains.
Zhang, Yunxia; Cheng, Chunyan; Li, Ji; Yang, Shuqiong; Wang, Yunzhu; Li, Ziang; Chen, Jinfeng; Lou, Qunfeng
2015-09-25
Differentiation and copy number of repetitive sequences affect directly chromosome structure which contributes to reproductive isolation and speciation. Comparative cytogenetic mapping has been verified an efficient tool to elucidate the differentiation and distribution of repetitive sequences in genome. In present study, the distinct chromosomal structures of five Cucumis species were revealed through genomic in situ hybridization (GISH) technique and comparative cytogenetic mapping of major satellite repeats. Chromosome structures of five Cucumis species were investigated using GISH and comparative mapping of specific satellites. Southern hybridization was employed to study the proliferation of satellites, whose structural characteristics were helpful for analyzing chromosome evolution. Preferential distribution of repetitive DNAs at the subtelomeric regions was found in C. sativus, C hystrix and C. metuliferus, while majority was positioned at the pericentromeric heterochromatin regions in C. melo and C. anguria. Further, comparative GISH (cGISH) through using genomic DNA of other species as probes revealed high homology of repeats between C. sativus and C. hystrix. Specific satellites including 45S rDNA, Type I/II, Type III, Type IV, CentM and telomeric repeat were then comparatively mapped in these species. Type I/II and Type IV produced bright signals at the subtelomeric regions of C. sativus and C. hystrix simultaneously, which might explain the significance of their amplification in the divergence of Cucumis subgenus from the ancient ancestor. Unique positioning of Type III and CentM only at the centromeric domains of C. sativus and C. melo, respectively, combining with unique southern bands, revealed rapid evolutionary patterns of centromeric DNA in Cucumis. Obvious interstitial telomeric repeats were observed in chromosomes 1 and 2 of C. sativus, which might provide evidence of the fusion hypothesis of chromosome evolution from x = 12 to x = 7 in Cucumis species. Besides, the significant correlation was found between gene density along chromosome and GISH band intensity in C. sativus and C. melo. In summary, comparative cytogenetic mapping of major satellites and GISH revealed the distinct differentiation of chromosome structure during species formation. The evolution of repetitive sequences was the main force for the divergence of Cucumis species from common ancestor.
The genomic structure: proof of the role of non-coding DNA.
Bouaynaya, Nidhal; Schonfeld, Dan
2006-01-01
We prove that the introns play the role of a decoy in absorbing mutations in the same way hollow uninhabited structures are used by the military to protect important installations. Our approach is based on a probability of error analysis, where errors are mutations which occur in the exon sequences. We derive the optimal exon length distribution, which minimizes the probability of error in the genome. Furthermore, to understand how can Nature generate the optimal distribution, we propose a diffusive random walk model for exon generation throughout evolution. This model results in an alpha stable exon length distribution, which is asymptotically equivalent to the optimal distribution. Experimental results show that both distributions accurately fit the real data. Given that introns also drive biological evolution by increasing the rate of unequal crossover between genes, we conclude that the role of introns is to maintain a genius balance between stability and adaptability in eukaryotic genomes.
Cancer: a reproductive strategy of "ultra-selfish" genes?
Schuiling, G A
2004-01-01
A hypothesis is presented in which the process of "malignant transformation" which ultimately results in the rapidly dividing tumor(s)(cells) causing "cancer", is regarded as an evolved reproductive strategy of "ultra-selfish" (proto-)(onco-) genes, already present in the genome, or introduced by a virus.
Data compression and genomes: a two-dimensional life domain map.
Menconi, Giulia; Benci, Vieri; Buiatti, Marcello
2008-07-21
We define the complexity of DNA sequences as the information content per nucleotide, calculated by means of some Lempel-Ziv data compression algorithm. It is possible to use the statistics of the complexity values of the functional regions of different complete genomes to distinguish among genomes of different domains of life (Archaea, Bacteria and Eukarya). We shall focus on the distribution function of the complexity of non-coding regions. We show that the three domains may be plotted in separate regions within the two-dimensional space where the axes are the skewness coefficient and the curtosis coefficient of the aforementioned distribution. Preliminary results on 15 genomes are introduced.
Parental genomes mix in mule and human cell nuclei.
Hepperger, Claudia; Mayer, Andreas; Merz, Julia; Vanderwall, Dirk K; Dietzel, Steffen
2009-06-01
Whether chromosome sets inherited from father and mother occupy separate spaces in the cell nucleus is a question first asked over 110 years ago. Recently, the nuclear organization of the genome has come increasingly into focus as an important level of epigenetic regulation. In this context, it is indispensable to know whether or not parental genomes are spatially separated. Genome separation had been demonstrated for plant hybrids and for the early mammalian embryo. Conclusive studies for somatic mammalian cell nuclei are lacking because homologous chromosomes from the two parents cannot be distinguished within a species. We circumvented this problem by investigating the three-dimensional distribution of chromosomes in mule lymphocytes and fibroblasts. Genomic DNA of horse and donkey was used as probes in fluorescence in situ hybridization under conditions where only tandem repetitive sequences were detected. We thus could determine the distribution of maternal and paternal chromosome sets in structurally preserved interphase nuclei for the first time. In addition, we investigated the distribution of several pairs of chromosomes in human bilobed granulocytes. Qualitative and quantitative image evaluation did not reveal any evidence for the separation of parental genomes. On the contrary, we observed mixing of maternal and paternal chromosome sets.
J. P. Copeland; K. S. McKelvey; K. B. Aubry; A. Landa; J. Persson; R. M. Inman; J. Krebs; E. Lofroth; H. Golden; J. R. Squires; A. Magoun; M. K. Schwartz; J. Wilmot; C. L. Copeland; R. E. Yates; I. Kojola; R. May
2010-01-01
We propose a fundamental geographic distribution for the wolverine (Gulo gulo (L., 1758)) based on the hypothesis that the occurrence of wolverines is constrained by their obligate association with persistent spring snow cover for successful reproductive denning and by an upper limit of thermoneutrality. To investigate this hypothesis, we developed a composite of MODIS...
Air method measurements of apple vessel length distributions with improved apparatus and theory
Shabtal Cohen; John Bennink; Mel Tyree
2003-01-01
Studies showing that rootstock dwarfing potential is related to plant hydraulic conductance led to the hypothesis that xylem properties are also related. Vessel length distribution and other properties of apple wood from a series of varieties were measured using the 'air method' in order to test this hypothesis. Apparatus was built to measure and monitor...
ePIANNO: ePIgenomics ANNOtation tool.
Liu, Chia-Hsin; Ho, Bing-Ching; Chen, Chun-Ling; Chang, Ya-Hsuan; Hsu, Yi-Chiung; Li, Yu-Cheng; Yuan, Shin-Sheng; Huang, Yi-Huan; Chang, Chi-Sheng; Li, Ker-Chau; Chen, Hsuan-Yu
2016-01-01
Recently, with the development of next generation sequencing (NGS), the combination of chromatin immunoprecipitation (ChIP) and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html). ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project) and gene-disease association information of GWAS (NHGRI) with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics) data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies.
Genomics and proteomics in liver fibrosis and cirrhosis
2012-01-01
Genomics and proteomics have become increasingly important in biomedical science in the past decade, as they provide an opportunity for hypothesis-free experiments that can yield major insights not previously foreseen when scientific and clinical questions are based only on hypothesis-driven approaches. Use of these tools, therefore, opens new avenues for uncovering physiological and pathological pathways. Liver fibrosis is a complex disease provoked by a range of chronic injuries to the liver, among which are viral hepatitis, (non-) alcoholic steatohepatitis and autoimmune disorders. Some chronic liver patients will never develop fibrosis or cirrhosis, whereas others rapidly progress towards cirrhosis in a few years. This variety can be caused by disease-related factors (for example, viral genotype) or host-factors (genetic/epigenetic). It is vital to establish accurate tools to identify those patients at highest risk for disease severity or progression in order to determine who are in need of immediate therapies. Moreover, there is an urgent imperative to identify non-invasive markers that can accurately distinguish mild and intermediate stages of fibrosis. Ideally, biomarkers can be used to predict disease progression and treatment response, but these studies will take many years due to the requirement for lengthy follow-up periods to assess outcomes. Current genomic and proteomic research provides many candidate biomarkers, but independent validation of these biomarkers is lacking, and reproducibility is still a key concern. Thus, great opportunities and challenges lie ahead in the field of genomics and proteomics, which, if successful, could transform the diagnosis and treatment of chronic fibrosing liver diseases. PMID:22214245
Evidence-based management of nutrigenomics expectations and ELSIs.
Ozdemir, Vural; Godard, Béatrice
2007-08-01
Nutrigenomics is a new application context for genomics technologies that focuses on the bidirectional study of genetic factors influencing host (individuals' or populations') response to diet and the effects of bioactive constituents in food on host genome and gene expression. Nutrigenomics is considered the next wave after pharmacogenomics for individualization of health interventions. However, relatively little attention has been given to the specific ethical-legal-social issues (ELSIs) and sociotechnical expectations raised by nutrigenomics research. Some of the ELSIs, such as ensuring privacy of genetic information and implications of genetic testing for health insurance and employment, may be shared across the continuum of genomic technology applications in human disease genetics, pharmacogenomics and nutrigenomics. However, there are certain aspects of nutrigenomics research that may result in unique or unprecedented ELSIs. For example, nutrigenomics has a strong focus on public health and the prevention/modification of 'predisease phenotypes' in apparently healthy individuals. Thus, in contrast to previous applications of genomics technologies, where the goal is to distinguish existing disease from absence of disease, the aim of nutrigenomics is the discernment of nuanced differences in predisease states. Moreover, there is evidence to suggest that ELSIs may be different in biomarker discovery, translational research and clinical testing stages of nutrigenomics. Ideally, ELSI research and nutrigenomics bioscience should progress in parallel and in a commensurate manner. We suggest that qualitative research methods, using a hypothesis-free approach, can be employed to gain deeper insights on complex bioethics issues that do not ordinarily lend themselves to formal hypothesis testing with the quantitative methods used in biomedical sciences.
Mitochondrial genetic codes evolve to match amino acid requirements of proteins.
Swire, Jonathan; Judson, Olivia P; Burt, Austin
2005-01-01
Mitochondria often use genetic codes different from the standard genetic code. Now that many mitochondrial genomes have been sequenced, these variant codes provide the first opportunity to examine empirically the processes that produce new genetic codes. The key question is: Are codon reassignments the sole result of mutation and genetic drift? Or are they the result of natural selection? Here we present an analysis of 24 phylogenetically independent codon reassignments in mitochondria. Although the mutation-drift hypothesis can explain reassignments from stop to an amino acid, we found that it cannot explain reassignments from one amino acid to another. In particular--and contrary to the predictions of the mutation-drift hypothesis--the codon involved in such a reassignment was not rare in the ancestral genome. Instead, such reassignments appear to take place while the codon is in use at an appreciable frequency. Moreover, the comparison of inferred amino acid usage in the ancestral genome with the neutral expectation shows that the amino acid gaining the codon was selectively favored over the amino acid losing the codon. These results are consistent with a simple model of weak selection on the amino acid composition of proteins in which codon reassignments are selected because they compensate for multiple slightly deleterious mutations throughout the mitochondrial genome. We propose that the selection pressure is for reduced protein synthesis cost: most reassignments give amino acids that are less expensive to synthesize. Taken together, our results strongly suggest that mitochondrial genetic codes evolve to match the amino acid requirements of proteins.
Evolution Models with Conditional Mutation Rates: Strange Plateaus in Population Distribution
NASA Astrophysics Data System (ADS)
Saakian, David B.
2017-08-01
Cancer is related to clonal evolution with a strongly nonlinear, collective behavior. Here we investigate a slightly advanced version of the popular Crow-Kimura evolution model, suggested recently, by simply assuming a conditional mutation rate. We investigated the steady-state solution and found a highly intriguing plateau in the distribution. There are selective and nonselective phases, with a rather narrow plateau in the distribution at the peak in the first phase, and a wide plateau for many Hamming classes (a collection of genomes with the same number of mutations from the reference genome) in the second phase. We analytically solved the steady state distribution in the selective and nonselective phases, calculating the widths of the plateaus. Numerically, we also found an intermediate phase with several plateaus in the steady-state distribution, related to large finite-genome-length corrections. We assume that the newly observed phenomena should exist in other versions of evolution dynamics when the parameters of the model are conditioned to the population distribution.
Mastrorilli, Eleonora; Pietrucci, Daniele; Barco, Lisa; Ammendola, Serena; Petrin, Sara; Longo, Alessandra; Mantovani, Claudio; Battistoni, Andrea; Ricci, Antonia; Desideri, Alessandro; Losasso, Carmen
2018-01-01
Over the past decades, Salmonella 4,[5],12:i:- has rapidly emerged and it is isolated with high frequency in the swine food chain. Although many studies have documented the epidemiological success of this serovar, few investigations have tried to explain this phenomenon from a genetic perspective. Here a comparative whole-genome analysis of 50 epidemiologically unrelated S. 4,[5],12:i:-, isolated in Italy from 2010 to 2016 was performed, characterizing them in terms of genetic elements potentially conferring resistance, tolerance and persistence characteristics. Phylogenetic analyses indicated interesting distinctions among the investigated isolates. The most striking genetic trait characterizing the analyzed isolates is the widespread presence of heavy metals tolerance gene cassettes: most of the strains possess genes expected to confer resistance to copper and silver, whereas about half of the isolates also contain the mercury tolerance gene merA. A functional assay showed that these genes might be useful for preventing the toxic effects of metals, thus supporting the hypothesis that they can contribute to the success of S. 4,[5],12:i:- in farming environments. In addition, the analysis of the distribution of type II toxin-antitoxin families indicated that these elements are abundant in this serovar, suggesting that this is another factor that might favor its successful spread. PMID:29719530
Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter
2014-01-13
Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Applying ecological models to communities of genetic elements: the case of neutral theory.
Linquist, Stefan; Cottenie, Karl; Elliott, Tyler A; Saylor, Brent; Kremer, Stefan C; Gregory, T Ryan
2015-07-01
A promising recent development in molecular biology involves viewing the genome as a mini-ecosystem, where genetic elements are compared to organisms and the surrounding cellular and genomic structures are regarded as the local environment. Here, we critically evaluate the prospects of ecological neutral theory (ENT), a popular model in ecology, as it applies at the genomic level. This assessment requires an overview of the controversy surrounding neutral models in community ecology. In particular, we discuss the limitations of using ENT both as an explanation of community dynamics and as a null hypothesis. We then analyse a case study in which ENT has been applied to genomic data. Our central finding is that genetic elements do not conform to the requirements of ENT once its assumptions and limitations are made explicit. We further compare this genome-level application of ENT to two other, more familiar approaches in genomics that rely on neutral mechanisms: Kimura's molecular neutral theory and Lynch's mutational-hazard model. Interestingly, this comparison reveals that there are two distinct concepts of neutrality associated with these models, which we dub 'fitness neutrality' and 'competitive neutrality'. This distinction helps to clarify the various roles for neutral models in genomics, for example in explaining the evolution of genome size. © 2015 John Wiley & Sons Ltd.
NCI is establishing the Genomic Data Commons to store, analyze and distribute cancer genomics data generated by NCI and other research organizations. The GDC will provide an interactive system for researchers to access data, with the goal of advancing the
Dos Santos, Sandra; Bardet, Claire; Bertrand, Stephanie; Escriva, Hector; Habert, Damien; Querat, Bruno
2009-08-01
The vertebrate glycoprotein hormones (GpHs), gonadotropins and thyrotropin, are heterodimers composed of a common alpha- and specific beta-subunit. The recombinant heterodimer of two additional, structurally related proteins identified in vertebrate and protostome genomes, the glycoproteins-alpha2 (GPA2) and-beta5 (GPB5), was shown to activate the thyrotropin receptor and was therefore named thyrostimulin. However, differences in tissue distribution and expression levels of these proteins suggested that they might act as nonassociated factors, prompting further investigation on these proteins. In this study we show that GPA2 and GPB5 appeared with the emergence of bilateria and were maintained in most groups. These genes are tightly associated at the genomic level, an association, however, lost in tetrapods. Our structural and genomic environment comparison reinforces the hypothesis of their phylogenetic relationships with GpH-alpha and -beta. In contrast, the glycosylation status of GPA2 and GPB5 is highly variable further questioning heterodimer secretory efficiency and activity. As a first step toward understanding their function, we investigated the spatiotemporal expression of GPA2 and GPB5 genes at different developmental stages in a basal chordate, the amphioxus. Expression of GPB5 was essentially ubiquitous with an anteroposterior gradient in embryos. GPA2 embryonic and larvae expression was restricted to specific areas and, interestingly, partially overlapped that of a GpH receptor-related gene. In conclusion, we speculate that GPA2 and GPB5 have nondispensable and coordinated functions related to a novelty appeared with bilateria. These proteins would be active during embryonic development in a manner that does not require their heterodimerization.
Lada, Artem G.; Stepchenkova, Elena I.; Waisertreiger, Irina S. R.; Noskov, Vladimir N.; Dhar, Alok; Eudy, James D.; Boissy, Robert J.; Hirano, Masayuki; Rogozin, Igor B.; Pavlov, Youri I.
2013-01-01
Genetic information should be accurately transmitted from cell to cell; conversely, the adaptation in evolution and disease is fueled by mutations. In the case of cancer development, multiple genetic changes happen in somatic diploid cells. Most classic studies of the molecular mechanisms of mutagenesis have been performed in haploids. We demonstrate that the parameters of the mutation process are different in diploid cell populations. The genomes of drug-resistant mutants induced in yeast diploids by base analog 6-hydroxylaminopurine (HAP) or AID/APOBEC cytosine deaminase PmCDA1 from lamprey carried a stunning load of thousands of unselected mutations. Haploid mutants contained almost an order of magnitude fewer mutations. To explain this, we propose that the distribution of induced mutation rates in the cell population is uneven. The mutants in diploids with coincidental mutations in the two copies of the reporter gene arise from a fraction of cells that are transiently hypersensitive to the mutagenic action of a given mutagen. The progeny of such cells were never recovered in haploids due to the lethality caused by the inactivation of single-copy essential genes in cells with too many induced mutations. In diploid cells, the progeny of hypersensitive cells survived, but their genomes were saturated by heterozygous mutations. The reason for the hypermutability of cells could be transient faults of the mutation prevention pathways, like sanitization of nucleotide pools for HAP or an elevated expression of the PmCDA1 gene or the temporary inability of the destruction of the deaminase. The hypothesis on spikes of mutability may explain the sudden acquisition of multiple mutational changes during evolution and carcinogenesis. PMID:24039593
Ma, Hongying; Wu, Yajiang; Xiang, Hai; Yang, Yunzhou; Wang, Min; Zhao, Chunjiang; Wu, Changxin
2018-01-01
There are large populations of indigenous horse ( Equus caballus ) in China and some other parts of East Asia. However, their matrilineal genetic diversity and origin remained poorly understood. Using a combination of mitochondrial DNA (mtDNA) and hypervariable region (HVR-1) sequences, we aim to investigate the origin of matrilineal inheritance in these domestic horses. To investigate patterns of matrilineal inheritance in domestic horses, we conducted a phylogenetic study using 31 de novo mtDNA genomes together with 317 others from the GenBank. In terms of the updated phylogeny, a total of 5,180 horse mitochondrial HVR-1 sequences were analyzed. Eightteen haplogroups (Aw-Rw) were uncovered from the analysis of the whole mitochondrial genomes. Most of which have a divergence time before the earliest domestication of wild horses (about 5,800 years ago) and during the Upper Paleolithic (35-10 KYA). The distribution of some haplogroups shows geographic patterns. The Lw haplogroup contained a significantly higher proportion of European horses than the horses from other regions, while haplogroups Jw, Rw, and some maternal lineages of Cw, have a higher frequency in the horses from East Asia. The 5,180 sequences of horse mitochondrial HVR-1 form nine major haplogroups (A-I). We revealed a corresponding relationship between the haplotypes of HVR-1 and those of whole mitochondrial DNA sequences. The data of the HVR-1 sequences also suggests that Jw, Rw, and some haplotypes of Cw may have originated in East Asia while Lw probably formed in Europe. Our study supports the hypothesis of the multiple origins of the maternal lineage of domestic horses and some maternal lineages of domestic horses may have originated from East Asia.
Krak, Karol; Vít, Petr; Belyayev, Alexander; Douda, Jan; Hreusová, Lucia; Mandák, Bohumil
2016-01-01
Reticulate evolution is characterized by occasional hybridization between two species, creating a network of closely related taxa below and at the species level. In the present research, we aimed to verify the hypothesis of the allopolyploid origin of hexaploid C. album s. str., identify its putative parents and estimate the frequency of allopolyploidization events. We sampled 122 individuals of the C. album aggregate, covering most of its distribution range in Eurasia. Our samples included putative progenitors of C. album s. str. of both ploidy levels, i.e. diploids (C. ficifolium, C. suecicum) and tetraploids (C. striatiforme, C. strictum). To fulfil these objectives, we analysed sequence variation in the nrDNA ITS region and the rpl32-trnL intergenic spacer of cpDNA and performed genomic in-situ hybridization (GISH). Our study confirms the allohexaploid origin of C. album s. str. Analysis of cpDNA revealed tetraploids as the maternal species. In most accessions of hexaploid C. album s. str., ITS sequences were completely or nearly completely homogenized towards the tetraploid maternal ribotype; a tetraploid species therefore served as one genome donor. GISH revealed a strong hybridization signal on the same eighteen chromosomes of C. album s. str. with both diploid species C. ficifolium and C. suecicum. The second genome donor was therefore a diploid species. Moreover, some individuals with completely unhomogenized ITS sequences were found. Thus, hexaploid individuals of C. album s. str. with ITS sequences homogenized to different degrees may represent hybrids of different ages. This proves the existence of at least two different allopolyploid lineages, indicating a polyphyletic origin of C. album s. str. PMID:27513342
Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae
Huang, Yuan; Wang, Jun; Yang, Yongping; Fan, Chuanzhu; Chen, Jiahui
2017-01-01
Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs) and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in Salicaceae provide resources to better understand the successful adaptation of Salicaceae species. PMID:28676809
Functional and topological characteristics of mammalian regulatory domains
Symmons, Orsolya; Uslu, Veli Vural; Tsujimura, Taro; Ruf, Sandra; Nassari, Sonya; Schwarzer, Wibke; Ettwiller, Laurence; Spitz, François
2014-01-01
Long-range regulatory interactions play an important role in shaping gene-expression programs. However, the genomic features that organize these activities are still poorly characterized. We conducted a large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor. We found that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains. Remarkably, these domains correlate strongly with the recently described TADs, which partition the genome into distinct self-interacting blocks. Different features, including specific repeats and CTCF-binding sites, correlate with the transition zones separating regulatory domains, and may help to further organize promiscuously distributed regulatory influences within large domains. These findings support a model of genomic organization where TADs confine regulatory activities to specific but large regulatory domains, contributing to the establishment of specific gene expression profiles. PMID:24398455
The topography of mutational processes in breast cancer genomes
Morganella, Sandro; Alexandrov, Ludmil B.; Glodzik, Dominik; ...
2016-01-01
Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription,more » DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Lastly, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis.« less
Xu, Zhanyou; Yu, Jing; Kohel, Russell J; Percy, Richard G; Beavis, William D; Main, Dorrie; Yu, John Z
2015-07-01
Cotton fiber represents the largest single cell in plants and they serve as models to study cell development. This study investigated the distribution and evolution of fiber Unigenes anchored to recombination hotspots between tetraploid cotton (Gossypium hirsutum) At and Dt subgenomes, and within a parental diploid cotton (Gossypium raimondii) D genome. Comparative analysis of At vs D and Dt vs D showed that 1) the D genome provides many fiber genes after its merger with another parental diploid cotton (Gossypium arboreum) A genome although the D genome itself does not produce any spinnable fiber; 2) similarity of fiber genes is higher between At vs D than between Dt vs D genomic hotspots. This is the first report that fiber genes have higher similarity between At and D than between Dt and D. The finding provides new insights into cotton genomic regions that would facilitate genetic improvement of natural fiber properties. Published by Elsevier Inc.
Are there laws of genome evolution?
Koonin, Eugene V
2011-08-01
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.
2011-01-01
Background Although many biological databases are applying semantic web technologies, meaningful biological hypothesis testing cannot be easily achieved. Database-driven high throughput genomic hypothesis testing requires both of the capabilities of obtaining semantically relevant experimental data and of performing relevant statistical testing for the retrieved data. Tissue Microarray (TMA) data are semantically rich and contains many biologically important hypotheses waiting for high throughput conclusions. Methods An application-specific ontology was developed for managing TMA and DNA microarray databases by semantic web technologies. Data were represented as Resource Description Framework (RDF) according to the framework of the ontology. Applications for hypothesis testing (Xperanto-RDF) for TMA data were designed and implemented by (1) formulating the syntactic and semantic structures of the hypotheses derived from TMA experiments, (2) formulating SPARQLs to reflect the semantic structures of the hypotheses, and (3) performing statistical test with the result sets returned by the SPARQLs. Results When a user designs a hypothesis in Xperanto-RDF and submits it, the hypothesis can be tested against TMA experimental data stored in Xperanto-RDF. When we evaluated four previously validated hypotheses as an illustration, all the hypotheses were supported by Xperanto-RDF. Conclusions We demonstrated the utility of high throughput biological hypothesis testing. We believe that preliminary investigation before performing highly controlled experiment can be benefited. PMID:21342584
Neurogenomics and the role of a large mutational target on rapid behavioral change.
Stanley, Craig E; Kulathinal, Rob J
2016-11-08
Behavior, while complex and dynamic, is among the most diverse, derived, and rapidly evolving traits in animals. The highly labile nature of heritable behavioral change is observed in such evolutionary phenomena as the emergence of converged behaviors in domesticated animals, the rapid evolution of preferences, and the routine development of ethological isolation between diverging populations and species. In fact, it is believed that nervous system development and its potential to evolve a seemingly infinite array of behavioral innovations played a major role in the successful diversification of metazoans, including our own human lineage. However, unlike other rapidly evolving functional systems such as sperm-egg interactions and immune defense, the genetic basis of rapid behavioral change remains elusive. Here we propose that the rapid divergence and widespread novelty of innate and adaptive behavior is primarily a function of its genomic architecture. Specifically, we hypothesize that the broad diversity of behavioral phenotypes present at micro- and macroevolutionary scales is promoted by a disproportionately large mutational target of neurogenic genes. We present evidence that these large neuro-behavioral targets are significant and ubiquitous in animal genomes and suggest that behavior's novelty and rapid emergence are driven by a number of factors including more selection on a larger pool of variants, a greater role of phenotypic plasticity, and/or unique molecular features present in large genes. We briefly discuss the origins of these large neurogenic genes, as they relate to the remarkable diversity of metazoan behaviors, and highlight key consequences on both behavioral traits and neurogenic disease across, respectively, evolutionary and ontogenetic time scales. Current approaches to studying the genetic mechanisms underlying rapid phenotypic change primarily focus on identifying signatures of Darwinian selection in protein-coding regions. In contrast, the large mutational target hypothesis places genomic architecture and a larger allelic pool at the forefront of rapid evolutionary change, particularly in genetic systems that are polygenic and regulatory in nature. Genomic data from brain and neural tissues in mammals as well as a preliminary survey of neurogenic genes from comparative genomic data support this hypothesis while rejecting both positive and relaxed selection on proteins or higher mutation rates. In mammals and invertebrates, neurogenic genes harbor larger protein-coding regions and possess a richer regulatory repertoire of miRNA targets and transcription factor binding sites. Overall, neurogenic genes cover a disproportionately large genomic fraction, providing a sizeable substrate for evolutionary, genetic, and molecular mechanisms to act upon. Readily available comparative and functional genomic data provide unexplored opportunities to test whether a distinct neurogenomic architecture can promote rapid behavioral change via several mechanisms unique to large genes, and which components of this large footprint are uniquely metazoan. The large mutational target hypothesis highlights the eminent roles of mutation and functional genomic architecture in generating rapid developmental and evolutionary change. It has broad implications on our understanding of the genetics of complex adaptive traits such as behavior by focusing on the importance of mutational input, from SNPs to alternative transcripts to transposable elements, on driving evolutionary rates of functional systems. Such functional divergence has important implications in promoting behavioral isolation across short- and long-term timescales. Due to genome-scaled polygenic adaptation, the large target effect also contributes to our inability to identify adapted behavioral candidate genes. The presence of large neurogenic genes, particularly in the mammalian brain and other neural tissues, further offers emerging insight into the etiology of neurodevelopmental and neurodegenerative diseases. The well-known correlation between neurological spectrum disorders in children and paternal age may simply be a direct result of aging fathers accumulating mutations across these large neurodevelopmental genes. The large mutational target hypothesis can also explain the rapid evolution of other functional systems covering a large genomic fraction such as male fertility and its preferential association with hybrid male sterility among closely related taxa. Overall, a focus on mutational potential may increase our power in understanding the genetic basis of complex phenotypes such as behavior while filling a general gap in understanding their evolution.
Guo, Hongyu; Pennings, Steven C
2012-01-01
Understanding of how plant communities are organized and will respond to global changes requires an understanding of how plant species respond to multiple environmental gradients. We examined the mechanisms mediating the distribution patterns of tidal marsh plants along an estuarine gradient in Georgia (USA) using a combination of field transplant experiments and monitoring. Our results could not be fully explained by the "competition-to-stress hypothesis" (the current paradigm explaining plant distributions across estuarine landscapes). This hypothesis states that the upstream limits of plant distributions are determined by competition, and the downstream limits by abiotic stress. We found that competition was generally strong in freshwater and brackish marshes, and that conditions in brackish and salt marshes were stressful to freshwater marsh plants, results consistent with the competition-to-stress hypothesis. Four other aspects of our results, however, were not explained by the competition-to-stress hypothesis. First, several halophytes found the freshwater habitat stressful and performed best (in the absence of competition) in brackish or salt marshes. Second, the upstream distribution of one species was determined by the combination of both abiotic and biotic (competition) factors. Third, marsh productivity (estimated by standing biomass) was a better predictor of relative biotic interaction intensity (RII) than was salinity or flooding, suggesting that productivity is a better indicator of plant stress than salinity or flooding gradients. Fourth, facilitation played a role in mediating the distribution patterns of some plants. Our results illustrate that even apparently simple abiotic gradients can encompass surprisingly complex processes mediating plant distributions.
Liu, Ruijuan; Wang, Richard R-C; Yu, Feng; Lu, Xingwang; Dou, Quanwen
2017-08-01
Genomes of ten species of Elymus, either presumed or known as tetraploid StY, were characterized using fluorescence in situ hybridization (FISH) and genomic in situ hybridization (GISH). These tetraploid species could be grouped into three categories. Type I included StY genome reported species-Roegneria pendulina, R. nutans, R. glaberrima, R. ciliaris, and Elymus nevskii, and StY genome presumed species-R. sinica, R. breviglumis, and R. dura, whose genome could be separated into two sets based on different GISH intensities. Type I genome constitution was deemed as putative StY. The St genome were mainly characterized with intense hybridization with pAs1, fewer AAG sites, and linked distribution of 5S rDNA and 18S-26S rDNA, while the Y genome with less intense hybridization with pAs1, more varied AAG sites, and isolated distribution of 5S rDNA and 18S-26S rDNA. Nevertheless, further genomic variations were detected among the different StY species. Type II included E. alashanicus, whose genome could be easily separated based on GISH pattern. FISH and GISH patterns suggested that E. alashanicus comprised a modified St genome and an unknown genome. Type III included E. longearistatus, whose genome could not be separated by GISH and was designated as St l Y l . Notably, a close relationship between S l and Y l genomes was observed.
Universal scaling for polymer chain scission in turbulence
Vanapalli, Siva A.; Ceccio, Steven L.; Solomon, Michael J.
2006-01-01
We report that previous polymer chain scission experiments in strong flows, long analyzed according to accepted laminar flow scission theories, were in fact affected by turbulence. We reconcile existing anomalies between theory and experiment with the hypothesis that the local stress at the Kolmogorov scale generates the molecular tension leading to polymer covalent bond breakage. The hypothesis yields a universal scaling for polymer scission in turbulent flows. This surprising reassessment of over 40 years of experimental data simplifies the theoretical picture of polymer dynamics leading to scission and allows control of scission in commercial polymers and genomic DNA. PMID:17075043
Fullerton, Heather; Hager, Kevin W; McAllister, Sean M; Moyer, Craig L
2017-08-01
The Zetaproteobacteria are ubiquitous in marine environments, yet this class of Proteobacteria is only represented by a few closely-related cultured isolates. In high-iron environments, such as diffuse hydrothermal vents, the Zetaproteobacteria are important members of the community driving its structure. Biogeography of Zetaproteobacteria has shown two ubiquitous operational taxonomic units (OTUs), yet much is unknown about their genomic diversity. Genome-resolved metagenomics allows for the specific binning of microbial genomes based on genomic signatures present in composite metagenome assemblies. This resulted in the recovery of 93 genome bins, of which 34 were classified as Zetaproteobacteria. Form II ribulose 1,5-bisphosphate carboxylase genes were recovered from nearly all the Zetaproteobacteria genome bins. In addition, the Zetaproteobacteria genome bins contain genes for uptake and utilization of bioavailable nitrogen, detoxification of arsenic, and a terminal electron acceptor adapted for low oxygen concentration. Our results also support the hypothesis of a Cyc2-like protein as the site for iron oxidation, now detected across a majority of the Zetaproteobacteria genome bins. Whole genome comparisons showed a high genomic diversity across the Zetaproteobacteria OTUs and genome bins that were previously unidentified by SSU rRNA gene analysis. A single lineage of cosmopolitan Zetaproteobacteria (zOTU 2) was found to be monophyletic, based on cluster analysis of average nucleotide identity and average amino acid identity comparisons. From these data, we can begin to pinpoint genomic adaptations of the more ecologically ubiquitous Zetaproteobacteria, and further understand their environmental constraints and metabolic potential.
Secure distributed genome analysis for GWAS and sequence comparison computation.
Zhang, Yihua; Blanton, Marina; Almashaqbeh, Ghada
2015-01-01
The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice.
Secure distributed genome analysis for GWAS and sequence comparison computation
2015-01-01
Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307
Effect of Artificial Selection on Runs of Homozygosity in U.S. Holstein Cattle
Kim, Eui-Soo; Cole, John B.; Huson, Heather; Wiggans, George R.; Van Tassell, Curtis P.; Crooker, Brian A.; Liu, George; Da, Yang; Sonstegard, Tad S.
2013-01-01
The intensive selection programs for milk made possible by mass artificial insemination increased the similarity among the genomes of North American (NA) Holsteins tremendously since the 1960s. This migration of elite alleles has caused certain regions of the genome to have runs of homozygosity (ROH) occasionally spanning millions of continuous base pairs at a specific locus. In this study, genome signatures of artificial selection in NA Holsteins born between 1953 and 2008 were identified by comparing changes in ROH between three distinct groups under different selective pressure for milk production. The ROH regions were also used to estimate the inbreeding coefficients. The comparisons of genomic autozygosity between groups selected or unselected since 1964 for milk production revealed significant differences with respect to overall ROH frequency and distribution. These results indicate selection has increased overall autozygosity across the genome, whereas the autozygosity in an unselected line has not changed significantly across most of the chromosomes. In addition, ROH distribution was more variable across the genomes of selected animals in comparison to a more even ROH distribution for unselected animals. Further analysis of genome-wide autozygosity changes and the association between traits and haplotypes identified more than 40 genomic regions under selection on several chromosomes (Chr) including Chr 2, 7, 16 and 20. Many of these selection signatures corresponded to quantitative trait loci for milk, fat, and protein yield previously found in contemporary Holsteins. PMID:24348915
Universal features in the genome-level evolution of protein domains.
Cosentino Lagomarsino, Marco; Sellerio, Alessandro L; Heijning, Philip D; Bassetti, Bruno
2009-01-01
Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features. We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.
Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias
2011-01-01
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Distribution and diversity of cytotypes in Dianthus broteri as evidenced by genome size variations.
Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador
2009-10-01
Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1.17-fold) and tetraploid (1.22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones.
Within-Genome Evolution of REPINs: a New Family of Miniature Mobile DNA in Bacteria
Bertels, Frederic; Rainey, Paul B.
2011-01-01
Repetitive sequences are a conserved feature of many bacterial genomes. While first reported almost thirty years ago, and frequently exploited for genotyping purposes, little is known about their origin, maintenance, or processes affecting the dynamics of within-genome evolution. Here, beginning with analysis of the diversity and abundance of short oligonucleotide sequences in the genome of Pseudomonas fluorescens SBW25, we show that over-represented short sequences define three distinct groups (GI, GII, and GIII) of repetitive extragenic palindromic (REP) sequences. Patterns of REP distribution suggest that closely linked REP sequences form a functional replicative unit: REP doublets are over-represented, randomly distributed in extragenic space, and more highly conserved than singlets. In addition, doublets are organized as inverted repeats, which together with intervening spacer sequences are predicted to form hairpin structures in ssDNA or mRNA. We refer to these newly defined entities as REPINs (REP doublets forming hairpins) and identify short reads from population sequencing that reveal putative transposition intermediates. The proximal relationship between GI, GII, and GIII REPINs and specific REP-associated tyrosine transposases (RAYTs), combined with features of the putative transposition intermediate, suggests a mechanism for within-genome dissemination. Analysis of the distribution of REPs in a range of RAYT–containing bacterial genomes, including Escherichia coli K-12 and Nostoc punctiforme, show that REPINs are a widely distributed, but hitherto unrecognized, family of miniature non-autonomous mobile DNA. PMID:21698139
Causes of genome instability: the effect of low dose chemical exposures in modern society
Langie, Sabine A.S.; Koppen, Gudrun; Desaulniers, Daniel; Al-Mulla, Fahd; Al-Temaimi, Rabeah; Amedei, Amedeo; Azqueta, Amaya; Bisson, William H.; Brown, Dustin; Brunborg, Gunnar; Charles, Amelia K.; Chen, Tao; Colacci, Annamaria; Darroudi, Firouz; Forte, Stefano; Gonzalez, Laetitia; Hamid, Roslida A.; Knudsen, Lisbeth E.; Leyns, Luc; Lopez de Cerain Salsamendi, Adela; Memeo, Lorenzo; Mondello, Chiara; Mothersill, Carmel; Olsen, Ann-Karin; Pavanello, Sofia; Raju, Jayadev; Rojas, Emilio; Roy, Rabindra; Ryan, Elizabeth; Ostrosky-Wegman, Patricia; Salem, Hosni K.; Scovassi, Ivana; Singh, Neetu; Vaccari, Monica; Van Schooten, Frederik J.; Valverde, Mahara; Woodrick, Jordan; Zhang, Luoping; van Larebeke, Nik; Kirsch-Volders, Micheline; Collins, Andrew R.
2015-01-01
Genome instability is a prerequisite for the development of cancer. It occurs when genome maintenance systems fail to safeguard the genome’s integrity, whether as a consequence of inherited defects or induced via exposure to environmental agents (chemicals, biological agents and radiation). Thus, genome instability can be defined as an enhanced tendency for the genome to acquire mutations; ranging from changes to the nucleotide sequence to chromosomal gain, rearrangements or loss. This review raises the hypothesis that in addition to known human carcinogens, exposure to low dose of other chemicals present in our modern society could contribute to carcinogenesis by indirectly affecting genome stability. The selected chemicals with their mechanisms of action proposed to indirectly contribute to genome instability are: heavy metals (DNA repair, epigenetic modification, DNA damage signaling, telomere length), acrylamide (DNA repair, chromosome segregation), bisphenol A (epigenetic modification, DNA damage signaling, mitochondrial function, chromosome segregation), benomyl (chromosome segregation), quinones (epigenetic modification) and nano-sized particles (epigenetic pathways, mitochondrial function, chromosome segregation, telomere length). The purpose of this review is to describe the crucial aspects of genome instability, to outline the ways in which environmental chemicals can affect this cancer hallmark and to identify candidate chemicals for further study. The overall aim is to make scientists aware of the increasing need to unravel the underlying mechanisms via which chemicals at low doses can induce genome instability and thus promote carcinogenesis. PMID:26106144
van de Guchte, M; Penaud, S; Grimaldi, C; Barbe, V; Bryson, K; Nicolas, P; Robert, C; Oztas, S; Mangenot, S; Couloux, A; Loux, V; Dervyn, R; Bossy, R; Bolotin, A; Batto, J-M; Walunas, T; Gibrat, J-F; Bessières, P; Weissenbach, J; Ehrlich, S D; Maguin, E
2006-06-13
Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is a representative of the group of lactic acid-producing bacteria, mainly known for its worldwide application in yogurt production. The genome sequence of this bacterium has been determined and shows the signs of ongoing specialization, with a substantial number of pseudogenes and incomplete metabolic pathways and relatively few regulatory functions. Several unique features of the L. bulgaricus genome support the hypothesis that the genome is in a phase of rapid evolution. (i) Exceptionally high numbers of rRNA and tRNA genes with regard to genome size may indicate that the L. bulgaricus genome has known a recent phase of important size reduction, in agreement with the observed high frequency of gene inactivation and elimination; (ii) a much higher GC content at codon position 3 than expected on the basis of the overall GC content suggests that the composition of the genome is evolving toward a higher GC content; and (iii) the presence of a 47.5-kbp inverted repeat in the replication termination region, an extremely rare feature in bacterial genomes, may be interpreted as a transient stage in genome evolution. The results indicate the adaptation of L. bulgaricus from a plant-associated habitat to the stable protein and lactose-rich milk environment through the loss of superfluous functions and protocooperation with Streptococcus thermophilus.
Fritsche, Lars G.; Igl, Wilmar; Cooke Bailey, Jessica N.; Grassmann, Felix; Sengupta, Sebanti; Bragg-Gresham, Jennifer L.; Burdon, Kathryn P.; Hebbring, Scott J.; Wen, Cindy; Gorski, Mathias; Kim, Ivana K.; Cho, David; Zack, Donald; Souied, Eric; Scholl, Hendrik P. N.; Bala, Elisa; Lee, Kristine E.; Hunter, David J.; Sardell, Rebecca J.; Mitchell, Paul; Merriam, Joanna E.; Cipriani, Valentina; Hoffman, Joshua D.; Schick, Tina; Lechanteur, Yara T. E.; Guymer, Robyn H.; Johnson, Matthew P.; Jiang, Yingda; Stanton, Chloe M.; Buitendijk, Gabriëlle H. S.; Zhan, Xiaowei; Kwong, Alan M.; Boleda, Alexis; Brooks, Matthew; Gieser, Linn; Ratnapriya, Rinki; Branham, Kari E.; Foerster, Johanna R.; Heckenlively, John R.; Othman, Mohammad I.; Vote, Brendan J.; Liang, Helena Hai; Souzeau, Emmanuelle; McAllister, Ian L.; Isaacs, Timothy; Hall, Janette; Lake, Stewart; Mackey, David A.; Constable, Ian J.; Craig, Jamie E.; Kitchner, Terrie E.; Yang, Zhenglin; Su, Zhiguang; Luo, Hongrong; Chen, Daniel; Ouyang, Hong; Flagg, Ken; Lin, Danni; Mao, Guanping; Ferreyra, Henry; Stark, Klaus; von Strachwitz, Claudia N.; Wolf, Armin; Brandl, Caroline; Rudolph, Guenther; Olden, Matthias; Morrison, Margaux A.; Morgan, Denise J.; Schu, Matthew; Ahn, Jeeyun; Silvestri, Giuliana; Tsironi, Evangelia E.; Park, Kyu Hyung; Farrer, Lindsay A.; Orlin, Anton; Brucker, Alexander; Li, Mingyao; Curcio, Christine; Mohand-Saïd, Saddek; Sahel, José-Alain; Audo, Isabelle; Benchaboune, Mustapha; Cree, Angela J.; Rennie, Christina A.; Goverdhan, Srinivas V.; Grunin, Michelle; Hagbi-Levi, Shira; Campochiaro, Peter; Katsanis, Nicholas; Holz, Frank G.; Blond, Frédéric; Blanché, Hélène; Deleuze, Jean-François; Igo, Robert P.; Truitt, Barbara; Peachey, Neal S.; Meuer, Stacy M.; Myers, Chelsea E.; Moore, Emily L.; Klein, Ronald; Hauser, Michael A.; Postel, Eric A.; Courtenay, Monique D.; Schwartz, Stephen G.; Kovach, Jaclyn L.; Scott, William K.; Liew, Gerald; Tƒan, Ava G.; Gopinath, Bamini; Merriam, John C.; Smith, R. Theodore; Khan, Jane C.; Shahid, Humma; Moore, Anthony T.; McGrath, J. Allie; Laux, Reneé; Brantley, Milam A.; Agarwal, Anita; Ersoy, Lebriz; Caramoy, Albert; Langmann, Thomas; Saksens, Nicole T. M.; de Jong, Eiko K.; Hoyng, Carel B.; Cain, Melinda S.; Richardson, Andrea J.; Martin, Tammy M.; Blangero, John; Weeks, Daniel E.; Dhillon, Bal; van Duijn, Cornelia M.; Doheny, Kimberly F.; Romm, Jane; Klaver, Caroline C. W.; Hayward, Caroline; Gorin, Michael B.; Klein, Michael L.; Baird, Paul N.; den Hollander, Anneke I.; Fauser, Sascha; Yates, John R. W.; Allikmets, Rando; Wang, Jie Jin; Schaumberg, Debra A.; Klein, Barbara E. K.; Hagstrom, Stephanie A.; Chowers, Itay; Lotery, Andrew J.; Léveillard, Thierry; Zhang, Kang; Brilliant, Murray H.; Hewitt, Alex W.; Swaroop, Anand; Chew, Emily Y.; Pericak-Vance, Margaret A.; DeAngelis, Margaret; Stambolian, Dwight; Haines, Jonathan L.; Iyengar, Sudha K.; Weber, Bernhard H. F.; Abecasis, Gonçalo R.; Heid, Iris M.
2016-01-01
Advanced age-related macular degeneration (AMD) is the leading cause of blindness in the elderly with limited therapeutic options. Here, we report on a study of >12 million variants including 163,714 directly genotyped, most rare, protein-altering variant. Analyzing 16,144 patients and 17,832 controls, we identify 52 independently associated common and rare variants (P < 5×10–8) distributed across 34 loci. While wet and dry AMD subtypes exhibit predominantly shared genetics, we identify the first signal specific to wet AMD, near MMP9 (difference-P = 4.1×10–10). Very rare coding variants (frequency < 0.1%) in CFH, CFI, and TIMP3 suggest causal roles for these genes, as does a splice variant in SLC16A8. Our results support the hypothesis that rare coding variants can pinpoint causal genes within known genetic loci and illustrate that applying the approach systematically to detect new loci requires extremely large sample sizes. PMID:26691988
Schwabe, Christian
2002-11-01
The new hypothesis of evolution establishes a contiguity of life sciences with cosmology, physics, and chemistry, and provides a basis for the search for life on other planets. Chemistry is the sole driving force of the assembly of life, under the subtle guidance exerted by bonding orbital geometry. That phenomenon leads to multiple origins that function on the same principles but are different to the extent that their nucleic acid core varies. Thus, thoughts about the origins of life and the development of complexity have been transferred from the chance orientation of the past to the realm of atomic structures, which are subject to the laws of thermodynamics and kinetics. Evolution is a legitimate subject of basic science, and the complexity of life will submit to the laws of chemistry and physics as the problem is viewed from a new perspective. The paradigm connects life to the big events that formed every sphere of our living space and that keeps conditions fine-tuned for life to persist, perhaps a billion years or more. The "genomic potential" hypothesis leads to the prediction that life like ours is likely to exist in galaxies that are as distant from the origin of the universe as the Milky Way, and that the habitable zone of our galaxy harbors other living planets as well. Copyright 2002 Wiley-Liss, Inc.
Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong
2014-01-01
Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species. PMID:24130371
Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong
2014-02-01
Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.
Induction of infectious petunia vein clearing (pararetro) virus from endogenous provirus in petunia
Richert-Pöggeler, Katja R.; Noreen, Faiza; Schwarzacher, Trude; Harper, Glyn; Hohn, Thomas
2003-01-01
Infection by an endogenous pararetrovirus using forms of both episomal and chromosomal origin has been demonstrated and characterized, together with evidence that petunia vein clearing virus (PVCV) is a constituent of the Petunia hybrida genome. Our findings allow comparative and direct analysis of horizontally and vertically transmitted virus forms and demonstrate their infectivity using biolistic transformation of a provirus-free petunia species. Some integrants within the genome of P.hybrida are arranged in tandem, allowing direct release of virus by transcription. In addition to known inducers of endogenous pararetroviruses, such as genome hybridization, tissue culture and abiotic stresses, we observed activation of PVCV after wounding. Our data also support the hypothesis that the host plant uses DNA methylation to control the endogenous pararetrovirus. PMID:12970195
A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)
Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto
2017-01-01
Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916
Egas-Bejar, Daniela; Anderson, Pete M; Agarwal, Rishi; Corrales-Medina, Fernando; Devarajan, Eswaran; Huh, Winston W; Brown, Robert E; Subbiah, Vivek
2014-03-12
The survival of patients with advanced osteosarcoma is poor with limited therapeutic options. There is an urgent need for new targeted therapies based on biomarkers. Recently, theranostic molecular profiling services for cancer patients by CLIA-certified commercial companies as well as in-house profiling in academic medical centers have expanded exponentially. We evaluated molecular profiles of patients with advanced osteosarcoma whose tumor tissue had been analyzed by one of the following methods: 1. 182-gene next-generation exome sequencing (Foundation Medicine, Boston, MA), 2. Immunohistochemistry (IHC)/PCR-based panel (CARIS Target Now, Irving, Tx), 3.Comparative genome hybridization (Oncopath, San Antonio, TX). 4. Single-gene PCR assays, PTEN IHC (MDACC CLIA), 5. UT Houston morphoproteomics (Houston, TX). The most common actionable aberrations occur in the PI3K/PTEN/mTOR pathway. No patterns in genomic alterations beyond the above are readily identifiable, and suggest both high molecular diversity in osteosarcoma and the need for more analyses to define distinct subgroups of osteosarcoma defined by genomic alterations. Based on our preliminary observations we hypothesize that the biology of aggressive and the metastatic phenotype osteosarcoma at the molecular level is similar to human fingerprints, in that no two tumors are identical. Further large scale analyses of osteosarcoma samples are warranted to test this hypothesis.
USDA-ARS?s Scientific Manuscript database
The genome of the cattle tick R. microplus, an ectoparasite with global distribution, is estimated to be 7.1 Gbp and consists of ~70% repetitive DNA. We report the first assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genom...
Genome-wide analysis of tandem repeats in plants and green algae
Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang
2014-01-01
Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...
Prostate Cancer Diagnostics and Prognostics Based on Interphase Spatial Genome Positioning
2016-03-01
the Drosophila melanogaster genome at the ...and van Steensel, B. (2006). 1176 Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat Genet 38, 1177 1005-1014. doi...region according to the gene distribution pattern in primary genomic sequence . J Cell Biol 174:27–38 Therizols P, Illingworth RS, Courilleau C,
USDA-ARS?s Scientific Manuscript database
Single nucleotide polymorphisms (SNPs) are capable of providing the highest level of genome coverage for genomic and genetic analysis because of their abundance and relatively even distribution in the genome. Such a capacity, however, cannot be achieved without an efficient genotyping platform such ...
Universal laws of human society's income distribution
NASA Astrophysics Data System (ADS)
Tao, Yong
2015-10-01
General equilibrium equations in economics play the same role with many-body Newtonian equations in physics. Accordingly, each solution of the general equilibrium equations can be regarded as a possible microstate of the economic system. Since Arrow's Impossibility Theorem and Rawls' principle of social fairness will provide a powerful support for the hypothesis of equal probability, then the principle of maximum entropy is available in a just and equilibrium economy so that an income distribution will occur spontaneously (with the largest probability). Remarkably, some scholars have observed such an income distribution in some democratic countries, e.g. USA. This result implies that the hypothesis of equal probability may be only suitable for some "fair" systems (economic or physical systems). From this meaning, the non-equilibrium systems may be "unfair" so that the hypothesis of equal probability is unavailable.
Dores, Robert M
2016-01-01
The evolution of the melanocortin receptors (MCRs) is closely associated with the evolution of the melanocortin-2 receptor accessory proteins (MRAPs). Recent annotation of the elephant shark genome project revealed the sequence of a putative MRAP1 ortholog. The presence of this sequence in the genome of a cartilaginous fish raises the possibility that the mrap1 and mrap2 genes in the genomes of gnathostome vertebrates were the result of the chordate 2R genome duplication event. The presence of a putative MRAP1 ortholog in a cartilaginous fish genome is perplexing. Recent studies on melanocortin-2 receptor (MC2R) in the genomes of the elephant shark and the Japanese stingray indicate that these MC2R orthologs can be functionally expressed in CHO cells without co-expression of an exogenous mrap1 cDNA. The novel ligand selectivity of these cartilaginous fish MC2R orthologs is discussed. Finally, the origin of the mc2r and mc5r genes is reevaluated. The distinctive primary sequence conservation of MC2R and MC5R is discussed in light of the physiological roles of these two MCR paralogs.
Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.
Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad
2016-09-01
Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.
Ecophysiology of Freshwater Verrucomicrobia Inferred from Metagenome-Assembled Genomes
He, Shaomei; Stevens, Sarah L. R.; Chan, Leong-Keat; Bertilsson, Stefan; Glavina del Rio, Tijana; Tringe, Susannah G.; Malmstrom, Rex R.
2017-01-01
ABSTRACT Microbes are critical in carbon and nutrient cycling in freshwater ecosystems. Members of the Verrucomicrobia are ubiquitous in such systems, and yet their roles and ecophysiology are not well understood. In this study, we recovered 19 Verrucomicrobia draft genomes by sequencing 184 time-series metagenomes from a eutrophic lake and a humic bog that differ in carbon source and nutrient availabilities. These genomes span four of the seven previously defined Verrucomicrobia subdivisions and greatly expand knowledge of the genomic diversity of freshwater Verrucomicrobia. Genome analysis revealed their potential role as (poly)saccharide degraders in freshwater, uncovered interesting genomic features for this lifestyle, and suggested their adaptation to nutrient availabilities in their environments. Verrucomicrobia populations differ significantly between the two lakes in glycoside hydrolase gene abundance and functional profiles, reflecting the autochthonous and terrestrially derived allochthonous carbon sources of the two ecosystems, respectively. Interestingly, a number of genomes recovered from the bog contained gene clusters that potentially encode a novel porin-multiheme cytochrome c complex and might be involved in extracellular electron transfer in the anoxic humus-rich environment. Notably, most epilimnion genomes have large numbers of so-called “Planctomycete-specific” cytochrome c-encoding genes, which exhibited distribution patterns nearly opposite to those seen with glycoside hydrolase genes, probably associated with the different levels of environmental oxygen availability and carbohydrate complexity between lakes/layers. Overall, the recovered genomes represent a major step toward understanding the role, ecophysiology, and distribution of Verrucomicrobia in freshwater. IMPORTANCE Freshwater Verrucomicrobia spp. are cosmopolitan in lakes and rivers, and yet their roles and ecophysiology are not well understood, as cultured freshwater Verrucomicrobia spp. are restricted to one subdivision of this phylum. Here, we greatly expanded the known genomic diversity of this freshwater lineage by recovering 19 Verrucomicrobia draft genomes from 184 metagenomes collected from a eutrophic lake and a humic bog across multiple years. Most of these genomes represent the first freshwater representatives of several Verrucomicrobia subdivisions. Genomic analysis revealed Verrucomicrobia to be potential (poly)saccharide degraders and suggested their adaptation to carbon sources of different origins in the two contrasting ecosystems. We identified putative extracellular electron transfer genes and so-called “Planctomycete-specific” cytochrome c-encoding genes and identified their distinct distribution patterns between the lakes/layers. Overall, our analysis greatly advances the understanding of the function, ecophysiology, and distribution of freshwater Verrucomicrobia, while highlighting their potential role in freshwater carbon cycling. PMID:28959738
How Can Genomics Inform Education?
ERIC Educational Resources Information Center
Grigorenko, Elena L.
2007-01-01
This article offers some thoughts on possible connections between genomics and education. Genomics is already revolutionizing the way medical care is delivered and distributed; it will inevitably affect children's developmental trajectories by introducing more pharmacological and behavioral therapies. Educators should be prepared to understand the…
We report the draft genome sequences of four Mycobacterium chelonae group strains from biofilms obtained after a ‘chlorine burn’ in a chloraminated drinking water distribution system simulator. These opportunistic pathogens have been detected in drinking and hospital water distr...
Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.
Matsuzaki, Motomichi; Misumi, Osami; Shin-I, Tadasu; Maruyama, Shinichiro; Takahara, Manabu; Miyagishima, Shin-Ya; Mori, Toshiyuki; Nishida, Keiji; Yagisawa, Fumi; Nishida, Keishin; Yoshida, Yamato; Nishimura, Yoshiki; Nakao, Shunsuke; Kobayashi, Tamaki; Momoyama, Yu; Higashiyama, Tetsuya; Minoda, Ayumi; Sano, Masako; Nomoto, Hisayo; Oishi, Kazuko; Hayashi, Hiroko; Ohta, Fumiko; Nishizaka, Satoko; Haga, Shinobu; Miura, Sachiko; Morishita, Tomomi; Kabeya, Yukihiro; Terasawa, Kimihiro; Suzuki, Yutaka; Ishii, Yasuyuki; Asakawa, Shuichi; Takano, Hiroyoshi; Ohta, Niji; Kuroiwa, Haruko; Tanaka, Kan; Shimizu, Nobuyoshi; Sugano, Sumio; Sato, Naoki; Nozaki, Hisayoshi; Ogasawara, Naotake; Kohara, Yuji; Kuroiwa, Tsuneyoshi
2004-04-08
Small, compact genomes of ultrasmall unicellular algae provide information on the basic and essential genes that support the lives of photosynthetic eukaryotes, including higher plants. Here we report the 16,520,305-base-pair sequence of the 20 chromosomes of the unicellular red alga Cyanidioschyzon merolae 10D as the first complete algal genome. We identified 5,331 genes in total, of which at least 86.3% were expressed. Unique characteristics of this genomic structure include: a lack of introns in all but 26 genes; only three copies of ribosomal DNA units that maintain the nucleolus; and two dynamin genes that are involved only in the division of mitochondria and plastids. The conserved mosaic origin of Calvin cycle enzymes in this red alga and in green plants supports the hypothesis of the existence of single primary plastid endosymbiosis. The lack of a myosin gene, in addition to the unexpressed actin gene, suggests a simpler system of cytokinesis. These results indicate that the C. merolae genome provides a model system with a simple gene composition for studying the origin, evolution and fundamental mechanisms of eukaryotic cells.
Evolutionary insights from Erwinia amylovora genomics.
Smits, Theo H M; Rezzonico, Fabio; Duffy, Brion
2011-08-20
Evolutionary genomics is coming into focus with the recent availability of complete sequences for many bacterial species. A hypothesis on the evolution of virulence factors in the plant pathogen Erwinia amylovora, the causative agent of fire blight, was generated using comparative genomics with the genomes E. amylovora, Erwinia pyrifoliae and Erwinia tasmaniensis. Putative virulence factors were mapped to the proposed genealogy of the genus Erwinia that is based on phylogenetic and genomic data. Ancestral origin of several virulence factors was identified, including levan biosynthesis, sorbitol metabolism, three T3SS and two T6SS. Other factors appeared to have been acquired after divergence of pathogenic species, including a second flagellar gene and two glycosyltransferases involved in amylovoran biosynthesis. E. amylovora singletons include 3 unique T3SS effectors that may explain differential virulence/host ranges. E. amylovora also has a unique T1SS export system, and a unique third T6SS gene cluster. Genetic analysis revealed signatures of foreign DNA suggesting that horizontal gene transfer is responsible for some of these differential features between the three species. Copyright © 2010 Elsevier B.V. All rights reserved.
A little bit of sex matters for genome evolution in asexual plants.
Hojsgaard, Diego; Hörandl, Elvira
2015-01-01
Genome evolution in asexual organisms is theoretically expected to be shaped by various factors: first, hybrid origin, and polyploidy confer a genomic constitution of highly heterozygous genotypes with multiple copies of genes; second, asexuality confers a lack of recombination and variation in populations, which reduces the efficiency of selection against deleterious mutations; hence, the accumulation of mutations and a gradual increase in mutational load (Muller's ratchet) would lead to rapid extinction of asexual lineages; third, allelic sequence divergence is expected to result in rapid divergence of lineages (Meselson effect). Recent transcriptome studies on the asexual polyploid complex Ranunculus auricomus using single-nucleotide polymorphisms confirmed neutral allelic sequence divergence within a short time frame, but rejected a hypothesis of a genome-wide accumulation of mutations in asexuals compared to sexuals, except for a few genes related to reproductive development. We discuss a general model that the observed incidence of facultative sexuality in plants may unmask deleterious mutations with partial dominance and expose them efficiently to purging selection. A little bit of sex may help to avoid genomic decay and extinction.
Hughes, Lily C; Ortí, Guillermo; Huang, Yu; Sun, Ying; Baldwin, Carole C; Thompson, Andrew W; Arcila, Dahiana; Betancur-R, Ricardo; Li, Chenhong; Becker, Leandro; Bellora, Nicolás; Zhao, Xiaomeng; Li, Xiaofeng; Wang, Min; Fang, Chao; Xie, Bing; Zhou, Zhuocheng; Huang, Hai; Chen, Songlin; Venkatesh, Byrappa; Shi, Qiong
2018-05-14
Our understanding of phylogenetic relationships among bony fishes has been transformed by analysis of a small number of genes, but uncertainty remains around critical nodes. Genome-scale inferences so far have sampled a limited number of taxa and genes. Here we leveraged 144 genomes and 159 transcriptomes to investigate fish evolution with an unparalleled scale of data: >0.5 Mb from 1,105 orthologous exon sequences from 303 species, representing 66 out of 72 ray-finned fish orders. We apply phylogenetic tests designed to trace the effect of whole-genome duplication events on gene trees and find paralogy-free loci using a bioinformatics approach. Genome-wide data support the structure of the fish phylogeny, and hypothesis-testing procedures appropriate for phylogenomic datasets using explicit gene genealogy interrogation settle some long-standing uncertainties, such as the branching order at the base of the teleosts and among early euteleosts, and the sister lineage to the acanthomorph and percomorph radiations. Comprehensive fossil calibrations date the origin of all major fish lineages before the end of the Cretaceous.
Hunting for genes for hypertension: the Millennium Genome Project for Hypertension.
Tabara, Yasuharu; Kohara, Katsuhiko; Miki, Tetsuro
2012-06-01
The Millennium Genome Project for Hypertension was started in 2000 to identify genetic variants conferring susceptibility to hypertension, with the aim of furthering the understanding of the pathogenesis of this condition and realizing genome-based personalized medical care. Two different approaches were launched, genome-wide association analysis using single-nucleotide polymorphisms (SNPs) and microsatellite markers, and systematic candidate gene analysis, under the hypothesis that common variants have an important role in the etiology of common diseases. These multilateral approaches identified ATP2B1 as a gene responsible for hypertension in not only Japanese but also Caucasians. The high blood pressure susceptibility conferred by certain alleles of ATP2B1 has been widely replicated in various populations. Ex vivo mRNA expression analysis in umbilical artery smooth muscle cells indicated that reduced expression of this gene associated with the risk allele may be an underlying mechanism relating the ATP2B1 variant to hypertension. However, the effect size of a SNP was too small to clarify the entire picture of the genetic basis of hypertension. Further, dense genome analysis with accurate phenotype data may be required.
Whole genome sequences of two octogenarians with sustained cognitive abilities
Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L.; Baranzini, Sergio E.; Kramer, Joel H.; Oksenberg, Jorge R.
2014-01-01
Although numerous genetic variants affecting aging and mortality have been identified, e.g. APOE ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of two female octogenarians. We provide the repertoire of genomic variants that the two octogenarians have in common. We also describe the overlap with the previously reported genomes of two supercentenarians - individuals aged ≥ 110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the two octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased lifespan. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. PMID:25618617
Whole genome sequences of 2 octogenarians with sustained cognitive abilities.
Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L; Baranzini, Sergio E; Kramer, Joel H; Oksenberg, Jorge R
2015-03-01
Although numerous genetic variants affecting aging and mortality have been identified, for example, apolipoprotein E ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of 2 female octogenarians. We provide the repertoire of genomic variants that the 2 octogenarians have in common. We also describe the overlap with the previously reported genomes of 2 supercentenarians—individuals aged ≥110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the 2 octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased life span. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. Copyright © 2015 Elsevier Inc. All rights reserved.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Miklós, István
2003-10-01
As more and more genomes have been sequenced, genomic data is rapidly accumulating. Genome-wide mutations are believed more neutral than local mutations such as substitutions, insertions and deletions, therefore phylogenetic investigations based on inversions, transpositions and inverted transpositions are less biased by the hypothesis on neutral evolution. Although efficient algorithms exist for obtaining the inversion distance of two signed permutations, there is no reliable algorithm when both inversions and transpositions are considered. Moreover, different type of mutations happen with different rates, and it is not clear how to weight them in a distance based approach. We introduce a Markov Chain Monte Carlo method to genome rearrangement based on a stochastic model of evolution, which can estimate the number of different evolutionary events needed to sort a signed permutation. The performance of the method was tested on simulated data, and the estimated numbers of different types of mutations were reliable. Human and Drosophila mitochondrial data were also analysed with the new method. The mixing time of the Markov Chain is short both in terms of CPU times and number of proposals. The source code in C is available on request from the author.
Nature and function of insulator protein binding sites in the Drosophila genome
Schwartz, Yuri B.; Linder-Basso, Daniela; Kharchenko, Peter V.; Tolstorukov, Michael Y.; Kim, Maria; Li, Hua-Bing; Gorchakov, Andrey A.; Minoda, Aki; Shanower, Gregory; Alekseyenko, Artyom A.; Riddle, Nicole C.; Jung, Youngsook L.; Gu, Tingting; Plachetka, Annette; Elgin, Sarah C.R.; Kuroda, Mitzi I.; Park, Peter J.; Savitsky, Mikhail; Karpen, Gary H.; Pirrotta, Vincenzo
2012-01-01
Chromatin insulator elements and associated proteins have been proposed to partition eukaryotic genomes into sets of independently regulated domains. Here we test this hypothesis by quantitative genome-wide analysis of insulator protein binding to Drosophila chromatin. We find distinct combinatorial binding of insulator proteins to different classes of sites and uncover a novel type of insulator element that binds CP190 but not any other known insulator proteins. Functional characterization of different classes of binding sites indicates that only a small fraction act as robust insulators in standard enhancer-blocking assays. We show that insulators restrict the spreading of the H3K27me3 mark but only at a small number of Polycomb target regions and only to prevent repressive histone methylation within adjacent genes that are already transcriptionally inactive. RNAi knockdown of insulator proteins in cultured cells does not lead to major alterations in genome expression. Taken together, these observations argue against the concept of a genome partitioned by specialized boundary elements and suggest that insulators are reserved for specific regulation of selected genes. PMID:22767387
Relating hybrid advantage and genome replacement in unisexual salamanders.
Charney, Noah D
2012-05-01
Unisexual vertebrates are model systems for understanding the evolution of sex. Many predominantly clonal lineages allow occasional genetic recombination, which may be sufficient to avoid the accumulation of deleterious mutations and parasites. Introgression of paternal DNA into an all-female lineage represents a one-way flow of genetic material. Over many generations, this could result in complete replacement of the unisexual genomes by those of the donor species. The process of genome replacement may be counteracted by contemporary dispersal or by positive selection on hybrid nuclear genomes in ecotones. I present a conceptual model that relates nuclear genome replacement, positive selection on hybrids and biogeography in unisexual systems. I execute an individual-based simulation of the fate of hybrid genotypes in contact with a single host species. I parameterize these models for unisexual salamanders in the Ambystoma genus, for which the frequency of genome replacement has been a source of ongoing debate. I find that, if genome replacement occurs at a rate greater than 1/10,000 in Ambystoma, then there must be compensating positive selection in order to maintain observed levels of hybrid nuclei. Future researchers studying unisexual systems may use this framework as a guide to evaluating the hybrid superiority hypothesis. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.
Desiderato, Joana G; Alvarenga, Danillo O; Constancio, Milena T L; Alves, Lucia M C; Varani, Alessandro M
2018-05-14
Cellulose and its associated polymers are structural components of the plant cell wall, constituting one of the major sources of carbon and energy in nature. The carbon cycle is dependent on cellulose- and lignin-decomposing microbial communities and their enzymatic systems acting as consortia. These microbial consortia are under constant exploration for their potential biotechnological use. Herein, we describe the characterization of the genome of Dyella jiangningensis FCAV SCS01, recovered from the metagenome of a lignocellulose-degrading microbial consortium, which was isolated from a sugarcane crop soil under mechanical harvesting and covered by decomposing straw. The 4.7 Mbp genome encodes 4,194 proteins, including 36 glycoside hydrolases (GH), supporting the hypothesis that this bacterium may contribute to lignocellulose decomposition. Comparative analysis among fully sequenced Dyella species indicate that the genome synteny is not conserved, and that D. jiangningensis FCAV SCS01 carries 372 unique genes, including an alpha-glucosidase and maltodextrin glucosidase coding genes, and other potential biomass degradation related genes. Additional genomic features, such as prophage-like, genomic islands and putative new biosynthetic clusters were also uncovered. Overall, D. jiangningensis FCAV SCS01 represents the first South American Dyella genome sequenced and shows an exclusive feature among its genus, related to biomass degradation.
Computing prokaryotic gene ubiquity: rescuing the core from extinction.
Charlebois, Robert L; Doolittle, W Ford
2004-12-01
The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
Langberg, Kurt; Phillips, Matthew; Rueppell, Olav
2018-04-01
The rate of genomic recombination displays evolutionary plasticity and can even vary in response to environmental factors. The western honey bee (Apis mellifera L.) has an extremely high genomic recombination rate but the mechanistic basis for this genome-wide upregulation is not understood. Based on the hypothesis that meiotic recombination and DNA damage repair share common mechanisms in honey bees as in other organisms, we predicted that oxidative stress leads to an increase in recombination rate in honey bees. To test this prediction, we subjected honey bee queens to oxidative stress by paraquat injection and measured the rates of genomic recombination in select genome intervals of offspring produced before and after injection. The evaluation of 26 genome intervals in a total of over 1750 offspring of 11 queens by microsatellite genotyping revealed several significant effects but no overall evidence for a mechanistic link between oxidative stress and increased recombination was found. The results weaken the notion that DNA repair enzymes have a regulatory function in the high rate of meiotic recombination of honey bees, but they do not provide evidence against functional overlap between meiotic recombination and DNA damage repair in honey bees and more mechanistic studies are needed.
2014-01-01
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Distribution and Evolution of Peroxisomes in Alveolates (Apicomplexa, Dinoflagellates, Ciliates)
Ludewig-Klingner, Ann-Kathrin; Michael, Victoria; Jarek, Michael; Brinkmann, Henner
2018-01-01
Abstract The peroxisome was the last organelle to be discovered and five decades later it is still the Cinderella of eukaryotic compartments. Peroxisomes have a crucial role in the detoxification of reactive oxygen species, the beta-oxidation of fatty acids, and the biosynthesis of etherphospholipids, and they are assumed to be present in virtually all aerobic eukaryotes. Apicomplexan parasites including the malaria and toxoplasmosis agents were described as the first group of mitochondriate protists devoid of peroxisomes. This study was initiated to reassess the distribution and evolution of peroxisomes in the superensemble Alveolata (apicomplexans, dinoflagellates, ciliates). We established transcriptome data from two chromerid algae (Chromera velia, Vitrella brassicaformis), and two dinoflagellates (Prorocentrum minimum, Perkinsus olseni) and identified the complete set of essential peroxins in all four reference species. Our comparative genome analysis provides unequivocal evidence for the presence of peroxisomes in Toxoplasma gondii and related genera. Our working hypothesis of a common peroxisomal origin of all alveolates is supported by phylogenetic analyses of essential markers such as the import receptor Pex5. Vitrella harbors the most comprehensive set of peroxisomal proteins including the catalase and the glyoxylate cycle and it is thus a promising model organism to investigate the functional role of this organelle in Apicomplexa. PMID:29202176
Immunocytological analysis of meiotic recombination in two anole lizards (Squamata, Dactyloidae).
Lisachov, Artem P; Trifonov, Vladimir A; Giovannotti, Massimo; Ferguson-Smith, Malcolm A; Borodin, Pavel M
2017-01-01
Although the evolutionary importance of meiotic recombination is not disputed, the significance of interspecies differences in the recombination rates and recombination landscapes remains under-appreciated. Recombination rates and distribution of chiasmata have been examined cytologically in many mammalian species, whereas data on other vertebrates are scarce. Immunolocalization of the protein of the synaptonemal complex (SYCP3), centromere proteins and the mismatch-repair protein MLH1 was used, which is associated with the most common type of recombination nodules, to analyze the pattern of meiotic recombination in the male of two species of iguanian lizards, Anolis carolinensis Voigt, 1832 and Deiroptyx coelestinus (Cope, 1862). These species are separated by a relatively long evolutionary history although they retain the ancestral iguanian karyotype. In both species similar and extremely uneven distributions of MLH1 foci along the macrochromosome bivalents were detected: approximately 90% of crossovers were located at the distal 20% of the chromosome arm length. Almost total suppression of recombination in the intermediate and proximal regions of the chromosome arms contradicts the hypothesis that "homogenous recombination" is responsible for the low variation in GC content across the anole genome. It also leads to strong linkage disequilibrium between the genes located in these regions, which may benefit conservation of co-adaptive gene arrays responsible for the ecological adaptations of the anoles.
Ioannou, Dimitrios; Millan, Nicole M; Jordan, Elizabeth; Tempest, Helen G
2017-01-31
The organization of chromosomes in sperm nuclei has been proposed to possess a unique "hairpin-loop" arrangement, which is hypothesized to aid in the ordered exodus of the paternal genome following fertilization. This study simultaneously assessed the 3D and 2D radial and longitudinal organization of telomeres, centromeres, and investigated whether chromosomes formed the same centromere clusters in sperm cells. Reproducible radial and longitudinal non-random organization was observed for all investigated loci using both 3D and 2D approaches in multiple subjects. We report novel findings, with telomeres and centromeres being localized throughout the nucleus but demonstrating roughly a 1:1 distribution in the nuclear periphery and the intermediate regions with <15% occupying the nuclear interior. Telomeres and centromeres were observed to aggregate in sperm nuclei, forming an average of 20 and 7 clusters, respectively. Reproducible longitudinal organization demonstrated preferential localization of telomeres and centromeres in the mid region of the sperm cell. Preliminary evidence is also provided to support the hypothesis that specific chromosomes preferentially form the same centromere clusters. The more segmental distribution of telomeres and centromeres as described in this study could more readily accommodate and facilitate the sequential exodus of paternal chromosomes following fertilization.
Ioannou, Dimitrios; Millan, Nicole M.; Jordan, Elizabeth; Tempest, Helen G.
2017-01-01
The organization of chromosomes in sperm nuclei has been proposed to possess a unique “hairpin-loop” arrangement, which is hypothesized to aid in the ordered exodus of the paternal genome following fertilization. This study simultaneously assessed the 3D and 2D radial and longitudinal organization of telomeres, centromeres, and investigated whether chromosomes formed the same centromere clusters in sperm cells. Reproducible radial and longitudinal non-random organization was observed for all investigated loci using both 3D and 2D approaches in multiple subjects. We report novel findings, with telomeres and centromeres being localized throughout the nucleus but demonstrating roughly a 1:1 distribution in the nuclear periphery and the intermediate regions with <15% occupying the nuclear interior. Telomeres and centromeres were observed to aggregate in sperm nuclei, forming an average of 20 and 7 clusters, respectively. Reproducible longitudinal organization demonstrated preferential localization of telomeres and centromeres in the mid region of the sperm cell. Preliminary evidence is also provided to support the hypothesis that specific chromosomes preferentially form the same centromere clusters. The more segmental distribution of telomeres and centromeres as described in this study could more readily accommodate and facilitate the sequential exodus of paternal chromosomes following fertilization. PMID:28139771
2010-01-01
Background An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. Methods This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. Results A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. Conclusion WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. PMID:21210985
Interactive Exploration on Large Genomic Datasets.
Tu, Eric
2016-01-01
The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project [1], which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or 1,000,000 bases of DNA), or when concurrently viewing multiple samples of data. While genomic processing pipelines have shifted towards using distributed computing techniques, such as with ADAM [4], genomic visualization tools have not. In this work we present Mango, a scalable genome browser built on top of ADAM that can run both locally and on a cluster. Mango presents a combination of different optimizations that can be combined in a single application to drive novel genomic visualization techniques over terabytes of genomic data. By building visualization on top of a distributed processing pipeline, we can perform visualization queries over large regions that are not possible with current tools, and decrease the time for viewing large data sets. Mango is part of the Big Data Genomics project at University of California-Berkeley [25] and is published under the Apache 2 license. Mango is available at https://github.com/bigdatagenomics/mango.
USDA-ARS?s Scientific Manuscript database
The mechanisms arthropods use to induce plant gall formation are poorly understood. However, there is growing evidence that effector proteins are involved. To examine this hypothesis, we sequenced the genome of the Hessian fly (Mayetiola destructor, M. des), an obligate plant parasitic gall midge an...
Genetic diversity of Danthonia spicata (L.) Beauv. Based on genomic simple sequence repeat markers
USDA-ARS?s Scientific Manuscript database
Danthonia spicata, commonly known as poverty oatgrass, is a perennial bunch-type grass native to North America. D. spicata has dimorphic seed heads; the hypothesis is that terminal seed heads allow some level of outcrossing and axial seed heads are only self-fertilized. However, there is no genetic ...
Havird, Justin C; Hall, Matthew D; Dowling, Damian K
2015-09-01
The evolution of sex in eukaryotes represents a paradox, given the "twofold" fitness cost it incurs. We hypothesize that the mutational dynamics of the mitochondrial genome would have favored the evolution of sexual reproduction. Mitochondrial DNA (mtDNA) exhibits a high-mutation rate across most eukaryote taxa, and several lines of evidence suggest that this high rate is an ancestral character. This seems inexplicable given that mtDNA-encoded genes underlie the expression of life's most salient functions, including energy conversion. We propose that negative metabolic effects linked to mitochondrial mutation accumulation would have invoked selection for sexual recombination between divergent host nuclear genomes in early eukaryote lineages. This would provide a mechanism by which recombinant host genotypes could be rapidly shuffled and screened for the presence of compensatory modifiers that offset mtDNA-induced harm. Under this hypothesis, recombination provides the genetic variation necessary for compensatory nuclear coadaptation to keep pace with mitochondrial mutation accumulation. © 2015 WILEY Periodicals, Inc.
Mating system shifts and transposable element evolution in the plant genus Capsella.
Agren, J Ågren; Wang, Wei; Koenig, Daniel; Neuffer, Barbara; Weigel, Detlef; Wright, Stephen I
2014-07-16
Despite having predominately deleterious fitness effects, transposable elements (TEs) are major constituents of eukaryote genomes in general and of plant genomes in particular. Although the proportion of the genome made up of TEs varies at least four-fold across plants, the relative importance of the evolutionary forces shaping variation in TE abundance and distributions across taxa remains unclear. Under several theoretical models, mating system plays an important role in governing the evolutionary dynamics of TEs. Here, we use the recently sequenced Capsella rubella reference genome and short-read whole genome sequencing of multiple individuals to quantify abundance, genome distributions, and population frequencies of TEs in three recently diverged species of differing mating system, two self-compatible species (C. rubella and C. orientalis) and their self-incompatible outcrossing relative, C. grandiflora. We detect different dynamics of TE evolution in our two self-compatible species; C. rubella shows a small increase in transposon copy number, while C. orientalis shows a substantial decrease relative to C. grandiflora. The direction of this change in copy number is genome wide and consistent across transposon classes. For insertions near genes, however, we detect the highest abundances in C. grandiflora. Finally, we also find differences in the population frequency distributions across the three species. Overall, our results suggest that the evolution of selfing may have different effects on TE evolution on a short and on a long timescale. Moreover, cross-species comparisons of transposon abundance are sensitive to reference genome bias, and efforts to control for this bias are key when making comparisons across species.
Identification and Characterization of Domesticated Bacterial Transposases
Gallie, Jenna; Rainey, Paul B.
2017-01-01
Abstract Selfish genetic elements, such as insertion sequences and transposons are found in most genomes. Transposons are usually identifiable by their high copy number within genomes. In contrast, REP-associated tyrosine transposases (RAYTs), a recently described class of bacterial transposase, are typically present at just one copy per genome. This suggests that RAYTs no longer copy themselves and thus they no longer function as a typical transposase. Motivated by this possibility we interrogated thousands of fully sequenced bacterial genomes in order to determine patterns of RAYT diversity, their distribution across chromosomes and accessory elements, and rate of duplication. RAYTs encompass exceptional diversity and are divisible into at least five distinct groups. They possess features more similar to housekeeping genes than insertion sequences, are predominantly vertically transmitted and have persisted through evolutionary time to the point where they are now found in 24% of all species for which at least one fully sequenced genome is available. Overall, the genomic distribution of RAYTs suggests that they have been coopted by host genomes to perform a function that benefits the host cell. PMID:28910967
Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex Traits.
Shi, Huwenbo; Mancuso, Nicholas; Spendlove, Sarah; Pasaniuc, Bogdan
2017-11-02
Although genetic correlations between complex traits provide valuable insights into epidemiological and etiological studies, a precise quantification of which genomic regions disproportionately contribute to the genome-wide correlation is currently lacking. Here, we introduce ρ-HESS, a technique to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. Our approach requires GWAS summary data only and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. We analyzed large-scale GWAS summary data across 36 quantitative traits, and identified 25 genomic regions that contribute significantly to the genetic correlation among these traits. Notably, we find 6 genomic regions that contribute to the genetic correlation of 10 pairs of traits that show negligible genome-wide correlation, further showcasing the power of local genetic correlation analyses. Finally, we report the distribution of local genetic correlations across the genome for 55 pairs of traits that show putative causal relationships. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
The genome of the Gulf pipefish enables understanding of evolutionary innovations.
Small, C M; Bassham, S; Catchen, J; Amores, A; Fuiten, A M; Brown, R S; Jones, A G; Cresko, W A
2016-12-20
Evolutionary origins of derived morphologies ultimately stem from changes in protein structure, gene regulation, and gene content. A well-assembled, annotated reference genome is a central resource for pursuing these molecular phenomena underlying phenotypic evolution. We explored the genome of the Gulf pipefish (Syngnathus scovelli), which belongs to family Syngnathidae (pipefishes, seahorses, and seadragons). These fishes have dramatically derived bodies and a remarkable novelty among vertebrates, the male brood pouch. We produce a reference genome, condensed into chromosomes, for the Gulf pipefish. Gene losses and other changes have occurred in pipefish hox and dlx clusters and in the tbx and pitx gene families, candidate mechanisms for the evolution of syngnathid traits, including an elongated axis and the loss of ribs, pelvic fins, and teeth. We measure gene expression changes in pregnant versus non-pregnant brood pouch tissue and characterize the genomic organization of duplicated metalloprotease genes (patristacins) recruited into the function of this novel structure. Phylogenetic inference using ultraconserved sequences provides an alternative hypothesis for the relationship between orders Syngnathiformes and Scombriformes. Comparisons of chromosome structure among percomorphs show that chromosome number in a pipefish ancestor became reduced via chromosomal fusions. The collected findings from this first syngnathid reference genome open a window into the genomic underpinnings of highly derived morphologies, demonstrating that de novo production of high quality and useful reference genomes is within reach of even small research groups.
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.
2015-01-01
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution. PMID:26005436
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; ...
2015-05-08
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less
Narla, S T; Lee, Y-W; Benson, C A; Sarder, P; Brennand, K J; Stachowiak, E K; Stachowiak, M K
2017-07-01
The watershed-hypothesis of schizophrenia asserts that over 200 different mutations dysregulate distinct pathways that converge on an unspecified common mechanism(s) that controls disease ontogeny. Consistent with this hypothesis, our RNA-sequencing of neuron committed cells (NCCs) differentiated from established iPSCs of 4 schizophrenia patients and 4 control subjects uncovered a dysregulated transcriptome of 1349 mRNAs common to all patients. Data reveals a global dysregulation of developmental genome, deconstruction of coordinated mRNA networks, and the formation of aberrant, new coordinated mRNA networks indicating a concerted action of the responsible factor(s). Sequencing of miRNA transcriptomes demonstrated an overexpression of 16 miRNAs and deconstruction of interactive miRNA-mRNA networks in schizophrenia NCCs. ChiPseq revealed that the nuclear (n) form of FGFR1, a pan-ontogenic regulator, is overexpressed in schizophrenia NCCs and overtargets dysregulated mRNA and miRNA genes. The nFGFR1 targeted 54% of all human gene promoters and 84.4% of schizophrenia dysregulated genes. The upregulated genes reside within major developmental pathways that control neurogenesis and neuron formation, whereas downregulated genes are involved in oligodendrogenesis. Our results indicate (i) an early (preneuronal) genomic etiology of schizophrenia, (ii) dysregulated genes and new coordinated gene networks are common to unrelated cases of schizophrenia, (iii) gene dysregulations are accompanied by increased nFGFR1-genome interactions, and (iv) modeling of increased nFGFR1 by an overexpression of a nFGFR1 lead to up or downregulation of selected genes as observed in schizophrenia NCCs. Together our results designate nFGFR1 signaling as a potential common dysregulated mechanism in investigated patients and potential therapeutic target in schizophrenia. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Rietschel, Marcella; Mattheisen, Manuel; Breuer, René; Schulze, Thomas G.; Nöthen, Markus M.; Levinson, Douglas; Shi, Jianxin; Gejman, Pablo V.; Cichon, Sven; Ophoff, Roel A.
2012-01-01
Recent studies suggest that variation in complex disorders (e.g., schizophrenia) is explained by a large number of genetic variants with small effect size (Odds Ratio∼1.05–1.1). The statistical power to detect these genetic variants in Genome Wide Association (GWA) studies with large numbers of cases and controls (∼15,000) is still low. As it will be difficult to further increase sample size, we decided to explore an alternative method for analyzing GWA data in a study of schizophrenia, dramatically reducing the number of statistical tests. The underlying hypothesis was that at least some of the genetic variants related to a common outcome are collocated in segments of chromosomes at a wider scale than single genes. Our approach was therefore to study the association between relatively large segments of DNA and disease status. An association test was performed for each SNP and the number of nominally significant tests in a segment was counted. We then performed a permutation-based binomial test to determine whether this region contained significantly more nominally significant SNPs than expected under the null hypothesis of no association, taking linkage into account. Genome Wide Association data of three independent schizophrenia case/control cohorts with European ancestry (Dutch, German, and US) using segments of DNA with variable length (2 to 32 Mbp) was analyzed. Using this approach we identified a region at chromosome 5q23.3-q31.3 (128–160 Mbp) that was significantly enriched with nominally associated SNPs in three independent case-control samples. We conclude that considering relatively wide segments of chromosomes may reveal reliable relationships between the genome and schizophrenia, suggesting novel methodological possibilities as well as raising theoretical questions. PMID:22723893
Calcisponges have a ParaHox gene and dynamic expression of dispersed NK homeobox genes.
Fortunato, Sofia A V; Adamski, Marcin; Ramos, Olivia Mendivil; Leininger, Sven; Liu, Jing; Ferrier, David E K; Adamska, Maja
2014-10-30
Sponges are simple animals with few cell types, but their genomes paradoxically contain a wide variety of developmental transcription factors, including homeobox genes belonging to the Antennapedia (ANTP) class, which in bilaterians encompass Hox, ParaHox and NK genes. In the genome of the demosponge Amphimedon queenslandica, no Hox or ParaHox genes are present, but NK genes are linked in a tight cluster similar to the NK clusters of bilaterians. It has been proposed that Hox and ParaHox genes originated from NK cluster genes after divergence of sponges from the lineage leading to cnidarians and bilaterians. On the other hand, synteny analysis lends support to the notion that the absence of Hox and ParaHox genes in Amphimedon is a result of secondary loss (the ghost locus hypothesis). Here we analysed complete suites of ANTP-class homeoboxes in two calcareous sponges, Sycon ciliatum and Leucosolenia complicata. Our phylogenetic analyses demonstrate that these calcisponges possess orthologues of bilaterian NK genes (Hex, Hmx and Msx), a varying number of additional NK genes and one ParaHox gene, Cdx. Despite the generation of scaffolds spanning multiple genes, we find no evidence of clustering of Sycon NK genes. All Sycon ANTP-class genes are developmentally expressed, with patterns suggesting their involvement in cell type specification in embryos and adults, metamorphosis and body plan patterning. These results demonstrate that ParaHox genes predate the origin of sponges, thus confirming the ghost locus hypothesis, and highlight the need to analyse the genomes of multiple sponge lineages to obtain a complete picture of the ancestral composition of the first animal genome.
López-Pérez, Mario; Gonzaga, Aitor; Martin-Cuadrado, Ana-Belen; Onyshchenko, Olga; Ghavidel, Akbar; Ghai, Rohit; Rodriguez-Valera, Francisco
2012-01-01
Alteromonas macleodii is a marine gammaproteobacterium with widespread distribution in temperate or tropical waters. We describe three genomes of isolates from surface waters around Europe (Atlantic, Mediterranean and Black Sea) and compare them with a previously described deep Mediterranean isolate (AltDE) that belongs to a widely divergent clade. The surface isolates are quite similar, the most divergent being the Black Sea (BS11) isolate. The genomes contain several genomic islands with different gene content. The recruitment of very similar genomic fragments from metagenomes in different locations indicates that the surface clade is globally abundant with little effect of geography, even the AltDE and the BS11 genomes recruiting from surface samples in open ocean locations. The finding of CRISPR protospacers of AltDE in a lysogenic phage in the Atlantic (English Channel) isolate illustrates a flow of genetic material among these clades and a remarkably wide distribution of this phage. PMID:23019517
Two fundamentally different classes of microbial genes.
Wolf, Yuri I; Makarova, Kira S; Lobkovsky, Alexander E; Koonin, Eugene V
2016-11-07
The evolution of bacterial and archaeal genomes is highly dynamic and involves extensive horizontal gene transfer and gene loss 1-4 . Furthermore, many microbial species appear to have open pangenomes, where each newly sequenced genome contains more than 10% ORFans, that is, genes without detectable homologues in other species 5,6 . Here, we report a quantitative analysis of microbial genome evolution by fitting the parameters of a simple, steady-state evolutionary model to the comparative genomic data on the gene content and gene order similarity between archaeal genomes. The results reveal two sharply distinct classes of microbial genes, one of which is characterized by effectively instantaneous gene replacement, and the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of the size of the prokaryotic genomic universe, which appears to consist of at least a billion distinct genes. Furthermore, the same distribution of constraints is shown to govern the evolution of gene complement and gene order, without the need to invoke long-range conservation or the selfish operon concept 7 .
Lassalle, Florent; Campillo, Tony; Vial, Ludovic; Baude, Jessica; Costechareyre, Denis; Chapulliot, David; Shams, Malek; Abrouk, Danis; Lavire, Céline; Oger-Desfeux, Christine; Hommais, Florence; Guéguen, Laurent; Daubin, Vincent; Muller, Daniel; Nesme, Xavier
2011-01-01
The definition of bacterial species is based on genomic similarities, giving rise to the operational concept of genomic species, but the reasons of the occurrence of differentiated genomic species remain largely unknown. We used the Agrobacterium tumefaciens species complex and particularly the genomic species presently called genomovar G8, which includes the sequenced strain C58, to test the hypothesis of genomic species having specific ecological adaptations possibly involved in the speciation process. We analyzed the gene repertoire specific to G8 to identify potential adaptive genes. By hybridizing 25 strains of A. tumefaciens on DNA microarrays spanning the C58 genome, we highlighted the presence and absence of genes homologous to C58 in the taxon. We found 196 genes specific to genomovar G8 that were mostly clustered into seven genomic islands on the C58 genome—one on the circular chromosome and six on the linear chromosome—suggesting higher plasticity and a major adaptive role of the latter. Clusters encoded putative functional units, four of which had been verified experimentally. The combination of G8-specific functions defines a hypothetical species primary niche for G8 related to commensal interaction with a host plant. This supports that the G8 ancestor was able to exploit a new ecological niche, maybe initiating ecological isolation and thus speciation. Searching genomic data for synapomorphic traits is a powerful way to describe bacterial species. This procedure allowed us to find such phenotypic traits specific to genomovar G8 and thus propose a Latin binomial, Agrobacterium fabrum, for this bona fide genomic species. PMID:21795751
Is mammalian chromosomal evolution driven by regions of genome fragility?
Ruiz-Herrera, Aurora; Castresana, Jose; Robinson, Terence J
2006-01-01
Background A fundamental question in comparative genomics concerns the identification of mechanisms that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint regions, and evolutionary breakpoints taken from public databases available for seven eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken, and examined these for correspondence with human fragile sites and tandem repeats. Results Our results confirm previous investigations that showed the presence of chromosomal regions in the human genome that have been repeatedly used as illustrated by a high breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however, that there is a striking correspondence between fragile site location, the positions of evolutionary breakpoints, and the distribution of tandem repeats throughout the human genome, which similarly reflect a non-uniform pattern of occurrence. Conclusion These observations provide further evidence that certain chromosomal regions in the human genome have been repeatedly used in the evolutionary process. As a consequence, the genome is a composite of fragile regions prone to reorganization that have been conserved in different lineages, and genomic tracts that do not exhibit the same levels of evolutionary plasticity. PMID:17156441
A note on generalized Genome Scan Meta-Analysis statistics
Koziol, James A; Feng, Anne C
2005-01-01
Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930
Three tiers of genome evolution in reptiles
Organ, Chris L.; Moreno, Ricardo Godínez; Edwards, Scott V.
2008-01-01
Characterization of reptilian genomes is essential for understanding the overall diversity and evolution of amniote genomes, because reptiles, which include birds, constitute a major fraction of the amniote evolutionary tree. To better understand the evolution and diversity of genomic characteristics in Reptilia, we conducted comparative analyses of online sequence data from Alligator mississippiensis (alligator) and Sphenodon punctatus (tuatara) as well as genome size and karyological data from a wide range of reptilian species. At the whole-genome and chromosomal tiers of organization, we find that reptilian genome size distribution is consistent with a model of continuous gradual evolution while genomic compartmentalization, as manifested in the number of microchromosomes and macrochromosomes, appears to have undergone early rapid change. At the sequence level, the third genomic tier, we find that exon size in Alligator is distributed in a pattern matching that of exons in Gallus (chicken), especially in the 101—200 bp size class. A small spike in the fraction of exons in the 301 bp—1 kb size class is also observed for Alligator, but more so for Sphenodon. For introns, we find that members of Reptilia have a larger fraction of introns within the 101 bp–2 kb size class and a lower fraction of introns within the 5–30 kb size class than do mammals. These findings suggest that the mode of reptilian genome evolution varies across three hierarchical levels of the genome, a pattern consistent with a mosaic model of genomic evolution. PMID:21669810
Fu, Wen-Bo; Li, Bo; He, Zheng-Bo
2018-01-01
Chemosensory proteins (CSP) are soluble carrier proteins that may function in odorant reception in insects. CSPs have not been thoroughly studied at whole-genome level, despite the availability of insect genomes. Here, we identified/reidentified 283 CSP genes in the genomes of 22 mosquitoes. All 283 CSP genes possess a highly conserved OS-D domain. We comprehensively analyzed these CSP genes and determined their conserved domains, structure, genomic distribution, phylogeny, and evolutionary patterns. We found an average of seven CSP genes in each of 19 Anopheles genomes, 27 CSP genes in Cx. quinquefasciatus, 43 in Ae. aegypti, and 83 in Ae. albopictus. The Anopheles CSP genes had a simple genomic organization with a relatively consistent gene distribution, while most of the Culicinae CSP genes were distributed in clusters on the scaffolds. Our phylogenetic analysis clustered the CSPs into two major groups: CSP1-8 and CSE1-3. The CSP1-8 groups were all monophyletic with good bootstrap support. The CSE1-3 groups were an expansion of the CSP family of genes specific to the three Culicinae species. The Ka/Ks ratios indicated that the CSP genes had been subject to purifying selection with relatively slow evolution. Our results provide a comprehensive framework for the study of the CSP gene family in these 22 mosquito species, laying a foundation for future work on CSP function in the detection of chemical cues in the surrounding environment. PMID:29304168
Mei, Ting; Fu, Wen-Bo; Li, Bo; He, Zheng-Bo; Chen, Bin
2018-01-01
Chemosensory proteins (CSP) are soluble carrier proteins that may function in odorant reception in insects. CSPs have not been thoroughly studied at whole-genome level, despite the availability of insect genomes. Here, we identified/reidentified 283 CSP genes in the genomes of 22 mosquitoes. All 283 CSP genes possess a highly conserved OS-D domain. We comprehensively analyzed these CSP genes and determined their conserved domains, structure, genomic distribution, phylogeny, and evolutionary patterns. We found an average of seven CSP genes in each of 19 Anopheles genomes, 27 CSP genes in Cx. quinquefasciatus, 43 in Ae. aegypti, and 83 in Ae. albopictus. The Anopheles CSP genes had a simple genomic organization with a relatively consistent gene distribution, while most of the Culicinae CSP genes were distributed in clusters on the scaffolds. Our phylogenetic analysis clustered the CSPs into two major groups: CSP1-8 and CSE1-3. The CSP1-8 groups were all monophyletic with good bootstrap support. The CSE1-3 groups were an expansion of the CSP family of genes specific to the three Culicinae species. The Ka/Ks ratios indicated that the CSP genes had been subject to purifying selection with relatively slow evolution. Our results provide a comprehensive framework for the study of the CSP gene family in these 22 mosquito species, laying a foundation for future work on CSP function in the detection of chemical cues in the surrounding environment.
Analysis of horizontal genetic transfer in red algae in the post-genomics age
Chan, Cheong Xin; Bhattacharya, Debashish
2013-01-01
The recently published genome of the unicellular red alga Porphyridium purpureum revealed a gene-rich, intron-poor species, which is surprising for a free-living mesophile. Of the 8,355 predicted protein-coding regions, up to 773 (9.3%) were implicated in horizontal genetic transfer (HGT) events involving other prokaryote and eukaryote lineages. A much smaller number, up to 174 (2.1%) showed unambiguous evidence of vertical inheritance. Together with other red algal genomes, nearly all published in 2013, these data provide an excellent platform for studying diverse aspects of algal biology and evolution. This novel information will help investigators test existing hypotheses about the impact of endosymbiosis and HGT on algal evolution and enable comparative analysis within a more-refined, hypothesis-driven framework that extends beyond HGT. Here we explore the impacts of this infusion of red algal genome data on addressing questions regarding the complex nature of algal evolution and highlight the need for scalable phylogenomic approaches to handle the forthcoming deluge of sequence information. PMID:24475368
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Sigma: Strain-level inference of genomes from metagenomic analysis for biosurveillance
Ahn, Tae-Hyuk; Chai, Juanjuan; Pan, Chongle
2014-09-29
Motivation: Metagenomic sequencing of clinical samples provides a promising technique for direct pathogen detection and characterization in biosurveillance. Taxonomic analysis at the strain level can be used to resolve serotypes of a pathogen in biosurveillance. Sigma was developed for strain-level identification and quantification of pathogens using their reference genomes based on metagenomic analysis. Results: Sigma provides not only accurate strain-level inferences, but also three unique capabilities: (i) Sigma quantifies the statistical uncertainty of its inferences, which includes hypothesis testing of identified genomes and confidence interval estimation of their relative abundances; (ii) Sigma enables strain variant calling by assigning metagenomic readsmore » to their most likely reference genomes; and (iii) Sigma supports parallel computing for fast analysis of large datasets. In conclusion, the algorithm performance was evaluated using simulated mock communities and fecal samples with spike-in pathogen strains. Availability and Implementation: Sigma was implemented in C++ with source codes and binaries freely available at http://sigma.omicsbio.org.« less
Boldogköi, Zsolt
2012-01-01
The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too. PMID:22783276
Boldogköi, Zsolt
2012-01-01
The regulation of gene expression is essential for normal functioning of biological systems in every form of life. Gene expression is primarily controlled at the level of transcription, especially at the phase of initiation. Non-coding RNAs are one of the major players at every level of genetic regulation, including the control of chromatin organization, transcription, various post-transcriptional processes, and translation. In this study, the Transcriptional Interference Network (TIN) hypothesis was put forward in an attempt to explain the global expression of antisense RNAs and the overall occurrence of tandem gene clusters in the genomes of various biological systems ranging from viruses to mammalian cells. The TIN hypothesis suggests the existence of a novel layer of genetic regulation, based on the interactions between the transcriptional machineries of neighboring genes at their overlapping regions, which are assumed to play a fundamental role in coordinating gene expression within a cluster of functionally linked genes. It is claimed that the transcriptional overlaps between adjacent genes are much more widespread in genomes than is thought today. The Waterfall model of the TIN hypothesis postulates a unidirectional effect of upstream genes on the transcription of downstream genes within a cluster of tandemly arrayed genes, while the Seesaw model proposes a mutual interdependence of gene expression between the oppositely oriented genes. The TIN represents an auto-regulatory system with an exquisitely timed and highly synchronized cascade of gene expression in functionally linked genes located in close physical proximity to each other. In this study, we focused on herpesviruses. The reason for this lies in the compressed nature of viral genes, which allows a tight regulation and an easier investigation of the transcriptional interactions between genes. However, I believe that the same or similar principles can be applied to cellular organisms too.
Gene-culture coevolution in whales and dolphins.
Whitehead, Hal
2017-07-24
Whales and dolphins (Cetacea) have excellent social learning skills as well as a long and strong mother-calf bond. These features produce stable cultures, and, in some species, sympatric groups with different cultures. There is evidence and speculation that this cultural transmission of behavior has affected gene distributions. Culture seems to have driven killer whales into distinct ecotypes, which may be incipient species or subspecies. There are ecotype-specific signals of selection in functional genes that correspond to cultural foraging behavior and habitat use by the different ecotypes. The five species of whale with matrilineal social systems have remarkably low diversity of mtDNA. Cultural hitchhiking, the transmission of functionally neutral genes in parallel with selective cultural traits, is a plausible hypothesis for this low diversity, especially in sperm whales. In killer whales the ecotype divisions, together with founding bottlenecks, selection, and cultural hitchhiking, likely explain the low mtDNA diversity. Several cetacean species show habitat-specific distributions of mtDNA haplotypes, probably the result of mother-offspring cultural transmission of migration routes or destinations. In bottlenose dolphins, remarkable small-scale differences in haplotype distribution result from maternal cultural transmission of foraging methods, and large-scale redistributions of sperm whale cultural clans in the Pacific have likely changed mitochondrial genetic geography. With the acceleration of genomics new results should come fast, but understanding gene-culture coevolution will be hampered by the measured pace of research on the socio-cultural side of cetacean biology.
Gene–culture coevolution in whales and dolphins
Whitehead, Hal
2017-01-01
Whales and dolphins (Cetacea) have excellent social learning skills as well as a long and strong mother–calf bond. These features produce stable cultures, and, in some species, sympatric groups with different cultures. There is evidence and speculation that this cultural transmission of behavior has affected gene distributions. Culture seems to have driven killer whales into distinct ecotypes, which may be incipient species or subspecies. There are ecotype-specific signals of selection in functional genes that correspond to cultural foraging behavior and habitat use by the different ecotypes. The five species of whale with matrilineal social systems have remarkably low diversity of mtDNA. Cultural hitchhiking, the transmission of functionally neutral genes in parallel with selective cultural traits, is a plausible hypothesis for this low diversity, especially in sperm whales. In killer whales the ecotype divisions, together with founding bottlenecks, selection, and cultural hitchhiking, likely explain the low mtDNA diversity. Several cetacean species show habitat-specific distributions of mtDNA haplotypes, probably the result of mother–offspring cultural transmission of migration routes or destinations. In bottlenose dolphins, remarkable small-scale differences in haplotype distribution result from maternal cultural transmission of foraging methods, and large-scale redistributions of sperm whale cultural clans in the Pacific have likely changed mitochondrial genetic geography. With the acceleration of genomics new results should come fast, but understanding gene–culture coevolution will be hampered by the measured pace of research on the socio-cultural side of cetacean biology. PMID:28739936
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.
Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie
2014-02-01
Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
Seeleuthner, Yoann; Mondy, Samuel; Lombard, Vincent; Carradec, Quentin; Pelletier, Eric; Wessner, Marc; Leconte, Jade; Mangot, Jean-François; Poulain, Julie; Labadie, Karine; Logares, Ramiro; Sunagawa, Shinichi; de Berardinis, Véronique; Salanoubat, Marcel; Dimier, Céline; Kandels-Lewis, Stefanie; Picheral, Marc; Searson, Sarah; Pesant, Stephane; Poulton, Nicole; Stepanauskas, Ramunas; Bork, Peer; Bowler, Chris; Hingamp, Pascal; Sullivan, Matthew B; Iudicone, Daniele; Massana, Ramon; Aury, Jean-Marc; Henrissat, Bernard; Karsenti, Eric; Jaillon, Olivier; Sieracki, Mike; de Vargas, Colomban; Wincker, Patrick
2018-01-22
Single-celled eukaryotes (protists) are critical players in global biogeochemical cycling of nutrients and energy in the oceans. While their roles as primary producers and grazers are well appreciated, other aspects of their life histories remain obscure due to challenges in culturing and sequencing their natural diversity. Here, we exploit single-cell genomics and metagenomics data from the circumglobal Tara Oceans expedition to analyze the genome content and apparent oceanic distribution of seven prevalent lineages of uncultured heterotrophic stramenopiles. Based on the available data, each sequenced genome or genotype appears to have a specific oceanic distribution, principally correlated with water temperature and depth. The genome content provides hypotheses for specialization in terms of cell motility, food spectra, and trophic stages, including the potential impact on their lifestyles of horizontal gene transfer from prokaryotes. Our results support the idea that prominent heterotrophic marine protists perform diverse functions in ocean ecology.
USDA-ARS?s Scientific Manuscript database
Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...
Birth and death of protein domains: A simple model of evolution explains power law behavior
Karev, Georgy P; Wolf, Yuri I; Rzhetsky, Andrey Y; Berezovskaya, Faina S; Koonin, Eugene V
2002-01-01
Background Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution. Results A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes. Conclusions We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment. PMID:12379152
Birth and death of protein domains: a simple model of evolution explains power law behavior.
Karev, Georgy P; Wolf, Yuri I; Rzhetsky, Andrey Y; Berezovskaya, Faina S; Koonin, Eugene V
2002-10-14
Power distributions appear in numerous biological, physical and other contexts, which appear to be fundamentally different. In biology, power laws have been claimed to describe the distributions of the connections of enzymes and metabolites in metabolic networks, the number of interactions partners of a given protein, the number of members in paralogous families, and other quantities. In network analysis, power laws imply evolution of the network with preferential attachment, i.e. a greater likelihood of nodes being added to pre-existing hubs. Exploration of different types of evolutionary models in an attempt to determine which of them lead to power law distributions has the potential of revealing non-trivial aspects of genome evolution. A simple model of evolution of the domain composition of proteomes was developed, with the following elementary processes: i) domain birth (duplication with divergence), ii) death (inactivation and/or deletion), and iii) innovation (emergence from non-coding or non-globular sequences or acquisition via horizontal gene transfer). This formalism can be described as a birth, death and innovation model (BDIM). The formulas for equilibrium frequencies of domain families of different size and the total number of families at equilibrium are derived for a general BDIM. All asymptotics of equilibrium frequencies of domain families possible for the given type of models are found and their appearance depending on model parameters is investigated. It is proved that the power law asymptotics appears if, and only if, the model is balanced, i.e. domain duplication and deletion rates are asymptotically equal up to the second order. It is further proved that any power asymptotic with the degree not equal to -1 can appear only if the hypothesis of independence of the duplication/deletion rates on the size of a domain family is rejected. Specific cases of BDIMs, namely simple, linear, polynomial and rational models, are considered in details and the distributions of the equilibrium frequencies of domain families of different size are determined for each case. We apply the BDIM formalism to the analysis of the domain family size distributions in prokaryotic and eukaryotic proteomes and show an excellent fit between these empirical data and a particular form of the model, the second-order balanced linear BDIM. Calculation of the parameters of these models suggests surprisingly high innovation rates, comparable to the total domain birth (duplication) and elimination rates, particularly for prokaryotic genomes. We show that a straightforward model of genome evolution, which does not explicitly include selection, is sufficient to explain the observed distributions of domain family sizes, in which power laws appear as asymptotic. However, for the model to be compatible with the data, there has to be a precise balance between domain birth, death and innovation rates, and this is likely to be maintained by selection. The developed approach is oriented at a mathematical description of evolution of domain composition of proteomes, but a simple reformulation could be applied to models of other evolving networks with preferential attachment.
Evidence against the selfish operon theory.
Pál, Csaba; Hurst, Laurence D
2004-06-01
According to the selfish operon hypothesis, the clustering of genes and their subsequent organization into operons is beneficial for the constituent genes because it enables the horizontal gene transfer of weakly selected, functionally coupled genes. The majority of these are expected to be non-essential genes. From our analysis of the Escherichia coli genome, we conclude that the selfish operon hypothesis is unlikely to provide a general explanation for clustering nor can it account for the gene composition of operons. Contrary to expectations, essential genes with related functions have an especially strong tendency to cluster, even if they are not in operons. Moreover, essential genes are particularly abundant in operons.
Vanin, A F
2004-01-01
The hypothesis was advanced that dinitrosyl iron complexes generated in animal and human cells and tissues producing nitric oxide can function as endogenous universal regulators of biochemical and physiological processes. This function is realized by the ability of dinitrosyl iron complexes to act as donors of free nitric oxide molecules interacting with the heme groups of proteins, nitrosonium ions, or Fe+(NO+)2 interacting with the thiol groups of proteins. The effect of dinitrosyl iron complexes on the activity of some enzymes and the expression of the genome at the translation and transcription levels was considered.
Bilateral Trade Flows and Income Distribution Similarity.
Martínez-Zarzoso, Inmaculada; Vollmer, Sebastian
2016-01-01
Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980-2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories.
Bilateral Trade Flows and Income Distribution Similarity
2016-01-01
Current models of bilateral trade neglect the effects of income distribution. This paper addresses the issue by accounting for non-homothetic consumer preferences and hence investigating the role of income distribution in the context of the gravity model of trade. A theoretically justified gravity model is estimated for disaggregated trade data (Dollar volume is used as dependent variable) using a sample of 104 exporters and 108 importers for 1980–2003 to achieve two main goals. We define and calculate new measures of income distribution similarity and empirically confirm that greater similarity of income distribution between countries implies more trade. Using distribution-based measures as a proxy for demand similarities in gravity models, we find consistent and robust support for the hypothesis that countries with more similar income-distributions trade more with each other. The hypothesis is also confirmed at disaggregated level for differentiated product categories. PMID:27137462
Is chloroplastic class IIA aldolase a marine enzyme?
Miyasaka, Hitoshi; Ogata, Takeru; Tanaka, Satoshi; Ohama, Takeshi; Kano, Sanae; Kazuhiro, Fujiwara; Hayashi, Shuhei; Yamamoto, Shinjiro; Takahashi, Hiro; Matsuura, Hideyuki; Hirata, Kazumasa
2016-11-01
Expressed sequence tag analyses revealed that two marine Chlorophyceae green algae, Chlamydomonas sp. W80 and Chlamydomonas sp. HS5, contain genes coding for chloroplastic class IIA aldolase (fructose-1, 6-bisphosphate aldolase: FBA). These genes show robust monophyly with those of the marine Prasinophyceae algae genera Micromonas, Ostreococcus and Bathycoccus, indicating that the acquisition of this gene through horizontal gene transfer by an ancestor of the green algal lineage occurred prior to the divergence of the core chlorophytes (Chlorophyceae and Trebouxiophyceae) and the prasinophytes. The absence of this gene in some freshwater chlorophytes, such as Chlamydomonas reinhardtii, Volvox carteri, Chlorella vulgaris, Chlorella variabilis and Coccomyxa subellipsoidea, can therefore be explained by the loss of this gene somewhere in the evolutionary process. Our survey on the distribution of this gene in genomic and transcriptome databases suggests that this gene occurs almost exclusively in marine algae, with a few exceptions, and as such, we propose that chloroplastic class IIA FBA is a marine environment-adapted enzyme. This hypothesis was also experimentally tested using Chlamydomonas W80, for which we found that the transcript levels of this gene to be significantly lower under low-salt (that is, simulated terrestrial) conditions. Expression analyses of transcriptome data for two algae, Prymnesium parvum and Emiliania huxleyi, taken from the Sequence Read Archive database also indicated that the expression of this gene under terrestrial conditions (low NaCl and low sulfate) is significantly downregulated. Thus, these experimental and transcriptome data provide support for our hypothesis.
Juliano, Celina E; Voronina, Ekaterina; Stack, Christie; Aldrich, Maryanna; Cameron, Andrew R; Wessel, Gary M
2006-12-01
Two distinct modes of germ line determination are used throughout the animal kingdom: conditional-an inductive mechanism, and autonomous-an inheritance of maternal factors in early development. This study identifies homologs of germ line determinants in the sea urchin Strongylocentrotus purpuratus to examine its mechanism of germ line determination. A list of conserved germ-line associated genes from diverse organisms was assembled to search the S. purpuratus genome for homologs, and the expression patterns of these genes were examined during embryogenesis by whole mount in situ RNA hybridization and QPCR. Of the 14 genes tested, all transcripts accumulate uniformly during oogenesis and Sp-pumilio, Sp-tudor, Sp-MSY, and Sp-CPEB1 transcripts are also uniformly distributed during embryonic development. Sp-nanos2, Sp-seawi, and Sp-ovo transcripts, however, are enriched in the vegetal plate of the mesenchyme blastula stage and Sp-vasa, Sp-nanos2, Sp-seawi, and Sp-SoxE transcripts are localized in small micromere descendents at the tip of the archenteron during gastrulation and are then enriched in the left coelomic pouch of larvae. The results of this screen suggest that sea urchins conditionally specify their germ line, and support the hypothesis that this mechanism is the basal mode of germ line determination amongst deuterostomes. Furthermore, accumulation of germ line determinants selectively in small micromere descendents supports the hypothesis that these cells contribute to the germ line.
Effects of spaceflight on the immunoglobulin repertoire of unimmunized C57BL/6 mice.
Ward, Claire; Rettig, Trisha A; Hlavacek, Savannah; Bye, Bailey A; Pecaut, Michael J; Chapes, Stephen K
2018-02-01
Spaceflight has been shown to suppress the adaptive immune response, altering the distribution and function of lymphocyte populations. B lymphocytes express highly specific and highly diversified receptors, known as immunoglobulins (Ig), that directly bind and neutralize pathogens. Ig diversity is achieved through the enzymatic splicing of gene segments within the genomic DNA of each B cell in a host. The collection of Ig specificities within a host, or Ig repertoire, has been increasingly characterized in both basic research and clinical settings using high-throughput sequencing technology (HTS). We utilized HTS to test the hypothesis that spaceflight affects the B-cell repertoire. To test this hypothesis, we characterized the impact of spaceflight on the unimmunized Ig repertoire of C57BL/6 mice that were flown aboard the International Space Station (ISS) during the Rodent Research One validation flight in comparison to ground controls. Individual gene segment usage was similar between ground control and flight animals, however, gene segment combinations and the junctions in which gene segments combine was varied among animals within and between treatment groups. We also found that spontaneous somatic mutations in the IgH and Igκ gene loci were not increased. These data suggest that space flight did not affect the B cell repertoire of mice flown and housed on the ISS over a short period of time. Copyright © 2017 The Committee on Space Research (COSPAR). Published by Elsevier Ltd. All rights reserved.
Is chloroplastic class IIA aldolase a marine enzyme?
Miyasaka, Hitoshi; Ogata, Takeru; Tanaka, Satoshi; Ohama, Takeshi; Kano, Sanae; Kazuhiro, Fujiwara; Hayashi, Shuhei; Yamamoto, Shinjiro; Takahashi, Hiro; Matsuura, Hideyuki; Hirata, Kazumasa
2016-01-01
Expressed sequence tag analyses revealed that two marine Chlorophyceae green algae, Chlamydomonas sp. W80 and Chlamydomonas sp. HS5, contain genes coding for chloroplastic class IIA aldolase (fructose-1, 6-bisphosphate aldolase: FBA). These genes show robust monophyly with those of the marine Prasinophyceae algae genera Micromonas, Ostreococcus and Bathycoccus, indicating that the acquisition of this gene through horizontal gene transfer by an ancestor of the green algal lineage occurred prior to the divergence of the core chlorophytes (Chlorophyceae and Trebouxiophyceae) and the prasinophytes. The absence of this gene in some freshwater chlorophytes, such as Chlamydomonas reinhardtii, Volvox carteri, Chlorella vulgaris, Chlorella variabilis and Coccomyxa subellipsoidea, can therefore be explained by the loss of this gene somewhere in the evolutionary process. Our survey on the distribution of this gene in genomic and transcriptome databases suggests that this gene occurs almost exclusively in marine algae, with a few exceptions, and as such, we propose that chloroplastic class IIA FBA is a marine environment-adapted enzyme. This hypothesis was also experimentally tested using Chlamydomonas W80, for which we found that the transcript levels of this gene to be significantly lower under low-salt (that is, simulated terrestrial) conditions. Expression analyses of transcriptome data for two algae, Prymnesium parvum and Emiliania huxleyi, taken from the Sequence Read Archive database also indicated that the expression of this gene under terrestrial conditions (low NaCl and low sulfate) is significantly downregulated. Thus, these experimental and transcriptome data provide support for our hypothesis. PMID:27058504
Characterization of Transposable Elements in Laccaria bicolor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Labbe, Jessy L; Murat, Claude; Morin, Emmanuelle
2012-01-01
Background: The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TE-specific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. Methodology/Principal Findings: TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copies elements distributed within 172 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs are ancient except some terminal inverted repeats (TIRS),more » long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TEs expansion in L. bicolor; the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 500,000 years ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. Conclusions: This analysis represents an initial characterization of TEs in the L. bicolor genome, contributes to genome assembly and to a greater understanding of the role TEs played in genome organization and evolution, and provides a valuable resource for the ongoing Laccaria Pan-Genome project supported by the U.S.-DOE Joint Genome Institute.« less
Prasinoviruses reveal a complex evolutionary history and a patchy environmental distribution
NASA Astrophysics Data System (ADS)
Finke, J. F.; Suttle, C.
2016-02-01
Prasinophytes constitute a group of eukaryotic phytoplankton that has a global distribution and is a major component of coastal and oceanic communities. Members of this group are infected by large double-stranded DNA viruses that can be significant agents of mortality, and which show evidence of substantial horizontal transfer of genes from their hosts and other organisms. However, information on the genetic diversity of these viruses and their environmental distribution is limited. This study examines the genetic repertoire, phylogeny and environmental distribution of large double-stranded DNA viruses infecting Micromonas pusilla and other prasinophytes. The genomes of viruses infecting M. pusilla were sequenced and compared to those of viruses infecting other prasinophytes, revealing a relatively small set of core genes and a larger flexible pan genome. Comparing genomes among prasinoviruses highlights their variable genetic content and complex evolutionary history. While some of the pan genome is clearly host derived, many open reading frames are most similar to those found in other eukaryotes and bacteria. Gene content of the viruses is is congruent with phylogenetic analysis of viral DNA polymerase sequences and indicates that two clades of M. pusilla viruses are less related to each other than to other prasinoviruses. Moreover, the environmental distribution of prasinovirus DNA polymerase sequences indicates a complex pattern of virus-host interactions in nature. Ultimately, these patterns are influenced by the genetic repertoire encoded by prasinoviruses, and the distribution of the hosts they infect.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N
2016-11-01
Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.
2016-01-01
Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
USDA-ARS?s Scientific Manuscript database
Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...
Transition of genomic evaluation from a research project to a production system
USDA-ARS?s Scientific Manuscript database
Genomic data began to be included in official USDA genetic evaluations of dairy cattle in January 2009. Numerous changes to the evaluation system were made to enable efficient management of genomic information, to incorporate it in official evaluations, and to distribute evaluations. Artificial-inse...
USDA-ARS?s Scientific Manuscript database
Principal component analysis (PCA) with 36,621 polymorphic genome-anchored single nucleotide polymorphisms (SNPs) identified collectively for Capsicum annuum and Capsicum baccatum was used to show the distribution of these 2 important incompatible cultivated pepper species. Estimated mean nucleotide...
Bennett, Matthew S.; Triemer, Richard E.; Preisfeld, Angelika
2017-01-01
Background Over the last few years multiple studies have been published showing a great diversity in size of chloroplast genomes (cpGenomes), and in the arrangement of gene clusters, in the Euglenales. However, while these genomes provided important insights into the evolution of cpGenomes across the Euglenales and within their genera, only two genomes were analyzed in regard to genomic variability between and within Euglenales and Eutreptiales. To better understand the dynamics of chloroplast genome evolution in early evolving Eutreptiales, this study focused on the cpGenome of Eutreptiella pomquetensis, and the spread and peculiarities of introns. Methods The Etl. pomquetensis cpGenome was sequenced, annotated and afterwards examined in structure, size, gene order and intron content. These features were compared with other euglenoid cpGenomes as well as those of prasinophyte green algae, including Pyramimonas parkeae. Results and Discussion With about 130,561 bp the chloroplast genome of Etl. pomquetensis, a basal taxon in the phototrophic euglenoids, was considerably larger than the two other Eutreptiales cpGenomes sequenced so far. Although the detected quadripartite structure resembled most green algae and plant chloroplast genomes, the gene content of the single copy regions in Etl. pomquetensis was completely different from those observed in green algae and plants. The gene composition of Etl. pomquetensis was extensively changed and turned out to be almost identical to other Eutreptiales and Euglenales, and not to P. parkeae. Furthermore, the cpGenome of Etl. pomquetensis was unexpectedly permeated by a high number of introns, which led to a substantially larger genome. The 51 identified introns of Etl. pomquetensis showed two major unique features: (i) more than half of the introns displayed a high level of pairwise identities; (ii) no group III introns could be identified in the protein coding genes. These findings support the hypothesis that group III introns are degenerated group II introns and evolved later. PMID:28852596
Phylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa
van Belkum, Alex; Soriaga, Leah B.; LaFave, Matthew C.; Akella, Srividya; Veyrieras, Jean-Baptiste; Barbu, E. Magda; Shortridge, Dee; Blanc, Bernadette; Hannum, Gregory; Zambardi, Gilles; Miller, Kristofer; Enright, Mark C.; Mugnier, Nathalie; Brami, Daniel; Schicklin, Stéphane; Felderman, Martina; Schwartz, Ariel S.; Richardson, Toby H.; Peterson, Todd C.; Hubby, Bolyn
2015-01-01
ABSTRACT Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. PMID:26604259
Alu distribution and mutation types of cancer genes
2011-01-01
Background Alu elements are the most abundant retrotransposable elements comprising ~11% of the human genome. Many studies have highlighted the role that Alu elements have in genetic instability and how their contribution to the assortment of mutagenic events can lead to cancer. As of yet, little has been done to quantitatively assess the association between Alu distribution and genes that are causally implicated in oncogenesis. Results We have investigated the effect of various Alu densities on the mutation type based classifications of cancer genes. In order to establish the direct relationship between Alus and the cancer genes of interest, genome wide Alu-related densities were measured using genes rather than the sliding windows of fixed length as the units. Several novel genomic features, such as the density of the adjacent Alu pairs and the number of Alu-Exon-Alu triplets, were developed in order to extend the investigation via the multivariate statistical analysis toward more advanced biological insight. In addition, we characterized the genome-wide intron Alu distribution with a mixture model that distinguished genes containing Alu elements from those with no Alus, and evaluated the gene-level effect of the 5'-TTAAAA motif associated with Alu insertion sites using a two-step regression analysis method. Conclusions The study resulted in several novel findings worthy of further investigation. They include: (1) Recessive cancer genes (tumor suppressor genes) are enriched with Alu elements (p < 0.01) compared to dominant cancer genes (oncogenes) and the entire set of genes in the human genome; (2) Alu-related genomic features can be used to cluster cancer genes into biological meaningful groups; (3) The retention of exon Alus has been restricted in the human genome development, and an upper limit to the chromosome-level exon Alu densities is suggested by the distribution profile; (4) For the genes with at least one intron Alu repeat in individual chromosomes, the intron Alu densities can be well fitted by a Gamma distribution; (5) The effect of the 5'-TTAAAA motif on Alu densities varies across different chromosomes. PMID:21429208
2010-01-01
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...
2014-06-15
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Causes of genome instability: the effect of low dose chemical exposures in modern society.
Langie, Sabine A S; Koppen, Gudrun; Desaulniers, Daniel; Al-Mulla, Fahd; Al-Temaimi, Rabeah; Amedei, Amedeo; Azqueta, Amaya; Bisson, William H; Brown, Dustin G; Brunborg, Gunnar; Charles, Amelia K; Chen, Tao; Colacci, Annamaria; Darroudi, Firouz; Forte, Stefano; Gonzalez, Laetitia; Hamid, Roslida A; Knudsen, Lisbeth E; Leyns, Luc; Lopez de Cerain Salsamendi, Adela; Memeo, Lorenzo; Mondello, Chiara; Mothersill, Carmel; Olsen, Ann-Karin; Pavanello, Sofia; Raju, Jayadev; Rojas, Emilio; Roy, Rabindra; Ryan, Elizabeth P; Ostrosky-Wegman, Patricia; Salem, Hosni K; Scovassi, A Ivana; Singh, Neetu; Vaccari, Monica; Van Schooten, Frederik J; Valverde, Mahara; Woodrick, Jordan; Zhang, Luoping; van Larebeke, Nik; Kirsch-Volders, Micheline; Collins, Andrew R
2015-06-01
Genome instability is a prerequisite for the development of cancer. It occurs when genome maintenance systems fail to safeguard the genome's integrity, whether as a consequence of inherited defects or induced via exposure to environmental agents (chemicals, biological agents and radiation). Thus, genome instability can be defined as an enhanced tendency for the genome to acquire mutations; ranging from changes to the nucleotide sequence to chromosomal gain, rearrangements or loss. This review raises the hypothesis that in addition to known human carcinogens, exposure to low dose of other chemicals present in our modern society could contribute to carcinogenesis by indirectly affecting genome stability. The selected chemicals with their mechanisms of action proposed to indirectly contribute to genome instability are: heavy metals (DNA repair, epigenetic modification, DNA damage signaling, telomere length), acrylamide (DNA repair, chromosome segregation), bisphenol A (epigenetic modification, DNA damage signaling, mitochondrial function, chromosome segregation), benomyl (chromosome segregation), quinones (epigenetic modification) and nano-sized particles (epigenetic pathways, mitochondrial function, chromosome segregation, telomere length). The purpose of this review is to describe the crucial aspects of genome instability, to outline the ways in which environmental chemicals can affect this cancer hallmark and to identify candidate chemicals for further study. The overall aim is to make scientists aware of the increasing need to unravel the underlying mechanisms via which chemicals at low doses can induce genome instability and thus promote carcinogenesis. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Distribution and diversity of cytotypes in Dianthus broteri as evidenced by genome size variations
Balao, Francisco; Casimiro-Soriguer, Ramón; Talavera, María; Herrera, Javier; Talavera, Salvador
2009-01-01
Background and Aims Studying the spatial distribution of cytotypes and genome size in plants can provide valuable information about the evolution of polyploid complexes. Here, the spatial distribution of cytological races and the amount of DNA in Dianthus broteri, an Iberian carnation with several ploidy levels, is investigated. Methods Sample chromosome counts and flow cytometry (using propidium iodide) were used to determine overall genome size (2C value) and ploidy level in 244 individuals of 25 populations. Both fresh and dried samples were investigated. Differences in 2C and 1Cx values among ploidy levels within biogeographical provinces were tested using ANOVA. Geographical correlations of genome size were also explored. Key Results Extensive variation in chromosomes numbers (2n = 2x = 30, 2n = 4x = 60, 2n = 6x = 90 and 2n = 12x =180) was detected, and the dodecaploid cytotype is reported for the first time in this genus. As regards cytotype distribution, six populations were diploid, 11 were tetraploid, three were hexaploid and five were dodecaploid. Except for one diploid population containing some triploid plants (2n = 45), the remaining populations showed a single cytotype. Diploids appeared in two disjunct areas (south-east and south-west), and so did tetraploids (although with a considerably wider geographic range). Dehydrated leaf samples provided reliable measurements of DNA content. Genome size varied significantly among some cytotypes, and also extensively within diploid (up to 1·17-fold) and tetraploid (1·22-fold) populations. Nevertheless, variations were not straightforwardly congruent with ecology and geographical distribution. Conclusions Dianthus broteri shows the highest diversity of cytotypes known to date in the genus Dianthus. Moreover, some cytotypes present remarkable internal genome size variation. The evolution of the complex is discussed in terms of autopolyploidy, with primary and secondary contact zones. PMID:19633312
Subramoni, Sujatha; Florez Salcedo, Diana Vanessa; Suarez-Moreno, Zulma R
2015-01-01
LuxR solo transcriptional regulators contain both an autoinducer binding domain (ABD; N-terminal) and a DNA binding Helix-Turn-Helix domain (HTH; C-terminal), but are not associated with a cognate N-acyl homoserine lactone (AHL) synthase coding gene in the same genome. Although a few LuxR solos have been characterized, their distributions as well as their role in bacterial signal perception and other processes are poorly understood. In this study we have carried out a systematic survey of distribution of all ABD containing LuxR transcriptional regulators (QS domain LuxRs) available in the InterPro database (IPR005143), and identified those lacking a cognate AHL synthase. These LuxR solos were then analyzed regarding their taxonomical distribution, predicted functions of neighboring genes and the presence of complete AHL-QS systems in the genomes that carry them. Our analyses reveal the presence of one or multiple predicted LuxR solos in many proteobacterial genomes carrying QS domain LuxRs, some of them harboring genes for one or more AHL-QS circuits. The presence of LuxR solos in bacteria occupying diverse environments suggests potential ecological functions for these proteins beyond AHL and interkingdom signaling. Based on gene context and the conservation levels of invariant amino acids of ABD, we have classified LuxR solos into functionally meaningful groups or putative orthologs. Surprisingly, putative LuxR solos were also found in a few non-proteobacterial genomes which are not known to carry AHL-QS systems. Multiple predicted LuxR solos in the same genome appeared to have different levels of conservation of invariant amino acid residues of ABD questioning their binding to AHLs. In summary, this study provides a detailed overview of distribution of LuxR solos and their probable roles in bacteria with genome sequence information.
Subramoni, Sujatha; Florez Salcedo, Diana Vanessa; Suarez-Moreno, Zulma R.
2015-01-01
LuxR solo transcriptional regulators contain both an autoinducer binding domain (ABD; N-terminal) and a DNA binding Helix-Turn-Helix domain (HTH; C-terminal), but are not associated with a cognate N-acyl homoserine lactone (AHL) synthase coding gene in the same genome. Although a few LuxR solos have been characterized, their distributions as well as their role in bacterial signal perception and other processes are poorly understood. In this study we have carried out a systematic survey of distribution of all ABD containing LuxR transcriptional regulators (QS domain LuxRs) available in the InterPro database (IPR005143), and identified those lacking a cognate AHL synthase. These LuxR solos were then analyzed regarding their taxonomical distribution, predicted functions of neighboring genes and the presence of complete AHL-QS systems in the genomes that carry them. Our analyses reveal the presence of one or multiple predicted LuxR solos in many proteobacterial genomes carrying QS domain LuxRs, some of them harboring genes for one or more AHL-QS circuits. The presence of LuxR solos in bacteria occupying diverse environments suggests potential ecological functions for these proteins beyond AHL and interkingdom signaling. Based on gene context and the conservation levels of invariant amino acids of ABD, we have classified LuxR solos into functionally meaningful groups or putative orthologs. Surprisingly, putative LuxR solos were also found in a few non-proteobacterial genomes which are not known to carry AHL-QS systems. Multiple predicted LuxR solos in the same genome appeared to have different levels of conservation of invariant amino acid residues of ABD questioning their binding to AHLs. In summary, this study provides a detailed overview of distribution of LuxR solos and their probable roles in bacteria with genome sequence information. PMID:25759807
More than genes: the advanced fetal programming hypothesis.
Hocher, Berthold
2014-10-01
Many lines of data, initial epidemiologic studies as well as subsequent extensive experimental studies, indicate that early-life events play a powerful role in influencing later suceptibility to certain chronic diseases. Such events might be over- or undernutrition, exposure to environmental toxins, but also changes in hormones, in particular stress hormones. Typically, those events are triggered by the environmental challenges of the mother. However, recent studies have shown that paternal environmental or nutritional factors affect the phenotype of the offspring as well. The maternal and paternal environmental factors act on the phenotype of the offspring via epigenetic modification of its genome. The advanced fetal programming hypothesis proposes an additional non-environmentally driven mechanism: maternal and also paternal genes may influence the maturating sperm, the oocyte, and later the embryo/fetus, leading to their epigenetic alteration. Thus, the observed phenotype of the offspring may be altered by maternal/paternal genes independent of the fetal genome. Meanwhile, several independent association studies in humans dealing with metabolic and neurological traits also suggest that maternal genes might affect the offspring phenotype independent of the transmission of that particular gene to the offspring. Considering the implications of this hypothesis, some conclusions drawn from transgenic or knockout animal models and based on the causality between a genetic alteration and a phenotype, need to be challenged. Possible implications for the development, diagnostic and therapy of human genetic diseases have to be investigated. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
The landscape of inherited and de novo copy number variants in a plasmodium falciparum genetic cross
2011-01-01
Background Copy number is a major source of genome variation with important evolutionary implications. Consequently, it is essential to determine copy number variant (CNV) behavior, distributions and frequencies across genomes to understand their origins in both evolutionary and generational time frames. We use comparative genomic hybridization (CGH) microarray and the resolution provided by a segregating population of cloned progeny lines of the malaria parasite, Plasmodium falciparum, to identify and analyze the inheritance of 170 genome-wide CNVs. Results We describe CNVs in progeny clones derived from both Mendelian (i.e. inherited) and non-Mendelian mechanisms. Forty-five CNVs were present in the parent lines and segregated in the progeny population. Furthermore, extensive variation that did not conform to strict Mendelian inheritance patterns was observed. 124 CNVs were called in one or more progeny but in neither parent: we observed CNVs in more than one progeny clone that were not identified in either parent, located more frequently in the telomeric-subtelomeric regions of chromosomes and singleton de novo CNVs distributed evenly throughout the genome. Linkage analysis of CNVs revealed dynamic copy number fluctuations and suggested mechanisms that could have generated them. Five of 12 previously identified expression quantitative trait loci (eQTL) hotspots coincide with CNVs, demonstrating the potential for broad influence of CNV on the transcriptional program and phenotypic variation. Conclusions CNVs are a significant source of segregating and de novo genome variation involving hundreds of genes. Examination of progeny genome segments provides a framework to assess the extent and possible origins of CNVs. This segregating genetic system reveals the breadth, distribution and dynamics of CNVs in a surprisingly plastic parasite genome, providing a new perspective on the sources of diversity in parasite populations. PMID:21936954
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.
Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao
2015-06-15
ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Modeling the nitrogen cycle one gene at a time
NASA Astrophysics Data System (ADS)
Coles, V.; Stukel, M. R.; Hood, R. R.; Moran, M. A.; Paul, J. H.; Satinsky, B.; Zielinski, B.; Yager, P. L.
2016-02-01
Marine ecosystem models are lagging the revolution in microbial oceanography. As a result, modeling of the nitrogen cycle has largely failed to leverage new genomic information on nitrogen cycling pathways and the organisms that mediate them. We developed a nitrogen based ecosystem model whose community is determined by randomly assigning functional genes to build each organism's "DNA". Microbes are assigned a size that sets their baseline environmental responses using allometric response curves. These responses are modified by the costs and benefits conferred by each gene in an organism's genome. The microbes are embedded in a general circulation model where environmental conditions shape the emergent population. This model is used to explore whether organisms constructed from randomized combinations of metabolic capability alone can self-organize to create realistic oceanic biogeochemical gradients. Community size spectra and chlorophyll-a concentrations emerge in the model with reasonable fidelity to observations. The model is run repeatedly with randomly-generated microbial communities and each time realistic gradients in community size spectra, chlorophyll-a, and forms of nitrogen develop. This supports the hypothesis that the metabolic potential of a community rather than the realized species composition is the primary factor setting vertical and horizontal environmental gradients. Vertical distributions of nitrogen and transcripts for genes involved in nitrification are broadly consistent with observations. Modeled gene and transcript abundance for nitrogen cycling and processing of land-derived organic material match observations along the extreme gradients in the Amazon River plume, and they help to explain the factors controlling observed variability.
Mark J. Statham; James Murdoch; Jan Janecka; Keith B. Aubry; Ceiridwen J. Edwards; Carl D. Soulsbury; Oliver Berry; Zhenghuan Wang; David Harrison; Malcolm Pearch; Louise Tomsett; Judith Chupasko; Benjamin N. Sacks
2014-01-01
Widely distributed taxa provide an opportunity to compare biogeographic responses to climatic fluctuations on multiple continents and to investigate speciation. We conducted the most geographically and genomically comprehensive study to date of the red fox (Vulpes vulpes), the worldâs most widely distributed wild terrestrial carnivore. Analyses of 697 bp of...
Comparative genomic analysis of three Leishmania species that cause diverse human disease
Peacock, Christopher S; Seeger, Kathy; Harris, David; Murphy, Lee; Ruiz, Jeronimo C; Quail, Michael A; Peters, Nick; Adlem, Ellen; Tivey, Adrian; Aslett, Martin; Kerhornou, Arnaud; Ivens, Alasdair; Fraser, Audrey; Rajandream, Marie-Adele; Carver, Tim; Norbertczak, Halina; Chillingworth, Tracey; Hance, Zahra; Jagels, Kay; Moule, Sharon; Ormond, Doug; Rutter, Simon; Squares, Rob; Whitehead, Sally; Rabbinowitsch, Ester; Arrowsmith, Claire; White, Brian; Thurston, Scott; Bringaud, Frédéric; Baldauf, Sandra L; Faulconbridge, Adam; Jeffares, Daniel; Depledge, Daniel P; Oyola, Samuel O; Hilley, James D; Brito, Loislene O; Tosi, Luiz R O; Barrell, Barclay; Cruz, Angela K; Mottram, Jeremy C; Smith, Deborah F; Berriman, Matthew
2008-01-01
Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only ∼200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader–associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage. PMID:17572675
Genomics Community Resources | Informatics Technology for Cancer Research (ITCR)
To facilitate genomic research and the dissemination of its products, National Human Genome Research Institute (NHGRI) supports genomic resources that are crucial for basic research, disease studies, model organism studies, and other biomedical research. Awards under this FOA will support the development and distribution of genomic resources that will be valuable for the broad research community, using cost-effective approaches. Such resources include (but are not limited to) databases and informatics resources (such as human and model organism databases, ontologies, and analysi
Árnason, Úlfur
2017-09-05
The substantiality of the Out of Africa hypothesis was addressed in the light of recent genomic analysis of extant humans (Homo sapiens sapiens, Hss) and progress in Neanderthal palaeontology. The examination lent no support to the commonly assumed Out of Africa scenario but favoured instead a Eurasian divergence between Neanderthals and Hss (the Askur/Embla hypothesis) and an Out of Asia/Eurasia hypothesis according to which all other parts of the world were colonized by Hss migrations from Asia. The examination suggested furthermore that the ancestors of extant KhoeSan and Mbuti composed the first Hss dispersal(s) into Africa and that the ancestors of Yoruba made up a later wave into the same continent. The conclusions constitute a change in paradigm for the study of human evolution. Copyright © 2017. Published by Elsevier B.V.
Population genomics of the endangered giant Galápagos tortoise
2013-01-01
Background The giant Galápagos tortoise, Chelonoidis nigra, is a large-sized terrestrial chelonian of high patrimonial interest. The species recently colonized a small continental archipelago, the Galápagos Islands, where it has been facing novel environmental conditions and limited resource availability. To explore the genomic consequences of this ecological shift, we analyze the transcriptomic variability of five individuals of C. nigra, and compare it to similar data obtained from several continental species of turtles. Results Having clarified the timing of divergence in the Chelonoidis genus, we report in C. nigra a very low level of genetic polymorphism, signatures of a weakened efficacy of purifying selection, and an elevated mutation load in coding and regulatory sequences. These results are consistent with the hypothesis of an extremely low long-term effective population size in this insular species. Functional evolutionary analyses reveal a reduced diversity of immunity genes in C. nigra, in line with the hypothesis of attenuated pathogen diversity in islands, and an increased selective pressure on genes involved in response to stress, potentially related to the climatic instability of its environment and its elongated lifespan. Finally, we detect no population structure or homozygosity excess in our five-individual sample. Conclusions These results enlighten the molecular evolution of an endangered taxon in a stressful environment and point to island endemic species as a promising model for the study of the deleterious effects on genome evolution of a reduced long-term population size. PMID:24342523
Population genomics of the endangered giant Galápagos tortoise.
Loire, Etienne; Chiari, Ylenia; Bernard, Aurélien; Cahais, Vincent; Romiguier, Jonathan; Nabholz, Benoît; Lourenço, Joao Miguel; Galtier, Nicolas
2013-12-16
The giant Galápagos tortoise, Chelonoidis nigra, is a large-sized terrestrial chelonian of high patrimonial interest. The species recently colonized a small continental archipelago, the Galápagos Islands, where it has been facing novel environmental conditions and limited resource availability. To explore the genomic consequences of this ecological shift, we analyze the transcriptomic variability of five individuals of C. nigra, and compare it to similar data obtained from several continental species of turtles. Having clarified the timing of divergence in the Chelonoidis genus, we report in C. nigra a very low level of genetic polymorphism, signatures of a weakened efficacy of purifying selection, and an elevated mutation load in coding and regulatory sequences. These results are consistent with the hypothesis of an extremely low long-term effective population size in this insular species. Functional evolutionary analyses reveal a reduced diversity of immunity genes in C. nigra, in line with the hypothesis of attenuated pathogen diversity in islands, and an increased selective pressure on genes involved in response to stress, potentially related to the climatic instability of its environment and its elongated lifespan. Finally, we detect no population structure or homozygosity excess in our five-individual sample. These results enlighten the molecular evolution of an endangered taxon in a stressful environment and point to island endemic species as a promising model for the study of the deleterious effects on genome evolution of a reduced long-term population size.
Scholz, Paul; Mohrhardt, Julia; Gisselmann, Günter; Hatt, Hanns
2016-01-01
The influence of the sex steroid hormones progesterone and estradiol on physiology and behavior during menstrual cycles and pregnancy is well known. Several studies indicate that olfactory performance changes with cyclically fluctuating steroid hormone levels in females. Knowledge of the exact mechanisms behind how female sex steroids modulate olfactory signaling is limited. A number of different known genomic and non-genomic actions that are mediated by progesterone and estradiol via interactions with different receptors may be responsible for this modulation. Next generation sequencing-based RNA-Seq transcriptome data from the murine olfactory epithelium (OE) and olfactory receptor neurons (ORNs) revealed the expression of several membrane progestin receptors and the estradiol receptor Gpr30. These receptors are known to mediate rapid non-genomic effects through interactions with G proteins. RT-PCR and immunohistochemical staining results provide evidence for progestin and estradiol receptors in the ORNs. These data support the hypothesis that steroid hormones are capable of modulating the odorant-evoked activity of ORNs. Here, we validated this hypothesis through the investigation of steroid hormone effects by submerged electro-olfactogram and whole cell patch-clamp recordings of ORNs. For the first time, we demonstrate that the sex steroid hormones progesterone and estradiol decrease odorant-evoked signals in the OE and ORNs of mice at low nanomolar concentrations. Thus, both of these sex steroids can rapidly modulate the odor responsiveness of ORNs through membrane progestin receptors and the estradiol receptor Gpr30. PMID:27494699
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A; Awosika, Joy; Briska, Adam; Ptashkin, Ryan N; Wagner, Trevor; Rajanna, Chythanya; Tsang, Hsinyi; Johnson, Shannon L; Mokashi, Vishwesh P; Chain, Patrick S G; Sozhamannan, Shanmuga
2015-01-01
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, ordered restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.
Chapman, Carol; Henry, Matthew; Bishop-Lilly, Kimberly A.; ...
2015-03-20
Historically, cholera outbreaks have been linked to V. cholerae O1 serogroup strains or its derivatives of the O37 and O139 serogroups. A genomic study on the 2010 Haiti cholera outbreak strains highlighted the putative role of non O1/non-O139 V. cholerae in causing cholera and the lack of genomic sequences of such strains from around the world. Here we address these gaps by scanning a global collection of V. cholerae strains as a first step towards understanding the population genetic diversity and epidemic potential of non O1/non-O139 strains. Whole Genome Mapping (Optical Mapping) based bar coding produces a high resolution, orderedmore » restriction map, depicting a complete view of the unique chromosomal architecture of an organism. To assess the genomic diversity of non-O1/non-O139 V. cholerae, we applied a Whole Genome Mapping strategy on a well-defined and geographically and temporally diverse strain collection, the Sakazaki serogroup type strains. Whole Genome Map data on 91 of the 206 serogroup type strains support the hypothesis that V. cholerae has an unprecedented genetic and genomic structural diversity. Interestingly, we discovered chromosomal fusions in two unusual strains that possess a single chromosome instead of the two chromosomes usually found in V. cholerae. We also found pervasive chromosomal rearrangements such as duplications and indels in many strains. The majority of Vibrio genome sequences currently in public databases are unfinished draft sequences. The Whole Genome Mapping approach presented here enables rapid screening of large strain collections to capture genomic complexities that would not have been otherwise revealed by unfinished draft genome sequencing and thus aids in assembling and finishing draft sequences of complex genomes. Furthermore, Whole Genome Mapping allows for prediction of novel V. cholerae non-O1/non-O139 strains that may have the potential to cause future cholera outbreaks.« less
The first complete chloroplast genome sequence of a lycophyte,Huperzia lucidula (Lycopodiaceae)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Paul G.; Karol, Kenneth G.; Mandoli, Dina F.
2005-02-01
We used a unique combination of techniques to sequence the first complete chloroplast genome of a lycophyte, Huperzia lucidula. This plant belongs to a significant clade hypothesized to represent the sister group to all other vascular plants. We used fluorescence-activated cell sorting (FACS) to isolate the organelles, rolling circle amplification (RCA) to amplify the genome, and shotgun sequencing to 8x depth coverage to obtain the complete chloroplast genome sequence. The genome is 154,373bp, containing inverted repeats of 15,314 bp each, a large single-copy region of 104,088 bp, and a small single-copy region of 19,671 bp. Gene order is more similarmore » to those of mosses, liverworts, and hornworts than to gene order for other vascular plants. For example, the Huperziachloroplast genome possesses the bryophyte gene order for a previously characterized 30 kb inversion, thus supporting the hypothesis that lycophytes are sister to all other extant vascular plants. The lycophytechloroplast genome data also enable a better reconstruction of the basaltracheophyte genome, which is useful for inferring relationships among bryophyte lineages. Several unique characters are observed in Huperzia, such as movement of the gene ndhF from the small single copy region into the inverted repeat. We present several analyses of evolutionary relationships among land plants by using nucleotide data, amino acid sequences, and by comparing gene arrangements from chloroplast genomes. The results, while still tentative pending the large number of chloroplast genomes from other key lineages that are soon to be sequenced, are intriguing in themselves, and contribute to a growing comparative database of genomic and morphological data across the green plants.« less
Yin, Hao; Du, Jianchang; Wu, Jun; Wei, Shuwei; Xu, Yingxiu; Tao, Shutian; Wu, Juyou; Zhang, Shaoling
2015-01-01
Recent sequencing of the Oriental pear (P. bretschneideri Rehd.) genome and the availability of the draft genome sequence of Occidental pear (P. communis L.), has provided a good opportunity to characterize the abundance, distribution, timing, and evolution of long terminal repeat retrotransposons (LTR-RTs) in these two important fruit plants. Here, a total of 7247 LTR-RTs, which can be classified into 148 families, have been identified in the assembled Oriental pear genome. Unlike in other plant genomes, approximately 90% of these elements were found to be randomly distributed along the pear chromosomes. Further analysis revealed that the amplification timeframe of elements varies dramatically in different families, super-families and lineages, and the Copia-like elements have highest activity in the recent 0.5 million years (Mys). The data also showed that two genomes evolved with similar evolutionary rates after their split from the common ancestor ~0.77–1.66 million years ago (Mya). Overall, the data provided here will be a valuable resource for further investigating the impact of transposable elements on gene structure, expression, and epigenetic modification in the pear genomes. PMID:26631625
Luchetti, Andrea; Mantovani, Barbara
2009-12-01
Studies on transposable elements in termites are of interest because their genome is in a permanent condition of inbreeding. In this situation, an increase in transposon copy number should be mainly due to a Muller's ratchet effect, with selection against deleterious insertions playing a major role. Short INterspersed Elements (SINEs) are non-autonomous retrotransposons, known to be stable components of eukaryotic genomes. The SINE Talua, first isolated from Reticulitermes lucifugus (Rhinotermitidae), is the only mobile element described so far in termites. In the present survey, Talua has been found widespread in the Isoptera order. In comparison with other non-termite SINEs, Talua diversity and distribution in the Reticulitermes genome demonstrate that Talua is an ancient component of termite genome and that it is significantly associated with other repeats. In particular, the element is found to be involved with microsatellite motifs either as their generator or because inserted in their nearby. Further, two new SINEs and a putative retrotranscriptase-like sequence were found linked to Talua. Talua's genomic distribution is discussed in the light of the available models on transposable element dynamics within inbred genomes, also taking into account SINE role as drivers of genetic diversity in counteracting inbreeding depression.
Liu, Yun-Hua; Zhang, Meiping; Wu, Chengcang; Huang, James J; Zhang, Hong-Bin
2014-01-01
Knowledge of how a genome is structured and organized from its constituent elements is crucial to understanding its biology and evolution. Here, we report the genome structuring and organization pattern as revealed by systems analysis of the sequences of three model species, Arabidopsis, rice and yeast, at the whole-genome and chromosome levels. We found that all fundamental function elements (FFE) constituting the genomes, including genes (GEN), DNA transposable elements (DTE), retrotransposable elements (RTE), simple sequence repeats (SSR), and (or) low complexity repeats (LCR), are structured in a nonrandom and correlative manner, thus leading to a hypothesis that the DNA of the species is structured as a linear "jigsaw puzzle". Furthermore, we showed that different FFE differ in their importance in the formation and evolution of the DNA jigsaw puzzle structure between species. DTE and RTE play more important roles than GEN, LCR, and SSR in Arabidopsis, whereas GEN and RTE play more important roles than LCR, SSR, and DTE in rice. The genes having multiple recognized functions play more important roles than those having single functions. These results provide useful knowledge necessary for better understanding genome biology and evolution of the species and for effective molecular breeding of rice.
Dubois, Emeline; Bischerour, Julien; Marmignon, Antoine; Mathy, Nathalie; Régnier, Vinciane; Bétermier, Mireille
2012-01-01
Sequences related to transposons constitute a large fraction of extant genomes, but insertions within coding sequences have generally not been tolerated during evolution. Thanks to their unique nuclear dimorphism and to their original mechanism of programmed DNA elimination from their somatic nucleus (macronucleus), ciliates are emerging model organisms for the study of the impact of transposable elements on genomes. The germline genome of the ciliate Paramecium, located in its micronucleus, contains thousands of short intervening sequences, the IESs, which interrupt 47% of genes. Recent data provided support to the hypothesis that an evolutionary link exists between Paramecium IESs and Tc1/mariner transposons. During development of the macronucleus, IESs are excised precisely thanks to the coordinated action of PiggyMac, a domesticated piggyBac transposase, and of the NHEJ double-strand break repair pathway. A PiggyMac homolog is also required for developmentally programmed DNA elimination in another ciliate, Tetrahymena. Here, we present an overview of the life cycle of these unicellular eukaryotes and of the developmentally programmed genome rearrangements that take place at each sexual cycle. We discuss how ancient domestication of a piggyBac transposase might have allowed Tc1/mariner elements to spread throughout the germline genome of Paramecium, without strong counterselection against insertion within genes. PMID:22888464
Environmental Adaptation Contributes to Gene Polymorphism across the Arabidopsis thaliana Genome
Lee, Cheng-Ruei
2012-01-01
The level of within-species polymorphism differs greatly among genes in a genome. Many genomic studies have investigated the relationship between gene polymorphism and factors such as recombination rate or expression pattern. However, the polymorphism of a gene is affected not only by its physical properties or functional constraints but also by natural selection on organisms in their environments. Specifically, if functionally divergent alleles enable adaptation to different environments, locus-specific polymorphism may be maintained by spatially heterogeneous natural selection. To test this hypothesis and estimate the extent to which environmental selection shapes the pattern of genome-wide polymorphism, we define the "environmental relevance" of a gene as the proportion of genetic variation explained by environmental factors, after controlling for population structure. We found substantial effects of environmental relevance on patterns of polymorphism among genes. In addition, the correlation between environmental relevance and gene polymorphism is positive, consistent with the expectation that balancing selection among heterogeneous environments maintains genetic variation at ecologically important genes. Comparison of the gene ontology annotations shows that genes with high environmental relevance are enriched in unknown function categories. These results suggest an important role for environmental factors in shaping genome-wide patterns of polymorphism and indicate another direction of genomic study. PMID:22798389
Dores, Robert M.
2016-01-01
The evolution of the melanocortin receptors (MCRs) is closely associated with the evolution of the melanocortin-2 receptor accessory proteins (MRAPs). Recent annotation of the elephant shark genome project revealed the sequence of a putative MRAP1 ortholog. The presence of this sequence in the genome of a cartilaginous fish raises the possibility that the mrap1 and mrap2 genes in the genomes of gnathostome vertebrates were the result of the chordate 2R genome duplication event. The presence of a putative MRAP1 ortholog in a cartilaginous fish genome is perplexing. Recent studies on melanocortin-2 receptor (MC2R) in the genomes of the elephant shark and the Japanese stingray indicate that these MC2R orthologs can be functionally expressed in CHO cells without co-expression of an exogenous mrap1 cDNA. The novel ligand selectivity of these cartilaginous fish MC2R orthologs is discussed. Finally, the origin of the mc2r and mc5r genes is reevaluated. The distinctive primary sequence conservation of MC2R and MC5R is discussed in light of the physiological roles of these two MCR paralogs. PMID:27445982
SINEs as driving forces in genome evolution.
Schmitz, J
2012-01-01
SINEs are short interspersed elements derived from cellular RNAs that repetitively retropose via RNA intermediates and integrate more or less randomly back into the genome. SINEs propagate almost entirely vertically within their host cells and, once established in the germline, are passed on from generation to generation. As non-autonomous elements, their reverse transcription (from RNA to cDNA) and genomic integration depends on the activity of the enzymatic machinery of autonomous retrotransposons, such as long interspersed elements (LINEs). SINEs are widely distributed in eukaryotes, but are especially effectively propagated in mammalian species. For example, more than a million Alu-SINE copies populate the human genome (approximately 13% of genomic space), and few master copies of them are still active. In the organisms where they occur, SINEs are a challenge to genomic integrity, but in the long term also can serve as beneficial building blocks for evolution, contributing to phenotypic heterogeneity and modifying gene regulatory networks. They substantially expand the genomic space and introduce structural variation to the genome. SINEs have the potential to mutate genes, to alter gene expression, and to generate new parts of genes. A balanced distribution and controlled activity of such properties is crucial to maintaining the organism's dynamic and thriving evolution. Copyright © 2012 S. Karger AG, Basel.
Analytical model of brittle destruction based on hypothesis of scale similarity
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arakcheev, A. S., E-mail: asarakcheev@gmail.com; Lotov, K. V.
2012-08-15
The size distribution of dust particles in thermonuclear (fusion) devices is closely described by a power law, which may be related to the brittle destruction of materials. The hypothesis of scale similarity leads to the conclusion that the size distribution of particles formed as a result of a brittle destruction is described by a power law with the exponent -{alpha} that can range from -4 to -1. The model of brittle destruction is described in terms of the fractal geometry, and the distribution exponent is expressed via the fractal dimension of packing. Under additional assumptions, it is possible to refinemore » the {alpha} value and, vice versa, to determine the type of destruction using the measured size distribution of particles.« less
NASA Astrophysics Data System (ADS)
Cianciara, Aleksander
2016-09-01
The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.
Monaghan, Padraic; Christiansen, Morten H; Chater, Nick
2007-12-01
Several phonological and prosodic properties of words have been shown to relate to differences between grammatical categories. Distributional information about grammatical categories is also a rich source in the child's language environment. In this paper we hypothesise that such cues operate in tandem for developing the child's knowledge about grammatical categories. We term this the Phonological-Distributional Coherence Hypothesis (PDCH). We tested the PDCH by analysing phonological and distributional information in distinguishing open from closed class words and nouns from verbs in four languages: English, Dutch, French, and Japanese. We found an interaction between phonological and distributional cues for all four languages indicating that when distributional cues were less reliable, phonological cues were stronger. This provides converging evidence that language is structured such that language learning benefits from the integration of information about category from contextual and sound-based sources, and that the child's language environment is less impoverished than we might suspect.
Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki
2002-01-01
Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256
Characterization of transposable elements in the ectomycorrhizal fungus Laccaria bicolor.
Labbé, Jessy; Murat, Claude; Morin, Emmanuelle; Tuskan, Gerald A; Le Tacon, François; Martin, Francis
2012-01-01
The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TE-specific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copy elements distributed within 171 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs exhibits signs of ancient transposition except some intact copies of terminal inverted repeats (TIRS), long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TE expansion in L. bicolor: the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 0.5 Mya ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. This analysis 1) represents an initial characterization of TEs in the L. bicolor genome, 2) contributes to improve genome annotation and a greater understanding of the role TEs played in genome organization and evolution and 3) provides a valuable resource for future research on the genome evolution within the Laccaria genus.
Characterization of Transposable Elements in the Ectomycorrhizal Fungus Laccaria bicolor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Labbe, Jessy L; Murat, Claude; Morin, Emmanuelle
2012-01-01
Background: The publicly available Laccaria bicolor genome sequence has provided a considerable genomic resource allowing systematic identification of transposable elements (TEs) in this symbiotic ectomycorrhizal fungus. Using a TEspecific annotation pipeline we have characterized and analyzed TEs in the L. bicolor S238N-H82 genome. Methodology/Principal Findings: TEs occupy 24% of the 60 Mb L. bicolor genome and represent 25,787 full-length and partial copy elements distributed within 171 families. The most abundant elements were the Copia-like. TEs are not randomly distributed across the genome, but are tightly nested or clustered. The majority of TEs exhibits signs of ancient transposition except some intactmore » copies of terminal inverted repeats (TIRS), long terminal repeats (LTRs) and a large retrotransposon derivative (LARD) element. There were three main periods of TE expansion in L. bicolor: the first from 57 to 10 Mya, the second from 5 to 1 Mya and the most recent from 0.5 Mya ago until now. LTR retrotransposons are closely related to retrotransposons found in another basidiomycete, Coprinopsis cinerea. Conclusions: This analysis 1) represents an initial characterization of TEs in the L. bicolor genome, 2) contributes to improve genome annotation and a greater understanding of the role TEs played in genome organization and evolution and 3) provides a valuable resource for future research on the genome evolution within the Laccaria genus.« less