Science.gov

Sample records for large ancestral genomes

  1. Paleogenomic data suggest mammal-like genome size in the ancestral amniote and derived large genome size in amphibians.

    PubMed

    Organ, C L; Canoville, A; Reisz, R R; Laurin, M

    2011-02-01

    An unsolved question in evolutionary genomics is whether amniote genomes have been expanding or contracting since the common ancestor of this diverse group. Here, we report on the polarity of amniote genome size evolution using genome size estimates for 14 extinct tetrapod genera from the Paleozoic and early Mesozoic Eras using osteocyte lacunae size as a correlate. We find substantial support for a phylogenetically controlled regression model relating genome size to osteocyte lacunae size (P of slopes <0.01, r²=0.65, phylogenetic signal λ=0.83). Genome size appears to have been homogeneous across Paleozoic crown-tetrapod lineages (average haploid genome size 2.9-3.7 pg) with values similar to those of extant mammals. The differentiation in genome size and underlying architecture among extant tetrapod lineages likely evolved in the Mesozoic and Cenozoic Eras, with expansion in amphibians, contractions along the diapsid lineage, and no directional change within the synapsid lineage leading to mammals.

  2. Reconstruction of ancestral genomic sequences using likelihood.

    PubMed

    Elias, Isaac; Tuller, Tamir

    2007-03-01

    A challenging task in computational biology is the reconstruction of genomic sequences of extinct ancestors, given the phylogenetic tree and the sequences at the leafs. This task is best solved by calculating the most likely estimate of the ancestral sequences, along with the most likely edge lengths. We deal with this problem and also the variant in which the phylogenetic tree in addition to the ancestral sequences need to be estimated. The latter problem is known to be NP-hard, while the computational complexity of the former is unknown. Currently, all algorithms for solving these problems are heuristics without performance guarantees. The biological importance of these problems calls for developing better algorithms with guarantees of finding either optimal or approximate solutions. We develop approximation, fix parameter tractable (FPT), and fast heuristic algorithms for two variants of the problem; when the phylogenetic tree is known and when it is unknown. The approximation algorithm guarantees a solution with a log-likelihood ratio of 2 relative to the optimal solution. The FPT has a running time which is polynomial in the length of the sequences and exponential in the number of taxa. This makes it useful for calculating the optimal solution for small trees. Moreover, we combine the approximation algorithm and the FPT into an algorithm with arbitrary good approximation guarantee (PTAS). We tested our algorithms on both synthetic and biological data. In particular, we used the FPT for computing the most likely ancestral mitochondrial genomes of hominidae (the great apes), thereby answering an interesting biological question. Moreover, we show how the approximation algorithms find good solutions for reconstructing the ancestral genomes for a set of lentiviruses (relatives of HIV). Supplementary material of this work is available at www.nada.kth.se/~isaac/publications/aml/aml.html.

  3. Ancestral genome inference using a genetic algorithm approach.

    PubMed

    Gao, Nan; Yang, Ning; Tang, Jijun

    2013-01-01

    Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The "double-cut-and-join" (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu. PMID:23658708

  4. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers. PMID:23653472

  5. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers.

  6. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We

  7. Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates.

    PubMed

    Weng, Mao-Lun; Blazier, John C; Govindu, Madhumita; Jansen, Robert K

    2014-03-01

    Geraniaceae plastid genomes are highly rearranged, and each of the four genera already sequenced in the family has a distinct genome organization. This study reports plastid genome sequences of six additional species, Francoa sonchifolia, Melianthus villosus, and Viviania marifolia from Geraniales, and Pelargonium alternans, California macrophylla, and Hypseocharis bilobata from Geraniaceae. These genome sequences, combined with previously published species, provide sufficient taxon sampling to reconstruct the ancestral plastid genome organization of Geraniaceae and the rearrangements unique to each genus. The ancestral plastid genome of Geraniaceae has a 4 kb inversion and a reduced, Pelargonium-like small single copy region. Our ancestral genome reconstruction suggests that a few minor rearrangements occurred in the stem branch of Geraniaceae followed by independent rearrangements in each genus. The genomic comparison demonstrates that a series of inverted repeat boundary shifts and inversions played a major role in shaping genome organization in the family. The distribution of repeats is strongly associated with breakpoints in the rearranged genomes, and the proportion and the number of large repeats (>20 bp and >60 bp) are significantly correlated with the degree of genome rearrangements. Increases in the degree of plastid genome rearrangements are correlated with the acceleration in nonsynonymous substitution rates (dN) but not with synonymous substitution rates (dS). Possible mechanisms that might contribute to this correlation, including DNA repair system and selection, are discussed. PMID:24336877

  8. Ancestral components of admixed genomes in a Mexican cohort.

    PubMed

    Johnson, Nicholas A; Coram, Marc A; Shriver, Mark D; Romieu, Isabelle; Barsh, Gregory S; London, Stephanie J; Tang, Hua

    2011-12-01

    For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study "virtual genomes" of admixed individuals. We apply this approach to a cohort of 492 parent-offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations-Africa, Europe, and America-vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10-15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease

  9. Ancestral Components of Admixed Genomes in a Mexican Cohort

    PubMed Central

    Johnson, Nicholas A.; Coram, Marc A.; Shriver, Mark D.; Romieu, Isabelle; Barsh, Gregory S.; London, Stephanie J.; Tang, Hua

    2011-01-01

    For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study “virtual genomes” of admixed individuals. We apply this approach to a cohort of 492 parent–offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations—Africa, Europe, and America—vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10–15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease

  10. Mining the semantics of genome super-blocks to infer ancestral architectures.

    PubMed

    Jean, Géraldine; Sherman, David James; Nikolski, Macha

    2009-09-01

    The study of evolutionary mechanisms is made more and more accurate by the increase in the number of fully sequenced genomes. One of the main problems is to reconstruct plausible ancestral genome architectures based on the comparison of contemporary genomes. Current methods have largely focused on finding complete architectures for ancestral genomes, and, due to the computational difficulty of the problem, stop after a small number of equivalent minimal solutions have been found. Recent results suggest, however, that the set of minimum complete architectures is very large and heterogeneous. In fact these solutions are collections of conserved blocks, freely rearranged. In this paper, we identify these conserved super-blocks, using a new method of analysis of ancestral architectures that reconciles both breakpoint and rearrangement analyses, as well as respects biological constraints. The resulting algorithms permit the first reliable reconstruction of plausible ancestral architectures for several non-WGD yeasts simultaneously, a problem hitherto intractable due to the extensive map reshuffling of these species. See online Supplementary Material at www.liebertonline.com. PMID:19772437

  11. Reflections on ancestral haplotypes: medical genomics, evolution, and human individuality.

    PubMed

    Steele, Edward J

    2014-01-01

    The major histocompatibility complex (MHC), once labelled the "sphinx of immunology" by Jan Klein, provides powerful challenges to evolutionary thinking. This essay highlights the main discoveries that established the block ancestral haplotype structure of the MHC and the wider genome, focusing on the work by the Perth (Australia) group, led by Roger Dawkins, and the Boston group, led by Chester Alper and Edmond Yunis. Their achievements have been overlooked in the rush to sequence the first and subsequent drafts of the human genome. In Caucasoids, where most of the detailed work has been done, about 70% of all known allelic MHC diversity can be accounted for by 30 or so ancestral haplotypes (AHs), or conserved sequences of many mega-bases, and their recombinants. The block haplotype structure of the genome, as shown for the MHC (and other genetic regions), is a story that needs to be understood in its own right, particularly given the promotion of the "HapMap" project and single nucleotide polymorphism (SNP) linkage disequilibrium (LD) analysis, which has been wrongly touted as the only way to pinpoint those genes that are important in genetic disorders or other desired (qualitative) characteristics. PMID:25544323

  12. A Cooperative Co-Evolutionary Genetic Algorithm for Tree Scoring and Ancestral Genome Inference.

    PubMed

    Gao, Nan; Zhang, Yan; Feng, Bing; Tang, Jijun

    2015-01-01

    Recent advances of technology have made it easy to obtain and compare whole genomes. Rearrangements of genomes through operations such as reversals and transpositions are rare events that enable researchers to reconstruct deep evolutionary history among species. Some of the popular methods need to search a large tree space for the best scored tree, thus it is desirable to have a fast and accurate method that can score a given tree efficiently. During the tree scoring procedure, the genomic structures of internal tree nodes are also provided, which provide important information for inferring ancestral genomes and for modeling the evolutionary processes. However, computing tree scores and ancestral genomes are very difficult and a lot of researchers have to rely on heuristic methods which have various disadvantages. In this paper, we describe the first genetic algorithm for tree scoring and ancestor inference, which uses a fitness function considering co-evolution, adopts different initial seeding methods to initialize the first population pool, and utilizes a sorting-based approach to realize evolution. Our extensive experiments show that compared with other existing algorithms, this new method is more accurate and can infer ancestral genomes that are much closer to the true ancestors. PMID:26671797

  13. Genome-wide inference of ancestral recombination graphs.

    PubMed

    Rasmussen, Matthew D; Hubisz, Melissa J; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  14. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  15. A general modeling framework for genome ancestral origins in multiparental populations.

    PubMed

    Zheng, Chaozhi; P Boer, Martin; van Eeuwijk, Fred A

    2014-09-01

    The next generation of QTL (quantitative trait loci) mapping populations have been designed with multiple founders, where one to a number of generations of intercrossing are introduced prior to the inbreeding phase to increase accumulated recombinations and thus mapping resolution. Examples of such populations are Collaborative Cross (CC) in mice and Multiparent Advanced Generation Inter-Cross (MAGIC) lines in Arabidopsis. The genomes of the produced inbred lines are fine-grained random mosaics of the founder genomes. In this article, we present a novel framework for modeling ancestral origin processes along two homologous autosomal chromosomes from mapping populations, which is a major component in the reconstruction of the ancestral origins of each line for QTL mapping. We construct a general continuous time Markov model for ancestral origin processes, where the rate matrix is deduced from the expected densities of various types of junctions (recombination breakpoints). The model can be applied to monoecious populations with or without self-fertilizations and to dioecious populations with two separate sexes. The analytic expressions for map expansions and expected junction densities are obtained for mapping populations that have stage-wise constant mating schemes, such as CC and MAGIC. Our studies on the breeding design of MAGIC populations show that the intercross mating schemes do not matter much for large population size and that the overall expected junction density, and thus map resolution, are approximately proportional to the inverse of the number of founders.

  16. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  17. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  18. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.

    PubMed

    Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A

    2016-03-01

    Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools. PMID:26885568

  19. Mapping ancestral genomes with massive gene loss: A matrix sandwich problem

    PubMed Central

    Gavranović, Haris; Chauve, Cedric; Salse, Jérôme; Tannier, Eric

    2011-01-01

    Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto–monocotyledon whose evolution involved massive gene loss following a whole genome duplication. Results: We propose a mapping approach based on the combinatorial notion of ‘sandwich consecutive ones matrix’, which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. Availability: Upon request to the authors. Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr PMID:21685079

  20. Reconstructing ancestral genomic sequences by co-evolution: formal definitions, computational issues, and biological examples.

    PubMed

    Tuller, Tamir; Birin, Hadas; Kupiec, Martin; Ruppin, Eytan

    2010-09-01

    The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies, we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content. In this work we formally define a computational problem for the inference of ancestral genome content by co-evolution. We show that this problem is NP-hard and hard to approximate and present both a Fixed Parameter Tractable (FPT) algorithm, and heuristic approximation algorithms for solving it. The running time of these algorithms on simulated inputs with hundreds of protein families and hundreds of co-evolutionary relations was fast (up to four minutes) and it achieved an approximation ratio of <1.3. We use our approach to study the ancestral genome content of the Fungi. To this end, we implement our approach on a dataset of 33, 931 protein families and 20, 317 co-evolutionary relations. Our algorithm added and removed hundreds of proteins from the ancestral genomes inferred by maximum likelihood (ML) or maximum parsimony (MP) while slightly affecting the likelihood/parsimony score of the results. A biological analysis revealed various pieces of evidence that support the biological plausibility of the new solutions. In addition, we showed that our approach reconstructs missing values at the leaves of the Fungi evolutionary tree better than ML or MP.

  1. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf.

    PubMed

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-09-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200,000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance.

  2. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf

    PubMed Central

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-01-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200 000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance. PMID:23672264

  3. The mosaic of ancestral karyotype blocks in the Sinapis alba L. genome.

    PubMed

    Nelson, Matthew N; Parkin, Isobel A P; Lydiate, Derek J

    2011-01-01

    The organisation of the Sinapis alba genome, comprising 12 linkage groups (n = 12), was compared with the Brassicaceae ancestral karyotype (AK) genomic blocks previously described in other crucifer species. Most of the S. alba genome falls into conserved triplicated genomic blocks that closely match the AK-defined genomic blocks found in other crucifer species including the A, B, and C genomes of closely related Brassica species. In one instance, an S. alba linkage group (S05) was completely collinear with one AK chromosome (AK1), the first time this has been observed in a member of the Brassiceae tribe. However, as observed for other members of the Brassiceae tribe, ancestral genomic blocks were fragmented in the S. alba genome, supporting previously reported comparative chromosome painting describing rearrangements of the AK karyotype prior to the divergence of the Brassiceae from other crucifers. The presented data also refute previous phylogenetic reports that suggest S. alba was more closely related to Brassica nigra (B genome) than to B. rapa (A genome) and B. oleracea (C genome). A comparison of the S. alba and Arabidopsis thaliana genomes revealed many regions of conserved gene order, which will facilitate access to the rich genomic resources available in the model species A. thaliana for genetic research in the less well-resourced crop species S. alba.

  4. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    PubMed Central

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  5. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

    PubMed

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2014-12-12

    To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  6. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001.

  7. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001. PMID:11479919

  8. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  9. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  10. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes.

  11. Exploring the diploid wheat ancestral A genome through sequence comparison at the high-molecular-weight glutenin locus region.

    PubMed

    Dong, Lingli; Huo, Naxin; Wang, Yi; Deal, Karin; Luo, Ming-Cheng; Wang, Daowen; Anderson, Olin D; Gu, Yong Qiang

    2012-12-01

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurately assembly of its genome. To explore the utility of ancestral diploid species of polyploid wheat, sequence variation of T. urartu (A(u)A(u)) was analyzed by comparing its 277-kb large genomic region carrying the important Glu-1 locus with the homologous regions from the A genomes of the diploid T. monococcum (A(m)A(m)), tetraploid T. turgidum (AABB), and hexaploid T. aestivum (AABBDD). Our results revealed that in addition to a high degree of the gene collinearity, nested retroelement structures were also considerably conserved among the A(u) genome and the A genomes in polyploid wheats, suggesting that the majority of the repetitive sequences in the A genomes of polyploid wheats originated from the diploid A(u) genome. The difference in the compared region between A(u) and A is mainly caused by four differential TE insertion and two deletion events between these genomes. The estimated divergence time of A genomes calculated on nucleotide substitution rate in both shared TEs and collinear genes further supports the closer evolutionary relationship of A to A(u) than to A(m). The structure conservation in the repetitive regions promoted us to develop repeat junction markers based on the A(u) sequence for mapping the A genome in hexaploid wheat. Eighty percent of these repeat junction markers were successfully mapped to the corresponding region in hexaploid wheat, suggesting that T. urartu could serve as a useful resource for developing molecular markers for genetic and breeding studies in hexaploid wheat.

  12. Defining the ancestral eutherian karyotype: a cladistic interpretation of chromosome painting and genome sequence assembly data.

    PubMed

    Robinson, Terence J; Ruiz-Herrera, Aurora

    2008-01-01

    A cladistic analysis of genome assemblies (syntenic associations) for eutherian mammals against two distant outgroup species--opossum and chicken--permitted a refinement of the 46-chromosome karyotype formerly inferred in the ancestral eutherian. We show that two intact chromosome pairs (corresponding to human chromosomes 13 and 18) and three conserved chromosome segments (10q, 19p and 8q in the human karyotype) are probably symplesiomorphic for Eutheria because they are also present as unaltered orthologues in one or both outgroups. Seven additional syntenies (4q/8p/4pq, 3p/21, 14/15, 10p/12pq/22qt, 19q/16q, 16p/7a and 12qt/22q), each involving human chromosomal segments that in various combinations correspond to complete chromosomes in the ancestral eutherian karyotype, are also present in one or both outgroup taxa and thus are probable symplesiomorphies for Eutheria. Interestingly, several of the symplesiomorphic characters identified in chicken and/or opossum are present in more distant outgroups such as pufferfish and zebrafish (for example 3p/21, 14/15, 19q/16q and 16p/7a), suggesting their retention since vertebrate common ancestry approximately 450 million years ago. However, eight intact pairs (corresponding to human chromosomes 1, 5, 6, 9, 11, 17, 20 and the X) and three chromosome segments (7b, 2p-q13 and 2q13-qter) are derived characters potentially consistent with eutherian monophyly. Our analyses clarify the distinction between shared-ancestral and shared-derived homology in the eutherian ancestral karyotype.

  13. Ancestral chromosomal blocks are triplicated in Brassiceae species with varying chromosome number and genome size.

    PubMed

    Lysak, Martin A; Cheung, Kwok; Kitschke, Michaela; Bures, Petr

    2007-10-01

    The paleopolyploid character of genomes of the economically important genus Brassica and closely related species (tribe Brassiceae) is still fairly controversial. Here, we report on the comparative painting analysis of block F of the crucifer Ancestral Karyotype (AK; n = 8), consisting of 24 conserved genomic blocks, in 10 species traditionally treated as members of the tribe Brassiceae. Three homeologous copies of block F were identified per haploid chromosome complement in Brassiceae species with 2n = 14, 18, 20, 32, and 36. In high-polyploid (n >or= 30) species Crambe maritima (2n = 60), Crambe cordifolia (2n = 120), and Vella pseudocytisus (2n = 68), six, 12, and six copies of the analyzed block have been revealed, respectively. Homeologous regions resembled the ancestral structure of block F within the AK or were altered by inversions and/or translocations. In two species of the subtribe Zillineae, two of the three homeologous regions were combined via a reciprocal translocation onto one chromosome. Altogether, these findings provide compelling evidence of an ancient hexaploidization event and corresponding whole-genome triplication shared by the tribe Brassiceae. No direct relationship between chromosome number and genome size variation (1.2-2.5 pg/2C) has been found in Brassiceae species with 2n = 14 to 36. Only two homeologous copies of block F suggest a whole-genome duplication but not the triplication event in Orychophragmus violaceus (2n = 24), and confirm a phylogenetic position of this species outside the tribe Brassiceae. Chromosome duplication detected in Orychophragmus as well as chromosome rearrangements shared by Zillineae species demonstrate the usefulness of comparative cytogenetics for elucidation of phylogenetic relationships.

  14. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

    PubMed

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the "salicoid" duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  15. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral “Salicoid” Genome Duplication

    PubMed Central

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the “salicoid” duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  16. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites

    PubMed Central

    2012-01-01

    Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci. PMID:23281809

  17. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure

    PubMed Central

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P.

    2016-01-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  18. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure.

    PubMed

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P

    2016-02-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  19. Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution

    PubMed Central

    Murat, Florent; Xu, Jian-Hong; Tannier, Eric; Abrouk, Michael; Guilhot, Nicolas; Pont, Caroline; Messing, Joachim; Salse, Jérôme

    2010-01-01

    The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5–12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes. PMID:20876790

  20. Ancestral grass karyotype reconstruction unravels new mechanisms of genome shuffling as a source of plant evolution.

    PubMed

    Murat, Florent; Xu, Jian-Hong; Tannier, Eric; Abrouk, Michael; Guilhot, Nicolas; Pont, Caroline; Messing, Joachim; Salse, Jérôme

    2010-11-01

    The comparison of the chromosome numbers of today's species with common reconstructed paleo-ancestors has led to intense speculation of how chromosomes have been rearranged over time in mammals. However, similar studies in plants with respect to genome evolution as well as molecular mechanisms leading to mosaic synteny blocks have been lacking due to relevant examples of evolutionary zooms from genomic sequences. Such studies require genomes of species that belong to the same family but are diverged to fall into different subfamilies. Our most important crops belong to the family of the grasses, where a number of genomes have now been sequenced. Based on detailed paleogenomics, using inference from n = 5-12 grass ancestral karyotypes (AGKs) in terms of gene content and order, we delineated sequence intervals comprising a complete set of junction break points of orthologous regions from rice, maize, sorghum, and Brachypodium genomes, representing three different subfamilies and different polyploidization events. By focusing on these sequence intervals, we could show that the chromosome number variation/reduction from the n = 12 common paleo-ancestor was driven by nonrandom centric double-strand break repair events. It appeared that the centromeric/telomeric illegitimate recombination between nonhomologous chromosomes led to nested chromosome fusions (NCFs) and synteny break points (SBPs). When intervals comprising NCFs were compared in their structure, we concluded that SBPs (1) were meiotic recombination hotspots, (2) corresponded to high sequence turnover loci through repeat invasion, and (3) might be considered as hotspots of evolutionary novelty that could act as a reservoir for producing adaptive phenotypes.

  1. Exploring the diploid wheat ancestral A genome through sequence comparison at the High-Molecular-Weight glutenin locus region

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurate assembly of its genome. To explore the utility of ancestral diploid species o...

  2. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  3. Genomic organization of the crested ibis MHC provides new insight into ancestral avian MHC structure

    PubMed Central

    Chen, Li-Cheng; Lan, Hong; Sun, Li; Deng, Yan-Li; Tang, Ke-Yi; Wan, Qiu-Hong

    2015-01-01

    The major histocompatibility complex (MHC) plays an important role in immune response. Avian MHCs are not well characterized, only reporting highly compact Galliformes MHCs and extensively fragmented zebra finch MHC. We report the first genomic structure of an endangered Pelecaniformes (crested ibis) MHC containing 54 genes in three regions spanning ~500 kb. In contrast to the loose BG (26 loci within 265 kb) and Class I (11 within 150) genomic structures, the Core Region is condensed (17 within 85). Furthermore, this Region exhibits a COL11A2 gene, followed by four tandem MHC class II αβ dyads retaining two suites of anciently duplicated “αβ” lineages. Thus, the crested ibis MHC structure is entirely different from the known avian MHC architectures but similar to that of mammalian MHCs, suggesting that the fundamental structure of ancestral avian class II MHCs should be “COL11A2-IIαβ1-IIαβ2.” The gene structures, residue characteristics, and expression levels of the five class I genes reveal inter-locus functional divergence. However, phylogenetic analysis indicates that these five genes generate a well-supported intra-species clade, showing evidence for recent duplications. Our analyses suggest dramatic structural variation among avian MHC lineages, help elucidate avian MHC evolution, and provide a foundation for future conservation studies. PMID:25608659

  4. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus.

    PubMed

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-07-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework.

  5. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  6. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes. PMID:20493270

  7. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes.

  8. Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

    PubMed

    Walker, Michael B; King, Benjamin L; Paigen, Kenneth

    2012-01-01

    Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

  9. Vertebrate codon bias indicates a highly GC-rich ancestral genome.

    PubMed

    Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei

    2013-04-25

    Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences. PMID:23376453

  10. A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia

    PubMed Central

    Ogedengbe, Mosun E.; Qvarnstrom, Yvonne; da Silva, Alexandre J.; Arrowood, Michael J.; Barta, John R.

    2015-01-01

    The near complete mitochondrial (mt)genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mt genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5’-terminus of the genome starting with a 21bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mt genome 5’-terminus are found in all eimeriid mt genomes available and suggest this may be the ancestral start of eimeriid mt genomes. PMID:25812835

  11. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage

    PubMed Central

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H.; Hugenholtz, Philip

    2016-01-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0–1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  12. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage.

    PubMed

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H; Hugenholtz, Philip

    2016-04-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0-1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms.

  13. Evaluation of the TREX1 gene in a large multi-ancestral lupus cohort

    PubMed Central

    Namjou, Bahram; Kothari, Parul H.; Kelly, Jennifer A.; Glenn, Stuart B.; Ojwang, Joshua O.; Adler, Adam; Alarcón-Riquelme, Marta E.; Gallant, Caroline J.; Boackle, Susan A.; Criswell, Lindsey A.; Kimberly, Robert P.; Brown, Elizabeth; Edberg, Jeffrey; Stevens, Anne M.; Jacob, Chaim O.; Tsao, Betty P.; Gilkeson, Gary S.; Kamen, Diane L.; Merrill, Joan T.; Petri, Michelle; Goldman, Rosalind Ramsey; Vila, Luis M.; Anaya, Juan-Manuel; Niewold, Timothy B.; Martin, Javier; Pons-Estel, Bernardo A.; Sabio, Jose M.; Callejas, Jose L.; Vyse, Timothy J.; Bae, Sang-Cheol; Perrino, Fred W.; Freedman, Barry I.; Scofield, R. Hal; Moser, Kathy L.; Gaffney, Patrick M.; James, Judith A.; Langefeld, Carl D.; Kaufman, Kenneth M.; Harley, John B.; Atkinson, John P.

    2011-01-01

    Systemic Lupus Erythematosus (SLE) is a prototypic autoimmune disorder with a complex pathogenesis in which genetic, hormonal and environmental factors play a role. Rare mutations in the TREX1 gene, the major mammalian 3′-5′ exonuclease, have been reported in sporadic SLE cases. Some of these mutations have also been identified in a rare pediatric neurologic condition featuring an inflammatory encephalopathy known as Aicardi-Goutières syndrome (AGS). We sought to investigate the frequency of these mutations in a large multi-ancestral cohort of SLE cases and controls. Methods Forty single-nucleotide polymorphisms (SNPs), including both common and rare variants, across the TREX1 gene were evaluated in ∼8370 patients with SLE and ∼7490 control subjects. Stringent quality control procedures were applied and principal components and admixture proportions were calculated to identify outliers for removal from analysis. Population-based case-control association analyses were performed. P values, false discovery rate q values, and odds ratios with 95% confidence intervals were calculated. Results The estimated frequency of TREX1 mutations in our lupus cohort was 0.5%. Five heterozygous mutations were detected at the Y305C polymorphism in European lupus cases but none were observed in European controls. Five African cases incurred heterozygous mutations at the E266G polymorphism and, again, none were observed in the African controls. A rare homozygous R114H mutation was identified in one Asian SLE patient whereas all genotypes at this mutation in previous reports for SLE were heterozygous. Analysis of common TREX1 SNPs (MAF >10%) revealed a relatively common risk haplotype in European SLE patients with neurologic manifestations, especially seizures, with a frequency of 58% in lupus cases compared to 45% in normal controls (p=0.0008, OR=1.73, 95% CI=1.25-2.39). Finally, the presence or absence of specific autoantibodies in certain populations produced significant

  14. Phylogenomics of primates and their ancestral populations

    PubMed Central

    Siepel, Adam

    2009-01-01

    Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics. PMID:19801602

  15. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  16. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa.

  17. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera

    PubMed Central

    2012-01-01

    Background Lepidoptera encompasses more than 160,000 described species that have been classified into 45–48 superfamilies. The previously determined Lepidoptera mitochondrial genomes (mitogenomes) are limited to six superfamilies of the lineage Ditrysia. Compared with the ancestral insect gene order, these mitogenomes all contain a tRNA rearrangement. To gain new insights into Lepidoptera mitogenome evolution, we sequenced the mitogenomes of two ghost moths that belong to the non-ditrysian lineage Hepialoidea and conducted a comparative mitogenomic analysis across Lepidoptera. Results The mitogenomes of Thitarodes renzhiensis and T. yunnanensis are 16,173 bp and 15,816 bp long with an A + T content of 81.28 % and 82.34 %, respectively. Both mitogenomes include 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and the A + T-rich region. Different tandem repeats in the A + T-rich region mainly account for the size difference between the two mitogenomes. All the protein-coding genes start with typical mitochondrial initiation codons, except for cox1 (CGA) and nad1 (TTG) in both mitogenomes. The anticodon of trnS(AGN) in T. renzhiensis and T. yunnanensis is UCU instead of the mostly used GCU in other sequenced Lepidoptera mitogenomes. The 1,584-bp sequence from rrnS to nad2 was also determined for an unspecified ghost moth (Thitarodes sp.), which has no repetitive sequence in the A + T-rich region. All three Thitarodes species possess the ancestral gene order with trnI-trnQ-trnM located between the A + T-rich region and nad2, which is different from the gene order trnM-trnI-trnQ in all previously sequenced Lepidoptera species. The formerly identified conserved elements of Lepidoptera mitogenomes (i.e. the motif ‘ATAGA’ and poly-T stretch in the A + T-rich region and the long intergenic spacer upstream of nad2) are absent in the Thitarodes mitogenomes. Conclusion The mitogenomes of T. renzhiensis and T

  18. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

  19. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages.

  20. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages. PMID:25230663

  1. Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events.

    PubMed

    Song, Xiaowei; Tang, Yezhong; Wang, Yajun

    2016-08-22

    The vertebrate FoxP subfamily genes play important roles in the construction of essential functional modules involved in physiological and developmental processes. To explore the adaptive evolution of functional modules associated with the FoxP subfamily member genes, it is necessary to study the gene duplication process. We detected four member genes of the FoxP subfamily in sea lampreys (a representative species of jawless vertebrates) through genome screenings and phylogenetic analyses. Reliable paralogons (i.e. paralogous chromosome segments) have rarely been detected in scaffolds of FoxP subfamily member genes in sea lampreys due to the considerable existence of HTH_Tnp_Tc3_2 transposases. However, these transposases did not alter gene numbers of the FoxP subfamily in sea lampreys. The coincidence between the "1-4" gene duplication pattern of FoxP subfamily genes from invertebrates to vertebrates and two rounds of ancestral whole genome duplication (1R- and 2R-WGD) events reveal that the FoxP subfamily of vertebrates was quadruplicated in the 1R- and 2R-WGD events. Furthermore, we deduced that a synchronous gene duplication process occurred for the FoxP subfamily and for three linked gene families/subfamilies (i.e. MIT family, mGluR group III and PLXNA subfamily) in the 1R- and 2R-WGD events using phylogenetic analyses and mirror-dendrogram methods (i.e. algorithms to test protein-protein interactions). Specifically, the ancestor of FoxP1 and FoxP3 and the ancestor of FoxP2 and FoxP4 were generated in 1R-WGD event. In the subsequent 2R-WGD event, these two ancestral genes were changed into FoxP1, FoxP2, FoxP3 and FoxP4. The elucidation of these gene duplication processes shed light on the phylogenetic relationships between functional modules of the FoxP subfamily member genes.

  2. Comparative genome sequence analysis underscores mycoparasitism as the ancestral life style of Trichoderma

    PubMed Central

    2011-01-01

    Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocladium virens, teleomorph Hypocrea virens), and a comparison with Trichoderma reesei (teleomorph Hypocrea jecorina). These three Trichoderma species display a remarkable conservation of gene order (78 to 96%), and a lack of active mobile elements probably due to repeat-induced point mutation. Several gene families are expanded in the two mycoparasitic species relative to T. reesei or other ascomycetes, and are overrepresented in non-syntenic genome regions. A phylogenetic analysis shows that T. reesei and T. virens are derived relative to T. atroviride. The mycoparasitism-specific genes thus arose in a common Trichoderma ancestor but were subsequently lost in T. reesei. Conclusions The data offer a better understanding of mycoparasitism, and thus enforce the development of improved biocontrol strains for efficient and environmentally friendly protection of plants. PMID:21501500

  3. Ancestral population genomics using coalescence hidden Markov models and heuristic optimisation algorithms.

    PubMed

    Cheng, Jade Yu; Mailund, Thomas

    2015-08-01

    With full genome data from several closely related species now readily available, we have the ultimate data for demographic inference. Exploiting these full genomes, however, requires models that can explicitly model recombination along alignments of full chromosomal length. Over the last decade a class of models, based on the sequential Markov coalescence model combined with hidden Markov models, has been developed and used to make inference in simple demographic scenarios. To move forward to more complex demographic modelling we need better and more automated ways of specifying these models and efficient optimisation algorithms for inferring the parameters in complex and often high-dimensional models. In this paper we present a framework for building such coalescence hidden Markov models for pairwise alignments and present results for using heuristic optimisation algorithms for parameter estimation. We show that we can build more complex demographic models than our previous frameworks and that we obtain more accurate parameter estimates using heuristic optimisation algorithms than when using our previous gradient based approaches. Our new framework provides a flexible way of constructing coalescence hidden Markov models almost automatically. While estimating parameters in more complex models is still challenging we show that using heuristic optimisation algorithms we still get a fairly good accuracy.

  4. Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

    PubMed Central

    Lipson, Mark; Loh, Po-Ru; Sankararaman, Sriram; Patterson, Nick; Berger, Bonnie; Reich, David

    2015-01-01

    The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Most notably, recent approaches based on counting de novo mutations in family pedigrees have yielded significantly smaller values than classical methods based on sequence divergence. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.61 ± 0.13 × 10−8 mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes. PMID:26562831

  5. Exploring Population Admixture Dynamics via Empirical and Simulated Genome-wide Distribution of Ancestral Chromosomal Segments

    PubMed Central

    Jin, Wenfei; Wang, Sijia; Wang, Haifeng; Jin, Li; Xu, Shuhua

    2012-01-01

    The processes of genetic admixture determine the haplotype structure and linkage disequilibrium patterns of the admixed population, which is important for medical and evolutionary studies. However, most previous studies do not consider the inherent complexity of admixture processes. Here we proposed two approaches to explore population admixture dynamics, and we demonstrated, by analyzing genome-wide empirical and simulated data, that the approach based on the distribution of chromosomal segments of distinct ancestry (CSDAs) was more powerful than that based on the distribution of individual ancestry proportions. Analysis of 1,890 African Americans showed that a continuous gene flow model, in which the African American population continuously received gene flow from European populations over about 14 generations, best explained the admixture dynamics of African Americans among several putative models. Interestingly, we observed that some African Americans had much more European ancestry than the simulated samples, indicating substructures of local ancestries in African Americans that could have been caused by individuals from some particular lineages having repeatedly admixed with people of European ancestry. In contrast, the admixture dynamics of Mexicans could be explained by a gradual admixture model in which the Mexican population continuously received gene flow from both European and Amerindian populations over about 24 generations. Our results also indicated that recent gene flows from Sub-Saharan Africans have contributed to the gene pool of Middle Eastern populations such as Mozabite, Bedouin, and Palestinian. In summary, this study not only provides approaches to explore population admixture dynamics, but also advances our understanding on population history of African Americans, Mexicans, and Middle Eastern populations. PMID:23103229

  6. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle

    PubMed Central

    Huson, Heather J.; Kim, Eui-Soo; Godfrey, Robert W.; Olson, Timothy A.; McClure, Matthew C.; Chase, Chad C.; Rizzi, Rita; O'Brien, Ana M. P.; Van Tassell, Curt P.; Garcia, José F.; Sonstegard, Tad S.

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7–38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus. PMID:24808908

  7. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle.

    PubMed

    Huson, Heather J; Kim, Eui-Soo; Godfrey, Robert W; Olson, Timothy A; McClure, Matthew C; Chase, Chad C; Rizzi, Rita; O'Brien, Ana M P; Van Tassell, Curt P; Garcia, José F; Sonstegard, Tad S

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7-38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus. PMID:24808908

  8. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle.

    PubMed

    Huson, Heather J; Kim, Eui-Soo; Godfrey, Robert W; Olson, Timothy A; McClure, Matthew C; Chase, Chad C; Rizzi, Rita; O'Brien, Ana M P; Van Tassell, Curt P; Garcia, José F; Sonstegard, Tad S

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7-38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus.

  9. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species

    PubMed Central

    Huguet-Tapia, Jose C.; Lefebure, Tristan; Badger, Jonathan H.; Guan, Dongli; Stanhope, Michael J.

    2016-01-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  10. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species.

    PubMed

    Huguet-Tapia, Jose C; Lefebure, Tristan; Badger, Jonathan H; Guan, Dongli; Pettis, Gregg S; Stanhope, Michael J; Loria, Rosemary

    2016-04-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  11. The complete chloroplast DNA sequence of the green alga Nephroselmis olivacea: insights into the architecture of ancestral chloroplast genomes.

    PubMed

    Turmel, M; Otis, C; Lemieux, C

    1999-08-31

    Green plants seem to form two sister lineages: Chlorophyta, comprising the green algal classes Prasinophyceae, Ulvophyceae, Trebouxiophyceae, and Chlorophyceae, and Streptophyta, comprising the Charophyceae and land plants. We have determined the complete chloroplast DNA (cpDNA) sequence (200,799 bp) of Nephroselmis olivacea, a member of the class (Prasinophyceae) thought to include descendants of the earliest-diverging green algae. The 127 genes identified in this genome represent the largest gene repertoire among the green algal and land plant cpDNAs completely sequenced to date. Of the Nephroselmis genes, 2 (ycf81 and ftsI, a gene involved in peptidoglycan synthesis) have not been identified in any previously investigated cpDNA; 5 genes [ftsW, rnE, ycf62, rnpB, and trnS(cga)] have been found only in cpDNAs of nongreen algae; and 10 others (ndh genes) have been described only in land plant cpDNAs. Nephroselmis and land plant cpDNAs share the same quadripartite structure-which is characterized by the presence of a large rRNA-encoding inverted repeat and two unequal single-copy regions-and very similar sets of genes in corresponding genomic regions. Given that our phylogenetic analyses place Nephroselmis within the Chlorophyta, these structural characteristics were most likely present in the cpDNA of the common ancestor of chlorophytes and streptophytes. Comparative analyses of chloroplast genomes indicate that the typical quadripartite architecture and gene-partitioning pattern of land plant cpDNAs are ancient features that may have been derived from the genome of the cyanobacterial progenitor of chloroplasts. Our phylogenetic data also offer insight into the chlorophyte ancestor of euglenophyte chloroplasts.

  12. Structure of the bc1 complex from Seculamonas ecuadoriensis, a jakobid flagellate with an ancestral mitochondrial genome.

    PubMed

    Marx, Stefanie; Baumgärtner, Maja; Kannan, Sivakumar; Braun, Hans-Peter; Lang, B Franz; Burger, Gertraud; Kunnan, Sivakumar

    2003-01-01

    In eubacteria, the respiratory bc(1) complex (complex III) consists of three or four different subunits, whereas that of mitochondria, which have descended from an alpha-proteobacterial endosymbiont, contains about seven additional subunits. To understand better how mitochondrial protein complexes evolved from their simpler bacterial predecessors, we purified complex III of Seculamonas ecuadoriensis, a member of the jakobid protists, which possess the most bacteria-like mitochondrial genomes known. The S. ecuadoriensis complex III has an apparent molecular mass of 460 kDa and exhibits antimycin-sensitive quinol:cytochrome c oxidoreductase activity. It is composed of at least eight subunits between 6 and 46 kDa in size, including two large "core" subunits and the three "respiratory" subunits. The molecular mass of the S. ecuadoriensis bc(1) complex is slightly lower than that reported for other eukaryotes, but about 2x as large as complex III in bacteria. This indicates that the departure from the small bacteria-like complex III took place at an early stage in mitochondrial evolution, prior to the divergence of jakobids. We posit that the recruitment of additional subunits in mitochondrial respiratory complexes is a consequence of the migration of originally alpha-proteobacterial genes to the nucleus.

  13. Ancestral genomic duplication of the insulin gene in tilapia: An analysis of possible implications for clinical islet xenotransplantation using donor islets from transgenic tilapia expressing a humanized insulin gene.

    PubMed

    Hrytsenko, Olga; Pohajdak, Bill; Wright, James R

    2016-07-01

    Tilapia, a teleost fish, have multiple large anatomically discrete islets which are easy to harvest, and when transplanted into diabetic murine recipients, provide normoglycemia and mammalian-like glucose tolerance profiles. Tilapia insulin differs structurally from human insulin which could preclude their use as islet donors for xenotransplantation. Therefore, we produced transgenic tilapia with islets expressing a humanized insulin gene. It is now known that fish genomes may possess an ancestral duplication and so tilapia may have a second insulin gene. Therefore, we cloned, sequenced, and characterized the tilapia insulin 2 transcript and found that its expression is negligible in islets, is not islet-specific, and would not likely need to be silenced in our transgenic fish. PMID:27222321

  14. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  15. Global Alignment System for Large Genomic Sequencing

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  16. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth.

    PubMed

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-12-22

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker.

  17. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  18. A dense linkage map for Chinook salmon (Oncorhynchus tshawytscha) reveals variable chromosomal divergence after an ancestral whole genome duplication event.

    PubMed

    Brieuc, Marine S O; Waters, Charles D; Seeb, James E; Naish, Kerry A

    2014-03-20

    Comparisons between the genomes of salmon species reveal that they underwent extensive chromosomal rearrangements following whole genome duplication that occurred in their lineage 58-63 million years ago. Extant salmonids are diploid, but occasional pairing between homeologous chromosomes exists in males. The consequences of re-diploidization can be characterized by mapping the position of duplicated loci in such species. Linkage maps are also a valuable tool for genome-wide applications such as genome-wide association studies, quantitative trait loci mapping or genome scans. Here, we investigated chromosomal evolution in Chinook salmon (Oncorhynchus tshawytscha) after genome duplication by mapping 7146 restriction-site associated DNA loci in gynogenetic haploid, gynogenetic diploid, and diploid crosses. In the process, we developed a reference database of restriction-site associated DNA loci for Chinook salmon comprising 48528 non-duplicated loci and 6409 known duplicated loci, which will facilitate locus identification and data sharing. We created a very dense linkage map anchored to all 34 chromosomes for the species, and all arms were identified through centromere mapping. The map positions of 799 duplicated loci revealed that homeologous pairs have diverged at different rates following whole genome duplication, and that degree of differentiation along arms was variable. Many of the homeologous pairs with high numbers of duplicated markers appear conserved with other salmon species, suggesting that retention of conserved homeologous pairing in some arms preceded species divergence. As chromosome arms are highly conserved across species, the major resources developed for Chinook salmon in this study are also relevant for other related species.

  19. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.

    PubMed

    Iyer, Lakshminarayan M; Balaji, S; Koonin, Eugene V; Aravind, L

    2006-04-01

    A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria

  20. Rapid genome-wide evolution in Brassica rapa populations following drought revealed by sequencing of ancestral and descendant gene pools.

    PubMed

    Franks, Steven J; Kane, Nolan C; O'Hara, Niamh B; Tittes, Silas; Rest, Joshua S

    2016-08-01

    There is increasing evidence that evolution can occur rapidly in response to selection. Recent advances in sequencing suggest the possibility of documenting genetic changes as they occur in populations, thus uncovering the genetic basis of evolution, particularly if samples are available from both before and after selection. Here, we had a unique opportunity to directly assess genetic changes in natural populations following an evolutionary response to a fluctuation in climate. We analysed genome-wide differences between ancestors and descendants of natural populations of Brassica rapa plants from two locations that rapidly evolved changes in multiple phenotypic traits, including flowering time, following a multiyear late-season drought in California. These ancestor-descendant comparisons revealed evolutionary shifts in allele frequencies in many genes. Some genes showing evolutionary shifts have functions related to drought stress and flowering time, consistent with an adaptive response to selection. Loci differentiated between ancestors and descendants (FST outliers) were generally different from those showing signatures of selection based on site frequency spectrum analysis (Tajima's D), indicating that the loci that evolved in response to the recent drought and those under historical selection were generally distinct. Very few genes showed similar evolutionary responses between two geographically distinct populations, suggesting independent genetic trajectories of evolution yielding parallel phenotypic changes. The results show that selection can result in rapid genome-wide evolutionary shifts in allele frequencies in natural populations, and highlight the usefulness of combining resurrection experiments in natural populations with genomics for studying the genetic basis of adaptive evolution. PMID:27072809

  1. Rapid genome-wide evolution in Brassica rapa populations following drought revealed by sequencing of ancestral and descendant gene pools.

    PubMed

    Franks, Steven J; Kane, Nolan C; O'Hara, Niamh B; Tittes, Silas; Rest, Joshua S

    2016-08-01

    There is increasing evidence that evolution can occur rapidly in response to selection. Recent advances in sequencing suggest the possibility of documenting genetic changes as they occur in populations, thus uncovering the genetic basis of evolution, particularly if samples are available from both before and after selection. Here, we had a unique opportunity to directly assess genetic changes in natural populations following an evolutionary response to a fluctuation in climate. We analysed genome-wide differences between ancestors and descendants of natural populations of Brassica rapa plants from two locations that rapidly evolved changes in multiple phenotypic traits, including flowering time, following a multiyear late-season drought in California. These ancestor-descendant comparisons revealed evolutionary shifts in allele frequencies in many genes. Some genes showing evolutionary shifts have functions related to drought stress and flowering time, consistent with an adaptive response to selection. Loci differentiated between ancestors and descendants (FST outliers) were generally different from those showing signatures of selection based on site frequency spectrum analysis (Tajima's D), indicating that the loci that evolved in response to the recent drought and those under historical selection were generally distinct. Very few genes showed similar evolutionary responses between two geographically distinct populations, suggesting independent genetic trajectories of evolution yielding parallel phenotypic changes. The results show that selection can result in rapid genome-wide evolutionary shifts in allele frequencies in natural populations, and highlight the usefulness of combining resurrection experiments in natural populations with genomics for studying the genetic basis of adaptive evolution.

  2. Genome-wide Association Study Identifies HLA 8.1 Ancestral Haplotype Alleles as Major Genetic Risk Factors for Myositis Phenotypes

    PubMed Central

    Miller, Frederick W.; Chen, Wei; O’Hanlon, Terrance P.; Cooper, Robert G.; Vencovsky, Jiri; Rider, Lisa G.; Danko, Katalin; Wedderburn, Lucy R.; Lundberg, Ingrid E.; Pachman, Lauren M.; Reed, Ann M.; Ytterberg, Steven R.; Padyukov, Leonid; Selva-O’Callaghan, Albert; Radstake, Timothy R.; Isenberg, David A.; Chinoy, Hector; Ollier, William E.R.; Scheet, Paul; Peng, Bo; Lee, Annette; Byun, Jinyoung; Lamb, Janine A.; Gregersen, Peter K.; Amos, Christopher I.

    2016-01-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis; 473 juvenile dermatomyositis; 532 polymyositis; and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P < 5 × 10−8) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1haplotype comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  3. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration.

    PubMed

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-09-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration-of a scope that is unprecedented in the field of traumatic stress-will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD.

  4. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration

    PubMed Central

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-01-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration—of a scope that is unprecedented in the field of traumatic stress—will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD. PMID:25904361

  5. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments. PMID:16628499

  6. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments.

  7. Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

    PubMed

    Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

    2007-01-01

    The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence. PMID:18185983

  8. A consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial sub-genome rearrangement

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Hexaploid oat (Avena sativa, 2n = 6x = 42) is a member of the Poaceae family with a very large genome (~13 Gb) containing 21 chromosome pairs: seven from each of two similar ancestral diploids (A and D) and seven from a more diverged ancestral diploid (C). Physical rearrangements among ancestral oat...

  9. Distinctive features of large complex virus genomes and proteomes

    PubMed Central

    Mrázek, Jan; Karlin, Samuel

    2007-01-01

    More than a dozen large DNA viruses exceeding 240-kb genome size were recently discovered, including the “giant” mimivirus with a 1.2-Mb genome size. The detection of mimivirus and other large viruses has stimulated new analysis and discussion concerning the early evolution of life and the complexity and mechanisms of evolutionary transitions. This paper presents analysis in three contexts. (i) Genome signatures of large viruses tend to deviate from the genome signatures of their hosts, perhaps indicating that the large viruses are lytic in the hosts. (ii) Proteome composition within these viral genomes contrast with cellular organisms; for example, most eukaryotic genomes, with respect to acidic residue usages, select Glu over Asp, but the opposite generally prevails for the large viral genomes preferring Asp more than Glu. In comparing Phe vs. Tyr usage, the viral genomes select mostly Tyr over Phe, whereas in almost all bacterial and eukaryotic genomes, Phe is used more than Tyr. Interpretations of these contrasts are proffered with respect to protein structure and function. (iii) Frequent oligonucleotides and peptides are characterized in the large viral genomes. The frequent words may provide structural flexibility to interact with host proteins. PMID:17360339

  10. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome.

    PubMed

    Blanc, Guillaume; Hokamp, Karsten; Wolfe, Kenneth H

    2003-02-01

    The Arabidopsis genome contains numerous large duplicated chromosomal segments, but the different approaches used in previous analyses led to different interpretations regarding the number and timing of ancestral large-scale duplication events. Here, using more appropriate methodology and a more recent version of the genome sequence annotation, we investigate the scale and timing of segmental duplications in Arabidopsis. We used protein sequence similarity searches to detect duplicated blocks in the genome, used the level of synonymous substitution between duplicated genes to estimate the relative ages of the blocks containing them, and analyzed the degree of overlap between adjacent duplicated blocks. We conclude that the Arabidopsis lineage underwent at least two distinct episodes of duplication. One was a polyploidy that occurred much more recently than estimated previously, before the Arabidopsis/Brassica rapa split and probably during the early emergence of the crucifer family (24-40 Mya). An older set of duplicated blocks was formed after the monocot/dicot divergence, and the relatively low level of overlap among these blocks indicates that at least some of them are remnants of a larger duplication such as a polyploidy or aneuploidy.

  11. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  12. GDC 2: Compression of large collections of genomes

    PubMed Central

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  13. GDC 2: Compression of large collections of genomes.

    PubMed

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  14. Exon capture optimization in amphibians with large genomes.

    PubMed

    McCartney-Melstad, Evan; Mount, Genevieve G; Shaffer, H Bradley

    2016-09-01

    Gathering genomic-scale data efficiently is challenging for nonmodel species with large, complex genomes. Transcriptome sequencing is accessible for organisms with large genomes, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but relatively few data exist for exon capture experiments in nonmodel organisms with large genomes. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for amphibians and other taxa with large, complex genomes. Increasing the amounts of c0 t-1 repetitive sequence blocker and individual input DNA used in target enrichment reactions reduced the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9% and rendering target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen vertebrates with large genomes, including amphibians. PMID:27223337

  15. Ancestral Origins and Genetic History of Tibetan Highlanders.

    PubMed

    Lu, Dongsheng; Lou, Haiyi; Yuan, Kai; Wang, Xiaoji; Wang, Yuchen; Zhang, Chao; Lu, Yan; Yang, Xiong; Deng, Lian; Zhou, Ying; Feng, Qidi; Hu, Ya; Ding, Qiliang; Yang, Yajun; Li, Shilin; Jin, Li; Guan, Yaqun; Su, Bing; Kang, Longli; Xu, Shuhua

    2016-09-01

    The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection.

  16. Ancestral Origins and Genetic History of Tibetan Highlanders.

    PubMed

    Lu, Dongsheng; Lou, Haiyi; Yuan, Kai; Wang, Xiaoji; Wang, Yuchen; Zhang, Chao; Lu, Yan; Yang, Xiong; Deng, Lian; Zhou, Ying; Feng, Qidi; Hu, Ya; Ding, Qiliang; Yang, Yajun; Li, Shilin; Jin, Li; Guan, Yaqun; Su, Bing; Kang, Longli; Xu, Shuhua

    2016-09-01

    The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection. PMID:27569548

  17. BACFinder: genomic localisation of large insert genomic clones based on restriction fingerprinting

    PubMed Central

    Crowe, Mark L.; Rana, Debashis; Fraser, Fiona; Bancroft, Ian; Trick, Martin

    2002-01-01

    We have developed software that allows the prediction of the genomic location of a bacterial artificial chromosome (BAC) clone, or other large genomic clone, based on a simple restriction digest of the BAC. The mapping is performed by comparing the experimentally derived restriction digest of the BAC DNA with a virtual restriction digest of the whole genome sequence. Our trials indicate that this program identified the genomic regions represented by BAC clones with a degree of accuracy comparable to that of end-sequencing, but at considerably less cost. Although the program has been developed principally for use with Arabidopsis BACs, it should align large insert genomic clones to any fully sequenced genome. PMID:12409477

  18. Territorial Polymers and Large Scale Genome Organization

    NASA Astrophysics Data System (ADS)

    Grosberg, Alexander

    2012-02-01

    Chromatin fiber in interphase nucleus represents effectively a very long polymer packed in a restricted volume. Although polymer models of chromatin organization were considered, most of them disregard the fact that DNA has to stay not too entangled in order to function properly. One polymer model with no entanglements is the melt of unknotted unconcatenated rings. Extensive simulations indicate that rings in the melt at large length (monomer numbers) N approach the compact state, with gyration radius scaling as N^1/3, suggesting every ring being compact and segregated from the surrounding rings. The segregation is consistent with the known phenomenon of chromosome territories. Surface exponent β (describing the number of contacts between neighboring rings scaling as N^β) appears only slightly below unity, β 0.95. This suggests that the loop factor (probability to meet for two monomers linear distance s apart) should decay as s^-γ, where γ= 2 - β is slightly above one. The later result is consistent with HiC data on real human interphase chromosomes, and does not contradict to the older FISH data. The dynamics of rings in the melt indicates that the motion of one ring remains subdiffusive on the time scale well above the stress relaxation time.

  19. Asymptotic Distributions of Coalescence Times and Ancestral Lineage Numbers for Populations with Temporally Varying Size

    PubMed Central

    Chen, Hua; Chen, Kun

    2013-01-01

    The distributions of coalescence times and ancestral lineage numbers play an essential role in coalescent modeling and ancestral inference. Both exact distributions of coalescence times and ancestral lineage numbers are expressed as the sum of alternating series, and the terms in the series become numerically intractable for large samples. More computationally attractive are their asymptotic distributions, which were derived in Griffiths (1984) for populations with constant size. In this article, we derive the asymptotic distributions of coalescence times and ancestral lineage numbers for populations with temporally varying size. For a sample of size n, denote by Tm the mth coalescent time, when m + 1 lineages coalesce into m lineages, and An(t) the number of ancestral lineages at time t back from the current generation. Similar to the results in Griffiths (1984), the number of ancestral lineages, An(t), and the coalescence times, Tm, are asymptotically normal, with the mean and variance of these distributions depending on the population size function, N(t). At the very early stage of the coalescent, when t → 0, the number of coalesced lineages n − An(t) follows a Poisson distribution, and as m → n, n(n−1)Tm/2N(0) follows a gamma distribution. We demonstrate the accuracy of the asymptotic approximations by comparing to both exact distributions and coalescent simulations. Several applications of the theoretical results are also shown: deriving statistics related to the properties of gene genealogies, such as the time to the most recent common ancestor (TMRCA) and the total branch length (TBL) of the genealogy, and deriving the allele frequency spectrum for large genealogies. With the advent of genomic-level sequencing data for large samples, the asymptotic distributions are expected to have wide applications in theoretical and methodological development for population genetic inference. PMID:23666939

  20. Ancestral gene synteny reconstruction improves extant species scaffolding

    PubMed Central

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes. PMID:26450761

  1. Large-scale data mining pilot project in human genome

    SciTech Connect

    Musick, R.; Fidelis, R.; Slezak, T.

    1997-05-01

    This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the accuracy of inference results, and lay the groundwork for a more extensive effort in large-scale data mining. A major aspect of the approach is that we will be fully-staffed data warehousing effort in the human Genome area. The long term goal is strong applications- oriented research program in large-@e data mining. The tools, skill set gained will be directly applicable to a wide spectrum of tasks involving a for large spatial and multidimensional data. This includes applications in ensuring non-proliferation, stockpile stewardship, enabling Global Ecology (Materials Database Industrial Ecology), advancing the Biosciences (Human Genome Project), and supporting data for others (Battlefield Management, Health Care).

  2. Kernel methods for large-scale genomic data analysis

    PubMed Central

    Xing, Eric P.; Schaid, Daniel J.

    2015-01-01

    Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today’s explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion. PMID:25053743

  3. Unraveling recombination rate evolution using ancestral recombination maps

    PubMed Central

    Munch, Kasper; Schierup, Mikkel H; Mailund, Thomas

    2014-01-01

    Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it. PMID:25043668

  4. Evolved Populations of Shigella flexneri Phage Sf6 Acquire Large Deletions, Altered Genomic Architecture, and Faster Life Cycles.

    PubMed

    Dover, John A; Burmeister, Alita R; Molineux, Ian J; Parent, Kristin N

    2016-01-01

    Genomic architecture is the framework within which genes and regulatory elements evolve and where specific constructs may constrain or potentiate particular adaptations. One such construct is evident in phages that use a headful packaging strategy that results in progeny phage heads packaged with DNA until full rather than encapsidating a simple unit-length genome. Here, we investigate the evolution of the headful packaging phage Sf6 in response to barriers that impede efficient phage adsorption to the host cell. Ten replicate populations evolved faster Sf6 life cycles by parallel mutations found in a phage lysis gene and/or by large, 1.2- to 4.0-kb deletions that remove a mobile genetic IS911 element present in the ancestral phage genome. The fastest life cycles were found in phages that acquired both mutations. No mutations were found in genes encoding phage structural proteins, which were a priori expected from the experimental design that imposed a challenge for phage adsorption by using a Shigella flexneri host lacking receptors preferred by Sf6. We used DNA sequencing, molecular approaches, and physiological experiments on 82 clonal isolates taken from all 10 populations to reveal the genetic basis of the faster Sf6 life cycle. The majority of our isolates acquired deletions in the phage genome. Our results suggest that deletions are adaptive and can influence the duration of the phage life cycle while acting in conjunction with other lysis time-determining point mutations. PMID:27497318

  5. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    SciTech Connect

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan; DiFazio, Steven P; Tuskan, Gerald A

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  6. Indexes of large genome collections on a PC.

    PubMed

    Danek, Agnieszka; Deorowicz, Sebastian; Grabowski, Szymon

    2014-01-01

    The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s) genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16-32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp) are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online. PMID:25289699

  7. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution.

  8. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution. PMID:26382233

  9. Are palaeoscolecids ancestral ecdysozoans?

    PubMed

    Harvey, Thomas H P; Dong, Xiping; Donoghue, Philip C J

    2010-01-01

    The reconstruction of ancestors is a central aim of comparative anatomy and evolutionary developmental biology, not least in attempts to understand the relationship between developmental and organismal evolution. Inferences based on living taxa can and should be tested against the fossil record, which provides an independent and direct view onto historical character combinations. Here, we consider the nature of the last common ancestor of living ecdysozoans through a detailed analysis of palaeoscolecids, an early and extinct group of introvert-bearing worms that have been proposed to be ancestral ecdysozoans. In a review of palaeoscolecid anatomy, including newly resolved details of the internal and external cuticle structure, we identify specific characters shared with various living nematoid and scalidophoran worms, but not with panarthropods. Considered within a formal cladistic context, these characters provide most overall support for a stem-priapulid affinity, meaning that palaeoscolecids are far-removed from the ecdysozoan ancestor. We conclude that previous interpretations in which palaeoscolecids occupy a deeper position in the ecdysozoan tree lack particular morphological support and rely instead on a paucity of preserved characters. This bears out a more general point that fossil taxa may appear plesiomorphic merely because they preserve only plesiomorphies, rather than the mélange of primitive and derived characters anticipated of organisms properly allocated to a position deep within animal phylogeny. PMID:20433458

  10. The Mitochondrial Genome of the Leaf-Cutter Ant Atta laevigata: A Mitogenome with a Large Number of Intergenic Spacers

    PubMed Central

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans. PMID:24828084

  11. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  12. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications

    PubMed Central

    2013-01-01

    Background Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Results Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. Conclusions We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the

  13. The Ancestral Gene for Transcribed, Low-Copy Repeats in the Prader-Willi/Angleman Region Encodes a Large Protein Implicated in Protein Trafficking that is Deficient in Mice with Neuromuscular and

    SciTech Connect

    Ji, Y.

    1999-01-01

    Transcribed, low-copy repeat elements are associated with the breakpoint regions of common deletions in Prader-Willi and Angelman syndromes. We report here the identification of the ancestral gene ( HERC2 ) and a family of duplicated, truncated copies that comprise these low-copy repeats. This gene encodes a highly conserved giant protein, HERC2, that is distantly related to p532 (HERC1), a guanine nucleotide exchange factor (GEF) implicated in vesicular trafficking. The mouse genome contains a single Herc2 locus, located in the jdf2 (juvenile development and fertility-2) interval of chromosome 7C. We have identified single nucleotide splice junction mutations in Herc2 in three independent N-ethyl-N-nitrosourea-induced jdf2 mutant alleles, each leading to exon skipping with premature termination of translation and/or deletion of conserved amino acids. Therefore, mutations in Herc2 lead to the neuromuscular secretory vesicle and sperm acrosome defects, other developmental abnormalities and juvenile lethality of jdf2 mice. Combined, these findings suggest that HERC2 is an important gene encoding a GEF involved in protein trafficking and degradation pathways in the cell.

  14. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  15. Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly

    PubMed Central

    Altemose, Nicolas; Miga, Karen H.; Maggioni, Mauro; Willard, Huntington F.

    2014-01-01

    The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations. PMID:24831296

  16. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  17. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    PubMed

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-03-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles.

  18. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    PubMed Central

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Results We identified five large S. coelicolor genomic islands (larger than 25 kb) and 18 smaller islets absent in S. lividans chromosome. Many of these regions show anomalous GC bias and codon usage patterns. Six of them are in close vicinity of tRNA genes while nine are flanked with near perfect repeat sequences indicating that these are probable recent evolutionary acquisitions into S. coelicolor. Embedded within these segments are at least four DNA methylases and two probable methyl-sensing restriction endonucleases. Comparison with S. coelicolor transcriptome and proteome data revealed that some of the missing genes are active during the course of growth and differentiation in S. coelicolor. In particular, a pair of methylmalonyl CoA mutase (mcm) genes involved in polyketide precursor biosynthesis, an acyl-CoA dehydrogenase implicated in timing of actinorhodin synthesis and bldB, a developmentally significant regulator whose mutation causes complete abrogation of antibiotic synthesis belong to this category. Conclusion Our findings provide tangible hints for elucidating the genetic basis of important phenotypic differences between these two streptomycetes. Importantly, absence of certain genes in S. lividans identified here could potentially explain the relative ease of DNA transformations and the conditional lack of actinorhodin synthesis in S. lividans. PMID:17623098

  19. Volume visualization of multiple alignment of large genomicDNA

    SciTech Connect

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  20. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  1. Intron-genome size relationship on a large evolutionary scale.

    PubMed

    Vinogradov, A E

    1999-09-01

    The intron-genome size relationship was studied across a wide evolutionary range (from slime mold and yeast to human and maize), as well as the relationship between genome size and the ratio of intervening/coding sequence size. The average intron size is scaled to genome size with a slope of about one-fourth for the log-transformed values; i.e., on the global scale its increase in evolution is lower than the increase in genome size by four orders of magnitude. There are exceptions to the general trend. In baker's yeast introns are extraordinarily long for its genome size. Tetrapods also have longer introns than expected for their genome sizes. In teleost fish the mean intron size does not differ significantly, notwithstanding the differences in genome size. In contrast to previous reports, avian introns were not found to be significantly shorter than introns of mammals, although avian genomes are smaller than genomes of mammals on average by about a factor of 2.5. The extra-/intragenic ratio of noncoding DNA can be higher in fungi than in animals, notwithstanding the smaller fungal genomes. In vertebrates and invertebrates taken separately, this ratio is increasing as the increase in genome size. Two hypotheses are proposed to explain the variation in the extra-/intragenic ratio of noncoding DNA in organisms with similar numbers of genes: transition (dynamic) and equilibrium (static). According to the transition model, this variation arises with the rapid shift of genome size because the bulk of extragenic DNA can be changed more rapidly than the finely interspersed intron sequences. The equilibrium model assumes that this variation is a result of selective adjustment of genome size with constraints imposed on the intron size due to its putative link to chromatin structure (and constraints of the splicing machinery). PMID:10473779

  2. Combining p-values in large-scale genomics experiments.

    PubMed

    Zaykin, Dmitri V; Zhivotovsky, Lev A; Czika, Wendy; Shao, Susan; Wolfinger, Russell D

    2007-01-01

    In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis - that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K-ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  3. Combining p-values in large scale genomics experiments

    PubMed Central

    Zaykin, Dmitri V.; Zhivotovsky, Lev A.; Czika, Wendy; Shao, Susan; Wolfinger, Russell D.

    2008-01-01

    Summary In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher’s and Lancaster’s combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher’s method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis – that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  4. Ancestral reconstruction of tick lineages.

    PubMed

    Mans, Ben J; de Castro, Minique H; Pienaar, Ronel; de Klerk, Daniel; Gaven, Philasande; Genu, Siyamcela; Latif, Abdalla A

    2016-06-01

    Ancestral reconstruction in its fullest sense aims to describe the complete evolutionary history of a lineage. This depends on accurate phylogenies and an understanding of the key characters of each parental lineage. An attempt is made to delineate our current knowledge with regard to the ancestral reconstruction of the tick (Ixodida) lineage. Tick characters may be assigned to Core of Life, Lineages of Life or Edges of Life phenomena depending on how far back these characters may be assigned in the evolutionary Tree of Life. These include housekeeping genes, sub-cellular systems, heme processing (Core of Life), development, moulting, appendages, nervous and organ systems, homeostasis, respiration (Lineages of Life), specific adaptations to a blood-feeding lifestyle, including the complexities of salivary gland secretions and tick-host interactions (Edges of Life). The phylogenetic relationships of lineages, their origins and importance in ancestral reconstruction are discussed. Uncertainties with respect to systematic relationships, ancestral reconstruction and the challenges faced in comparative transcriptomics (next-generation sequencing approaches) are highlighted. While almost 150 years of information regarding tick biology have been assembled, progress in recent years indicates that we are in the infancy of understanding tick evolution. Even so, broad reconstructions can be made with relation to biological features associated with various lineages. Conservation of characters shared with sister and parent lineages are evident, but appreciable differences are present in the tick lineage indicating modification with descent, as expected for Darwinian evolutionary theory. Many of these differences can be related to the hematophagous lifestyle of ticks. PMID:26868413

  5. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist.

  6. Accommodating the load: The transposable element content of very large genomes.

    PubMed

    Metcalfe, Cushla J; Casane, Didier

    2013-03-01

    Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the 'missing' portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the

  7. Genomic evidence for large, long-lived ancestors to placental mammals.

    PubMed

    Romiguier, J; Ranwez, V; Douzery, E J P; Galtier, N

    2013-01-01

    It is widely assumed that our mammalian ancestors, which lived in the Cretaceous era, were tiny animals that survived massive asteroid impacts in shelters and evolved into modern forms after dinosaurs went extinct, 65 Ma. The small size of most Mesozoic mammalian fossils essentially supports this view. Paleontology, however, is not conclusive regarding the ancestry of extant mammals, because Cretaceous and Paleocene fossils are not easily linked to modern lineages. Here, we use full-genome data to estimate the longevity and body mass of early placental mammals. Analyzing 36 fully sequenced mammalian genomes, we reconstruct two aspects of the ancestral genome dynamics, namely GC-content evolution and nonsynonymous over synonymous rate ratio. Linking these molecular evolutionary processes to life-history traits in modern species, we estimate that early placental mammals had a life span above 25 years and a body mass above 1 kg. This is similar to current primates, cetartiodactyls, or carnivores, but markedly different from mice or shrews, challenging the dominant view about mammalian origin and evolution. Our results imply that long-lived mammals existed in the Cretaceous era and were the most successful in evolution, opening new perspectives about the conditions for survival to the Cretaceous-Tertiary crisis.

  8. Identifying Recent Adaptations in Large-scale Genomic Data

    PubMed Central

    Grossman, Sharon R.; Andersen, Kristian G.; Shlyakhter, Ilya; Tabrizi, Shervin; Winnicki, Sarah; Yen, Angela; Park, Daniel J.; Griesemer, Dustin; Karlsson, Elinor K.; Wong, Sunny H.; Cabili, Moran; Adegbola, Richard A.; Bamezai, Rameshwar N. K.; Hill, Adrian V. S.; Vannberg, Fredrik O.; Rinn, John L.; Lander, Eric S.; Schaffner, Stephen F.; Sabeti, Pardis C.

    2013-01-01

    SUMMARY While several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes Project (1000G) and the Composite of Multiple Signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes thirty-five high-scoring non-synonymous variants, fifty-nine variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate non-synonymous variant in TLR5, and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PMID:23415221

  9. Galaxy: a platform for interactive large-scale genome analysis.

    PubMed

    Giardine, Belinda; Riemer, Cathy; Hardison, Ross C; Burhans, Richard; Elnitski, Laura; Shah, Prachi; Zhang, Yi; Blankenberg, Daniel; Albert, Istvan; Taylor, James; Miller, Webb; Kent, W James; Nekrutenko, Anton

    2005-10-01

    Accessing and analyzing the exponentially expanding genomic sequence and functional data pose a challenge for biomedical researchers. Here we describe an interactive system, Galaxy, that combines the power of existing genome annotation databases with a simple Web portal to enable users to search remote resources, combine data from independent queries, and visualize the results. The heart of Galaxy is a flexible history system that stores the queries from each user; performs operations such as intersections, unions, and subtractions; and links to other computational tools. Galaxy can be accessed at http://g2.bx.psu.edu.

  10. The common ancestral core of vertebrate and fungal telomerase RNAs.

    PubMed

    Qi, Xiaodong; Li, Yang; Honda, Shinji; Hoffmann, Steve; Marz, Manja; Mosig, Axel; Podlevsky, Joshua D; Stadler, Peter F; Selker, Eric U; Chen, Julian J-L

    2013-01-01

    Telomerase is a ribonucleoprotein with an intrinsic telomerase RNA (TER) component. Within yeasts, TER is remarkably large and presents little similarity in secondary structure to vertebrate or ciliate TERs. To better understand the evolution of fungal telomerase, we identified 74 TERs from Pezizomycotina and Taphrinomycotina subphyla, sister clades to budding yeasts. We initially identified TER from Neurospora crassa using a novel deep-sequencing-based approach, and homologous TER sequences from available fungal genome databases by computational searches. Remarkably, TERs from these non-yeast fungi have many attributes in common with vertebrate TERs. Comparative phylogenetic analysis of highly conserved regions within Pezizomycotina TERs revealed two core domains nearly identical in secondary structure to the pseudoknot and CR4/5 within vertebrate TERs. We then analyzed N. crassa and Schizosaccharomyces pombe telomerase reconstituted in vitro, and showed that the two RNA core domains in both systems can reconstitute activity in trans as two separate RNA fragments. Furthermore, the primer-extension pulse-chase analysis affirmed that the reconstituted N. crassa telomerase synthesizes TTAGGG repeats with high processivity, a common attribute of vertebrate telomerase. Overall, this study reveals the common ancestral cores of vertebrate and fungal TERs, and provides insights into the molecular evolution of fungal TER structure and function.

  11. Modeling X-Linked Ancestral Origins in Multiparental Populations

    PubMed Central

    Zheng, Chaozhi

    2015-01-01

    The models for the mosaic structure of an individual’s genome from multiparental populations have been developed primarily for autosomes, whereas X chromosomes receive very little attention. In this paper, we extend our previous approach to model ancestral origin processes along two X chromosomes in a mapping population, which is necessary for developing hidden Markov models in the reconstruction of ancestry blocks for X-linked quantitative trait locus mapping. The model accounts for the joint recombination pattern, the asymmetry between maternally and paternally derived X chromosomes, and the finiteness of population size. The model can be applied to various mapping populations such as the advanced intercross lines (AIL), the Collaborative Cross (CC), the heterogeneous stock (HS), the Diversity Outcross (DO), and the Drosophila synthetic population resource (DSPR). We further derive the map expansion, density (per Morgan) of recombination breakpoints, in advanced intercross populations with L inbred founders under the limit of an infinitely large population size. The analytic results show that for X chromosomes the genetic map expands linearly at a rate (per generation) of two-thirds times 1 – 10/(9L) for the AIL, and at a rate of two-thirds times 1 – 1/L for the DO and the HS, whereas for autosomes the map expands at a rate of 1 – 1/L for the AIL, the DO, and the HS. PMID:25740936

  12. Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases.

    PubMed

    Standage-Beier, Kylie; Zhang, Qi; Wang, Xiao

    2015-11-20

    Programmable CRISPR-Cas systems have augmented our ability to produce precise genome manipulations. Here we demonstrate and characterize the ability of CRISPR-Cas derived nickases to direct targeted recombination of both small and large genomic regions flanked by repetitive elements in Escherichia coli. While CRISPR directed double-stranded DNA breaks are highly lethal in many bacteria, we show that CRISPR-guided nickase systems can be programmed to make precise, nonlethal, single-stranded incisions in targeted genomic regions. This induces recombination events and leads to targeted deletion. We demonstrate that dual-targeted nicking enables deletion of 36 and 97 Kb of the genome. Furthermore, multiplex targeting enables deletion of 133 Kb, accounting for approximately 3% of the entire E. coli genome. This technology provides a framework for methods to manipulate bacterial genomes using CRISPR-nickase systems. We envision this system working synergistically with preexisting bacterial genome engineering methods.

  13. Inferring ancient metabolism using ancestral core metabolic models of enterobacteria

    PubMed Central

    2013-01-01

    Background Enterobacteriaceae diversified from an ancestral lineage ~300-500 million years ago (mya) into a wide variety of free-living and host-associated lifestyles. Nutrient availability varies across niches, and evolution of metabolic networks likely played a key role in adaptation. Results Here we use a paleo systems biology approach to reconstruct and model metabolic networks of ancestral nodes of the enterobacteria phylogeny to investigate metabolism of ancient microorganisms and evolution of the networks. Specifically, we identified orthologous genes across genomes of 72 free-living enterobacteria (16 genera), and constructed core metabolic networks capturing conserved components for ancestral lineages leading to E. coli/Shigella (~10 mya), E. coli/Shigella/Salmonella (~100 mya), and all enterobacteria (~300-500 mya). Using these models we analyzed the capacity for carbon, nitrogen, phosphorous, sulfur, and iron utilization in aerobic and anaerobic conditions, identified conserved and differentiating catabolic phenotypes, and validated predictions by comparison to experimental data from extant organisms. Conclusions This is a novel approach using quantitative ancestral models to study metabolic network evolution and may be useful for identification of new targets to control infectious diseases caused by enterobacteria. PMID:23758866

  14. Assessing Genome-Wide Statistical Significance for Large p Small n Problems

    PubMed Central

    Diao, Guoqing; Vidyashankar, Anand N.

    2013-01-01

    Assessing genome-wide statistical significance is an important issue in genetic studies. We describe a new resampling approach for determining the appropriate thresholds for statistical significance. Our simulation results demonstrate that the proposed approach accurately controls the genome-wide type I error rate even under the large p small n situations. PMID:23666935

  15. Assessing genome-wide statistical significance for large p small n problems.

    PubMed

    Diao, Guoqing; Vidyashankar, Anand N

    2013-07-01

    Assessing genome-wide statistical significance is an important issue in genetic studies. We describe a new resampling approach for determining the appropriate thresholds for statistical significance. Our simulation results demonstrate that the proposed approach accurately controls the genome-wide type I error rate even under the large p small n situations.

  16. FastML: a web server for probabilistic reconstruction of ancestral sequences.

    PubMed

    Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal

    2012-07-01

    Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastml.tau.ac.il/.

  17. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  18. BactoGeNIE: A large-scale comparative genome visualization for big displays

    SciTech Connect

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.

  19. BactoGeNIE: a large-scale comparative genome visualization for big displays

    PubMed Central

    2015-01-01

    Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021

  20. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGES

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  1. Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila

    PubMed Central

    Zhou, Qi; Bachtrog, Doris

    2015-01-01

    Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of ‘neo-sex’ chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration. PMID

  2. Ancestral Chromatin Configuration Constrains Chromatin Evolution on Differentiating Sex Chromosomes in Drosophila.

    PubMed

    Zhou, Qi; Bachtrog, Doris

    2015-06-01

    Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of 'neo-sex' chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration.

  3. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. PMID:25919952

  4. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  5. Engineering large viral DNA genomes using the CRISPR-Cas9 system.

    PubMed

    Suenaga, Tadahiro; Kohyama, Masako; Hirayasu, Kouyuki; Arase, Hisashi

    2014-09-01

    Manipulation of viral genomes is essential for studying viral gene function and utilizing viruses for therapy. Several techniques for viral genome engineering have been developed. Homologous recombination in virus-infected cells has traditionally been used to edit viral genomes; however, the frequency of the expected recombination is quite low. Alternatively, large viral genomes have been edited using a bacterial artificial chromosome (BAC) plasmid system. However, cloning of large viral genomes into BAC plasmids is both laborious and time-consuming. In addition, because it is possible for insertion into the viral genome of drug selection markers or parts of BAC plasmids to affect viral function, artificial genes sometimes need to be removed from edited viruses. Herpes simplex virus (HSV), a common DNA virus with a genome length of 152 kbp, causes labialis, genital herpes and encephalitis. Mutant HSV is a candidate for oncotherapy, in which HSV is used to kill tumor cells. In this study, the clustered regularly interspaced short palindromic repeat-Cas9 system was used to very efficiently engineer HSV without inserting artificial genes into viral genomes. Not only gene-ablated HSV but also gene knock-in HSV were generated using this method. Furthermore, selection with phenotypes of edited genes promotes the isolation efficiencies of expectedly mutated viral clones. Because our method can be applied to other DNA viruses such as Epstein-Barr virus, cytomegaloviruses, vaccinia virus and baculovirus, our system will be useful for studying various types of viruses, including clinical isolates.

  6. Identification of repeat structure in large genomes using repeat probability clouds.

    PubMed

    Gu, Wanjun; Castoe, Todd A; Hedges, Dale J; Batzer, Mark A; Pollock, David D

    2008-09-01

    The identification of repeat structure in eukaryotic genomes can be time-consuming and difficult because of the large amount of information ( approximately 3 x 10(9) bp) that needs to be processed and compared. We introduce a new approach based on exact word counts to evaluate, de novo, the repeat structure present within large eukaryotic genomes. This approach avoids sequence alignment and similarity search, two of the most time-consuming components of traditional methods for repeat identification. Algorithms were implemented to efficiently calculate exact counts for any length oligonucleotide in large genomes. Based on these oligonucleotide counts, oligonucleotide excess probability clouds, or "P-clouds," were constructed. P-clouds are composed of clusters of related oligonucleotides that occur, as a group, more often than expected by chance. After construction, P-clouds were mapped back onto the genome, and regions of high P-cloud density were identified as repetitive regions based on a sliding window approach. This efficient method is capable of analyzing the repeat content of the entire human genome on a single desktop computer in less than half a day, at least 10-fold faster than current approaches. The predicted repetitive regions strongly overlap with known repeat elements as well as other repetitive regions such as gene families, pseudogenes, and segmental duplicons. This method should be extremely useful as a tool for use in de novo identification of repeat structure in large newly sequenced genomes.

  7. RNA–DNA differences in human mitochondria restore ancestral form of 16S ribosomal RNA

    PubMed Central

    Bar-Yaacov, Dan; Avital, Gal; Levin, Liron; Richards, Allison L.; Hachen, Naomi; Rebolledo Jaramillo, Boris; Nekrutenko, Anton; Zarivach, Raz; Mishmar, Dan

    2013-01-01

    RNA transcripts are generally identical to the underlying DNA sequences. Nevertheless, RNA–DNA differences (RDDs) were found in the nuclear human genome and in plants and animals but not in human mitochondria. Here, by deep sequencing of human mitochondrial DNA (mtDNA) and RNA, we identified three RDD sites at mtDNA positions 295 (C-to-U), 13710 (A-to-U, A-to-G), and 2617 (A-to-U, A-to-G). Position 2617, within the 16S rRNA, harbored the most prevalent RDDs (>30% A-to-U and ∼15% A-to-G of the reads in all tested samples). The 2617 RDDs appeared already at the precursor polycistrone mitochondrial transcript. By using traditional Sanger sequencing, we identified the A-to-U RDD in six different cell lines and representative primates (Gorilla gorilla, Pongo pigmaeus, and Macaca mulatta), suggesting conservation of the mechanism generating such RDD. Phylogenetic analysis of more than 1700 vertebrate mtDNA sequences supported a thymine as the primate ancestral allele at position 2617, suggesting that the 2617 RDD recapitulates the ancestral 16S rRNA. Modeling U or G (the RDDs) at position 2617 stabilized the large ribosomal subunit structure in contrast to destabilization by an A (the pre-RDDs). Hence, these mitochondrial RDDs are likely functional. PMID:23913925

  8. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  9. The draft genome of the large yellow croaker reveals well-developed innate immunity.

    PubMed

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  10. The draft genome of the large yellow croaker reveals well-developed innate immunity.

    PubMed

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies.

  11. The draft genome of the large yellow croaker reveals well-developed innate immunity

    PubMed Central

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  12. Large-scale profiling of microRNAs for The Cancer Genome Atlas

    PubMed Central

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J.; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A.

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ∼11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  13. Compression of Large genomic datasets using COMRAD on Parallel Computing Platform

    PubMed Central

    Biji, Christopher Leela; Madhu, Manu K; Vishnu, Vineetha; K, Satheesh Kumar; Vijayakumar; Nair, Achuthsankar S

    2015-01-01

    The big data storage is a challenge in a post genome era. Hence, there is a need for high performance computing solutions for managing large genomic data. Therefore, it is of interest to describe a parallel-computing approach using message-passing library for distributing the different compression stages in clusters. The genomic compression helps to reduce the on disk“foot print” of large data volumes of sequences. This supports the computational infrastructure for a more efficient archiving. The approach was shown to find utility in 21 Eukaryotic genomes using stratified sampling in this report. The method achieves an average of 6-fold disk space reduction with three times better compression time than COMRAD. Availability The source codes are written in C using message passing libraries and are available at https:// sourceforge.net/ projects/ comradmpi/files / COMRADMPI/ PMID:26124572

  14. Large-scale profiling of microRNAs for The Cancer Genome Atlas.

    PubMed

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ~11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts.

  15. Novel Porcine Epidemic Diarrhea Virus Variant with Large Genomic Deletion, South Korea

    PubMed Central

    Park, Seongjun; Kim, Sanghyun; Song, Daesub

    2014-01-01

    Since 1992, porcine epidemic diarrhea virus (PEDV) has been one of the most common porcine diarrhea–associated viruses in South Korea. We conducted a large-scale investigation of the incidence of PEDV in pigs with diarrhea in South Korea and consequently identified and characterized a novel PEDV variant with a large genomic deletion. PMID:25424875

  16. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods. PMID:12200476

  17. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods.

  18. Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials

    PubMed Central

    Meric-Bernstam, Funda; Brusco, Lauren; Shaw, Kenna; Horombe, Chacha; Kopetz, Scott; Davies, Michael A.; Routbort, Mark; Piha-Paul, Sarina A.; Janku, Filip; Ueno, Naoto; Hong, David; De Groot, John; Ravi, Vinod; Li, Yisheng; Luthra, Raja; Patel, Keyur; Broaddus, Russell; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Purpose We report the experience with 2,000 consecutive patients with advanced cancer who underwent testing on a genomic testing protocol, including the frequency of actionable alterations across tumor types, subsequent enrollment onto clinical trials, and the challenges for trial enrollment. Patients and Methods Standardized hotspot mutation analysis was performed in 2,000 patients, using either an 11-gene (251 patients) or a 46- or 50-gene (1,749 patients) multiplex platform. Thirty-five genes were considered potentially actionable based on their potential to be targeted with approved or investigational therapies. Results Seven hundred eighty-nine patients (39%) had at least one mutation in potentially actionable genes. Eighty-three patients (11%) with potentially actionable mutations went on genotype-matched trials targeting these alterations. Of 230 patients with PIK3CA/AKT1/PTEN/BRAF mutations that returned for therapy, 116 (50%) received a genotype-matched drug. Forty patients (17%) were treated on a genotype-selected trial requiring a mutation for eligibility, 16 (7%) were treated on a genotype-relevant trial targeting a genomic alteration without biomarker selection, and 40 (17%) received a genotype-relevant drug off trial. Challenges to trial accrual included patient preference of noninvestigational treatment or local treatment, poor performance status or other reasons for trial ineligibility, lack of trials/slots, and insurance denial. Conclusion Broad implementation of multiplex hotspot testing is feasible; however, only a small portion of patients with actionable alterations were actually enrolled onto genotype-matched trials. Increased awareness of therapeutic implications and access to novel therapeutics are needed to optimally leverage results from broad-based genomic testing. PMID:26014291

  19. Multiway admixture deconvolution using phased or unphased ancestral panels.

    PubMed

    Churchhouse, Claire; Marchini, Jonathan

    2013-01-01

    We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model-Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported. PMID:23136122

  20. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  1. Inference of Ancestral Recombination Graphs through Topological Data Analysis.

    PubMed

    Cámara, Pablo G; Levine, Arnold J; Rabadán, Raúl

    2016-08-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  2. Software engineering the mixed model for genome-wide association studies on large samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  3. Efficient de novo assembly of large genomes using compressed data structures

    PubMed Central

    Simpson, Jared T.; Durbin, Richard

    2012-01-01

    De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data. SGA uses the overlap-based string graph model of assembly, unlike most de novo assemblers that rely on de Bruijn graphs, and is simply parallelizable. We demonstrate the error correction and assembly performance of SGA on 1.2 billion sequence reads from a human genome, which we are able to assemble using 54 GB of memory. The resulting contigs are highly accurate and contiguous, while covering 95% of the reference genome (excluding contigs <200 bp in length). Because of the low memory requirements and parallelization without requiring inter-process communication, SGA provides the first practical assembler to our knowledge for a mammalian-sized genome on a low-end computing cluster. PMID:22156294

  4. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity

    PubMed Central

    Nikolaev, L.G; Akopov, S.B; Didych, D.A; Sverdlov, E.D

    2009-01-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  5. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity.

    PubMed

    Nikolaev, L G; Akopov, S B; Didych, D A; Sverdlov, E D

    2009-08-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  6. Digital genotyping of sorghum – a diverse plant species with a large repeat-rich genome

    PubMed Central

    2013-01-01

    Background Rapid acquisition of accurate genotyping information is essential for all genetic marker-based studies. For species with relatively small genomes, complete genome resequencing is a feasible approach for genotyping; however, for species with large and highly repetitive genomes, the acquisition of whole genome sequences for the purpose of genotyping is still relatively inefficient and too expensive to be carried out on a high-throughput basis. Sorghum bicolor is a C4 grass with a sequenced genome size of ~730 Mb, of which ~80% is highly repetitive. We have developed a restriction enzyme targeted genome resequencing method for genetic analysis, termed Digital Genotyping (DG), to be applied to sorghum and other grass species with large repeat-rich genomes. Results DG templates are generated using one of three methylation sensitive restriction enzymes that recognize a nested set of 4, 6 or 8 bp GC-rich sequences, enabling varying depth of analysis and integration of results among assays. Variation in sequencing efficiency among DG markers was correlated with template GC-content and length. The expected DG allele sequence was obtained 97.3% of the time with a ratio of expected to alternative allele sequence acquisition of >20:1. A genetic map aligned to the sorghum genome sequence with an average resolution of 1.47 cM was constructed using 1,772 DG markers from 137 recombinant inbred lines. The DG map enhanced the detection of QTL for variation in plant height and precisely aligned QTL such as Dw3 to underlying genes/alleles. Higher-resolution NgoMIV-based DG haplotypes were used to trace the origin of DNA on SBI-06, spanning Ma1 and Dw2 from progenitors to BTx623 and IS3620C. DG marker analysis identified the correct location of two miss-assembled regions and located seven super contigs in the sorghum reference genome sequence. Conclusion DG technology provides a cost-effective approach to rapidly generate accurate genotyping data in sorghum. Currently

  7. FVGWAS: Fast Voxelwise Genome Wide Association Analysis of Large-scale Imaging Genetic Data 1

    PubMed Central

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yang, Yu; Lu, Zhaohua; Feng, Qianjing; Knickmeyer, Rebecca C; Zhu, Hongtu

    2015-01-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC > 12 million known variants) associations with signals at millions of locations (NV ~ 106) in the brain from thousands of subjects (n ~ 103). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to e ciently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (G-SIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O(nNV NC) for voxelwise genome wide association analysis (VGWAS) method compared with O((NC + NV)n2) for FVGWAS. Simulation studies show that FVGWAS is an effcient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275 voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645 seconds for a single CPU. Our FVG-WAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  8. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli

    PubMed Central

    Lee, Heewook; Doak, Thomas G.; Popodi, Ellen; Foster, Patricia L.; Tang, Haixu

    2016-01-01

    A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10−4 insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10−5 recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186. Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5′-GGGG(N6/N7)CCCC-3′. We also detected 48 long deletions not involving IS elements. PMID:27431326

  9. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

    PubMed Central

    2013-01-01

    Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of

  10. Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond

    SciTech Connect

    Oehmen, Christopher S.; Sofia, Heidi J.; Baxter, Douglas; Szeto, Ernest; Hugenholtz, Philip; Kyrpides, Nikos; Markowitz, Victor; Straatsma, Tjerk P.

    2007-06-01

    Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the Integrated Microbial Genome (IMG) system at the Joint Genome Institute (JGI). We present an example of how ScalaBLAST, a high-throughput sequence analysis program harnesses increasingly critical high-performance computing to perform sequence analysis which is a critical component of maintaining a state-of-the-art sequence data repository. The Integrated Microbial Genomes (IMG) system1 is a data management and analysis platform for microbial genomes hosted at the JGI. IMG contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life. IMG provides tools and viewers for interactive analysis of genomes, genes and functions, individually or in a comparative context. Most of these tools are based on pre-computed pairwise sequence similarities involving millions of genes. These computations are becoming prohibitively time consuming with the rapid increase in the number of newly sequenced genomes incorporated into IMG and the need to refresh regularly the content of IMG in order to reflect changes in the annotations of existing genomes. Thus, building IMG 2.0 (released on December 1st 2006) entailed reloading from NCBI's RefSeq all the genomes in the previous version of IMG (IMG 1.6, as of September 1st, 2006) together with 1,541 new public microbial,viral and eukaryal genomes, bringing the total of IMG genomes to 2,301. A critical part of building IMG 2.0 involved using PNNL ScalaBLAST software for computing pairwise similarities for over 2.2 million genes in under 26 hours on 1,000 processors, thus illustrating the impact that new generation bioinformatics tools are poised to make in biology. The BLAST algorithm2, 3 is a familiar bioinformatics application for computing sequence similarity, and has become a workhorse in large

  11. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  12. Biological Consequences of Ancient Gene Acquisition and Duplication in the Large Genome of Candidatus Solibacter usitatus Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Hauser, Loren John; Land, Miriam L; Xie, Gary; Kuske, Cheryl R

    2011-01-01

    Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2 5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.

  13. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    SciTech Connect

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  14. Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus

    PubMed Central

    2010-01-01

    Background Higher crustaceans (class Malacostraca) represent the most species-rich and morphologically diverse group of non-insect arthropods and many of its members are commercially important. Although the crustacean DNA sequence information is growing exponentially, little is known about the genome organization of Malacostraca. Here, we constructed a bacterial artificial chromosome (BAC) library and performed BAC-end sequencing to provide genomic information for kuruma shrimp (Marsupenaeus japonicus), one of the most widely cultured species among crustaceans, and found the presence of a redundant sequence in the BAC library. We examined the BAC clone that includes the redundant sequence to further analyze its length, copy number and location in the kuruma shrimp genome. Results Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences. Conclusions Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the

  15. The large (134.9 kb) mitochondrial genome of the glomeromycete Funneliformis mosseae.

    PubMed

    Nadimi, Maryam; Stefani, Franck O P; Hijri, Mohamed

    2016-10-01

    Funneliformis mosseae is among the most ecologically and economically important glomeromycete species and occurs both in natural and disturbed areas in a wide range of habitats and climates. In this study, we report the sequencing of the complete mitochondrial (mt) genome of F. mosseae isolate FL299 using 454 pyrosequencing and Illumina HiSeq technologies. This mt genome is a full-length circular chromosome of 134,925 bp, placing it among the largest mitochondrial DNAs (mtDNAs) in the fungal kingdom. A comparative analysis with publically available arbuscular mycorrhizal fungal mtDNAs revealed that the mtDNA of F. mosseae FL299 contained a very large number of insertions contributing to its expansion. The gene synteny was completely reshuffled compared to previously published glomeromycotan mtDNAs and several genes were oriented in an anti-sense direction. Furthermore, the presence of different types of introns and insertions in rnl (14 introns) made this gene very distinctive in Glomeromycota. The presence of alternative genetic codes in both initiation (GUG) and termination (UGA) codons was another new feature in this mtDNA compared to previously published glomeromycotan mt genomes. The phylogenetic analysis inferred from the analysis of 14 protein mt genes confirmed the position of the Glomeromycota clade as a sister group of Mortierellomycotina. This mt genome is the largest observed so far in Glomeromycota and the first mt genome within the Funneliformis clade, providing new opportunities to better understand their evolution and to develop molecular markers.

  16. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Shih, Patrick

    2012-03-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  17. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick [Kerfeld Lab, UC Berkeley and JGI

    2016-07-12

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  18. Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

    PubMed Central

    2012-01-01

    Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433

  19. The Use of Weighted Graphs for Large-Scale Genome Analysis

    PubMed Central

    Zhou, Fang; Toivonen, Hannu; King, Ross D.

    2014-01-01

    There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061

  20. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2.

  1. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2. PMID:27527786

  2. Fusion of Large-Scale Genomic Knowledge and Frequency Data Computationally Prioritizes Variants in Epilepsy

    PubMed Central

    Campbell, Ian M.; Rao, Mitchell; Arredondo, Sean D.; Lalani, Seema R.; Xia, Zhilian; Kang, Sung-Hae L.; Bi, Weimin; Breman, Amy M.; Smith, Janice L.; Bacino, Carlos A.; Beaudet, Arthur L.; Patel, Ankita; Cheung, Sau Wai; Lupski, James R.; Stankiewicz, Paweł; Ramocki, Melissa B.; Shaw, Chad A.

    2013-01-01

    Curation and interpretation of copy number variants identified by genome-wide testing is challenged by the large number of events harbored in each personal genome. Conventional determination of phenotypic relevance relies on patterns of higher frequency in affected individuals versus controls; however, an increasing amount of ascertained variation is rare or private to clans. Consequently, frequency data have less utility to resolve pathogenic from benign. One solution is disease-specific algorithms that leverage gene knowledge together with variant frequency to aid prioritization. We used large-scale resources including Gene Ontology, protein-protein interactions and other annotation systems together with a broad set of 83 genes with known associations to epilepsy to construct a pathogenicity score for the phenotype. We evaluated the score for all annotated human genes and applied Bayesian methods to combine the derived pathogenicity score with frequency information from our diagnostic laboratory. Analysis determined Bayes factors and posterior distributions for each gene. We applied our method to subjects with abnormal chromosomal microarray results and confirmed epilepsy diagnoses gathered by electronic medical record review. Genes deleted in our subjects with epilepsy had significantly higher pathogenicity scores and Bayes factors compared to subjects referred for non-neurologic indications. We also applied our scores to identify a recently validated epilepsy gene in a complex genomic region and to reveal candidate genes for epilepsy. We propose a potential use in clinical decision support for our results in the context of genome-wide screening. Our approach demonstrates the utility of integrative data in medical genomics. PMID:24086149

  3. Fusion of large-scale genomic knowledge and frequency data computationally prioritizes variants in epilepsy.

    PubMed

    Campbell, Ian M; Rao, Mitchell; Arredondo, Sean D; Lalani, Seema R; Xia, Zhilian; Kang, Sung-Hae L; Bi, Weimin; Breman, Amy M; Smith, Janice L; Bacino, Carlos A; Beaudet, Arthur L; Patel, Ankita; Cheung, Sau Wai; Lupski, James R; Stankiewicz, Paweł; Ramocki, Melissa B; Shaw, Chad A

    2013-01-01

    Curation and interpretation of copy number variants identified by genome-wide testing is challenged by the large number of events harbored in each personal genome. Conventional determination of phenotypic relevance relies on patterns of higher frequency in affected individuals versus controls; however, an increasing amount of ascertained variation is rare or private to clans. Consequently, frequency data have less utility to resolve pathogenic from benign. One solution is disease-specific algorithms that leverage gene knowledge together with variant frequency to aid prioritization. We used large-scale resources including Gene Ontology, protein-protein interactions and other annotation systems together with a broad set of 83 genes with known associations to epilepsy to construct a pathogenicity score for the phenotype. We evaluated the score for all annotated human genes and applied Bayesian methods to combine the derived pathogenicity score with frequency information from our diagnostic laboratory. Analysis determined Bayes factors and posterior distributions for each gene. We applied our method to subjects with abnormal chromosomal microarray results and confirmed epilepsy diagnoses gathered by electronic medical record review. Genes deleted in our subjects with epilepsy had significantly higher pathogenicity scores and Bayes factors compared to subjects referred for non-neurologic indications. We also applied our scores to identify a recently validated epilepsy gene in a complex genomic region and to reveal candidate genes for epilepsy. We propose a potential use in clinical decision support for our results in the context of genome-wide screening. Our approach demonstrates the utility of integrative data in medical genomics.

  4. Ancient eudicot hexaploidy meets ancestral eurosid gene order

    PubMed Central

    2013-01-01

    Background A hexaploidization event over 125 Mya underlies the evolutionary lineage of the majority of flowering plants, including very many species of agricultural importance. Half of these belong to the rosid subgrouping, containing severals whose genome sequences have been published. Although most duplicate and triplicate genes have been lost in all descendants, clear traces of the original chromosome triples can be discerned, their internal contiguity highly conserved in some genomes and very fragmented in others. To understand the particular evolutionary patterns of plant genomes, there is a need to systematically survey the fate of the subgenomes of polyploids, including the retention of a small proportion of the duplicate and triplicate genes and the reconstruction of putative ancestral intermediates between the original hexaploid and modern species, in this case the ancestor of the eurosid clade. Results We quantitatively trace the fate of gene triples originating in the hexaploidy across seven core eudicot flowering plants, and fit this to a two-stage model, pre- and post-radiation. We also measure the simultaneous dynamics of duplicate orthologous gene loss in three rosids, as influenced by biological functional class. We propose a new protocol for reconstructing ancestral gene order using only gene adjacency data from pairwise genomic analyses, based on repeating MAXIMUM WEIGHT MATCHING at two levels of resolution, an approach designed to transcend limitations on reconstructed contig size, while still avoiding the ambiguities of a multiplicity of solutions. Applied to three high-quality rosid genomes without subsequent polyploidy events, our automated procedure reconstructs the ancestor of the eurosid clade. Conclusions The gene loss analysis and the ancestor reconstruction present complementary assessments of post-hexaploidization evolution, the first at the level of individual gene families within and across sister genomes and the second at the

  5. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.

    PubMed

    Warren, René L; Keeling, Christopher I; Yuen, Macaire Man Saint; Raymond, Anthony; Taylor, Greg A; Vandervalk, Benjamin P; Mohamadi, Hamid; Paulino, Daniel; Chiu, Readman; Jackman, Shaun D; Robertson, Gordon; Yang, Chen; Boyle, Brian; Hoffmann, Margarete; Weigel, Detlef; Nelson, David R; Ritland, Carol; Isabel, Nathalie; Jaquish, Barry; Yanchuk, Alvin; Bousquet, Jean; Jones, Steven J M; MacKay, John; Birol, Inanc; Bohlmann, Joerg

    2015-07-01

    White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation. PMID:26017574

  6. Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism.

    PubMed

    Warren, René L; Keeling, Christopher I; Yuen, Macaire Man Saint; Raymond, Anthony; Taylor, Greg A; Vandervalk, Benjamin P; Mohamadi, Hamid; Paulino, Daniel; Chiu, Readman; Jackman, Shaun D; Robertson, Gordon; Yang, Chen; Boyle, Brian; Hoffmann, Margarete; Weigel, Detlef; Nelson, David R; Ritland, Carol; Isabel, Nathalie; Jaquish, Barry; Yanchuk, Alvin; Bousquet, Jean; Jones, Steven J M; MacKay, John; Birol, Inanc; Bohlmann, Joerg

    2015-07-01

    White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.

  7. DNA from Dust: Comparative Genomics of Large DNA Viruses in Field Surveillance Samples

    PubMed Central

    Pandey, Utsav; Bell, Andrew S.; Renner, Daniel W.; Kennedy, David A.; Shreve, Jacob T.; Cairns, Chris L.; Jones, Matthew J.; Dunn, Patricia A.; Read, Andrew F.

    2016-01-01

    ABSTRACT The intensification of the poultry industry over the last 60 years facilitated the evolution of increased virulence and vaccine breaks in Marek’s disease virus (MDV-1). Full-genome sequences are essential for understanding why and how this evolution occurred, but what is known about genome-wide variation in MDV comes from laboratory culture. To rectify this, we developed methods for obtaining high-quality genome sequences directly from field samples without the need for sequence-based enrichment strategies prior to sequencing. We applied this to the first characterization of MDV-1 genomes from the field, without prior culture. These viruses were collected from vaccinated hosts that acquired naturally circulating field strains of MDV-1, in the absence of a disease outbreak. This reflects the current issue afflicting the poultry industry, where virulent field strains continue to circulate despite vaccination and can remain undetected due to the lack of overt disease symptoms. We found that viral genomes from adjacent field sites had high levels of overall DNA identity, and despite strong evidence of purifying selection, had coding variations in proteins associated with virulence and manipulation of host immunity. Our methods empower ecological field surveillance, make it possible to determine the basis of viral virulence and vaccine breaks, and can be used to obtain full genomes from clinical samples of other large DNA viruses, known and unknown. IMPORTANCE Despite both clinical and laboratory data that show increased virulence in field isolates of MDV-1 over the last half century, we do not yet understand the genetic basis of its pathogenicity. Our knowledge of genome-wide variation between strains of this virus comes exclusively from isolates that have been cultured in the laboratory. MDV-1 isolates tend to lose virulence during repeated cycles of replication in the laboratory, raising concerns about the ability of cultured isolates to accurately

  8. Comparative Genomics of Amphibian-like Ranaviruses, Nucleocytoplasmic Large DNA Viruses of Poikilotherms

    PubMed Central

    Price, Stephen J.

    2015-01-01

    Recent research on genome evolution of large DNA viruses has highlighted a number of incredibly dynamic processes that can facilitate rapid adaptation. The genomes of amphibian-like ranaviruses – double-stranded DNA viruses infecting amphibians, reptiles, and fish (family Iridoviridae) – were examined to assess variation in genome content and evolutionary processes. The viruses studied were closely related, but their genome content varied considerably, with 29 genes identified that were not present in all of the major clades. Twenty-one genes had evidence of recombination, while a virus isolated from a captive reptile appeared to be a mosaic of two divergent parents. Positive selection was also found to be acting on more than a quarter of Ranavirus genes and was found most frequently in the Spanish common midwife toad virus, which has had a severe impact on amphibian host communities. Efforts to resolve the root of this group by inclusion of an outgroup were inconclusive, but a set of core genes were identified, which recovered a well-supported species tree. PMID:27812275

  9. Global patterns of large copy number variations in the human genome reveal complexity in chromosome organization.

    PubMed

    Veerappa, Avinash M; Suresh, Raviraj V; Vishweswaraiah, Sangeetha; Lingaiah, Kusuma; Murthy, Megha; Manjegowda, Dinesh S; Padakannaya, Prakash; Ramachandra, Nallur B

    2015-01-01

    Global patterns of copy number variations (CNVs) in chromosomes are required to understand the dynamics of genome organization and complexity. For this study, analysis was performed using the Affymetrix Genome-Wide Human SNP Array 6.0 chip and CytoScan High-Density arrays. We identified a total of 44 109 CNVs from 1715 genomes with a mean of 25 CNVs in an individual, which established the first drafts of population-specific CNV maps providing a rationale for prioritizing chromosomal regions. About 19 905 ancient CNVs were identified across all chromosomes and populations at varying frequencies. CNV count, and sometimes CNV size, contributed to the bulk CNV size of the chromosome. Population specific lengthening and shortening of chromosomal length was observed. Sex bias for CNV presence was largely dependent on ethnicity. Lower CNV inheritance rate was observed for India, compared to YRI and CEU. A total of 33 candidate CNV hotspots from 5382 copy number (CN) variable region (CNVR) clusters were identified. Population specific CNV distribution patterns in p and q arms disturbed the assumption that CNV counts in the p arm are less common compared to long arms, and the CNV occurrence and distribution in chromosomes is length independent. This study unraveled the force of independent evolutionary dynamics on genome organization and complexity across chromosomes and populations. PMID:26390810

  10. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

    PubMed

    Gao, Song; Bertrand, Denis; Chia, Burton K H; Nagarajan, Niranjan

    2016-05-11

    The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies.

  11. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  12. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories

    PubMed Central

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-01-01

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp. PMID:27657141

  13. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-01-01

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp. PMID:27657141

  14. The ancestral gene repertoire of animal stem cells

    PubMed Central

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-01-01

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the “germ-line multipotency program” and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  15. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells.

  16. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  17. The large (134.9 kb) mitochondrial genome of the glomeromycete Funneliformis mosseae.

    PubMed

    Nadimi, Maryam; Stefani, Franck O P; Hijri, Mohamed

    2016-10-01

    Funneliformis mosseae is among the most ecologically and economically important glomeromycete species and occurs both in natural and disturbed areas in a wide range of habitats and climates. In this study, we report the sequencing of the complete mitochondrial (mt) genome of F. mosseae isolate FL299 using 454 pyrosequencing and Illumina HiSeq technologies. This mt genome is a full-length circular chromosome of 134,925 bp, placing it among the largest mitochondrial DNAs (mtDNAs) in the fungal kingdom. A comparative analysis with publically available arbuscular mycorrhizal fungal mtDNAs revealed that the mtDNA of F. mosseae FL299 contained a very large number of insertions contributing to its expansion. The gene synteny was completely reshuffled compared to previously published glomeromycotan mtDNAs and several genes were oriented in an anti-sense direction. Furthermore, the presence of different types of introns and insertions in rnl (14 introns) made this gene very distinctive in Glomeromycota. The presence of alternative genetic codes in both initiation (GUG) and termination (UGA) codons was another new feature in this mtDNA compared to previously published glomeromycotan mt genomes. The phylogenetic analysis inferred from the analysis of 14 protein mt genes confirmed the position of the Glomeromycota clade as a sister group of Mortierellomycotina. This mt genome is the largest observed so far in Glomeromycota and the first mt genome within the Funneliformis clade, providing new opportunities to better understand their evolution and to develop molecular markers. PMID:27246226

  18. A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics

    PubMed Central

    Doroghazi, James R.; Albright, Jessica C.; Goering, Anthony W.; Ju, Kou-San; Haines, Robert R.; Tchalukov, Konstantin A.; Labeda, David P.; Kelleher, Neil L.; Metcalf, William W.

    2014-01-01

    Actinobacteria encode a wealth of natural product biosynthetic gene clusters (NPGCs), whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprised of 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence/absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of novel natural products using large data sets. Extrapolation from the 830-genome dataset reveals that Actinobacteria encode hundreds of thousands of future drug leads, while the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them. PMID:25262415

  19. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    PubMed Central

    Schiffer, Philipp H.; Gravemeyer, Jan; Rauscher, Martina; Wiehe, Thomas

    2016-01-01

    Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term “run-away evolution”. This process might ultimately lead to the failure of genomic integrity and drive species to extinction. PMID:27509525

  20. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Xie, Gary; Kuske, Cheryl R; Hauser, Loren; Land, Miriam

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  1. From the double-helix to novel approaches to the sequencing of large genomes.

    PubMed

    Szybalski, W

    1993-12-15

    Elucidation of the structure of DNA by Watson and Crick [Nature 171 (1953) 737-738] has led to many crucial molecular experiments, including studies on DNA replication, transcription, physical mapping, and most recently to serious attempts directed toward the sequencing of large genomes [Watson, Science 248 (1990) 44-49]. I am totally convinced of the great importance of the Human Genome Project, and toward achieving this goal I strongly favor 'top-down' approaches consisting of the physical mapping and preparation of contiguous 50-100-kb fragments directly from the genome, followed by their automated sequencing based on the rapid assembly of primers by hexamer ligation together with primer walking. Our 'top-down' procedures totally avoids conventional cloning, subcloning and random sequencing, which are the elements of the present 'bottom-up' procedures. Fragments of 50-100 kb are prepared in sufficient quantities either by in vitro excision with rare-cutting restriction systems (including Achilles' heel cleavage [AC] or the RecA-AC procedures of Koob et al. [Nucleic Acids Res. 20 (1992) 5831-5836]) or by in vivo excision and amplification using the yeast FRT/Flp system or the phage lambda att/Int system. Such fragments, when derived directly from the Escherichia coli genome, are arranged in consecutive order, so that 50 specially constructed strains of E. coli would supply 50 end-to-end arranged approx. 100-kb fragments, which will cover the entire approx. 5-Mb E. coli genome. For the 150-Mb Drosophila melanogaster genome, 1500 of such consecutive 100-kb fragments (supplied by 1500 strains) are required to cover the entire genome. The fragments will be sequenced by the SPEL-6 method involving hexamer ligation [Szybalski, Gene 90 (1990) 177-178; Fresenius J. Anal. Chem. 4 (1992) 343] and primer walking. The 18-mer primers are synthesized in only a few minutes from three contiguous hexamers annealed to the DNA strand to be sequenced when using an over 100-fold

  2. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing

    PubMed Central

    Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J.

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  3. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-11-10

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes.

  4. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  5. Combining comparative sequence and genomic data to ascertain phylogenetic relationships and explore the evolution of the large GDSL-lipase family in land plants.

    PubMed

    Volokita, Micha; Rosilio-Brami, Tamar; Rivkin, Natalia; Zik, Moriyah

    2011-01-01

    The GDSL-lipase gene family is a very large subfamily within the supergene family of SGNH esterases, defined by the distinct GDSL amino acid motif and several highly conserved domains. Plants retain a large number of GDSL-lipases indicating that they have acquired important functions. Yet, in planta functions have been demonstrated for only a few GDSL-lipases from diverse species. Considering that orthologs often retain equivalent functions, we determined the phylogenetic relationships between GDSL-lipases from genome-sequenced species representing bryophytes, gymnosperms, monocots, and eudicots. An unrooted phylogenetic tree was constructed from the amino acid sequences of 604 GDSL-lipases from seven species. The topology of the tree depicts two major and one minor subfamily. This division is also supported by the unique gene structure of each subfamily. Because GDSL-lipase genes of all species are present in each of the three subfamilies, we conclude that the last common ancestor of the land plants already possessed at least one ancestral GDSL-lipase gene of each subfamily. Combined gene structure and synteny analyses revealed events of segmental duplications, gene transposition, and gene degeneration in the evolution of the GDSL-lipase gene family. Furthermore, these analyses showed that independent events of intron gain and loss also contributed to the extant repertoire of the GDSL-lipase gene family. Our findings suggest that underlying many of the intron losses was a spliceosomal-mediated mechanism followed by gene conversion. Sorting the phylogenetic relationships among the members of the GDSL-lipase gene family, as depicted by the tree and supported by synteny analyses, provides a framework for extrapolation of demonstrated functional data to GDSL-lipases, whose function is yet unknown. Furthermore, function(s) associated with specific lineage(s)-enriched branches may reveal correlations between acquired and/or lost functions and speciation.

  6. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection.

    PubMed

    Xie, Weibo; Wang, Gongwei; Yuan, Meng; Yao, Wen; Lyu, Kai; Zhao, Hu; Yang, Meng; Li, Pingbo; Zhang, Xing; Yuan, Jing; Wang, Quanxiu; Liu, Fang; Dong, Huaxia; Zhang, Lejing; Li, Xinglei; Meng, Xiangzhou; Zhang, Wan; Xiong, Lizhong; He, Yuqing; Wang, Shiping; Yu, Sibin; Xu, Caiguo; Luo, Jie; Li, Xianghua; Xiao, Jinghua; Lian, Xingming; Zhang, Qifa

    2015-09-29

    Intensive rice breeding over the past 50 y has dramatically increased productivity especially in the indica subspecies, but our knowledge of the genomic changes associated with such improvement has been limited. In this study, we analyzed low-coverage sequencing data of 1,479 rice accessions from 73 countries, including landraces and modern cultivars. We identified two major subpopulations, indica I (IndI) and indica II (IndII), in the indica subspecies, which corresponded to the two putative heterotic groups resulting from independent breeding efforts. We detected 200 regions spanning 7.8% of the rice genome that had been differentially selected between IndI and IndII, and thus referred to as breeding signatures. These regions included large numbers of known functional genes and loci associated with important agronomic traits revealed by genome-wide association studies. Grain yield was positively correlated with the number of breeding signatures in a variety, suggesting that the number of breeding signatures in a line may be useful for predicting agronomic potential and the selected loci may provide targets for rice improvement.

  7. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection

    PubMed Central

    Xie, Weibo; Wang, Gongwei; Yuan, Meng; Yao, Wen; Lyu, Kai; Zhao, Hu; Yang, Meng; Li, Pingbo; Zhang, Xing; Yuan, Jing; Wang, Quanxiu; Liu, Fang; Dong, Huaxia; Zhang, Lejing; Li, Xinglei; Meng, Xiangzhou; Zhang, Wan; Xiong, Lizhong; He, Yuqing; Wang, Shiping; Yu, Sibin; Xu, Caiguo; Luo, Jie; Li, Xianghua; Xiao, Jinghua; Lian, Xingming; Zhang, Qifa

    2015-01-01

    Intensive rice breeding over the past 50 y has dramatically increased productivity especially in the indica subspecies, but our knowledge of the genomic changes associated with such improvement has been limited. In this study, we analyzed low-coverage sequencing data of 1,479 rice accessions from 73 countries, including landraces and modern cultivars. We identified two major subpopulations, indica I (IndI) and indica II (IndII), in the indica subspecies, which corresponded to the two putative heterotic groups resulting from independent breeding efforts. We detected 200 regions spanning 7.8% of the rice genome that had been differentially selected between IndI and IndII, and thus referred to as breeding signatures. These regions included large numbers of known functional genes and loci associated with important agronomic traits revealed by genome-wide association studies. Grain yield was positively correlated with the number of breeding signatures in a variety, suggesting that the number of breeding signatures in a line may be useful for predicting agronomic potential and the selected loci may provide targets for rice improvement. PMID:26358652

  8. Repair of base damage and genome maintenance in the nucleo-cytoplasmic large DNA viruses.

    PubMed

    Redrejo-Rodríguez, Modesto; Salas, María L

    2014-01-22

    Among the DNA viruses, the so-called nucleo-cytoplasmic large DNA viruses (NCLDV) constitute a monophyletic group that currently consists of seven families of viruses infecting a very broad variety of eukaryotes, from unicellular marine protists to humans. Many recent papers have analyzed the sequence and structure of NCLDV genomes and their phylogeny, providing detailed analysis about their genomic structure and evolutionary history and proposing their inclusion in a new viral order named Megavirales that, according to some authors, should be considered as a fourth domain of life, aside from Bacteria, Archaea and Eukarya. The maintenance of genetic information protected from environmental attacks and mutations is essential not only for the survival of cellular organisms but also viruses. In cellular organisms, damaged DNA bases are removed in two major repair pathways: base excision repair (BER) and nucleotide incision repair (NIR) that constitute the major pathways responsible for repairing most endogenous base lesions and abnormal bases in the genome by precise repair procedures. Like cells, many NCLDV encode proteins that might constitute viral DNA repair pathways that would remove damages through BER/NIR pathways. However, the molecular mechanisms and, specially, the biological roles of those viral repair pathways have not been deeply addressed in the literature so far. In this paper, we review viral-encoded BER proteins and the genetic and biochemical data available about them. We propose and discuss probable viral-encoded DNA repair mechanisms and pathways, as compared with the functional and molecular features of known homologs proteins.

  9. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients.

    PubMed

    Hayden, Hillary S; Gillett, Will; Saenphimmachak, Channakhone; Lim, Regina; Zhou, Yang; Jacobs, Michael A; Chang, Jean; Rohmer, Laurence; D'Argenio, David A; Palmieri, Anthony; Levy, Ruth; Haugen, Eric; Wong, Gane K S; Brittnacher, Mitch J; Burns, Jane L; Miller, Samuel I; Olson, Maynard V; Kaul, Rajinder

    2008-06-01

    Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation.

  10. Large-insert genome analysis technology detects structural variation in Pseudomonas aeruginosa clinical strains from cystic fibrosis patients.

    PubMed

    Hayden, Hillary S; Gillett, Will; Saenphimmachak, Channakhone; Lim, Regina; Zhou, Yang; Jacobs, Michael A; Chang, Jean; Rohmer, Laurence; D'Argenio, David A; Palmieri, Anthony; Levy, Ruth; Haugen, Eric; Wong, Gane K S; Brittnacher, Mitch J; Burns, Jane L; Miller, Samuel I; Olson, Maynard V; Kaul, Rajinder

    2008-06-01

    Large-insert genome analysis (LIGAN) is a broadly applicable, high-throughput technology designed to characterize genome-scale structural variation. Fosmid paired-end sequences and DNA fingerprints from a query genome are compared to a reference sequence using the Genomic Variation Analysis (GenVal) suite of software tools to pinpoint locations of insertions, deletions, and rearrangements. Fosmids spanning regions that contain new structural variants can then be sequenced. Clonal pairs of Pseudomonas aeruginosa isolates from four cystic fibrosis patients were used to validate the LIGAN technology. Approximately 1.5 Mb of inserted sequences were identified, including 743 kb containing 615 ORFs that are absent from published P. aeruginosa genomes. Six rearrangement breakpoints and 220 kb of deleted sequences were also identified. Our study expands the "genome universe" of P. aeruginosa and validates a technology that complements emerging, short-read sequencing methods that are better suited to characterizing single-nucleotide polymorphisms than structural variation. PMID:18445516

  11. The ancestral complement system in sea urchins.

    PubMed

    Smith, L C; Clow, L A; Terwilliger, D P

    2001-04-01

    The origin of adaptive immunity in the vertebrates can be traced to the appearance of the ancestral RAG genes in the ancestral jawed vertebrate; however, the innate immune system is more ancient. A central subsystem within innate immunity is the complement system, which has been identified throughout and seems to be restricted to the deuterostomes. The evolutionary history of complement can be traced from the sea urchins (members of the echinoderm phylum), which have a simplified system homologous to the alternative pathway, through the agnathans (hagfish and lamprey) and the elasmobranchs (sharks and rays) to the teleosts (bony fish) and tetrapods, with increases in the numbers of complement components and duplications in complement pathways. Increasing complexity in the complement system parallels increasing complexity in the deuterostome animals. This review focuses on the simplest of the complement systems that is present in the sea urchin. Two components have been identified that show significant homology to vertebrate C3 and factor B (Bf), called SpC3 and SpBf, respectively. Sequence analysis from both molecules reveals their ancestral characteristics. Immune challenge of sea urchins indicates that SpC3 is inducible and is present in coelomic fluid (the body fluids) in relatively high concentrations, while SpBf expression is constitutive and is present in much lower concentrations. Opsonization of foreign cells and particles followed by augmented uptake by phagocytic coelomocytes appears to be a central function for this simpler complement system and important for host defense in the sea urchin. These activities are similar to some of the functions of the homologous proteins in the vertebrate complement system. The selective advantage for the ancestral deuterostome may have been the amplification feedback loop that is still of central importance in the alternative pathway of complement in higher vertebrates. Feedback loop functions would quickly coat

  12. Managing Large-Scale Genomic Datasets and Translation into Clinical Practice

    PubMed Central

    2014-01-01

    Summary Objective To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain. Method We provide a synopsis of the articles selected for the IMIA Yearbook 2014, from which we attempt to derive a synthetic overview of current and future activities in the field. A first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor evaluated independently the set of 1,851 articles and 15 articles were retained for peer-review. Results The selection and evaluation process of this Yearbook’s section on Bioinformatics and Translational Informatics yielded three excellent articles regarding data management and genome medicine. In the first article, the authors present VEST (Variant Effect Scoring Tool) which is a supervised machine learning tool for prioritizing variants found in exome sequencing projects that are more likely involved in human Mendelian diseases. In the second article, the authors show how to infer surnames of male individuals by crossing anonymous publicly available genomic data from the Y chromosome and public genealogy data banks. The third article presents a statistical framework called iCluster+ that can perform pattern discovery in integrated cancer genomic data. This framework was able to determine different tumor subtypes in colon cancer. Conclusions The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on large-scale biological, genomic, and Electronic Health Records data. Indeed, there is a need for powerful tools for managing and interpreting complex data, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts are contributing to the challenge of impacting clinically the results and even going towards a

  13. Reticulate evolution of the rye genome.

    PubMed

    Martis, Mihaela M; Zhou, Ruonan; Haseneyer, Grit; Schmutzer, Thomas; Vrána, Jan; Kubaláková, Marie; König, Susanne; Kugler, Karl G; Scholz, Uwe; Hackauf, Bernd; Korzun, Viktor; Schön, Chris-Carolin; Dolezel, Jaroslav; Bauer, Eva; Mayer, Klaus F X; Stein, Nils

    2013-10-01

    Rye (Secale cereale) is closely related to wheat (Triticum aestivum) and barley (Hordeum vulgare). Due to its large genome (~8 Gb) and its regional importance, genome analysis of rye has lagged behind other cereals. Here, we established a virtual linear gene order model (genome zipper) comprising 22,426 or 72% of the detected set of 31,008 rye genes. This was achieved by high-throughput transcript mapping, chromosome survey sequencing, and integration of conserved synteny information of three sequenced model grass genomes (Brachypodium distachyon, rice [Oryza sativa], and sorghum [Sorghum bicolor]). This enabled a genome-wide high-density comparative analysis of rye/barley/model grass genome synteny. Seventeen conserved syntenic linkage blocks making up the rye and barley genomes were defined in comparison to model grass genomes. Six major translocations shaped the modern rye genome in comparison to a putative Triticeae ancestral genome. Strikingly dissimilar conserved syntenic gene content, gene sequence diversity signatures, and phylogenetic networks were found for individual rye syntenic blocks. This indicates that introgressive hybridizations (diploid or polyploidy hybrid speciation) and/or a series of whole-genome or chromosome duplications played a role in rye speciation and genome evolution.

  14. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

    PubMed Central

    Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A.; Tenesa, Albert

    2015-01-01

    Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010

  15. Large BRCA1 and BRCA2 genomic rearrangements in Malaysian high risk breast-ovarian cancer families.

    PubMed

    Kang, Peter; Mariapun, Shivaani; Phuah, Sze Yee; Lim, Linda Shushan; Liu, Jianjun; Yoon, Sook-Yee; Thong, Meow Keong; Mohd Taib, Nur Aishah; Yip, Cheng Har; Teo, Soo-Hwang

    2010-11-01

    Early studies of genetic predisposition due to the BRCA1 and BRCA2 genes have focused largely on sequence alterations, but it has now emerged that 4-28% of inherited mutations in the BRCA genes may be due to large genomic rearrangements of these genes. However, to date, there have been relatively few studies of large genomic rearrangements in Asian populations. We have conducted a full sequencing and large genomic rearrangement analysis (using Multiplex Ligation-dependent Probe Amplification, MLPA) of 324 breast cancer patients who were selected from a multi-ethnic hospital-based cohort on the basis of age of onset of breast cancer and/or family history. Three unrelated individuals were found to have large genomic rearrangements: 2 in BRCA1 and 1 in BRCA2, which accounts for 2/24 (8%) of the total mutations detected in BRCA1 and 1/23 (4%) of the mutations in BRCA2 detected in this cohort. Notably, the family history of the individuals with these mutations is largely unremarkable suggesting that family history alone is a poor predictor of mutation status in Asian families. In conclusion, this study in a multi-ethnic (Malay, Chinese, Indian) cohort suggests that large genomic rearrangements are present at a low frequency but should nonetheless be included in the routine testing for BRCA1 and BRCA2. PMID:20617377

  16. Large BRCA1 and BRCA2 genomic rearrangements in Malaysian high risk breast-ovarian cancer families.

    PubMed

    Kang, Peter; Mariapun, Shivaani; Phuah, Sze Yee; Lim, Linda Shushan; Liu, Jianjun; Yoon, Sook-Yee; Thong, Meow Keong; Mohd Taib, Nur Aishah; Yip, Cheng Har; Teo, Soo-Hwang

    2010-11-01

    Early studies of genetic predisposition due to the BRCA1 and BRCA2 genes have focused largely on sequence alterations, but it has now emerged that 4-28% of inherited mutations in the BRCA genes may be due to large genomic rearrangements of these genes. However, to date, there have been relatively few studies of large genomic rearrangements in Asian populations. We have conducted a full sequencing and large genomic rearrangement analysis (using Multiplex Ligation-dependent Probe Amplification, MLPA) of 324 breast cancer patients who were selected from a multi-ethnic hospital-based cohort on the basis of age of onset of breast cancer and/or family history. Three unrelated individuals were found to have large genomic rearrangements: 2 in BRCA1 and 1 in BRCA2, which accounts for 2/24 (8%) of the total mutations detected in BRCA1 and 1/23 (4%) of the mutations in BRCA2 detected in this cohort. Notably, the family history of the individuals with these mutations is largely unremarkable suggesting that family history alone is a poor predictor of mutation status in Asian families. In conclusion, this study in a multi-ethnic (Malay, Chinese, Indian) cohort suggests that large genomic rearrangements are present at a low frequency but should nonetheless be included in the routine testing for BRCA1 and BRCA2.

  17. Large genomic rearrangement of BRCA1 and BRCA2 genes in familial breast cancer patients in Korea.

    PubMed

    Cho, Ja Young; Cho, Dae-Yeon; Ahn, Sei Hyun; Choi, Su-Youn; Shin, Inkyung; Park, Hyun Gyu; Lee, Jong Won; Kim, Hee Jeong; Yu, Jong Han; Ko, Beom Seok; Ku, Bo Kyung; Son, Byung Ho

    2014-06-01

    We screened large genomic rearrangements of the BRCA1 and BRCA2 genes in Korean, familial breast cancer patients. Multiplex ligation-dependent probe amplification assay was used to identify BRCA1 and BRCA2 genomic rearrangements in 226 Korean familial breast cancer patients with risk factors for BRCA1 and BRCA2 mutations, who previously tested negative for point mutations in the two genes. We identified only one large deletion (c.4186-1593_4676-1465del) in BRCA1. No large rearrangements were found in BRCA2. Our result indicates that large genomic rearrangement in the BRCA1 and BRCA2 genes does not seem like a major determinant of breast cancer susceptibility in the Korean population. A large-scale study needs to validate our result in Korea.

  18. In search of ancestral Kilauea volcano

    USGS Publications Warehouse

    Lipman, P.W.; Sisson, T.W.; Ui, T.; Naka, J.

    2000-01-01

    Submersible observations and samples show that the lower south flank of Hawaii, offshore from Kilauea volcano and the active Hilina slump system, consists entirely of compositionally diverse volcaniclastic rocks; pillow lavas are confined to shallow slopes. Submarine-erupted basalt clasts have strongly variable alkalic and transitional basalt compositions (to 41% SiO2, 10.8% alkalies), contrasting with present-day Kilauea tholeiites. The volcaniclastic rocks provide a unique record of ancestral alkalic growth of an archetypal hotspot volcano, including transition to its tholeiitic shield stage, and associated slope-failure events.

  19. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    PubMed Central

    2010-01-01

    Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind. PMID:20178631

  20. Large-scale analysis of tandem repeat variability in the human genome.

    PubMed

    Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R; Verstrepen, Kevin J; Froyen, Guy

    2014-05-01

    Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats.

  1. Large-scale analysis of tandem repeat variability in the human genome

    PubMed Central

    Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R.; Verstrepen, Kevin J.; Froyen, Guy

    2014-01-01

    Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats. PMID:24682812

  2. Frequent occurrence of large duplications at reciprocal genomic rearrangement breakpoints in multiple myeloma and other tumors

    PubMed Central

    Demchenko, Yulia; Roschke, Anna; Chen, Wei-Dong; Asmann, Yan; Bergsagel, Peter Leif; Kuehl, Walter Michael

    2016-01-01

    Using a combination of array comparative genomic hybridization, mate pair and cloned sequences, and FISH analyses, we have identified in multiple myeloma cell lines and tumors a novel and recurrent type of genomic rearrangement, i.e. interchromosomal rearrangements (translocations or insertions) and intrachromosomal inversions that contain long (1–4000 kb; median ∼100 kb) identical sequences adjacent to both reciprocal breakpoint junctions. These duplicated sequences were generated from sequences immediately adjacent to the breakpoint from at least one—but sometimes both—chromosomal donor site(s). Tandem duplications had a similar size distribution suggesting the possibility of a shared mechanism for generating duplicated sequences at breakpoints. Although about 25% of apparent secondary rearrangements contained these duplications, primary IGH translocations rarely, if ever, had large duplications at breakpoint junctions. Significantly, these duplications often contain super-enhancers and/or oncogenes (e.g. MYC) that are dysregulated by rearrangements during tumor progression. We also found that long identical sequences often were identified at both reciprocal breakpoint junctions in six of eight other tumor types. Finally, we have been unable to find reports of similar kinds of rearrangements in wild-type or mutant prokaryotes or lower eukaryotes such as yeast. PMID:27353332

  3. Large-scale metabolome analysis and quantitative integration with genomics and proteomics data in Mycoplasma pneumoniae.

    PubMed

    Maier, Tobias; Marcos, Josep; Wodke, Judith A H; Paetzold, Bernhard; Liebeke, Manuel; Gutiérrez-Gallego, Ricardo; Serrano, Luis

    2013-07-01

    Systems metabolomics, the identification and quantification of cellular metabolites and their integration with genomics and proteomics data, promises valuable functional insights into cellular biology. However, technical constraints, sample complexity issues and the lack of suitable complementary quantitative data sets prevented accomplishing such studies in the past. Here, we present an integrative metabolomics study of the genome-reduced bacterium Mycoplasma pneumoniae. We experimentally analysed its metabolome using a cross-platform approach. We explain intracellular metabolite homeostasis by quantitatively integrating our results with the cellular inventory of proteins, DNA and other macromolecules, as well as with available building blocks from the growth medium. We calculated in vivo catalytic parameters of glycolytic enzymes, making use of measured reaction velocities, as well as enzyme and metabolite pool sizes. A quantitative, inter-species comparison of absolute and relative metabolite abundances indicated that metabolic pathways are regulated as functional units, thereby simplifying adaptive responses. Our analysis demonstrates the potential for new scientific insight by integrating different types of large-scale experimental data from a single biological source.

  4. LCGserver: A Webserver for Exploring Evolutionary Trajectory of Gene Orders in a Large Number of Genomes.

    PubMed

    Wang, Dapeng; Yu, Jun

    2015-09-01

    Genes and chromosomes are highly organized; together with protein-coding sequence, gene structure at per gene level and gene order at cluster level are both variable in a context of lineages and under natural selection. How gene order and chromosome organization are related and selected remains to be illuminated. The number of newly-sequenced genomes from various taxa has been increasing rapidly, but there have not been easy-to-use web tools that allow better visualization for gene order in a large genome collection. Here, we describe a webserver, LCGserver (http://lcgbase.big.ac.cn/LCGserver/), for exploring evolutionary dynamics of gene orders over diverse lineages. This server provides gene order information at three levels: single gene, paired gene (a minimal cluster), and clustered gene (more than two genes). The most exclusive feature of LCGserver is alignment and visualization of neighboring genes based on orthology, allowing users to inspect all conserved and dynamic events of gene order along chromosomes in a lineage-specific manner. In addition, it categories paired genes into six patterns and identifies fully-conserved gene clusters within and among lineages.

  5. A Common Ancestral Mutation in CRYBB3 Identified in Multiple Consanguineous Families with Congenital Cataracts

    PubMed Central

    Irum, Bushra; Khan, Arif O.; Wang, Qiwei; Li, David; Khan, Asma A.; Husnain, Tayyab; Akram, Javed; Riazuddin, Sheikh

    2016-01-01

    Purpose This study was performed to investigate the genetic determinants of autosomal recessive congenital cataracts in large consanguineous families. Methods Affected individuals underwent a detailed ophthalmological examination and slit-lamp photographs of the cataractous lenses were obtained. An aliquot of blood was collected from all participating family members and genomic DNA was extracted from white blood cells. Initially, a genome-wide scan was performed with genomic DNAs of family PKCC025 followed by exclusion analysis of our familial cohort of congenital cataracts. Protein-coding exons of CRYBB1, CRYBB2, CRYBB3, and CRYBA4 were sequenced bidirectionally. A haplotype was constructed with SNPs flanking the causal mutation for affected individuals in all four families, while the probability that the four familial cases have a common founder was estimated using EM and CHM-based algorithms. The expression of Crybb3 in the developing murine lens was investigated using TaqMan assays. Results The clinical and ophthalmological examinations suggested that all affected individuals had nuclear cataracts. Genome-wide linkage analysis localized the causal phenotype in family PKCC025 to chromosome 22q with statistically significant two-point logarithm of odds (LOD) scores. Subsequently, we localized three additional families, PKCC063, PKCC131, and PKCC168 to chromosome 22q. Bidirectional Sanger sequencing identified a missense variation: c.493G>C (p.Gly165Arg) in CRYBB3 that segregated with the disease phenotype in all four familial cases. This variation was not found in ethnically matched control chromosomes, the NHLBI exome variant server, or the 1000 Genomes or dbSNP databases. Interestingly, all four families harbor a unique disease haplotype that strongly suggests a common founder of the causal mutation (p<1.64E-10). We observed expression of Crybb3 in the mouse lens as early as embryonic day 15 (E15), and expression remained relatively steady throughout

  6. Genomic analysis of 38 Legionella species identifies large and diverse effector repertoires.

    PubMed

    Burstein, David; Amaro, Francisco; Zusman, Tal; Lifshitz, Ziv; Cohen, Ofir; Gilbert, Jack A; Pupko, Tal; Shuman, Howard A; Segal, Gil

    2016-02-01

    Infection by the human pathogen Legionella pneumophila relies on the translocation of ∼ 300 virulence proteins, termed effectors, which manipulate host cell processes. However, almost no information exists regarding effectors in other Legionella pathogens. Here we sequenced, assembled and characterized the genomes of 38 Legionella species and predicted their effector repertoires using a previously validated machine learning approach. This analysis identified 5,885 predicted effectors. The effector repertoires of different Legionella species were found to be largely non-overlapping, and only seven core effectors were shared by all species studied. Species-specific effectors had atypically low GC content, suggesting exogenous acquisition, possibly from the natural protozoan hosts of these species. Furthermore, we detected numerous new conserved effector domains and discovered new domain combinations, which allowed the inference of as yet undescribed effector functions. The effector collection and network of domain architectures described here can serve as a roadmap for future studies of effector function and evolution.

  7. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations. PMID:25696824

  8. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations.

  9. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.

  10. A detailed RFLP map of Sorghum bicolor x S. propinquum, suitable for high-density mapping, suggests ancestral duplication of Sorghum chromosomes or chromosomal segments.

    PubMed

    Chittenden, L M; Schertz, K F; Lin, Y R; Wing, R A; Paterson, A H

    1994-03-01

    The first "complete" genetic linkage map of Sorghum section Sorghum is described, comprised of ten linkage groups putatively corresponding to the ten gametic chromosomes of S. bicolor and S. propinquum. The map includes 276 RFLP loci, predominately detected by PstI-digested S. bicolor genomic probes, segregating in 56 F2 progeny of a cross between S. bicolor and S. propinquum. Although prior cytological evidence suggests that the genomes of these species are largely homosequential, a high level of molecular divergence is evidenced by the abundant RFLP and RAPD polymorphisms, the marked deviations from Mendelian segregation in many regions of the genome, and several species-specific DNA probes. The remarkable level of DNA polymorphism between these species will facilitate development of a high-density genetic map. Further, the high level of DNA polymorphism permitted mapping of multiple loci for 21 (8.2%) DNA probes. Linkage relationships among eight (38%) of these probes suggest ancestral duplication of three genomic regions. Mapping of 13 maize genomic clones in this cross was consistent with prior results. Mapping of heterologous cDNAs from rice and oat suggests that it may be feasible to extend comparative mapping to these distantly-related species, and to ultimately generate a detailed description of chromosome rearrangements among cultivated Gramineae. Limited investigation of a small number of RFLPs showed several alleles common to S. bicolor and S. Halepense ("johnson-grass"), but few alleles common to S. propinquum and S. halepense, raising questions about the origin of S. halepense.

  11. The genome of woodland strawberry (Fragaria vesca).

    PubMed

    Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Lopez Girona, Elena; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M

    2011-02-01

    The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted.

  12. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-08-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures.

  13. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  14. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  15. Genomic structure of the large RNA segment of infectious bursal disease virus.

    PubMed Central

    Hudson, P J; McKern, N M; Power, B E; Azad, A A

    1986-01-01

    The larger RNA segment of infectious bursal disease virus (IBDV: Australian strain 002-73) has been characterized by cDNA cloning and nucleotide sequence analysis. We believe IBDV is the first birnavirus to be sequenced and so have confirmed the coding region by N-terminal amino acid sequence analysis of intact viral proteins and several tryptic peptide fragments. The large RNA segment encodes in order the 37-kDa, 28-kDa and 32-kDa proteins within a continuous open reading frame and the primary translation product appears to be subsequently processed into the mature viral proteins. The large protein precursor is still processed into the 32-kDa host protective immunogen when expressed as a fusion protein in E. coli. These results are in marked contrast to the predictions from in vitro translation data that birnavirus genomes are expressed as polycistronic templates. We can now propose that birnaviruses, in particular IBDV, possess monocistronic segments and that the precursor is proteolytically processed in vivo. The sequence data presented for the 32-kDa host protective immunogen may provide the basic information needed for the production of an effective subunit vaccine against this commercially important virus. Images PMID:3014441

  16. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction.

    PubMed

    De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan

    2016-05-01

    Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

  17. The genomic and physical organization of Ty1-copia-like sequences as a component of large genomes in Pinus elliottii var. elliottii and other gymnosperms.

    PubMed Central

    Kamm, A; Doudrick, R L; Heslop-Harrison, J S; Schmidt, T

    1996-01-01

    A DNA sequence, TPE1, representing the internal domain of a Ty1-copia retroelement, was isolated from genomic DNA of Pinus elliottii Engelm. var. elliottii (slash pine). Genomic Southern analysis showed that this sequence, carrying partial reverse transcriptase and integrase gene sequences, is highly amplified within the genome of slash pine and part of a dispersed element >4.8 kbp. Fluorescent in situ hybridization to metaphase chromosomes shows that the element is relatively uniformly dispersed over all 12 chromosome pairs and is highly abundant in the genome. It is largely excluded from centromeric regions and intercalary chromosomal sites representing the 18S-5.8S-25S rRNA genes. Southern hybridization with specific DNA probes for the reverse transcriptase gene shows that TPE1 represents a large subgroup of heterogeneous Ty1-copia retrotransposons in Pinus species. Because no TPE1 transcription could be detected, it is most likely an inactive element--at least in needle tissue. Further evidence for inactivity was found in recombinant reverse transcriptase and integrase sequences. The distribution of TPE1 within different gymnosperms that contain Ty1-copia group retrotransposons, as shown by a PCR assay, was investigated by Southern hybridization. The TPE1 family is highly amplified and conserved in all Pinus species analyzed, showing a similar genomic organization in the three- and five-needle pine species investigated. It is also present in spruce, bald cypress (swamp cypress), and in gingko but in fewer copies and a different genomic organization. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 PMID:8610105

  18. PyPop update--a software pipeline for large-scale multilocus population genomics.

    PubMed

    Lancaster, A K; Single, R M; Solberg, O D; Nelson, M P; Thomson, G

    2007-04-01

    Population genetic statistics from multilocus genotype data inform our understanding of the patterns of genetic variation and their implications for evolutionary studies, generally, and human disease studies in particular. In any given population one can estimate haplotype frequencies, identify deviation from Hardy-Weinberg equilibrium, test for balancing or directional selection, and investigate patterns of linkage disequilibrium. Existing software packages are oriented primarily toward the computation of such statistics on a population-by-population basis, not on comparisons among populations and across different statistics. We developed PyPop (Python for Population Genomics) to facilitate the analyses of population genetic statistics across populations and the relationships among different statistics within and across populations. PyPop is an open-source framework for performing large-scale population genetic analyses on multilocus genotype data. It computes the statistics described above, among others. PyPop deploys a standard Extensible Markup Language (XML) output format and can integrate the results of multiple analyses on various populations that were performed at different times into a common output format that can be read into a spreadsheet. The XML output format allows PyPop to be embedded as part of a larger analysis pipeline. Originally developed to analyze the highly polymorphic genetic data of the human leukocyte antigen region of the human genome, PyPop has applicability to any kind of multilocus genetic data. It is the primary analysis platform for analyzing data collected for the Anthropological component of the 13th and 14th International Histocompatibility Workshops. PyPop has also been successfully used in studies by our group, with collaborators, and in publications by several independent research teams.

  19. Diversity and relationships of cocirculating modern human rotaviruses revealed using large-scale comparative genomics.

    PubMed

    McDonald, Sarah M; McKell, Allison O; Rippinger, Christine M; McAllen, John K; Akopov, Asmik; Kirkness, Ewen F; Payne, Daniel C; Edwards, Kathryn M; Chappell, James D; Patton, John T

    2012-09-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  20. Diversity and Relationships of Cocirculating Modern Human Rotaviruses Revealed Using Large-Scale Comparative Genomics

    PubMed Central

    McKell, Allison O.; Rippinger, Christine M.; McAllen, John K.; Akopov, Asmik; Kirkness, Ewen F.; Payne, Daniel C.; Edwards, Kathryn M.; Chappell, James D.; Patton, John T.

    2012-01-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  1. Origin of amphibian and avian chromosomes by fission, fusion, and retention of ancestral chromosomes

    PubMed Central

    Voss, Stephen R.; Kump, D. Kevin; Putta, Srikrishna; Pauly, Nathan; Reynolds, Anna; Henry, Rema J.; Basa, Saritha; Walker, John A.; Smith, Jeramiah J.

    2011-01-01

    Amphibian genomes differ greatly in DNA content and chromosome size, morphology, and number. Investigations of this diversity are needed to identify mechanisms that have shaped the evolution of vertebrate genomes. We used comparative mapping to investigate the organization of genes in the Mexican axolotl (Ambystoma mexicanum), a species that presents relatively few chromosomes (n = 14) and a gigantic genome (>20 pg/N). We show extensive conservation of synteny between Ambystoma, chicken, and human, and a positive correlation between the length of conserved segments and genome size. Ambystoma segments are estimated to be four to 51 times longer than homologous human and chicken segments. Strikingly, genes demarking the structures of 28 chicken chromosomes are ordered among linkage groups defining the Ambystoma genome, and we show that these same chromosomal segments are also conserved in a distantly related anuran amphibian (Xenopus tropicalis). Using linkage relationships from the amphibian maps, we predict that three chicken chromosomes originated by fusion, nine to 14 originated by fission, and 12–17 evolved directly from ancestral tetrapod chromosomes. We further show that some ancestral segments were fused prior to the divergence of salamanders and anurans, while others fused independently and randomly as chromosome numbers were reduced in lineages leading to Ambystoma and Xenopus. The maintenance of gene order relationships between chromosomal segments that have greatly expanded and contracted in salamander and chicken genomes, respectively, suggests selection to maintain synteny relationships and/or extremely low rates of chromosomal rearrangement. Overall, the results demonstrate the value of data from diverse, amphibian genomes in studies of vertebrate genome evolution. PMID:21482624

  2. Rapidly Registering Identity-by-Descent Across Ancestral Recombination Graphs.

    PubMed

    Yang, Shuo; Carmi, Shai; Pe'er, Itsik

    2016-06-01

    The genomes of remotely related individuals occasionally contain long segments that are identical by descent (IBD). Sharing of IBD segments has many applications in population and medical genetics, and it is thus desirable to study their properties in simulations. However, no current method provides a direct, efficient means to extract IBD segments from simulated genealogies. Here, we introduce computationally efficient approaches to extract ground-truth IBD segments from a sequence of genealogies, or equivalently, an ancestral recombination graph. Specifically, we use a two-step scheme, where we first identify putative shared segments by comparing the common ancestors of all pairs of individuals at some distance apart. This reduces the search space considerably, and we then proceed by determining the true IBD status of the candidate segments. Under some assumptions and when allowing a limited resolution of segment lengths, our run-time complexity is reduced from O(n(3) log n) for the naïve algorithm to O(n log n), where n is the number of individuals in the sample.

  3. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer.

  4. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. PMID:27342254

  5. Multiple occurrences of giant virus core genes acquired by eukaryotic genomes: the visible part of the iceberg?

    PubMed

    Filée, Jonathan

    2014-10-01

    Giant Viruses are a widespread group of viruses, characterized by huge genomes composed of a small subset of ancestral, vertically inherited core genes along with a large body of highly variable genes. In this study, I report the acquisition of 23 core ancestral Giant Virus genes by diverse eukaryotic species including various protists, a moss and a cnidarian. The viral genes are inserted in large scaffolds or chromosomes with intron-rich, eukaryotic-like genomic contexts, refuting the possibility of DNA contaminations. Some of these genes are expressed and in the cryptophyte alga Guillardia theta, a possible non-homologous displacement of the eukaryotic DNA primase by a viral D5 helicase/primase is documented. As core Giant Virus genes represent only a tiny fraction of the total genomic repertoire of these viruses, these results suggest that Giant Viruses represent an underestimated source of new genes and functions for their hosts.

  6. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  7. Draft Genome Sequence of Rheinheimera sp. F8, a Biofilm-Forming Strain Which Produces Large Amounts of Extracellular DNA

    PubMed Central

    Szewzyk, Ulrich

    2016-01-01

    Rheinheimera sp. strain F8 is a biofilm-forming gammaproteobacterium that has been found to produce large amounts of filamentous extracellular DNA. Here, we announce the de novo assembly of its genome. It is estimated to be 4,464,511 bp in length, with 3,970 protein-coding sequences and 92 RNA-coding sequences. PMID:26966195

  8. Evo-Devo: Variations on Ancestral Themes

    PubMed Central

    De Robertis, E.M.

    2008-01-01

    Most animals evolved from a common ancestor, Urbilateria, which already had in place the developmental genetic networks for shaping body plans. Comparative genomics has revealed rather unexpectedly that many of the genes present in bilaterian animal ancestors were lost by individual phyla during evolution. Reconstruction of the archetypal developmental genomic tool-kit present in Urbilateria will help to elucidate the contribution of gene loss and developmental constraints to the evolution of animal body plans. PMID:18243095

  9. The exceptionally large genome of Hendra virus: support for creation of a new genus within the family Paramyxoviridae.

    PubMed

    Wang, L F; Yu, M; Hansson, E; Pritchard, L I; Shiell, B; Michalski, W P; Eaton, B T

    2000-11-01

    An outbreak of acute respiratory disease in Hendra, a suburb of Brisbane, Australia, in September 1994 resulted in the deaths of 14 racing horses and a horse trainer. The causative agent was a new member of the family Paramyxoviridae. The virus was originally called Equine morbillivirus but was renamed Hendra virus (HeV) when molecular characterization highlighted differences between it and members of the genus Morbillivirus. Less than 5 years later, the closely related Nipah virus (NiV) emerged in Malaysia, spread rapidly through the pig population, and caused the deaths of over 100 people. We report the characterization of the HeV L gene and protein, the genome termini, and gene boundary sequences, thus completing the HeV genome sequence. In the highly conserved region of the L protein, the HeV sequence GDNE differs from the GDNQ found in almost all other nonsegmented negative-strand (NNS) RNA viruses. HeV has an absolutely conserved intergenic trinucleotide sequence, 3'-GAA-5', and highly conserved transcription initiation and termination sequences similar to those of respiroviruses and morbilliviruses. The large genome size (18,234 nucleotides), the unique complementary genome terminal sequences of HeV, and the limited homology with other members of the Paramyxoviridae suggest that HeV, together with NiV, should be classified in a new genus in this family. The large genome of HeV also fills a gap in the spectrum of genome sizes observed with NNS RNA virus genomes. As such, it provides a further piece in the puzzle of NNS RNA virus evolution.

  10. Complete mitochondrial DNA sequence of the ark shell Scapharca broughtonii: an ultra-large metazoan mitochondrial genome.

    PubMed

    Liu, Yun-Guo; Kurokawa, Tadahide; Sekino, Masashi; Tanabe, Toru; Watanabe, Kazuhito

    2013-03-01

    The complete mitochondrial (mt) genome of the ark shell Scapharca broughtonii was determined using long PCR and a genome walking sequencing strategy with genus-specific primers. The S. broughtonii mt genome (GenBank accession number AB729113) contained 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 2 ribosomal RNA genes, and 42 transfer tRNA genes, in a length of 46,985 nucleotides for the size of mtDNA with only one copy of the heteroplasmic tandem repeat (HTR) unit. Moreover the S. broughtonii mt genome shows size variation; these genomes ranged in size from about 47 kb to about 50 kb because of variation in the number of repeat sequences in the non-coding region. The mt-genome of S. broughtonii is, to date, the longest reported metazoan mtDNA sequence. Sequence duplication in non-coding region and the formation of HTR arrays were two of the factors responsible for the ultra-large size of this mt genome. All the tRNA genes were found within the S. broughtonii mt genome, unlike the other bivalves usually lacking one or more tRNA genes. Twelve additional specimens were used to analyze the patterns of tandem repeat arrays by PCR amplification and agarose electrophoresis. Each of the 12 specimens displayed extensive heteroplasmy and had 8-10 length variants. The motifs of the HTR arrays are about 353-362 bp and the number of repeats ranges from 1 to 11. PMID:23291309

  11. Complete mitochondrial DNA sequence of the ark shell Scapharca broughtonii: an ultra-large metazoan mitochondrial genome.

    PubMed

    Liu, Yun-Guo; Kurokawa, Tadahide; Sekino, Masashi; Tanabe, Toru; Watanabe, Kazuhito

    2013-03-01

    The complete mitochondrial (mt) genome of the ark shell Scapharca broughtonii was determined using long PCR and a genome walking sequencing strategy with genus-specific primers. The S. broughtonii mt genome (GenBank accession number AB729113) contained 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 2 ribosomal RNA genes, and 42 transfer tRNA genes, in a length of 46,985 nucleotides for the size of mtDNA with only one copy of the heteroplasmic tandem repeat (HTR) unit. Moreover the S. broughtonii mt genome shows size variation; these genomes ranged in size from about 47 kb to about 50 kb because of variation in the number of repeat sequences in the non-coding region. The mt-genome of S. broughtonii is, to date, the longest reported metazoan mtDNA sequence. Sequence duplication in non-coding region and the formation of HTR arrays were two of the factors responsible for the ultra-large size of this mt genome. All the tRNA genes were found within the S. broughtonii mt genome, unlike the other bivalves usually lacking one or more tRNA genes. Twelve additional specimens were used to analyze the patterns of tandem repeat arrays by PCR amplification and agarose electrophoresis. Each of the 12 specimens displayed extensive heteroplasmy and had 8-10 length variants. The motifs of the HTR arrays are about 353-362 bp and the number of repeats ranges from 1 to 11.

  12. Matrilocal residence is ancestral in Austronesian societies.

    PubMed

    Jordan, Fiona M; Gray, Russell D; Greenhill, Simon J; Mace, Ruth

    2009-06-01

    The nature of social life in human prehistory is elusive, yet knowing how kinship systems evolve is critical for understanding population history and cultural diversity. Post-marital residence rules specify sex-specific dispersal and kin association, influencing the pattern of genetic markers across populations. Cultural phylogenetics allows us to practise 'virtual archaeology' on these aspects of social life that leave no trace in the archaeological record. Here we show that early Austronesian societies practised matrilocal post-marital residence. Using a Markov-chain Monte Carlo comparative method implemented in a Bayesian phylogenetic framework, we estimated the type of residence at each ancestral node in a sample of Austronesian language trees spanning 135 Pacific societies. Matrilocal residence has been hypothesized for proto-Oceanic society (ca 3500 BP), but we find strong evidence that matrilocality was predominant in earlier Austronesian societies ca 5000-4500 BP, at the root of the language family and its early branches. Our results illuminate the divergent patterns of mtDNA and Y-chromosome markers seen in the Pacific. The analysis of present-day cross-cultural data in this way allows us to directly address cultural evolutionary and life-history processes in prehistory.

  13. Chromosome evolution in kangaroos (Marsupialia: Macropodidae): cross species chromosome painting between the tammar wallaby and rock wallaby spp. with the 2n = 22 ancestral macropodid karyotype.

    PubMed

    O'Neill, R J; Eldridge, M D; Toder, R; Ferguson-Smith, M A; O'Brien, P C; Graves, J A

    1999-06-01

    Marsupial mammals show extraordinary karyotype stability, with 2n = 14 considered ancestral. However, macropodid marsupials (kangaroos and wallabies) exhibit a considerable variety of karyotypes, with a hypothesised ancestral karyotype of 2n = 22. Speciation and karyotypic diversity in rock wallabies (Petrogale) is exceptional. We used cross species chromosome painting to examine the chromosome evolution between the tammar wallaby (2n = 16) and three 2n = 22 rock wallaby species groups with the putative ancestral karyotype. Hybridization of chromosome paints prepared from flow sorted chromosomes of the tammar wallaby to Petrogale spp., showed that this ancestral karyotype is largely conserved among 2n = 22 rock wallaby species, and confirmed the identity of ancestral chromosomes which fused to produce the bi-armed chromosomes of the 2n = 16 tammar wallaby. These results illustrate the fission-fusion process of karyotype evolution characteristic of the kangaroo group.

  14. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  15. Obligate Insect Endosymbionts Exhibit Increased Ortholog Length Variation and Loss of Large Accessory Proteins Concurrent with Genome Shrinkage

    PubMed Central

    Kenyon, Laura J.; Sabree, Zakee L.

    2014-01-01

    Extreme genome reduction has been observed in obligate intracellular insect mutualists and is an assumed consequence of fixed, long-term host isolation. Rapid accumulation of mutations and pseudogenization of genes no longer vital for an intracellular lifestyle, followed by deletion of many genes, are factors that lead to genome reduction. Size reductions in individual genes due to small-scale deletions have also been implicated in contributing to overall genome shrinkage. Conserved protein functional domains are expected to exhibit low tolerance for mutations and therefore remain relatively unchanged throughout protein length reduction while nondomain regions, presumably under less selective pressures, would shorten. This hypothesis was tested using orthologous protein sets from the Flavobacteriaceae (phylum: Bacteroidetes) and Enterobacteriaceae (subphylum: Gammaproteobacteria) families, each of which includes some of the smallest known genomes. Upon examination of protein, functional domain, and nondomain region lengths, we found that proteins were not uniformly shrinking with genome reduction, but instead increased length variability and variability was observed in both the functional domain and nondomain regions. Additionally, as complete gene loss also contributes to overall genome shrinkage, we found that the largest proteins in the proteomes of nonhost-restricted bacteroidetial and gammaproteobacterial species often were inferred to be involved in secondary metabolic processes, extracellular sensing, or of unknown function. These proteins were absent in the proteomes of obligate insect endosymbionts. Therefore, loss of genes encoding large proteins not required for host-restricted lifestyles in obligate endosymbiont proteomes likely contributes to extreme genome reduction to a greater degree than gene shrinkage. PMID:24671745

  16. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    SciTech Connect

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  17. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus

    PubMed Central

    Horvath, Julie E.; Sheedy, Christina B.; Merrett, Stephanie L.; Diallo, Abdoulaye Banire; Swofford, David L.; NISC Comparative Sequencing Program; Green, Eric D.; Willard, Huntington F.

    2011-01-01

    Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years. PMID:21518738

  18. Complete genome sequence analysis of human echovirus 30 isolated during a large outbreak in Guangdong Province of China, in 2012.

    PubMed

    Xiao, Hong; Huang, Keyong; Li, Ling; Wu, Xianbo; Zheng, Li; Wan, Chengsong; Zhao, Wei; Ke, Changwen; Zhang, Bao

    2014-02-01

    In May and June 2012, an outbreak of aseptic meningitis caused by Echovirus 30 (E30) occurred on a large scale in Luoding, Guangdong Province, China. Our team successfully isolated one subtype, strain 2012EM161, and its complete genome was sequenced. The phylogenetic tree of viral protein (VP) 1 gene sequences showed that the viral isolate was similar to the E30 strain prevalent in Fujian (2011), with identity of 98.05-99.32 % and 98.63-99.32 % for nucleotides and amino acids respectively. Whole genome-based phylogenetic analysis indicated that 2012EM161 contained the most proximate consensus to DQ246620 (Zhejiang, 2003) and FDJS03 (AY948442, Jiangsu, 2005), with nucleotide homogeneity of 87.09 % and 86.98 % respectively. The RDP4.16 and Simplot analysis showed that the newly discovered 2012EM161 was probably a recombinant, which was closely related to the strain of E30 (DQ246620) in the first half of the genome and the strain of E6 (JX976771) in genomic P3 region. The whole genome sequence of 2012EM161 will allow further study of the origin, evolution, and the molecular epidemiology of E30 strains.

  19. Mitochondrial genome analysis of the predatory mite Phytoseiulus persimilis and a revisit of the Metaseiulus occidentalis mitochondrial genome.

    PubMed

    Dermauw, Wannes; Vanholme, Bartel; Tirry, Luc; Van Leeuwen, Thomas

    2010-04-01

    In this study we sequenced and analysed the complete mitochondrial (mt) genome of the Chilean predatory mite Phytoseiulus persimilis Athias-Henriot (Chelicerata: Acari: Mesostigmata: Phytoseiidae: Amblyseiinae). The 16 199 bp genome (79.8% AT) contains the standard set of 13 protein-coding and 24 RNA genes. Compared with the ancestral arthropod mtDNA pattern, the gene order is extremely reshuffled (35 genes changed position) and represents a novel arrangement within the arthropods. This is probably related to the presence of several large noncoding regions in the genome. In contrast with the mt genome of the closely related species Metaseiulus occidentalis (Phytoseiidae: Typhlodrominae) - which was reported to be unusually large (24 961 bp), to lack nad6 and nad3 protein-coding genes, and to contain 22 tRNAs without T-arms - the genome of P. persimilis has all the features of a standard metazoan mt genome. Consequently, we performed additional experiments on the M. occidentalis mt genome. Our preliminary restriction digests and Southern hybridization data revealed that this genome is smaller than previously reported. In addition, we cloned nad3 in M. occidentalis and positioned this gene between nad4L and 12S-rRNA on the mt genome. Finally, we report that at least 15 of the 22 tRNAs in the M. occidentalis mt genome can be folded into canonical cloverleaf structures similar to their counterparts in P. persimilis.

  20. Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella.

    PubMed

    Brandvain, Yaniv; Slotte, Tanja; Hazzouri, Khaled M; Wright, Stephen I; Coop, Graham

    2013-01-01

    The shift from outcrossing to self-fertilization is among the most common evolutionary transitions in flowering plants. Until recently, however, a genome-wide view of this transition has been obscured by both a dearth of appropriate data and the lack of appropriate population genomic methods to interpret such data. Here, we present a novel population genomic analysis detailing the origin of the selfing species, Capsella rubella, which recently split from its outcrossing sister, Capsella grandiflora. Due to the recency of the split, much of the variation within C. rubella is also found within C. grandiflora. We can therefore identify genomic regions where two C. rubella individuals have inherited the same or different segments of ancestral diversity (i.e. founding haplotypes) present in C. rubella's founder(s). Based on this analysis, we show that C. rubella was founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora, that drift and selection have rapidly homogenized most of this ancestral variation since C. rubella's founding, and that little novel variation has accumulated within this time. Despite the extensive loss of ancestral variation, the approximately 25% of the genome for which two C. rubella individuals have inherited different founding haplotypes makes up roughly 90% of the genetic variation between them. To extend these findings, we develop a coalescent model that utilizes the inferred frequency of founding haplotypes and variation within founding haplotypes to estimate that C. rubella was founded by a potentially large number of individuals between 50 and 100 kya, and has subsequently experienced a twenty-fold reduction in its effective population size. As population genomic data from an increasing number of outcrossing/selfing pairs are generated, analyses like the one developed here will facilitate a fine-scaled view of the evolutionary and demographic impact of the transition to self

  1. Chromatin organization and cytological features of carnivorous Genlisea species with large genome size differences

    PubMed Central

    Tran, Trung D.; Cao, Hieu X.; Jovtchev, Gabriele; Novák, Petr; Vu, Giang T. H.; Macas, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-01-01

    The monophyletic carnivorous genus Genlisea (Lentibulariaceae) is characterized by a bi-directional genome size evolution resulting in a 25-fold difference in nuclear DNA content. This is one of the largest ranges found within a genus so far and makes Genlisea an interesting subject to study mechanisms of genome and karyotype evolution. Genlisea nigrocaulis, with 86 Mbp one of the smallest plant genomes, and the 18-fold larger genome of G. hispidula (1,550 Mbp) possess identical chromosome numbers (2n = 40) but differ considerably in chromatin organization, nuclear and cell size. Interphase nuclei of G. nigrocaulis and of related species with small genomes, G. aurea (133 Mbp, 2n ≈ 104) and G. pygmaea (179 Mbp, 2n = 80), are hallmarked by intensely DAPI-stained chromocenters, carrying typical heterochromatin-associated methylation marks (5-methylcytosine, H3K9me2), while in G. hispidula and surprisingly also in the small genome of G. margaretae (184 Mbp, 2n = 38) the heterochromatin marks are more evenly distributed. Probes of tandem repetitive sequences together with rDNA allow the unequivocal discrimination of 13 out of 20 chromosome pairs of G. hispidula. One of the repetitive sequences labeled half of the chromosome set almost homogenously supporting an allopolyploid status of G. hispidula and its close relative G. subglabra (1,622 Mbp, 2n = 40). In G. nigrocaulis 11 chromosome pairs could be individualized using a combination of rDNA and unique genomic probes. The presented data provide a basis for future studies of karyotype evolution within the genus Genlisea. PMID:26347752

  2. Chromatin organization and cytological features of carnivorous Genlisea species with large genome size differences.

    PubMed

    Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Novák, Petr; Vu, Giang T H; Macas, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-01-01

    The monophyletic carnivorous genus Genlisea (Lentibulariaceae) is characterized by a bi-directional genome size evolution resulting in a 25-fold difference in nuclear DNA content. This is one of the largest ranges found within a genus so far and makes Genlisea an interesting subject to study mechanisms of genome and karyotype evolution. Genlisea nigrocaulis, with 86 Mbp one of the smallest plant genomes, and the 18-fold larger genome of G. hispidula (1,550 Mbp) possess identical chromosome numbers (2n = 40) but differ considerably in chromatin organization, nuclear and cell size. Interphase nuclei of G. nigrocaulis and of related species with small genomes, G. aurea (133 Mbp, 2n ≈ 104) and G. pygmaea (179 Mbp, 2n = 80), are hallmarked by intensely DAPI-stained chromocenters, carrying typical heterochromatin-associated methylation marks (5-methylcytosine, H3K9me2), while in G. hispidula and surprisingly also in the small genome of G. margaretae (184 Mbp, 2n = 38) the heterochromatin marks are more evenly distributed. Probes of tandem repetitive sequences together with rDNA allow the unequivocal discrimination of 13 out of 20 chromosome pairs of G. hispidula. One of the repetitive sequences labeled half of the chromosome set almost homogenously supporting an allopolyploid status of G. hispidula and its close relative G. subglabra (1,622 Mbp, 2n = 40). In G. nigrocaulis 11 chromosome pairs could be individualized using a combination of rDNA and unique genomic probes. The presented data provide a basis for future studies of karyotype evolution within the genus Genlisea.

  3. Chromatin organization and cytological features of carnivorous Genlisea species with large genome size differences.

    PubMed

    Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Novák, Petr; Vu, Giang T H; Macas, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-01-01

    The monophyletic carnivorous genus Genlisea (Lentibulariaceae) is characterized by a bi-directional genome size evolution resulting in a 25-fold difference in nuclear DNA content. This is one of the largest ranges found within a genus so far and makes Genlisea an interesting subject to study mechanisms of genome and karyotype evolution. Genlisea nigrocaulis, with 86 Mbp one of the smallest plant genomes, and the 18-fold larger genome of G. hispidula (1,550 Mbp) possess identical chromosome numbers (2n = 40) but differ considerably in chromatin organization, nuclear and cell size. Interphase nuclei of G. nigrocaulis and of related species with small genomes, G. aurea (133 Mbp, 2n ≈ 104) and G. pygmaea (179 Mbp, 2n = 80), are hallmarked by intensely DAPI-stained chromocenters, carrying typical heterochromatin-associated methylation marks (5-methylcytosine, H3K9me2), while in G. hispidula and surprisingly also in the small genome of G. margaretae (184 Mbp, 2n = 38) the heterochromatin marks are more evenly distributed. Probes of tandem repetitive sequences together with rDNA allow the unequivocal discrimination of 13 out of 20 chromosome pairs of G. hispidula. One of the repetitive sequences labeled half of the chromosome set almost homogenously supporting an allopolyploid status of G. hispidula and its close relative G. subglabra (1,622 Mbp, 2n = 40). In G. nigrocaulis 11 chromosome pairs could be individualized using a combination of rDNA and unique genomic probes. The presented data provide a basis for future studies of karyotype evolution within the genus Genlisea. PMID:26347752

  4. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes.

  5. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  6. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life

    SciTech Connect

    Elkins, James G.; Kunin, Victor; Anderson, Iain; Barry, Kerrie; Goltsman, Eugene; Lapidus, Alla; Hedlund, Brian; Hugenholtz, Phil; Kyrpides, Nikos; Graham, David; Keller, Martin; Wanner, Gerhard; Richardson, Paul; Stetter, Karl O.

    2007-05-01

    Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signature gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.

  7. Participants' recall and understanding of genomic research and large-scale data sharing.

    PubMed

    Robinson, Jill Oliver; Slashinski, Melody J; Wang, Tao; Hilsenbeck, Susan G; McGuire, Amy L

    2013-10-01

    As genomic researchers are urged to openly share generated sequence data with other researchers, it is important to examine the utility of informed consent documents and processes, particularly as these relate to participants' engagement with and recall of the information presented to them, their objective or subjective understanding of the key elements of genomic research (e.g., data sharing), as well as how these factors influence or mediate the decisions they make. We conducted a randomized trial of three experimental informed consent documents (ICDs) with participants (n = 229) being recruited to genomic research studies; each document afforded varying control over breadth of release of genetic information. Recall and understanding, their impact on data sharing decisions, and comfort in decision making were assessed in a follow-up structured interview. Over 25% did not remember signing an ICD to participate in a genomic study, and the majority (54%) could not correctly identify with whom they had agreed to share their genomic data. However, participants felt that they understood enough to make an informed decision, and lack of recall did not impact final data sharing decisions or satisfaction with participation. These findings raise questions about the types of information participants need in order to provide valid informed consent, and whether subjective understanding and comfort with decision making are sufficient to satisfy the ethical principle of respect for persons.

  8. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats

    PubMed Central

    Corman, Victor Max; Baldwin, Heather J.; Tateno, Adriana Fumie; Zerbinati, Rodrigo Melim; Annan, Augustina; Owusu, Michael; Nkrumah, Evans Ewald; Maganga, Gael Darren; Oppong, Samuel; Adu-Sarkodie, Yaw; Vallo, Peter; da Silva Filho, Luiz Vicente Ribeiro Ferreira; Leroy, Eric M.; Thiel, Volker; van der Hoek, Lia; Poon, Leo L. M.; Tschapka, Marco

    2015-01-01

    ABSTRACT We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E. IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in

  9. Maintenance of Large Numbers of Virus Genomes in Human Cytomegalovirus-Infected T98G Glioblastoma Cells

    PubMed Central

    Duan, Ying-Liang; Ye, Han-Qing; Zavala, Anamaria G.; Yang, Cui-Qing; Miao, Ling-Feng; Fu, Bi-Shi; Seo, Keun Seok; Davrinche, Christian

    2014-01-01

    ABSTRACT After infection, human cytomegalovirus (HCMV) persists for life. Primary infections and reactivation of latent virus can both result in congenital infection, a leading cause of central nervous system birth defects. We previously reported long-term HCMV infection in the T98G glioblastoma cell line (1). HCMV infection has been further characterized in T98Gs, emphasizing the presence of HCMV DNA over an extended time frame. T98Gs were infected with either HCMV Towne or AD169-IE2-enhanced green fluorescent protein (eGFP) strains. Towne infections yielded mixed IE1 antigen-positive and -negative (Ag+/Ag−) populations. AD169-IE2-eGFP infections also yielded mixed populations, which were sorted to obtain an IE2− (Ag−) population. Viral gene expression over the course of infection was determined by immunofluorescent analysis (IFA) and reverse transcription-PCR (RT-PCR). The presence of HCMV genomes was determined by PCR, nested PCR (n-PCR), and fluorescence in situ hybridization (FISH). Compared to the HCMV latency model, THP-1, Towne-infected T98Gs expressed IE1 and latency-associated transcripts for longer periods, contained many more HCMV genomes during early passages, and carried genomes for a greatly extended period of passaging. Large numbers of HCMV genomes were also found in purified Ag− AD169-infected cells for the first several passages. Interestingly, latency transcripts were observed from very early times in the Towne-infected cells, even when IE1 was expressed at low levels. Although AD169-infected Ag− cells expressed no detectable levels of either IE1 or latency transcripts, they also maintained large numbers of genomes within the cell nuclei for several passages. These results identify HCMV-infected T98Gs as an attractive new model in the study of the long-term maintenance of virus genomes in the context of neural cell types. IMPORTANCE Our previous work showed that T98G glioblastoma cells were semipermissive to HCMV infection; virus

  10. Merlin: Computer-Aided Oligonucleotide Design for Large Scale Genome Engineering with MAGE.

    PubMed

    Quintin, Michael; Ma, Natalie J; Ahmed, Samir; Bhatia, Swapnil; Lewis, Aaron; Isaacs, Farren J; Densmore, Douglas

    2016-06-17

    Genome engineering technologies now enable precise manipulation of organism genotype, but can be limited in scalability by their design requirements. Here we describe Merlin ( http://merlincad.org ), an open-source web-based tool to assist biologists in designing experiments using multiplex automated genome engineering (MAGE). Merlin provides methods to generate pools of single-stranded DNA oligonucleotides (oligos) for MAGE experiments by performing free energy calculation and BLAST scoring on a sliding window spanning the targeted site. These oligos are designed not only to improve recombination efficiency, but also to minimize off-target interactions. The application further assists experiment planning by reporting predicted allelic replacement rates after multiple MAGE cycles, and enables rapid result validation by generating primer sequences for multiplexed allele-specific colony PCR. Here we describe the Merlin oligo and primer design procedures and validate their functionality compared to OptMAGE by eliminating seven AvrII restriction sites from the Escherichia coli genome.

  11. A large genome centre’s improvements to the Illumina sequencing system

    PubMed Central

    Quail, Michael A.; Kozarewa, Iwanka; Smith, Frances; Scally, Aylwyn; Stephens, Philip J.; Durbin, Richard; Swerdlow, Harold; Turner, Daniel J.

    2008-01-01

    Preface The Wellcome Trust Sanger Institute is one of the world’s largest genome centres, and a substantial amount of our sequencing is performed on ‘next generation’ massively parallel sequencing technologies: in June 2008 the quantity of purity filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64gigabases (Gb). Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high throughput environment, to reduce bias, tighten insert size distribution, and reliably obtain high yields of data. PMID:19034268

  12. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    SciTech Connect

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  13. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed.

    PubMed

    Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L

    2016-04-01

    In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni  < 0.10. However, there was an increasing number of SNPs with a decreasing level of allele frequency changes over time, representing a continuous profile across the genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population.

  14. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages.

  15. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages. PMID:19874726

  16. Kmasker--a tool for in silico prediction of single-copy FISH probes for the large-genome species Hordeum vulgare.

    PubMed

    Schmutzer, T; Ma, L; Pousarebani, N; Bull, F; Stein, N; Houben, A; Scholz, U

    2014-01-01

    Specific localization of large genomic fragments by fluorescence in situ hybridization (FISH) is challenging in large- genome plant species due to the high content of repetitive sequences. We report the automated work flow (Kmasker) for in silico extraction of unique genomic sequences of large genomic fragments suitable for FISH in barley. This method can be widely used for the integration of genetic and cytogenetic maps in plants and other species with large and complex genomes if the probe sequence (e.g. BACs, sequence contigs) and a low coverage (8-fold) of unassembled sequences of the species of interest are available. Kmasker has been made publicly available as a web tool at http://webblast.ipk-gatersleben.de/kmasker. PMID:24335088

  17. A Protocol for mtGenome Analysis on Large Sample Numbers

    PubMed Central

    Hamoy, Igor G; Ribeiro-dos-Santos, André M; Alvarez, Luiz; Barbosa, Silvanira; Silva, Artur; Santos, Sidney; Gusmão, Leonor; Ribeiro-dos-Santos, Ândrea

    2014-01-01

    The mitochondrial genome is widely studied in a variety of fields, such as population, forensic, and human and medical genetics. Most studies have been limited to a small portion of the sequence that, although highly diverse, does not describe the total variability. The arrival of modern high-throughput sequencing technologies has made it possible to investigate larger sequences in a shorter amount of time as well as in a more affordable fashion. This work aims to describe a protocol for sequencing and analyzing the complete mitochondrial genome with the Ion PGM™ platform. To evaluate the protocol, the mitochondrial genome was sequenced to approximately 210 Mbp, with high-quality sequences distributed between 12 samples that had an average coverage of 1023× per sample. Several variant callers were compared to improve the protocol outcome. The results suggest that it is possible to run up to 120 samples per run without any loss of any significant quality. Therefore, this protocol is an efficient and accurate tool for full mitochondrial genome analysis. PMID:25002812

  18. Discovery of novel phosphonate natural products and their biosynthetic pathways by large-scale genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome mining has revolutionized the field of natural products, providing hope that new antibiotics can be discovered in time before all remainders are rendered useless against multidrug resistant pathogens. While this approach has been successful in academic settings focused on small collections or...

  19. From RNA-seq to large-scale genotyping - genomics resources for rye (Secale cereale L.)

    PubMed Central

    2011-01-01

    Background The improvement of agricultural crops with regard to yield, resistance and environmental adaptation is a perpetual challenge for both breeding and research. Exploration of the genetic potential and implementation of genome-based breeding strategies for efficient rye (Secale cereale L.) cultivar improvement have been hampered by the lack of genome sequence information. To overcome this limitation we sequenced the transcriptomes of five winter rye inbred lines using Roche/454 GS FLX technology. Results More than 2.5 million reads were assembled into 115,400 contigs representing a comprehensive rye expressed sequence tag (EST) resource. From sequence comparisons 5,234 single nucleotide polymorphisms (SNPs) were identified to develop the Rye5K high-throughput SNP genotyping array. Performance of the Rye5K SNP array was investigated by genotyping 59 rye inbred lines including the five lines used for sequencing, and five barley, three wheat, and two triticale accessions. A balanced distribution of allele frequencies ranging from 0.1 to 0.9 was observed. Residual heterozygosity of the rye inbred lines varied from 4.0 to 20.4% with higher average heterozygosity in the pollen compared to the seed parent pool. Conclusions The established sequence and molecular marker resources will improve and promote genetic and genomic research as well as genome-based breeding in rye. PMID:21951788

  20. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species.

    PubMed

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G; O'Brien, Stephen J; Johnson, Warren E

    2006-02-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2-telomeric region vs. D2-centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species.

  1. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species.

    PubMed

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G; O'Brien, Stephen J; Johnson, Warren E

    2006-02-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2-telomeric region vs. D2-centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species. PMID:16380222

  2. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  3. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.

    PubMed

    Lowery, Robert K; Uribe, Gabriel; Jimenez, Eric B; Weiss, Mark A; Herrera, Kristian J; Regueiro, Maria; Herrera, Rene J

    2013-11-01

    Analyses of the genetic relationships among modern humans, Neanderthals and Denisovans have suggested that 1-4% of the non-Sub-Saharan African gene pool may be Neanderthal derived, while 6-8% of the Melanesian gene pool may be the product of admixture between the Denisovans and the direct ancestors of Melanesians. In the present study, we analyzed single nucleotide polymorphism (SNP) diversity among a worldwide collection of contemporary human populations with respect to the genetic constitution of these two archaic hominins and Pan troglodytes (chimpanzee). We partitioned SNPs into subsets, including those that are derived in both archaic lineages, those that are ancestral in both archaic lineages and those that are only derived in one archaic lineage. By doing this, we have conducted separate examinations of subsets of mutations with higher probabilities of divergent phylogenetic origins. While previous investigations have excluded SNPs from common ancestors in principal component analyses, we included common ancestral SNPs in our analyses to visualize the relative placement of the Neanderthal and Denisova among human populations. To assess the genetic similarities among the various hominin lineages, we performed genetic structure analyses to provide a comparison of genetic patterns found within contemporary human genomes that may have archaic or common ancestral roots. Our results indicate that 3.6% of the Neanderthal genome is shared with roughly 65.4% of the average European gene pool, which clinally diminishes with distance from Europe. Our results suggest that Neanderthal genetic associations with contemporary non-Sub-Saharan African populations, as well as the genetic affinities observed between Denisovans and Melanesians most likely result from the retention of ancient mutations in these populations. PMID:23872234

  4. Large-scale appearance of ultraconserved elements in tetrapod genomes and slowdown of the molecular clock.

    PubMed

    Stephen, Stuart; Pheasant, Michael; Makunin, Igor V; Mattick, John S

    2008-02-01

    Mammalian genomes contain millions of highly conserved noncoding sequences, many of which are regulatory. The most extreme examples are the 481 ultraconserved elements (UCEs) that are identical over at least 200 bp in human, mouse, and rat and show 96% identity with chicken, which diverged approximately 310 MYA. If the substitution rate in UCEs remained constant, these elements should also be present with a high level of identity in fish (approximately 450 Myr), but this is not the case, suggesting that many appeared in the amniotes or tetrapods or that the molecular clock has slowed down in these lineages, or both. Taking advantage of the availability of multiple genomes, we identified 13,736 UCEs in the human genome that are identical over at least 100 bp in at least 3 of 5 placental mammals, including 2,189 sequences over at least 200 bp, thereby greatly expanding the repertoire of known UCEs, and investigated the evolution of these sequences in opossum, chicken, frog, and fish. We conclude that there was a massive genome-wide acquisition and expansion of UCEs during tetrapod and then amniote evolution, accompanied by a slowdown of the molecular clock, particularly in the amniotes, a process consistent with their functional exaptation in these lineages. The majority of tetrapod-specific UCEs are noncoding and associated with genes involved in regulation of transcription and development. In contrast, fish genomes contain relatively few UCEs, the majority of which are common to all bony vertebrates. These elements are different from other conserved noncoding elements and appear to be important regulatory innovations that became fixed following the emergence of vertebrates from the sea to the land.

  5. Ancestral Rocky Mountian Tectonics: A Sedimentary Record of Ancestral Front Range and Uncompahgre Exhumation

    NASA Astrophysics Data System (ADS)

    Smith, T. M.; Saylor, J. E.; Lapen, T. J.

    2015-12-01

    The Ancestral Rocky Mountains (ARM) encompass multiple crustal provinces with characteristic crystallization ages across the central and western US. Two driving mechanisms have been proposed to explain ARM deformation. (1) Ouachita-Marathon collision SE of the ARM uplifts has been linked to an E-to-W sequence of uplift and is consistent with proposed disruption of a larger Paradox-Central Colorado Trough Basin by exhumation of the Uncompahgre Uplift. Initial exhumation of the Amarillo-Wichita Uplift to the east would provide a unique ~530 Ma signal absent from source areas to the SW, and result in initial exhumation of the Ancestral Front Range. (2) Alternatively, deformation due to flat slab subduction along a hypothesized plate boundary to the SW suggests a SW-to-NE younging of exhumation. This hypothesis suggests a SW-derived Grenville signature, and would trigger uplift of the Uncompahgre first. We analyzed depositional environments, sediment dispersal patterns, and sediment and basement zircon U-Pb and (U-Th)/He ages in 3 locations in the Paradox Basin and Central Colorado Trough (CCT). The Paradox Basin exhibits an up-section transition in fluvial style that suggests a decrease in overbank stability and increased lateral migration. Similarly, the CCT records a long-term progradation of depositional environments from marginal marine to fluvial, indicating that sediment supply in both basins outpaced accommodation. Preliminary provenance results indicate little to no input from the Amarillo-Wichita uplift in either basin despite uniformly westward sediment dispersal systems in both basins. Results also show that the Uncompahgre Uplift was the source for sediment throughout Paradox Basin deposition. These observations are inconsistent with the predictions of scenario 1 above. Rather, they suggest either a synchronous response to tectonic stress across the ARM provinces or an SW-to-NE pattern of deformation.

  6. Minimal genome: Worthwhile or worthless efforts toward being smaller?

    PubMed

    Choe, Donghui; Cho, Suhyung; Kim, Sun Chang; Cho, Byung-Kwan

    2016-02-01

    Microbial cells are versatile hosts for the production of value-added products due to the well-established background knowledge, various genetic tools, and ease of manipulation. Despite those advantages, efficiency of newly incorporated synthetic pathways in microbial cells is frequently limited by innate metabolism, product toxicity, and growth-mediated genetic instability. To overcome those obstacles, a minimal genome harboring only the essential set of genes was proposed, which is a fascinating concept with potential for use as a platform strain. Here, we review the currently available artificial reduced genomes and discuss the prospects for extending use of the genome-reduced strains as programmable chasses. The genome-reduced strains generally showed comparable growth to and higher productivity than their ancestral strains. In Escherichia coli, about 300 genes are estimated as the minimal number of genes under laboratory conditions. However, recent advances revealed that there are non-essential components in essential genes, suggesting that the design principle of minimal genomes should be reconstructed. Current technology is not efficient enough to reduce large amount of interspaced genomic regions or to synthesize the genome. Furthermore, construction of minimal genome frequently has failed due to lack of genomic information. Technological breakthroughs and intense systematic studies on genomes remain tasks.

  7. Reconstructed Ancestral Myo-Inositol-3-Phosphate Synthases Indicate That Ancestors of the Thermococcales and Thermotoga Species Were More Thermophilic than Their Descendants

    PubMed Central

    Butzin, Nicholas C.; Lapierre, Pascal; Green, Anna G.; Swithers, Kristen S.; Gogarten, J. Peter; Noll, Kenneth M.

    2013-01-01

    The bacterial genomes of Thermotoga species show evidence of significant interdomain horizontal gene transfer from the Archaea. Members of this genus acquired many genes from the Thermococcales, which grow at higher temperatures than Thermotoga species. In order to study the functional history of an interdomain horizontally acquired gene we used ancestral sequence reconstruction to examine the thermal characteristics of reconstructed ancestral proteins of the Thermotoga lineage and its archaeal donors. Several ancestral sequence reconstruction methods were used to determine the possible sequences of the ancestral Thermotoga and Archaea myo-inositol-3-phosphate synthase (MIPS). These sequences were predicted to be more thermostable than the extant proteins using an established sequence composition method. We verified these computational predictions by measuring the activities and thermostabilities of purified proteins from the Thermotoga and the Thermococcales species, and eight ancestral reconstructed proteins. We found that the ancestral proteins from both the archaeal donor and the Thermotoga most recent common ancestor recipient were more thermostable than their descendants. We show that there is a correlation between the thermostability of MIPS protein and the optimal growth temperature (OGT) of its host, which suggests that the OGT of the ancestors of these species of Archaea and the Thermotoga grew at higher OGTs than their descendants. PMID:24391933

  8. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots

    PubMed Central

    Wang, Yupeng; Ficklin, Stephen P.; Wang, Xiyin; Feltus, F. Alex; Paterson, Andrew H.

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots. PMID:27195960

  9. Large, Male Germ Cell-Specific Hypomethylated DNA Domains With Unique Genomic and Epigenomic Features on the Mouse X Chromosome

    PubMed Central

    Ikeda, Rieko; Shiura, Hirosuke; Numata, Koji; Sugimoto, Michihiko; Kondo, Masayo; Mise, Nathan; Suzuki, Masako; Greally, John M.; Abe, Kuniya

    2013-01-01

    To understand the epigenetic regulation required for germ cell-specific gene expression in the mouse, we analysed DNA methylation profiles of developing germ cells using a microarray-based assay adapted for a small number of cells. The analysis revealed differentially methylated sites between cell types tested. Here, we focused on a group of genomic sequences hypomethylated specifically in germline cells as candidate regions involved in the epigenetic regulation of germline gene expression. These hypomethylated sequences tend to be clustered, forming large (10 kb to ∼9 Mb) genomic domains, particularly on the X chromosome of male germ cells. Most of these regions, designated here as large hypomethylated domains (LoDs), correspond to segmentally duplicated regions that contain gene families showing germ cell- or testis-specific expression, including cancer testis antigen genes. We found an inverse correlation between DNA methylation level and expression of genes in these domains. Most LoDs appear to be enriched with H3 lysine 9 dimethylation, usually regarded as a repressive histone modification, although some LoD genes can be expressed in male germ cells. It thus appears that such a unique epigenomic state associated with the LoDs may constitute a basis for the specific expression of genes contained in these genomic domains. PMID:23861320

  10. Retroviral envelope syncytin capture in an ancestrally diverged mammalian clade for placentation in the primitive Afrotherian tenrecs.

    PubMed

    Cornelis, Guillaume; Vernochet, Cécile; Malicorne, Sébastien; Souquere, Sylvie; Tzika, Athanasia C; Goodman, Steven M; Catzeflis, François; Robinson, Terence J; Milinkovitch, Michel C; Pierron, Gérard; Heidmann, Odile; Dupressoir, Anne; Heidmann, Thierry

    2014-10-14

    Syncytins are fusogenic envelope (env) genes of retroviral origin that have been captured for a function in placentation. Syncytins have been identified in Euarchontoglires (primates, rodents, Leporidae) and Laurasiatheria (Carnivora, ruminants) placental mammals. Here, we searched for similar genes in species that retained characteristic features of primitive mammals, namely the Malagasy and mainland African Tenrecidae. They belong to the superorder Afrotheria, an early lineage that diverged from Euarchotonglires and Laurasiatheria 100 Mya, during the Cretaceous terrestrial revolution. An in silico search for env genes with full coding capacity within a Tenrecidae genome identified several candidates, with one displaying placenta-specific expression as revealed by RT-PCR analysis of a large panel of Setifer setosus tissues. Cloning of this endogenous retroviral env gene demonstrated fusogenicity in an ex vivo cell-cell fusion assay on a panel of mammalian cells. Refined analysis of placental architecture and ultrastructure combined with in situ hybridization demonstrated specific expression of the gene in multinucleate cellular masses and layers at the materno-fetal interface, consistent with a role in syncytium formation. This gene, which we named "syncytin-Ten1," is conserved among Tenrecidae, with evidence of purifying selection and conservation of fusogenic activity. To our knowledge, it is the first syncytin identified to date within the ancestrally diverged Afrotheria superorder.

  11. Retroviral envelope syncytin capture in an ancestrally diverged mammalian clade for placentation in the primitive Afrotherian tenrecs

    PubMed Central

    Cornelis, Guillaume; Vernochet, Cécile; Malicorne, Sébastien; Souquere, Sylvie; Tzika, Athanasia C.; Goodman, Steven M.; Catzeflis, François; Robinson, Terence J.; Milinkovitch, Michel C.; Pierron, Gérard; Heidmann, Odile; Dupressoir, Anne; Heidmann, Thierry

    2014-01-01

    Syncytins are fusogenic envelope (env) genes of retroviral origin that have been captured for a function in placentation. Syncytins have been identified in Euarchontoglires (primates, rodents, Leporidae) and Laurasiatheria (Carnivora, ruminants) placental mammals. Here, we searched for similar genes in species that retained characteristic features of primitive mammals, namely the Malagasy and mainland African Tenrecidae. They belong to the superorder Afrotheria, an early lineage that diverged from Euarchotonglires and Laurasiatheria 100 Mya, during the Cretaceous terrestrial revolution. An in silico search for env genes with full coding capacity within a Tenrecidae genome identified several candidates, with one displaying placenta-specific expression as revealed by RT-PCR analysis of a large panel of Setifer setosus tissues. Cloning of this endogenous retroviral env gene demonstrated fusogenicity in an ex vivo cell–cell fusion assay on a panel of mammalian cells. Refined analysis of placental architecture and ultrastructure combined with in situ hybridization demonstrated specific expression of the gene in multinucleate cellular masses and layers at the materno–fetal interface, consistent with a role in syncytium formation. This gene, which we named “syncytin-Ten1,” is conserved among Tenrecidae, with evidence of purifying selection and conservation of fusogenic activity. To our knowledge, it is the first syncytin identified to date within the ancestrally diverged Afrotheria superorder. PMID:25267646

  12. Linkage-disequilibrium mapping of disease genes by reconstruction of ancestral haplotypes in founder populations.

    PubMed Central

    Service, S K; Lang, D W; Freimer, N B; Sandkuijl, L A

    1999-01-01

    Linkage disequilibrium (LD) mapping may be a powerful means for genome screening to identify susceptibility loci for common diseases. A new statistical approach for detection of LD around a disease gene is presented here. This method compares the distribution of haplotypes in affected individuals versus that expected for individuals descended from a common ancestor who carried a mutation of the disease gene. Simulations demonstrate that this method, which we term "ancestral haplotype reconstruction" (AHR), should be powerful for genome screening of phenotypes characterized by a high degree of etiologic heterogeneity, even with currently available marker maps. AHR is best suited to application in isolated populations where affected individuals are relatively recently descended (< approximately 25 generations) from a common disease mutation-bearing founder. PMID:10330361

  13. First complete genome sequence of a capsicum chlorosis tospovirus isolate from Australia with an unusually large S RNA intergenic region.

    PubMed

    Widana Gamage, Shirani; Persley, Denis M; Higgins, Colleen M; Dietzgen, Ralf G

    2015-03-01

    The first complete genome sequence of capsicum chlorosis virus (CaCV) from Australia was determined using a combination of Illumina HiSeq RNA and Sanger sequencing technologies. Australian CaCV had a tripartite genome structure like other CaCV isolates. The large (L) RNA was 8913 nucleotides (nt) in length and contained a single open reading frame (ORF) of 8634 nt encoding a predicted RNA-dependent RNA polymerase (RdRp) in the viral-complementary (vc) sense. The medium (M) and small (S) RNA segments were 4846 and 3944 nt in length, respectively, each containing two non-overlapping ORFs in ambisense orientation, separated by intergenic regions (IGR). The M segment contained ORFs encoding the predicted non-structural movement protein (NSm; 927 nt) and precursor of glycoproteins (GP; 3366 nt) in the viral sense (v) and vc strand, respectively, separated by a 449-nt IGR. The S segment coded for the predicted nucleocapsid (N) protein (828 nt) and non-structural suppressor of silencing protein (NSs; 1320 nt) in the vc and v strand, respectively. The S RNA contained an IGR of 1663 nt, being the largest IGR of all CaCV isolates sequenced so far. Comparison of the Australian CaCV genome with complete CaCV genome sequences from other geographic regions showed highest sequence identity with a Taiwanese isolate. Genome sequence comparisons and phylogeny of all available CaCV isolates provided evidence for at least two highly diverged groups of CaCV isolates that may warrant re-classification of AIT-Thailand and CP-China isolates as unique tospoviruses, separate from CaCV.

  14. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome

    PubMed Central

    van Opijnen, Tim; Bento, José

    2016-01-01

    The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic’s mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable. PMID:27607357

  15. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome.

    PubMed

    van Opijnen, Tim; Dedrick, Sandra; Bento, José

    2016-09-01

    The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic's mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable. PMID:27607357

  16. Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference

    PubMed Central

    Shen, Sam H.; Stauft, Charles B.; Gorbatsevych, Oleksandr; Song, Yutong; Ward, Charles B.; Yurovsky, Alisa; Mueller, Steffen; Futcher, Bruce; Wimmer, Eckard

    2015-01-01

    The protein synthesis machineries of two distinct phyla of the Animal kingdom, insects of Arthropoda and mammals of Chordata, have different preferences for how to best encode proteins. Nevertheless, arboviruses (arthropod-borne viruses) are capable of infecting both mammals and insects just like arboviruses that use insect vectors to infect plants. These organisms have evolved carefully balanced genomes that can efficiently use the translational machineries of different phyla, even if the phyla belong to different kingdoms. Using dengue virus as an example, we have undone the genome encoding balance and specifically shifted the encoding preference away from mammals. These mammalian-attenuated viruses grow to high titers in insect cells but low titers in mammalian cells, have dramatically increased LD50s in newborn mice, and induce high levels of protective antibodies. Recoded arboviruses with a bias toward phylum-specific expression could form the basis of a new generation of live attenuated vaccine candidates. PMID:25825721

  17. Large-scale recoding of an arbovirus genome to rebalance its insect versus mammalian preference.

    PubMed

    Shen, Sam H; Stauft, Charles B; Gorbatsevych, Oleksandr; Song, Yutong; Ward, Charles B; Yurovsky, Alisa; Mueller, Steffen; Futcher, Bruce; Wimmer, Eckard

    2015-04-14

    The protein synthesis machineries of two distinct phyla of the Animal kingdom, insects of Arthropoda and mammals of Chordata, have different preferences for how to best encode proteins. Nevertheless, arboviruses (arthropod-borne viruses) are capable of infecting both mammals and insects just like arboviruses that use insect vectors to infect plants. These organisms have evolved carefully balanced genomes that can efficiently use the translational machineries of different phyla, even if the phyla belong to different kingdoms. Using dengue virus as an example, we have undone the genome encoding balance and specifically shifted the encoding preference away from mammals. These mammalian-attenuated viruses grow to high titers in insect cells but low titers in mammalian cells, have dramatically increased LD50s in newborn mice, and induce high levels of protective antibodies. Recoded arboviruses with a bias toward phylum-specific expression could form the basis of a new generation of live attenuated vaccine candidates.

  18. Multiple recent horizontal transfers of a large genomic region in cheese making fungi

    PubMed Central

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti—called Wallaby—present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes. PMID:24407037

  19. Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

    PubMed

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.

  20. Completion of the swine genome will simplify the production of swine as a large animal biomedical model

    PubMed Central

    2012-01-01

    Background Anatomic and physiological similarities to the human make swine an excellent large animal model for human health and disease. Methods Cloning from a modified somatic cell, which can be determined in cells prior to making the animal, is the only method available for the production of targeted modifications in swine. Results Since some strains of swine are similar in size to humans, technologies that have been developed for swine can be readily adapted to humans and vice versa. Here the importance of swine as a biomedical model, current technologies to produce genetically enhanced swine, current biomedical models, and how the completion of the swine genome will promote swine as a biomedical model are discussed. Conclusions The completion of the swine genome will enhance the continued use and development of swine as models of human health, syndromes and conditions. PMID:23151353

  1. Genomic diversity of large-plaque-forming podoviruses infecting the phytopathogen Ralstonia solanacearum.

    PubMed

    Kawasaki, Takeru; Narulita, Erlia; Matsunami, Minaho; Ishikawa, Hiroki; Shimizu, Mio; Fujie, Makoto; Bhunchoth, Anjana; Phironrit, Namthip; Chatchawankanphanich, Orawan; Yamada, Takashi

    2016-05-01

    The genome organization, gene structure, and host range of five podoviruses that infect Ralstonia solanacearum, the causative agent of bacterial wilt disease were characterized. The phages fell into two distinctive groups based on the genome position of the RNA polymerase gene (i.e., T7-type and ϕKMV-type). One-step growth experiments revealed that ϕRSB2 (a T7-like phage) lysed host cells more efficiently with a shorter infection cycle (ca. 60 min corresponding to half the doubling time of the host) than ϕKMV-like phages such as ϕRSB1 (with an infection cycle of ca. 180 min). Co-infection experiments with ϕRSB1 and ϕRSB2 showed that ϕRSB2 always predominated in the phage progeny independent of host strains. Most phages had wide host-ranges and the phage particles usually did not attach to the resistant strains; when occasionally some did, the phage genome was injected into the resistant strain's cytoplasm, as revealed by fluorescence microscopy with SYBR Gold-labeled phage particles. PMID:26901487

  2. Genomic diversity of large-plaque-forming podoviruses infecting the phytopathogen Ralstonia solanacearum.

    PubMed

    Kawasaki, Takeru; Narulita, Erlia; Matsunami, Minaho; Ishikawa, Hiroki; Shimizu, Mio; Fujie, Makoto; Bhunchoth, Anjana; Phironrit, Namthip; Chatchawankanphanich, Orawan; Yamada, Takashi

    2016-05-01

    The genome organization, gene structure, and host range of five podoviruses that infect Ralstonia solanacearum, the causative agent of bacterial wilt disease were characterized. The phages fell into two distinctive groups based on the genome position of the RNA polymerase gene (i.e., T7-type and ϕKMV-type). One-step growth experiments revealed that ϕRSB2 (a T7-like phage) lysed host cells more efficiently with a shorter infection cycle (ca. 60 min corresponding to half the doubling time of the host) than ϕKMV-like phages such as ϕRSB1 (with an infection cycle of ca. 180 min). Co-infection experiments with ϕRSB1 and ϕRSB2 showed that ϕRSB2 always predominated in the phage progeny independent of host strains. Most phages had wide host-ranges and the phage particles usually did not attach to the resistant strains; when occasionally some did, the phage genome was injected into the resistant strain's cytoplasm, as revealed by fluorescence microscopy with SYBR Gold-labeled phage particles.

  3. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  4. The 'inner circle' of the cereal genomes.

    PubMed

    Bolot, Stéphanie; Abrouk, Michael; Masood-Quraishi, Umar; Stein, Nils; Messing, Joachim; Feuillet, Catherine; Salse, Jérôme

    2009-04-01

    Early marker-based macrocolinearity studies between the grass genomes led to arranging their chromosomes into concentric 'crop circles' of synteny blocks that initially consisted of 30 rice-independent linkage groups representing the ancestral cereal genome structure. Recently, increased marker density and genome sequencing of several cereal genomes allowed the characterization of intragenomic duplications and their integration with intergenomic colinearity data to identify paleo-duplications and propose a model for the evolution of the grass genomes from a common ancestor. On the basis of these data an 'inner circle' comprising five ancestral chromosomes was defined providing a new reference for the grass chromosomes and new insights into their ancestral relationships and origin, as well as an efficient tool to design cross-genome markers for genetic studies.

  5. Mechanisms for the Evolution of a Derived Function in the Ancestral Glucocorticoid Receptor

    PubMed Central

    Carroll, Sean Michael; Ortlund, Eric A.; Thornton, Joseph W.

    2011-01-01

    Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs—reduced sensitivity to all hormones and increased selectivity for glucocorticoids—are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR–MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and

  6. Mechanisms for the Evolution of a Derived Function in the Ancestral Glucocorticoid Receptor

    SciTech Connect

    Carroll, Sean Michael; Ortlund, Eric A; Thornton, Joseph W.

    2012-03-16

    Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs - reduced sensitivity to all hormones and increased selectivity for glucocorticoids - are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR-MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and

  7. Characterization of genomic imbalances in diffuse large B-cell lymphoma by detailed SNP-chip analysis.

    PubMed

    Scholtysik, René; Kreuz, Markus; Hummel, Michael; Rosolowski, Maciej; Szczepanowski, Monika; Klapper, Wolfram; Loeffler, Markus; Trümper, Lorenz; Siebert, Reiner; Küppers, Ralf

    2015-03-01

    The pathogenesis of diffuse large B-cell lymphomas (DLBCL) is only partly understood. We analyzed 148 DLBCL by single nucleotide polymorphism (SNP)-chips to characterize genomic imbalances. Seventy-nine cases were of the germinal center B-cell like (GCB) type of DLBCL, 49 of the activated B-cell like (ABC) subtype and 20 were unclassified DLBCL. Twenty-four regions of recurrent genomic gains and 38 regions of recurrent genomic losses were identified over the whole cohort, with a median of 25 imbalances per case for ABC-DLBCL and 19 per case for GCB-DLBCL. Several recurrent copy number changes showed differential frequencies in the GCB- and ABC-DLBCL subgroups, including gains of HDAC7A predominantly in GCB-DLBCL (38% of cases) and losses of BACH2 and CASP8AP2 predominantly in ABC-DLBCL (35%), hinting at disparate pathogenetic mechanisms in these entities. Correlating gene expression and copy number revealed a strong gene dosage effect in all tumors, with 34% of probesets showing a concordant expression change in affected regions. Two new potential tumor suppressor genes emerging from the analysis, CASP3 and IL5RA, were sequenced in ten and 16 candidate cases, respectively. However, no mutations were found, pointing to a potential haploinsufficiency effect of these genes, considering their reduced expression in cases with deletions. Our study thus describes differences and similarities in the landscape of genomic aberrations in the DLBCL subgroups in a large collection of cases, confirming already known targets, but also discovering novel copy number changes with possible pathogenetic relevance.

  8. Draft Genome Sequence of Agreia bicolorata Strain AC-1804, a Producer of Large Amounts of Carotenoid Pigments, Isolated from Narrow Reed Grass Infected by the Phytoparasitic Nematode

    PubMed Central

    Siniagina, Maria; Malanin, Sergey; Boulygina, Eugenia; Grygoryeva, Tatiana; Yarullina, Dina; Ilinskaya, Olga

    2015-01-01

    Here, we report the draft genome sequence of Agreia bicolorata strain AC-1804, isolated from narrow reed grass galls induced by a plant-parasitic nematode which is able to produce large amounts of carotenoid pigments. The draft genome sequence of 3,919,485 bp provides a resource for carotenoid pathway research. PMID:26634752

  9. An experimental phylogeny to benchmark ancestral sequence reconstruction

    PubMed Central

    Randall, Ryan N.; Radford, Caelan E.; Roof, Kelsey A.; Natarajan, Divya K.; Gaucher, Eric A.

    2016-01-01

    Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as ‘modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687

  10. An experimental phylogeny to benchmark ancestral sequence reconstruction.

    PubMed

    Randall, Ryan N; Radford, Caelan E; Roof, Kelsey A; Natarajan, Divya K; Gaucher, Eric A

    2016-01-01

    Ancestral sequence reconstruction (ASR) is a still-burgeoning method that has revealed many key mechanisms of molecular evolution. One criticism of the approach is an inability to validate its algorithms within a biological context as opposed to a computer simulation. Here we build an experimental phylogeny using the gene of a single red fluorescent protein to address this criticism. The evolved phylogeny consists of 19 operational taxonomic units (leaves) and 17 ancestral bifurcations (nodes) that display a wide variety of fluorescent phenotypes. The 19 leaves then serve as 'modern' sequences that we subject to ASR analyses using various algorithms and to benchmark against the known ancestral genotypes and ancestral phenotypes. We confirm computer simulations that show all algorithms infer ancient sequences with high accuracy, yet we also reveal wide variation in the phenotypes encoded by incorrectly inferred sequences. Specifically, Bayesian methods incorporating rate variation significantly outperform the maximum parsimony criterion in phenotypic accuracy. Subsampling of extant sequences had minor effect on the inference of ancestral sequences. PMID:27628687

  11. Phylogeographic ancestral inference using the coalescent model on haplotype trees.

    PubMed

    Manolopoulou, Ioanna; Emerson, Brent C

    2012-06-01

    Phylogeographic ancestral inference is issue frequently arising in population ecology that aims to understand the geographical roots and structure of species. Here, we specifically address relatively small scale mtDNA datasets (typically less than 500 sequences with fewer than 1000 nucleotides), focusing on ancestral location inference. Our approach uses a coalescent modelling framework projected onto haplotype trees in order to reduce computational complexity, at the same time adhering to complex evolutionary processes. Statistical innovations of the last few years have allowed for computationally feasible yet accurate inferences in phylogenetic frameworks. We implement our methods on a set of synthetic datasets and show how, despite high uncertainty in terms of identifying the root haplotype, estimation of the ancestral location naturally encompasses lower uncertainty, allowing us to pinpoint the Maximum A Posteriori estimates for ancestral locations. We exemplify our methods on a set of synthetic datasets and then combine our inference methods with the phylogeographic clustering approach presented in Manolopoulou et al. (2011) on a real dataset from weevils in the Iberian peninsula in order to infer ancestral locations as well as population substructure.

  12. Male androphilia in the ancestral environment. An ethnological analysis.

    PubMed

    VanderLaan, Doug P; Ren, Zhiyuan; Vasey, Paul L

    2013-12-01

    The kin selection hypothesis posits that male androphilia (male sexual attraction to adult males) evolved because androphilic males invest more in kin, thereby enhancing inclusive fitness. Increased kin-directed altruism has been repeatedly documented among a population of transgendered androphilic males, but never among androphilic males in other cultures who adopt gender identities as men. Thus, the kin selection hypothesis may be viable if male androphilia was expressed in the transgendered form in the ancestral past. Using the Standard Cross-Cultural Sample (SCCS), we examined 46 societies in which male androphilia was expressed in the transgendered form (transgendered societies) and 146 comparison societies (non-transgendered societies). We analyzed SCCS variables pertaining to ancestral sociocultural conditions, access to kin, and societal reactions to homosexuality. Our results show that ancestral sociocultural conditions and bilateral and double descent systems were more common in transgendered than in non-transgendered societies. Across the entire sample, descent systems and residence patterns that would presumably facilitate increased access to kin were associated with the presence of ancestral sociocultural conditions. Among transgendered societies, negative societal attitudes toward homosexuality were unlikely. We conclude that the ancestral human sociocultural environment was likely conducive to the expression of the transgendered form of male androphilia. Descent systems, residence patterns, and societal reactions to homosexuality likely facilitated investments in kin by transgendered males. Given that contemporary transgendered male androphiles appear to exhibit elevated kin-directed altruism, these findings further indicate the viability of the kin selection hypothesis.

  13. The integration of recombination and physical maps in a large-genome monocot using haploid genome analysis in a trihybrid allium population.

    PubMed

    Khrustaleva, L I; de Melo, P E; van Heusden, A W; Kik, C

    2005-03-01

    Integrated mapping in large-genome monocots has been carried out on a limited number of species. Furthermore, integrated maps are difficult to construct for these species due to, among other reasons, the specific plant populations needed. To fill these gaps, Alliums were chosen as target species and a new strategy for constructing suitable populations was developed. This strategy involves the use of trihybrid genotypes in which only one homeolog of a chromosome pair is recombinant due to interspecific recombination. We used genotypes from a trihybrid Allium cepa x (A. roylei x A. fistulosum) population. Recombinant chromosomes 5 and 8 from the interspecific parent were analyzed using genomic in situ hybridization visualization of recombination points and the physical positions of recombination were integrated into AFLP linkage maps of both chromosomes. The integrated maps showed that in Alliums recombination predominantly occurs in the proximal half of chromosome arms and that 57.9% of PstI/MseI markers are located in close proximity to the centromeric region, suggesting the presence of genes in this region. These findings are different from data obtained on cereals, where recombination rate and gene density tends to be higher in distal regions. PMID:15654085

  14. Derived Immune and Ancestral Pigmentation Alleles in a 7,000-Year-old Mesolithic European

    PubMed Central

    Olalde, Iñigo; Allentoft, Morten E.; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W. K.; DeGiorgio, Michael; Prado-Martínez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M.; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G.; Novembre, John; Sturm, Richard A.; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-01-01

    Ancient genomic sequences have started revealing the origin and the demographic impact of Neolithic farmers spreading into Europe1–3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers precludes an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. By sequencing a ~7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León (Spain), we retrieved the first complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across Western and Central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. Hence, these genomic variants cannot represent novel mutations that occurred during the adaptation to the farming lifestyle. PMID:24463515

  15. Magmatism and Epithermal Gold-Silver Deposits of the Southern Ancestral Cascade Arc, Western Nevada and Eastern California

    USGS Publications Warehouse

    John, David A.; du Bray, Edward A.; Henry, Christopher D.; Vikre, Peter

    2015-01-01

    Many epithermal gold-silver deposits are temporally and spatially associated with late Oligocene to Pliocene magmatism of the southern ancestral Cascade arc in western Nevada and eastern California. These deposits, which include both quartz-adularia (low- and intermediate-sulfidation; Comstock Lode, Tonopah, Bodie) and quartz-alunite (high-sulfidation; Goldfield, Paradise Peak) types, were major producers of gold and silver. Ancestral Cascade arc magmatism preceded that of the modern High Cascades arc and reflects subduction of the Farallon plate beneath North America. Ancestral arc magmatism began about 45 Ma, continued until about 3 Ma, and extended from near the Canada-United States border in Washington southward to about 250 km southeast of Reno, Nevada. The ancestral arc was split into northern and southern segments across an inferred tear in the subducting slab between Mount Shasta and Lassen Peak in northern California. The southern segment extends between 42°N in northern California and 37°N in western Nevada and was active from about 30 to 3 Ma. It is bounded on the east by the northeast edge of the Walker Lane. Ancestral arc volcanism represents an abrupt change in composition and style of magmatism relative to that in central Nevada. Large volume, caldera-forming, silicic ignimbrites associated with the 37 to 19 Ma ignimbrite flareup are dominant in central Nevada, whereas volcanic centers of the ancestral arc in western Nevada consist of andesitic stratovolcanoes and dacitic to rhyolitic lava domes that mostly formed between 25 and 4 Ma. Both ancestral arc and ignimbrite flareup magmatism resulted from rollback of the shallowly dipping slab that began about 45 Ma in northeast Nevada and migrated south-southwest with time. Most southern segment ancestral arc rocks have oxidized, high potassium, calc-alkaline compositions with silica contents ranging continuously from about 55 to 77 wt%. Most lavas are porphyritic and contain coarse plagioclase

  16. Comparison of histological grading and large-scale genomic status (DNA ploidy) as prognostic tools in oral dysplasia.

    PubMed

    Sudbø, J; Bryne, M; Johannessen, A C; Kildal, W; Danielsen, H E; Reith, A

    2001-07-01

    Approximately one in ten oral white patches (leukoplakia) are histologically classified as dysplasia, with a well-documented potential for developing into oral squamous cell carcinoma (OSCC). Histological grading in oral dysplasia has limited prognostic value, whereas large-scale genomic status (DNA ploidy, nuclear DNA content) is an early marker of malignant transformation in several tissues. Biopsies from 196 patients with oral leukoplakia histologically typed as dysplasia were investigated. Inter-observer agreement among four experienced pathologists performing a simplified grading was assessed by Cohen's kappa values. For 150 of the 196 cases, it was also possible to assess large-scale genomic status and compare its prognostic impact with that of histological grading. Disease-free survival was estimated by life-table methods, with a mean follow-up time of 103 months (range 4-165 months). The primary considered end-point was the subsequent occurrence of OSCC. For grading of the total of 196 cases, kappa values ranged from 0.17 to 0.33 when three grading groups (mild, moderate, and severe dysplasia) were considered, and from 0.21 to 0.32 when two groups (low grade and high grade) were considered (p=0.41). For the 150 cases in which large-scale genomic status was also assessed, kappa values for the histological grading ranged from 0.21 to 0.33 for three grading groups and from 0.27 to 0.34 for two grading groups (p=0.47). In survival analysis, histological grading was without significant prognostic value for any of the four observers (p 0.14-0.44), in contrast to DNA ploidy (p=0.001). It is concluded that DNA ploidy in oral dysplasia has a practical prognostic value, unlike histological grading of the same lesions. PMID:11439362

  17. Chætognath transcriptome reveals ancestral and unique features among bilaterians

    PubMed Central

    Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick

    2008-01-01

    Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022

  18. A genome-wide linkage analysis for reproductive traits in F2 Large White × Meishan cross gilts

    PubMed Central

    Hernandez, S C; Finlayson, H A; Ashworth, C J; Haley, C S; Archibald, A L

    2014-01-01

    Female reproductive performance traits in pigs have low heritabilities thus limiting improvement through traditional selective breeding programmes. However, there is substantial genetic variation found between pig breeds with the Chinese Meishan being one of the most prolific pig breeds known. In this study, three cohorts of Large White × Meishan F2 cross-bred pigs were analysed to identify quantitative trait loci (QTL) with effects on reproductive traits, including ovulation rate, teat number, litter size, total born alive and prenatal survival. A total of 307 individuals were genotyped for 174 genetic markers across the genome. The genome-wide analysis of the trait-recorded F2 gilts in their first parity/litter revealed one QTL for teat number significant at the genome level and a total of 12 QTL, which are significant at the chromosome-wide level, for: litter size (three QTL), total born alive (two QTL), ovulation rate (four QTL), prenatal survival (one QTL) and teat number (two QTL). Further support for eight of these QTL is provided by results from other studies. Four of these 12 QTL were mapped for the first time in this study: on SSC15 for ovulation rate and on SSC18 for teat number, ovulation rate and litter size. PMID:24456574

  19. Ancestral state reconstruction, rate heterogeneity, and the evolution of reptile viviparity.

    PubMed

    King, Benedict; Lee, Michael S Y

    2015-05-01

    Virtually all models for reconstructing ancestral states for discrete characters make the crucial assumption that the trait of interest evolves at a uniform rate across the entire tree. However, this assumption is unlikely to hold in many situations, particularly as ancestral state reconstructions are being performed on increasingly large phylogenies. Here, we show how failure to account for such variable evolutionary rates can cause highly anomalous (and likely incorrect) results, while three methods that accommodate rate variability yield the opposite, more plausible, and more robust reconstructions. The random local clock method, implemented in BEAST, estimates the position and magnitude of rate changes on the tree; split BiSSE estimates separate rate parameters for pre-specified clades; and the hidden rates model partitions each character state into a number of rate categories. Simulations show the inadequacy of traditional models when characters evolve with both asymmetry (different rates of change between states within a character) and heterotachy (different rates of character evolution across different clades). The importance of accounting for rate heterogeneity in ancestral state reconstruction is highlighted empirically with a new analysis of the evolution of viviparity in squamate reptiles, which reveal a predominance of forward (oviparous-viviparous) transitions and very few reversals.

  20. Sequence variants from whole genome sequencing a large group of Icelanders.

    PubMed

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports. PMID:25977816

  1. Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care.

    PubMed

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K; Chung, Yeun-Jun

    2013-12-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population.

  2. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    PubMed Central

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K.

    2013-01-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population. PMID:24465229

  3. Genome-wide association study of porcine hematological parameters in a Large White × Minzhu F2 resource population.

    PubMed

    Luo, Weizhen; Chen, Shaokang; Cheng, Duxue; Wang, Ligang; Li, Yong; Ma, Xiaojun; Song, Xin; Liu, Xin; Li, Wen; Liang, Jing; Yan, Hua; Zhao, Kebin; Wang, Chuduan; Wang, Lixian; Zhang, Longchao

    2012-01-01

    Hematological traits, which are important indicators of immune function in animals, have been commonly examined as biomarkers of disease and disease severity in humans and animals. Genome-wide significant quantitative trait loci (QTLs) provide important information for use in breeding programs of animals such as pigs. QTLs for hematological parameters (hematological traits) have been detected in pig chromosomes, although these are often mapped by linkage analysis to large intervals making identification of the underlying mutation problematic. Single nucleotide polymorphisms (SNPs) are the common form of genetic variation among individuals and are thought to account for the majority of inherited traits. In this study, a genome-wide association study (GWAS) was performed to detect regions of association with hematological traits in a three-generation resource population produced by intercrossing Large White boars and Minzhu sows during the period from 2007 to 2011. Illumina PorcineSNP60 BeadChip technology was used to genotype each animal and seven hematological parameters were measured (hematocrit (HCT), hemoglobin (HGB), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV), red blood cell count (RBC) and red blood cell volume distribution width (RDW)). Data were analyzed in a three step Genome-wide Rapid Association using the Mixed Model and Regression-Genomic Control (GRAMMAR-GC) method. A total of 62 genome-wide significant and three chromosome-wide significant SNPs associated with hematological parameters were detected in this GWAS. Seven and five SNPs were associated with HCT and HGB, respectively. These SNPs were all located within the region of 34.6-36.5 Mb on SSC7. Four SNPs within the region of 43.7-47.0 Mb and fifty-five SNPs within the region of 42.2-73.8 Mb on SSC8 showed significant association with MCH and MCV, respectively. At chromosome-wide significant level, one SNP at 29.2 Mb on SSC1

  4. i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets

    PubMed Central

    Proost, Sebastian; Fostier, Jan; De Witte, Dieter; Dhoedt, Bart; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas

    2012-01-01

    Comparative genomics is a powerful means to gain insight into the evolutionary processes that shape the genomes of related species. As the number of sequenced genomes increases, the development of software to perform accurate cross-species analyses becomes indispensable. However, many implementations that have the ability to compare multiple genomes exhibit unfavorable computational and memory requirements, limiting the number of genomes that can be analyzed in one run. Here, we present a software package to unveil genomic homology based on the identification of conservation of gene content and gene order (collinearity), i-ADHoRe 3.0, and its application to eukaryotic genomes. The use of efficient algorithms and support for parallel computing enable the analysis of large-scale data sets. Unlike other tools, i-ADHoRe can process the Ensembl data set, containing 49 species, in 1 h. Furthermore, the profile search is more sensitive to detect degenerate genomic homology than chaining pairwise collinearity information based on transitive homology. From ultra-conserved collinear regions between mammals and birds, by integrating coexpression information and protein–protein interactions, we identified more than 400 regions in the human genome showing significant functional coherence. The different algorithmical improvements ensure that i-ADHoRe 3.0 will remain a powerful tool to study genome evolution. PMID:22102584

  5. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates

    PubMed Central

    Peng, Jiguang; He, Bin Z.; Li, Yumei; Liu, Chu-Jun; Luan, Xuke; Ding, Wanqiu; Li, Shuxian; Chen, Chunyan; Tan, Bertrand Chin-Ming; Zhang, Yong E.; He, Aibin; Li, Chuan-Yun

    2015-01-01

    While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. PMID:26177073

  6. A draft physical map of a D-genome cotton species (Gossypium raimondii)

    PubMed Central

    2010-01-01

    Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427

  7. Distinct Origin of the Y and St Genome in Elymus Species: Evidence from the Analysis of a Large Sample of St Genome Species Using Two Nuclear Genes

    PubMed Central

    Yan, Chi; Sun, Genlou; Sun, Dongfa

    2011-01-01

    Background Previous cytological and single copy nuclear genes data suggested the St and Y genome in the StY-genomic Elymus species originated from different donors: the St from a diploid species in Pseudoroegneria and the Y from an unknown diploid species, which are now extinct or undiscovered. However, ITS data suggested that the Y and St genome shared the same progenitor although rather few St genome species were studied. In a recent analysis of many samples of St genome species Pseudoroegneria spicata (Pursh) À. Löve suggested that one accession of P. spicata species was the most likely donor of the Y genome. The present study tested whether intraspecific variation during sampling could affect the outcome of analyses to determining the origin of Y genome in allotetraploid StY species. We also explored the evolutionary dynamics of these species. Methodology/Principal Findings Two single copy nuclear genes, the second largest subunit of RNA polymerase II (RPB2) and the translation elongation factor G (EF-G) sequences from 58 accessions of Pseudoroegneria and Elymus species, together with those from Hordeum (H), Agropyron (P), Australopyrum (W), Lophopyrum (Ee), Thinopyrum (Ea), Thinopyrum (Eb), and Dasypyrum (V) were analyzed using maximum parsimony, maximum likelihood and Bayesian methods. Sequence comparisons among all these genomes revealed that the St and Y genomes are relatively dissimilar. Extensive sequence variations have been detected not only between the sequences from St and Y genome, but also among the sequences from diploid St genome species. Phylogenetic analyses separated the Y sequences from the St sequences. Conclusions/Significance Our results confirmed that St and Y genome in Elymus species have originated from different donors, and demonstrated that intraspecific variation does not affect the identification of genome origin in polyploids. Moreover, sequence data showed evidence to support the suggestion of the genome convergent evolution in

  8. Genomic Characterization of a Large Outbreak of Legionella pneumophila Serogroup 1 Strains in Quebec City, 2012

    PubMed Central

    Mendis, Nilmini; Cantin, Philippe; Marchand, Geneviève; Charest, Hugues; Raymond, Frédéric; Huot, Caroline; Goupil-Sormany, Isabelle; Desbiens, François; Faucher, Sébastien P.; Corbeil, Jacques; Tremblay, Cécile

    2014-01-01

    During the summer of 2012, a major Legionella pneumophila serogroup 1 outbreak occurred in Quebec City, Canada, which caused 182 declared cases of Legionnaire's disease and included 13 fatalities. Legionella pneumophila serogroup 1 isolates from 23 patients as well as from 32 cooling towers located in the vicinity of the outbreak were recovered for analysis. In addition, 6 isolates from the 1996 Quebec City outbreak and 4 isolates from patients unrelated to both outbreaks were added to allow comparison. We characterized the isolates using pulsed-field gel electrophoresis, sequence-based typing, and whole genome sequencing. The comparison of patients-isolated strains to cooling tower isolates allowed the identification of the tower that was the source of the outbreak. Legionella pneumophila strain Quebec 2012 was identified as a ST-62 by sequence-based typing methodology. Two new Legionellaceae plasmids were found only in the epidemic strain. The LVH type IV secretion system was found in the 2012 outbreak isolates but not in the ones from the 1996 outbreak and only in half of the contemporary human isolates. The epidemic strains replicated more efficiently and were more cytotoxic to human macrophages than the environmental strains tested. At least four Icm/Dot effectors in the epidemic strains were absent in the environmental strains suggesting that some effectors could impact the intracellular replication in human macrophages. Sequence-based typing and pulsed-field gel electrophoresis combined with whole genome sequencing allowed the identification and the analysis of the causative strain including its likely environmental source. PMID:25105285

  9. Genomic characterization of a large outbreak of Legionella pneumophila serogroup 1 strains in Quebec City, 2012.

    PubMed

    Lévesque, Simon; Plante, Pier-Luc; Mendis, Nilmini; Cantin, Philippe; Marchand, Geneviève; Charest, Hugues; Raymond, Frédéric; Huot, Caroline; Goupil-Sormany, Isabelle; Desbiens, François; Faucher, Sébastien P; Corbeil, Jacques; Tremblay, Cécile

    2014-01-01

    During the summer of 2012, a major Legionella pneumophila serogroup 1 outbreak occurred in Quebec City, Canada, which caused 182 declared cases of Legionnaire's disease and included 13 fatalities. Legionella pneumophila serogroup 1 isolates from 23 patients as well as from 32 cooling towers located in the vicinity of the outbreak were recovered for analysis. In addition, 6 isolates from the 1996 Quebec City outbreak and 4 isolates from patients unrelated to both outbreaks were added to allow comparison. We characterized the isolates using pulsed-field gel electrophoresis, sequence-based typing, and whole genome sequencing. The comparison of patients-isolated strains to cooling tower isolates allowed the identification of the tower that was the source of the outbreak. Legionella pneumophila strain Quebec 2012 was identified as a ST-62 by sequence-based typing methodology. Two new Legionellaceae plasmids were found only in the epidemic strain. The LVH type IV secretion system was found in the 2012 outbreak isolates but not in the ones from the 1996 outbreak and only in half of the contemporary human isolates. The epidemic strains replicated more efficiently and were more cytotoxic to human macrophages than the environmental strains tested. At least four Icm/Dot effectors in the epidemic strains were absent in the environmental strains suggesting that some effectors could impact the intracellular replication in human macrophages. Sequence-based typing and pulsed-field gel electrophoresis combined with whole genome sequencing allowed the identification and the analysis of the causative strain including its likely environmental source. PMID:25105285

  10. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI

    PubMed Central

    Vimaleswaran, Karani S.; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N.; Dudbridge, Frank; Loos, Ruth J.F.

    2012-01-01

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10−7. Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits. PMID:22791748

  11. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI.

    PubMed

    Vimaleswaran, Karani S; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N; Dudbridge, Frank; Loos, Ruth J F

    2012-10-15

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10(-7). Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits.

  12. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. PMID:26756535

  13. Comparison of eleven methods for genomic DNA extraction suitable for large-scale whole-genome genotyping and long-term DNA banking using blood samples.

    PubMed

    Psifidi, Androniki; Dovas, Chrysostomos I; Bramis, Georgios; Lazou, Thomai; Russel, Claire L; Arsenos, Georgios; Banos, Georgios

    2015-01-01

    Over the recent years, next generation sequencing and microarray technologies have revolutionized scientific research with their applications to high-throughput analysis of biological systems. Isolation of high quantities of pure, intact, double stranded, highly concentrated, not contaminated genomic DNA is prerequisite for successful and reliable large scale genotyping analysis. High quantities of pure DNA are also required for the creation of DNA-banks. In the present study, eleven different DNA extraction procedures, including phenol-chloroform, silica and magnetic beads based extractions, were examined to ascertain their relative effectiveness for extracting DNA from ovine blood samples. The quality and quantity of the differentially extracted DNA was subsequently assessed by spectrophotometric measurements, Qubit measurements, real-time PCR amplifications and gel electrophoresis. Processing time, intensity of labor and cost for each method were also evaluated. Results revealed significant differences among the eleven procedures and only four of the methods yielded satisfactory outputs. These four methods, comprising three modified silica based commercial kits (Modified Blood, Modified Tissue, Modified Dx kits) and an in-house developed magnetic beads based protocol, were most appropriate for extracting high quality and quantity DNA suitable for large-scale microarray genotyping and also for long-term DNA storage as demonstrated by their successful application to 600 individuals.

  14. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  15. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.

    PubMed

    Olalde, Iñigo; Allentoft, Morten E; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W K; DeGiorgio, Michael; Prado-Martinez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G; Novembre, John; Sturm, Richard A; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-03-13

    Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer.

  16. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.

    PubMed

    Olalde, Iñigo; Allentoft, Morten E; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W K; DeGiorgio, Michael; Prado-Martinez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G; Novembre, John; Sturm, Richard A; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-03-13

    Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. PMID:24463515

  17. Are survival processing memory advantages based on ancestral priorities?

    PubMed

    Soderstrom, Nicholas C; McCabe, David P

    2011-06-01

    Recent research has suggested that our memory systems are especially tuned to process information according to its survival relevance, and that inducing problems of "ancestral priorities" faced by our ancestors should lead to optimal recall performance (Nairne & Pandeirada, Cognitive Psychology, 2010). The present study investigated the specificity of this idea by comparing an ancestor-consistent scenario and a modern survival scenario that involved threats that were encountered by human ancestors (e.g., predators) or threats from fictitious creatures (i.e., zombies). Participants read one of four survival scenarios in which the environment and the explicit threat were either consistent or inconsistent with ancestrally based problems (i.e., grasslands-predators, grasslands-zombies, city-attackers, city-zombies), or they rated words for pleasantness. After rating words based on their survival relevance (or pleasantness), the participants performed a free recall task. All survival scenarios led to better recall than did pleasantness ratings, but recall was greater when zombies were the threat, as compared to predators or attackers. Recall did not differ for the modern (i.e., city) and ancestral (i.e., grasslands) scenarios. These recall differences persisted when valence and arousal ratings for the scenarios were statistically controlled as well. These data challenge the specificity of ancestral priorities in survival-processing advantages in memory.

  18. Musculature in sipunculan worms: ontogeny and ancestral states.

    PubMed

    Schulze, Anja; Rice, Mary E

    2009-01-01

    Molecular phylogenetics suggests that the Sipuncula fall into the Annelida, although they are morphologically very distinct and lack segmentation. To understand the evolutionary transformations from the annelid to the sipunculan body plan, it is important to reconstruct the ancestral states within the respective clades at all life history stages. Here we reconstruct the ancestral states for the head/introvert retractor muscles and the body wall musculature in the Sipuncula using Bayesian statistics. In addition, we describe the ontogenetic transformations of the two muscle systems in four sipunculan species with different developmental modes, using F-actin staining with fluorescent-labeled phalloidin in conjunction with confocal laser scanning microscopy. All four species, which have smooth body wall musculature and less than the full set of four introvert retractor muscles as adults, go through developmental stages with four retractor muscles that are eventually reduced to a lower number in the adult. The circular and sometimes the longitudinal body wall musculature are split into bands that later transform into a smooth sheath. Our ancestral state reconstructions suggest with nearly 100% probability that the ancestral sipunculan had four introvert retractor muscles, longitudinal body wall musculature in bands and circular body wall musculature arranged as a smooth sheath. Species with crawling larvae have more strongly developed body wall musculature than those with swimming larvae. To interpret our findings in the context of annelid evolution, a more solid phylogenetic framework is needed for the entire group and more data on ontogenetic transformations of annelid musculature are desirable. PMID:19196337

  19. Advanced Intestinal Cancers often Maintain a Multi-Ancestral Architecture

    PubMed Central

    Zahm, Christopher D.; Szulczewski, Joseph M.; Leystra, Alyssa A.; Paul Olson, Terrah J.; Clipson, Linda; Albrecht, Dawn M.; Middlebrooks, Malisa; Thliveris, Andrew T.; Matkowskyj, Kristina A.; Washington, Mary Kay; Newton, Michael A.; Eliceiri, Kevin W.; Halberg, Richard B.

    2016-01-01

    A widely accepted paradigm in the field of cancer biology is that solid tumors are uni-ancestral being derived from a single founder and its descendants. However, data have been steadily accruing that indicate early tumors in mice and humans can have a multi-ancestral origin in which an initiated primogenitor facilitates the transformation of neighboring co-genitors. We developed a new mouse model that permits the determination of clonal architecture of intestinal tumors in vivo and ex vivo, have validated this model, and then used it to assess the clonal architecture of adenomas, intramucosal carcinomas, and invasive adenocarcinomas of the intestine. The percentage of multi-ancestral tumors did not significantly change as tumors progressed from adenomas with low-grade dysplasia [40/65 (62%)], to adenomas with high-grade dysplasia [21/37 (57%)], to intramucosal carcinomas [10/23 (43%]), to invasive adenocarcinomas [13/19 (68%)], indicating that the clone arising from the primogenitor continues to coexist with clones arising from co-genitors. Moreover, neoplastic cells from distinct clones within a multi-ancestral adenocarcinoma have even been observed to simultaneously invade into the underlying musculature [2/15 (13%)]. Thus, intratumoral heterogeneity arising early in tumor formation persists throughout tumorigenesis. PMID:26919712

  20. Are survival processing memory advantages based on ancestral priorities?

    PubMed

    Soderstrom, Nicholas C; McCabe, David P

    2011-06-01

    Recent research has suggested that our memory systems are especially tuned to process information according to its survival relevance, and that inducing problems of "ancestral priorities" faced by our ancestors should lead to optimal recall performance (Nairne & Pandeirada, Cognitive Psychology, 2010). The present study investigated the specificity of this idea by comparing an ancestor-consistent scenario and a modern survival scenario that involved threats that were encountered by human ancestors (e.g., predators) or threats from fictitious creatures (i.e., zombies). Participants read one of four survival scenarios in which the environment and the explicit threat were either consistent or inconsistent with ancestrally based problems (i.e., grasslands-predators, grasslands-zombies, city-attackers, city-zombies), or they rated words for pleasantness. After rating words based on their survival relevance (or pleasantness), the participants performed a free recall task. All survival scenarios led to better recall than did pleasantness ratings, but recall was greater when zombies were the threat, as compared to predators or attackers. Recall did not differ for the modern (i.e., city) and ancestral (i.e., grasslands) scenarios. These recall differences persisted when valence and arousal ratings for the scenarios were statistically controlled as well. These data challenge the specificity of ancestral priorities in survival-processing advantages in memory. PMID:21327372

  1. Reaching Children through Their Ancestral Language and Authentic Literature

    ERIC Educational Resources Information Center

    Bannon, Kay Thorpe

    2004-01-01

    In this article, the author describes a program of Eastern Cherokee ancestral language restoration in Cherokee, North Carolina. One of the primary goals of the program is to enhance the self-concept of the children and motivate the students to experience academic excitement and success. The use of authentic legends and stories is one method…

  2. Isolation of ancestral sylvatic dengue virus type 1, Malaysia.

    PubMed

    Teoh, Boon-Teong; Sam, Sing-Sin; Abd-Jamil, Juraina; AbuBakar, Sazaly

    2010-11-01

    Ancestral sylvatic dengue virus type 1, which was isolated from a monkey in 1972, was isolated from a patient with dengue fever in Malaysia. The virus is neutralized by serum of patients with endemic DENV-1 infection. Rare isolation of this virus suggests a limited spillover infection from an otherwise restricted sylvatic cycle.

  3. The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes.

    PubMed

    Sahl, Jason W; Caporaso, J Gregory; Rasko, David A; Keim, Paul

    2014-01-01

    Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR. Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27-57 h, depending upon the alignment method, using 16 processors. Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical

  4. Creation of Functional Viruses from Non-Functional cDNA Clones Obtained from an RNA Virus Population by the Use of Ancestral Reconstruction.

    PubMed

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Dräger, Carolin; Orton, Richard J; Blome, Sandra; Höper, Dirk; Beer, Martin; Rasmussen, Thomas Bruun

    2015-01-01

    RNA viruses have the highest known mutation rates. Consequently it is likely that a high proportion of individual RNA virus genomes, isolated from an infected host, will contain lethal mutations and be non-functional. This is problematic if the aim is to clone and investigate high-fitness, functional cDNAs and may also pose problems for sequence-based analysis of viral evolution. To address these challenges we have performed a study of the evolution of classical swine fever virus (CSFV) using deep sequencing and analysis of 84 full-length cDNA clones, each representing individual genomes from a moderately virulent isolate. In addition to here being used as a model for RNA viruses generally, CSFV has high socioeconomic importance and remains a threat to animal welfare and pig production. We find that the majority of the investigated genomes are non-functional and only 12% produced infectious RNA transcripts. Full length sequencing of cDNA clones and deep sequencing of the parental population identified substitutions important for the observed phenotypes. The investigated cDNA clones were furthermore used as the basis for inferring the sequence of functional viruses. Since each unique clone must necessarily be the descendant of a functional ancestor, we hypothesized that it should be possible to produce functional clones by reconstructing ancestral sequences. To test this we used phylogenetic methods to infer two ancestral sequences, which were then reconstructed as cDNA clones. Viruses rescued from the reconstructed cDNAs were tested in cell culture and pigs. Both reconstructed ancestral genomes proved functional, and displayed distinct phenotypes in vitro and in vivo. We suggest that reconstruction of ancestral viruses is a useful tool for experimental and computational investigations of virulence and viral evolution. Importantly, ancestral reconstruction can be done even on the basis of a set of sequences that all correspond to non-functional variants. PMID

  5. Sexually dimorphic effects of ancestral exposure to vinclozolin on stress reactivity in rats.

    PubMed

    Gillette, Ross; Miller-Crews, Isaac; Nilsson, Eric E; Skinner, Michael K; Gore, Andrea C; Crews, David

    2014-10-01

    How an individual responds to the environment depends upon both personal life history as well as inherited genetic and epigenetic factors from ancestors. Using a 2-hit, 3 generations apart model, we tested how F3 descendants of rats given in utero exposure to the environmental endocrine-disrupting chemical (EDC) vinclozolin reacted to stress during adolescence in their own lives, focusing on sexually dimorphic phenotypic outcomes. In adulthood, male and female F3 vinclozolin- or vehicle-lineage rats, stressed or nonstressed, were behaviorally characterized on a battery of tests and then euthanized. Serum was used for hormone assays, and brains were used for quantitative PCR and transcriptome analyses. Results showed that the effects of ancestral exposure to vinclozolin converged with stress experienced during adolescence in a sexually dimorphic manner. Debilitating effects were seen at all levels of the phenotype, including physiology, behavior, brain metabolism, gene expression, and genome-wide transcriptome modifications in specific brain nuclei. Additionally, females were significantly more vulnerable than males to transgenerational effects of vinclozolin on anxiety but not sociality tests. This fundamental transformation occurs in a manner not predicted by the ancestral exposure or the proximate effects of stress during adolescence, an interaction we refer to as synchronicity. PMID:25051444

  6. Sexually Dimorphic Effects of Ancestral Exposure to Vinclozolin on Stress Reactivity in Rats

    PubMed Central

    Gillette, Ross; Miller-Crews, Isaac; Nilsson, Eric E.; Skinner, Michael K.; Gore, Andrea C.

    2014-01-01

    How an individual responds to the environment depends upon both personal life history as well as inherited genetic and epigenetic factors from ancestors. Using a 2-hit, 3 generations apart model, we tested how F3 descendants of rats given in utero exposure to the environmental endocrine-disrupting chemical (EDC) vinclozolin reacted to stress during adolescence in their own lives, focusing on sexually dimorphic phenotypic outcomes. In adulthood, male and female F3 vinclozolin- or vehicle-lineage rats, stressed or nonstressed, were behaviorally characterized on a battery of tests and then euthanized. Serum was used for hormone assays, and brains were used for quantitative PCR and transcriptome analyses. Results showed that the effects of ancestral exposure to vinclozolin converged with stress experienced during adolescence in a sexually dimorphic manner. Debilitating effects were seen at all levels of the phenotype, including physiology, behavior, brain metabolism, gene expression, and genome-wide transcriptome modifications in specific brain nuclei. Additionally, females were significantly more vulnerable than males to transgenerational effects of vinclozolin on anxiety but not sociality tests. This fundamental transformation occurs in a manner not predicted by the ancestral exposure or the proximate effects of stress during adolescence, an interaction we refer to as synchronicity. PMID:25051444

  7. Biogeographic Patterns in Genomic Diversity among a Large Collection of Vibrio cholerae Isolates▿ †

    PubMed Central

    Keymer, Daniel P.; Lam, Lilian H.; Boehm, Alexandria B.

    2009-01-01

    Vibrio cholerae strains are capable of inhabiting multiple niches in the aquatic environment and in some cases cause disease in humans. However, the ecology and biodiversity of these bacteria in environmental settings remains poorly understood. We used the genomic fingerprinting technique enterobacterial repetitive intergenic consensus sequence PCR (ERIC-PCR) to profile 835 environmental isolates from waters and sediments obtained at nine sites along the central California coast. We identified 115 ERIC-PCR genotypes from 998 fingerprints, with a reproducibility of 98.5% and a discriminatory power of 0.971. When the temporal dynamics at a subset of sampling sites were explored, several genotypes provided evidence for cosmopolitan or geographically restricted distributions, and other genotypes displayed nonrandom patterns of cooccurrence. Partial Mantel tests confirmed that genotypic similarity of isolates across all sampling events was correlated with environmental similarity (0.04 ≤ r ≤ 0.05), temporal proximity (r = 0.09), and geographic distance (r = 0.09). A neutral community model for all sampling events explained 61% of the variation in genotype abundance. Cooccurrence indices (C-score, C-board, and Combo) were significantly different than expected by chance, suggesting that the V. cholerae population may have a competitive structure, especially at the regional scale. Even though stochastic processes are undoubtedly important in generating biogeographic patterns in diversity, deterministic factors appear to play a significant, albeit small, role in shaping the V. cholerae population structure in this system. PMID:19139224

  8. Even modest prediction accuracy of genomic models can have large clinical utility.

    PubMed

    Dhurandhar, Emily J; Vazquez, Ana I; Argyropoulos, George A; Allison, David B

    2014-01-01

    Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R (2) = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality.

  9. Even modest prediction accuracy of genomic models can have large clinical utility

    PubMed Central

    Dhurandhar, Emily J.; Vazquez, Ana I.; Argyropoulos, George A.; Allison, David B.

    2014-01-01

    Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R2 = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality. PMID:25506355

  10. Ancestral relationships of the major eukaryotic lineages.

    PubMed

    Sogin, M L; Morrison, H G; Hinkle, G; Silberman, J D

    1996-03-01

    Molecular systematics has revolutionized our understanding of microbial evolution. Phylogenetic frameworks relating all organisms in this biosphere can be inferred from comparisons of slowly evolving molecules such as the small and large subunit ribosomal RNAs. Unlike today's text book standard, the "Five Kingdoms" (plants, animals, fungi, protists and bacteria), molecular studies define three primary lines of descent (Eukaryotes, Eubacteria, and Archaebacteria). Within the Eukaryotes, the "higher" kingdoms (Fungi, Plantae, and Animalia) are joined by at least two novel complex evolutionary assemblages, the "Alveolates" (ciliates, dinoflagellates and apicomplexans) and the "Stramenopiles" (diatoms, oomycetes, labyrinthulids, brown algae and chrysophytes). The separation of these eukaryotic groups (described as the eukaryotic "crown") occurred approximately 10(9) years ago and was preceded by a succession of earlier diverging protist lineages, some as ancient as the separation of the prokaryotic domains. The molecular phylogenies suggest that multiple endosymbiotic events introduced plastids into discrete eukaryotic lineages.

  11. High proportion of large genomic deletions and a genotype–phenotype update in 80 unrelated families with juvenile polyposis syndrome

    PubMed Central

    Aretz, S; Stienen, D; Uhlhaas, S; Stolte, M; Entius, M M; Loff, S; Back, W; Kaufmann, A; Keller, K‐M; Blaas, S H; Siebert, R; Vogt, S; Spranger, S; Holinski‐Feder, E; Sunde, L; Propping, P; Friedl, W

    2007-01-01

    Background In patients with juvenile polyposis syndrome (JPS) the frequency of large genomic deletions in the SMAD4 and BMPR1A genes was unknown. Methods Mutation and phenotype analysis was used in 80 unrelated patients of whom 65 met the clinical criteria for JPS (typical JPS) and 15 were suspected to have JPS. Results By direct sequencing of the two genes, point mutations were identified in 30 patients (46% of typical JPS). Using MLPA, large genomic deletions were found in 14% of all patients with typical JPS (six deletions in SMAD4 and three deletions in BMPR1A). Mutation analysis of the PTEN gene in the remaining 41 mutation negative cases uncovered a point mutation in two patients (5%). SMAD4 mutation carriers had a significantly higher frequency of gastric polyposis (73%) than did patients with BMPR1A mutations (8%) (p<0.001); all seven cases of gastric cancer occurred in families with SMAD4 mutations. SMAD4 mutation carriers with gastric polyps were significantly older at gastroscopy than those without (p<0.001). In 22% of the 23 unrelated SMAD4 mutation carriers, hereditary hemorrhagic telangiectasia (HHT) was also diagnosed clinically. The documented histologic findings encompassed a wide distribution of different polyp types, comparable with that described in hereditary mixed polyposis syndromes (HMPS). Conclusions Screening for large deletions raised the mutation detection rate to 60% in the 65 patients with typical JPS. A strong genotype‐phenotype correlation for gastric polyposis, gastric cancer, and HHT was identified, which should have implications for counselling and surveillance. Histopathological results in hamartomatous polyposis syndromes must be critically interpreted. PMID:17873119

  12. Analysis of FOXO1 mutations in diffuse large B-cell lymphoma | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) accounts for 30% to 40% of newly diagnosed lymphomas and has an overall cure rate of approximately 60%. Previously, we observed FOXO1 mutations in non-Hodgkin lymphoma patient samples. To explore the effects of FOXO1 mutations, we assessed FOXO1 status in 279 DLBCL patient samples and 22 DLBCL-derived cell lines.

  13. Genome-Wide Association Studies Identify the Loci for 5 Exterior Traits in a Large White × Minzhu Pig Population

    PubMed Central

    Yan, Hua; Liu, Xin; Li, Na; Liang, Jing; Pu, Lei; Zhang, Yuebo; Shi, Huibi; Zhao, Kebin; Wang, Lixian

    2014-01-01

    As one of the main breeding selection criteria, external appearance has special economic importance in the hog industry. In this study, an Illumina Porcine SNP60 BeadChip was used to conduct a genome-wide association study (GWAS) in 605 pigs of the F2 generation derived from a Large White × Minzhu intercross. Traits under study were abdominal circumference (AC), body height (BH), body length (BL), cannon bone circumference (CBC), chest depth (CD), chest width (CW), rump circumference (RC), rump width (RW), scapula width (SW), and waist width (WW). A total of 138 SNPs (the most significant being MARC0033464) on chromosome 7 were found to be associated with BH, BL, CBC, and RC (P-value  = 4.15E-6). One SNP on chromosome 1 was found to be associated with CD at genome-wide significance levels. The percentage phenotypic variance of these significant SNPs ranged from 0.1–25.48%. Moreover, a conditional analysis revealed that the significant SNPs were derived from a single quantitative trait locus (QTL) and indicated additional chromosome-wide significant association for 25 SNPs on SSC4 (BL, CBC) and 9 SNPs on SSC7 (RC). Linkage analysis revealed two complete linkage disequilibrium haplotype blocks that contained seven and four SNPs, respectively. In block 1, the most significant SNP, MARC0033464, was present. Annotations from pig reference genome suggested six genes (GRM4, HMGA1, NUDT3, RPS10, SPDEF and PACSIN1) in block 1 (495 kb), and one gene (SCUBE3) in block 3 (124 kb). Functional analysis indicated that HMGA1 and SCUBE3 genes are the potential genes controlling BH, BL, and RC in pigs, with an application in breeding programs. We screened several candidate intervals and genes based on SNP location and gene function, and predicted their function using bioinformatics analyses. PMID:25090094

  14. Hawaiian Drosophila genomes: size variation and evolutionary expansions.

    PubMed

    Craddock, Elysse M; Gall, Joseph G; Jonas, Mark

    2016-02-01

    This paper reports genome sizes of one Hawaiian Scaptomyza and 16 endemic Hawaiian Drosophila species that include five members of the antopocerus species group, one member of the modified mouthpart group, and ten members of the picture wing clade. Genome size expansions have occurred independently multiple times among Hawaiian Drosophila lineages, and have resulted in an over 2.3-fold range of genome sizes among species, with the largest observed in Drosophila cyrtoloma (1C = 0.41 pg). We find evidence that these repeated genome size expansions were likely driven by the addition of significant amounts of heterochromatin and satellite DNA. For example, our data reveal that the addition of seven heterochromatic chromosome arms to the ancestral haploid karyotype, and a remarkable proportion of ~70 % satellite DNA, account for the greatly expanded size of the D. cyrtoloma genome. Moreover, the genomes of 13/17 Hawaiian picture wing species are composed of substantial proportions (22-70 %) of detectable satellites (all but one of which are AT-rich). Our results suggest that in this tightly knit group of recently evolved species, genomes have expanded, in large part, via evolutionary amplifications of satellite DNA sequences in centric and pericentric domains (especially of the X and dot chromosomes), which have resulted in longer acrocentric chromosomes or metacentrics with an added heterochromatic chromosome arm. We discuss possible evolutionary mechanisms that may have shaped these patterns, including rapid fixation of novel expanded genomes during founder-effect speciation.

  15. Excavating the Genome: Large Scale Mutagenesis Screening for the Discovery of New Mouse Models

    PubMed Central

    Sundberg, John P.; Dadras, Soheil S.; Silva, Kathleen A.; Kennedy, Victoria E.; Murray, Stephen A.; Denegre, James; Schofield, Paul N.; King, Lloyd E.; Wiles, Michael; Pratt, C. Herbert

    2016-01-01

    Technology now exists for rapid screening of mutated laboratory mice to identify phenotypes associated with specific genetic mutations. Large repositories exist for spontaneous mutants and those induced by chemical mutagenesis, many of which have never been studied or comprehensively evaluated. To supplement these resources, a variety of techniques have been consolidated in an international effort to create mutations in all known protein coding genes in the mouse. With targeted embryonic stem cell lines now available for almost all protein coding genes and more recently CRISPR/Cas9 technology, large-scale efforts are underway to create novel mutant mouse strains and to characterize their phenotypes. However, accurate diagnosis of skin, hair, and nail diseases still relies on careful gross and histological analysis. While not automated to the level of the physiological phenotyping, histopathology provides the most direct and accurate diagnosis and correlation with human diseases. As a result of these efforts, many new mouse dermatological disease models are being developed. PMID:26551941

  16. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare.

    PubMed

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome.

  17. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

    PubMed Central

    Ye, Chengxi; Hill, Christopher M.; Wu, Shigang; Ruan, Jue; Ma, Zhanshan (Sam)

    2016-01-01

    The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. PMID:27573208

  18. Genomic organization and reproductive regulation of a large lipid transfer protein in the varroa mite, Varroa destructor (Anderson & Trueman).

    PubMed

    Cabrera, A R; Shirk, P D; Duehl, A J; Donohue, K V; Grozinger, C M; Evans, J D; Teal, P E A

    2013-10-01

    The complete genomic region and corresponding transcript of the most abundant protein in phoretic varroa mites, Varroa destructor (Anderson & Trueman), were sequenced and have homology with acarine hemelipoglycoproteins and the large lipid transfer protein (LLTP) super family. The genomic sequence of VdLLTP included 14 introns and the mature transcript coded for a predicted polypeptide of 1575 amino acid residues. VdLLTP shared a minimum of 25% sequence identity with acarine LLTPs. Phylogenetic assessment showed VdLLTP was most closely related to Metaseiulus occidentalis vitellogenin and LLTP proteins of ticks; however, no heme binding by VdLLTP was detected. Analysis of lipids associated with VdLLTP showed that it was a carrier for free and esterified C12 -C22 fatty acids from triglycerides, diacylglycerides and monoacylglycerides. Additionally, cholesterol and β-sitosterol were found as cholesterol esters linked to common fatty acids. Transcript levels of VdLLTP were 42 and 310 times higher in phoretic female mites when compared with males and quiescent deutonymphs, respectively. Coincident with initiation of the reproductive phase, VdLLTP transcript levels declined to a third of those in phoretic female mites. VdLLTP functions as an important lipid transporter and should provide a significant RNA interference target for assessing the control of varroa mites.

  19. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.

    PubMed

    Ye, Chengxi; Hill, Christopher M; Wu, Shigang; Ruan, Jue; Ma, Zhanshan Sam

    2016-01-01

    The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. PMID:27573208

  20. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare.

    PubMed

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome. PMID:26361137

  1. Genomic characterization of a large panel of patient-derived hepatocellular carcinoma xenograft tumor models for preclinical development.

    PubMed

    Gu, Qingyang; Zhang, Bin; Sun, Hongye; Xu, Qiang; Tan, Yexiong; Wang, Guan; Luo, Qin; Xu, Weiguo; Yang, Shuqun; Li, Jian; Fu, Jing; Chen, Lei; Yuan, Shengxian; Liang, Guibai; Ji, Qunsheng; Chen, Shu-Hui; Chan, Chi-Chung; Zhou, Weiping; Xu, Xiaowei; Wang, Hongyang; Fang, Douglas D

    2015-08-21

    Lack of clinically relevant tumor models dramatically hampers development of effective therapies for hepatocellular carcinoma (HCC). Establishment of patient-derived xenograft (PDX) models that faithfully recapitulate the genetic and phenotypic features of HCC becomes important. In this study, we first established a cohort of 65 stable PDX models of HCC from corresponding Chinese patients. Then we showed that the histology and gene expression patterns of PDX models were highly consistent between xenografts and case-matched original tumors. Genetic alterations, including mutations and DNA copy number alterations (CNAs), of the xenografts correlated well with the published data of HCC patient specimens. Furthermore, differential responses to sorafenib, the standard-of-care agent, in randomly chosen xenografts were unveiled. Finally, in the models expressing high levels of FGFR1 gene according to the genomic data, FGFR1 inhibitor lenvatinib showed greater efficacy than sorafenib. Taken together, our data indicate that PDX models resemble histopathological and genomic characteristics of clinical HCC tumors, as well as recapitulate the differential responses of HCC patients to the standard-of-care treatment. Overall, this large collection of PDX models becomes a clinically relevant platform for drug screening, biomarker discovery and translational research in preclinical setting.

  2. Genomic characterization of a large panel of patient-derived hepatocellular carcinoma xenograft tumor models for preclinical development

    PubMed Central

    Sun, Hongye; Xu, Qiang; Tan, Yexiong; Wang, Guan; Luo, Qin; Xu, Weiguo; Yang, Shuqun; Li, Jian; Fu, Jing; Chen, Lei; Yuan, Shengxian; Liang, Guibai; Ji, Qunsheng; Chen, Shu-Hui; Chan, Chi-Chung; Zhou, Weiping; Xu, Xiaowei; Wang, Hongyang; Fang, Douglas D.

    2015-01-01

    Lack of clinically relevant tumor models dramatically hampers development of effective therapies for hepatocellular carcinoma (HCC). Establishment of patient-derived xenograft (PDX) models that faithfully recapitulate the genetic and phenotypic features of HCC becomes important. In this study, we first established a cohort of 65 stable PDX models of HCC from corresponding Chinese patients. Then we showed that the histology and gene expression patterns of PDX models were highly consistent between xenografts and case-matched original tumors. Genetic alterations, including mutations and DNA copy number alterations (CNAs), of the xenografts correlated well with the published data of HCC patient specimens. Furthermore, differential responses to sorafenib, the standard-of-care agent, in randomly chosen xenografts were unveiled. Finally, in the models expressing high levels of FGFR1 gene according to the genomic data, FGFR1 inhibitor lenvatinib showed greater efficacy than sorafenib. Taken together, our data indicate that PDX models resemble histopathological and genomic characteristics of clinical HCC tumors, as well as recapitulate the differential responses of HCC patients to the standard-of-care treatment. Overall, this large collection of PDX models becomes a clinically relevant platform for drug screening, biomarker discovery and translational research in preclinical setting. PMID:26062443

  3. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare

    PubMed Central

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome. PMID:26361137

  4. Integrating large-scale functional genomics data to dissect metabolic networks for hydrogen production

    SciTech Connect

    Harwood, Caroline S

    2012-12-17

    The goal of this project is to identify gene networks that are critical for efficient biohydrogen production by leveraging variation in gene content and gene expression in independently isolated Rhodopseudomonas palustris strains. Coexpression methods were applied to large data sets that we have collected to define probabilistic causal gene networks. To our knowledge this a first systems level approach that takes advantage of strain-to strain variability to computationally define networks critical for a particular bacterial phenotypic trait.

  5. Large genomic fragment deletion and functional gene cassette knock-in via Cas9 protein mediated genome editing in one-cell rodent embryos.

    PubMed

    Wang, Liren; Shao, Yanjiao; Guan, Yuting; Li, Liang; Wu, Lijuan; Chen, Fangrui; Liu, Meizhen; Chen, Huaqing; Ma, Yanlin; Ma, Xueyun; Liu, Mingyao; Li, Dali

    2015-01-01

    The CRISPR-Cas RNA-guided system has versatile uses in many organisms and allows modification of multiple target sites simultaneously. Generating novel genetically modified mouse and rat models is one valuable application of this system. Through the injection of Cas9 protein instead of mRNA into embryos, we observed fewer off-target effects of Cas9 and increased point mutation knock-in efficiency. Large genomic DNA fragment (up to 95 kb) deletion mice were generated for in vivo study of lncRNAs and gene clusters. Site-specific insertion of a 2.7 kb CreERT2 cassette into the mouse Nfatc1 locus allowed labeling and tracing of hair follicle stem cells. In addition, we combined the Cre-Loxp system with a gene-trap strategy to insert a GFP reporter in the reverse orientation into the rat Lgr5 locus, which was later inverted by Cre-mediated recombination, yielding a conditional knockout/reporter strategy suitable for mosaic mutation analysis. PMID:26620761

  6. Origin of human chromosome 2: An ancestral telomere-telomere fusion

    SciTech Connect

    Ijdo, J.W.; Baldini, A.; Ward, D.C.; Reeders, S.T.; Wells, R.A. )

    1991-10-15

    The authors identified two allelic genomic cosmids from human chromosome 2, c8.1 and c29B, each containing two inverted arrays of the vertebrate telomeric repeat in a head-to-head arrangement, 5{prime}(TTAGGG){sub n}-(CCCTAA){sub m}3{prime}. Sequences flanking this telomeric repeat are characteristic of present-day human pretelomeres. BAL-31 nuclease experiments with yeast artificial chromosome clones of human telomeres and fluorescence in situ hybridization reveal that sequences flanking these inverted repeats hybridize both to band 2q13 and to different, but overlapping, subsets of human chromosome ends. They conclude that the locus cloned in cosmids c8.1 and c29B is the relic of an ancient telomere-telomere fusion and marks the point at which two ancestral ape chromosomes fused to give rise to human chromosome 2.

  7. Inter- and intra-genomic homology of the Brassica genomes: implications for their origin and evolution.

    PubMed

    Truco, M J; Hu, J; Sadowski, J; Quiros, C F

    1996-12-01

    In order to determine the homologous regions shared by the cultivated Brassica genomes, linkage maps of the diploid cultivated B. rapa (A genome, n = 10), B. nigra (B genome, n = 8) and B. oleracea (C genome, n = 9), were compared. We found intergenomic conserved regions but with extensitve reordering among the genomes. Eighteen linkage groups from all three species could be associated on the basis of homologous segments based on at least three common markers. Intragenomic homologous conservation was also observed for some of the chromosomes of the A, B and C genomes. A possible chromosome phylogenetic pathway based on an ancestral genome of at least five, and no more than seven chromosomes, was drawn from the chromosomal inter-relationships observed. These results demonstrate that extensive duplication and rearrangement have been involved in the formation of the Brassica genomes from a smaller ancestral genome.

  8. Nonhomologous recombination between the large unassigned region of the male and female mitochondrial genomes in the mussel, Mytilus trossulus.

    PubMed

    Rawson, Paul D

    2005-12-01

    Doubly uniparental inheritance of mtDNA (DUI) is commonly observed in several genera of bivalves. Under DUI, female offspring inherit mtDNA from their mothers, while male offspring inherit mtDNA from both parents but preferentially transmit the paternally inherited mtDNA to their sons. Several studies have shown that the female- and male-specific mtDNA lineages in blue mussels, Mytilus spp., vary by upward of 20% at the nucleotide level. In addition to high levels of nucleotide substitution, the present study observed substantial gender-based length polymorphism in the presumptive mitochondrial control region (=large unassigned region; LUR) of North American M. trossulus. In this species, female lineage LUR haplotypes are over 2 kb larger than male lineage LUR haplotypes. Analysis of sequence data for these length variants indicates that the F LUR haplotypes of North American M. trossulus contain sequences similar to the F lineage control region in the congeners M. edulis and M. galloprovincialis. Relative to the F LUR in the latter two species, however, the F lineage LUR haplotypes in M. trossulus contain two large sequence insertions, each nearly 1 kb in size. One of these insertions has high sequence similarity to the male lineage LUR of M. trossulus. The tandem arrangement of F and M control region sequences in the F lineage LUR of M. trossulus is most likely the result of nonhomologous recombination between the male and the female mitochondrial genomes in M. trossulus, a finding that has important implications regarding the transmission and evolution of blue mussel mitochondrial genomes.

  9. Extensive Capsule Locus Variation and Large-Scale Genomic Recombination within the Klebsiella pneumoniae Clonal Group 258

    PubMed Central

    Wyres, Kelly L.; Gorrie, Claire; Edwards, David J.; Wertheim, Heiman F.L.; Hsu, Li Yang; Van Kinh, Nguyen; Zadoks, Ruth; Baker, Stephen; Holt, Kathryn E.

    2015-01-01

    Klebsiella pneumoniae clonal group (CG) 258, comprising sequence types (STs) 258, 11, and closely related variants, is associated with dissemination of the K. pneumoniae carbapenemase (KPC). Hospital outbreaks of KPC CG258 infections have been observed globally and are very difficult to treat. As a consequence, there is renewed interest in alternative infection control measures such as vaccines and phage or depolymerase treatments targeting the K. pneumoniae polysaccharide capsule. To date, 78 immunologically distinct capsule variants have been described in K. pneumoniae. Previous investigations of ST258 and a small number of closely related strains suggested that capsular variation was limited within this clone; only two distinct ST258 capsule polysaccharide synthesis (cps) loci have been identified, both acquired through large-scale recombination events (>50 kb). In contrast to previous studies, we report a comparative genomic analysis of the broader K. pneumoniae CG258 (n = 39). We identified 11 different cps loci within CG258, indicating that capsular switching is actually common within the complex. We observed several insertion sequences (IS) within the cps loci, and show further intraclone diversification of two cps loci through IS activity. Our data also indicate that several large-scale recombination events have shaped the genomes of CG258, and that definition of the complex should be broadened to include ST395 (also reported to harbor KPC). As only the second report of extensive intraclonal cps variation among Gram-negative bacterial species, our findings alter our understanding of the evolution of these organisms and have key implications for the design of control measures targeting K. pneumoniae capsules. PMID:25861820

  10. Extensive Capsule Locus Variation and Large-Scale Genomic Recombination within the Klebsiella pneumoniae Clonal Group 258.

    PubMed

    Wyres, Kelly L; Gorrie, Claire; Edwards, David J; Wertheim, Heiman F L; Hsu, Li Yang; Van Kinh, Nguyen; Zadoks, Ruth; Baker, Stephen; Holt, Kathryn E

    2015-05-01

    Klebsiella pneumoniae clonal group (CG) 258, comprising sequence types (STs) 258, 11, and closely related variants, is associated with dissemination of the K. pneumoniae carbapenemase (KPC). Hospital outbreaks of KPC CG258 infections have been observed globally and are very difficult to treat. As a consequence, there is renewed interest in alternative infection control measures such as vaccines and phage or depolymerase treatments targeting the K. pneumoniae polysaccharide capsule. To date, 78 immunologically distinct capsule variants have been described in K. pneumoniae. Previous investigations of ST258 and a small number of closely related strains suggested that capsular variation was limited within this clone; only two distinct ST258 capsule polysaccharide synthesis (cps) loci have been identified, both acquired through large-scale recombination events (>50 kb). In contrast to previous studies, we report a comparative genomic analysis of the broader K. pneumoniae CG258 (n = 39). We identified 11 different cps loci within CG258, indicating that capsular switching is actually common within the complex. We observed several insertion sequences (IS) within the cps loci, and show further intraclone diversification of two cps loci through IS activity. Our data also indicate that several large-scale recombination events have shaped the genomes of CG258, and that definition of the complex should be broadened to include ST395 (also reported to harbor KPC). As only the second report of extensive intraclonal cps variation among Gram-negative bacterial species, our findings alter our understanding of the evolution of these organisms and have key implications for the design of control measures targeting K. pneumoniae capsules. PMID:25861820

  11. Genome-wide association study for the level of serum electrolytes in Italian Large White pigs.

    PubMed

    Bovo, S; Schiavo, G; Mazzoni, G; Dall'Olio, S; Galimberti, G; Calò, D G; Scotti, E; Bertolini, F; Buttazzoni, L; Samorè, A B; Fontanesi, L

    2016-10-01

    Calcium, magnesium and phosphorus are essential electrolytes involved in a large number of biological processes. Imbalance of these minerals in blood may indicate clinically relevant conditions and are important in inferring acute or chronic pathologies in humans and animals. In this work, we carried out a genome-wide association study (GWAS) for the level of these three electrolytes in the serum of 843 performance-tested Italian Large White pigs. All pigs were genotyped with the Illumina PorcineSNP60 BeadChip, and GWAS was carried out using genome-wide efficient mixed-model association. For the level of Ca(2+) , eight single nucleotide polymorphisms (SNPs) were significant, considering a false discovery rate (FDR) < 0.05, and another eight were above the moderate association threshold (Pnominal value  < 5.00E-05). These SNPs are distributed in four porcine chromosomes (SSC): SSC8, SSC11, SSC12 and SSC13. In particular, a few putative different signals of association detected on SSC13 and one on SSC12 were in genes or close to genes involved in calcium metabolism (P2RY1, RAP2B, SLC9A9, C3orf58, TSC22D2, PLCH1 and CACNB1). Only one SNP (on SSC7) and six SNPs (on SSC2 and SSC7) showed moderate association with the level of magnesium and phosphorus respectively. The association signals for these two latter minerals might identify genes not known thus far for playing a role in their biological functions and regulations. In conclusion, our GWAS contributed to increased knowledge on the role that calcium, magnesium and phosphorus may play in the genetically determined physiological mechanisms affecting the natural variability of mineral levels in mammalian blood.

  12. Genome-wide association study for the level of serum electrolytes in Italian Large White pigs.

    PubMed

    Bovo, S; Schiavo, G; Mazzoni, G; Dall'Olio, S; Galimberti, G; Calò, D G; Scotti, E; Bertolini, F; Buttazzoni, L; Samorè, A B; Fontanesi, L

    2016-10-01

    Calcium, magnesium and phosphorus are essential electrolytes involved in a large number of biological processes. Imbalance of these minerals in blood may indicate clinically relevant conditions and are important in inferring acute or chronic pathologies in humans and animals. In this work, we carried out a genome-wide association study (GWAS) for the level of these three electrolytes in the serum of 843 performance-tested Italian Large White pigs. All pigs were genotyped with the Illumina PorcineSNP60 BeadChip, and GWAS was carried out using genome-wide efficient mixed-model association. For the level of Ca(2+) , eight single nucleotide polymorphisms (SNPs) were significant, considering a false discovery rate (FDR) < 0.05, and another eight were above the moderate association threshold (Pnominal value  < 5.00E-05). These SNPs are distributed in four porcine chromosomes (SSC): SSC8, SSC11, SSC12 and SSC13. In particular, a few putative different signals of association detected on SSC13 and one on SSC12 were in genes or close to genes involved in calcium metabolism (P2RY1, RAP2B, SLC9A9, C3orf58, TSC22D2, PLCH1 and CACNB1). Only one SNP (on SSC7) and six SNPs (on SSC2 and SSC7) showed moderate association with the level of magnesium and phosphorus respectively. The association signals for these two latter minerals might identify genes not known thus far for playing a role in their biological functions and regulations. In conclusion, our GWAS contributed to increased knowledge on the role that calcium, magnesium and phosphorus may play in the genetically determined physiological mechanisms affecting the natural variability of mineral levels in mammalian blood. PMID:27296164

  13. A large-scale genomic approach affords unprecedented resolution for the molecular epidemiology and evolutionary history of contagious caprine pleuropneumonia.

    PubMed

    Dupuy, Virginie; Verdier, Axel; Thiaucourt, François; Manso-Silván, Lucía

    2015-01-01

    Contagious caprine pleuropneumonia (CCPP), caused by Mycoplasma capricolum subsp. capripneumoniae (Mccp), is a devastating disease of domestic goats and of some wild ungulate species. The disease is currently spreading in Africa and Asia and poses a serious threat to disease-free areas. A comprehensive view of the evolutionary history and dynamics of Mccp is essential to understand the epidemiology of CCPP. Yet, analysing the diversity of genetically monomorphic pathogens, such as Mccp, is complicated due to their low variability. In this study, the molecular epidemiology and evolution of CCPP was investigated using a large-scale genomic approach based on next-generation sequencing technologies, applied to a sample of strains representing the global distribution of this disease. A highly discriminatory multigene typing system was developed, allowing the differentiation of 24 haplotypes among 25 Mccp strains distributed in six genotyping groups, which showed some correlation with geographic origin. A Bayesian approach was used to infer the first robust phylogeny of the species and to date the principal events of its evolutionary history. The emergence of Mccp was estimated only at about 270 years ago, which explains the low genetic diversity of this species despite its high mutation rate, evaluated at 1.3 × 10(-6) substitutions per site per year. Finally, plausible scenarios were proposed to elucidate the evolution and dynamics of CCPP in Asia and Africa, though limited by the paucity of Mccp strains, particularly in Asia. This study shows how combining large-scale genomic data with spatial and temporal data makes it possible to obtain a comprehensive view of the epidemiology of CCPP, a precondition for the development of improved disease surveillance and control measures. PMID:26149260

  14. A genome-wide scan for common genetic variants with a large influence on warfarin maintenance dose

    PubMed Central

    Cooper, Gregory M.; Johnson, Julie A.; Langaee, Taimour Y.; Feng, Hua; Stanaway, Ian B.; Schwarz, Ute I.; Ritchie, Marylyn D.; Stein, C. Michael; Roden, Dan M.; Smith, Joshua D.; Veenstra, David L.; Rettie, Allan E.

    2008-01-01

    Warfarin dosing is correlated with polymorphisms in vitamin K epoxide reductase complex 1 (VKORC1) and the cytochrome P450 2C9 (CYP2C9) genes. Recently, the FDA revised warfarin labeling to raise physician awareness about these genetic effects. Randomized clinical trials are underway to test genetically based dosing algorithms. It is thus important to determine whether common single nucleotide polymorphisms (SNPs) in other gene(s) have a large effect on warfarin dosing. A retrospective genome-wide association study was designed to identify polymorphisms that could explain a large fraction of the dose variance. White patients from an index warfarin population (n = 181) and 2 independent replication patient populations (n = 374) were studied. From the approximately 550 000 polymorphisms tested, the most significant independent effect was associated with VKORC1 polymorphisms (P = 6.2 × 10−13) in the index patients. CYP2C9 (rs1057910 CYP2C9*3) and rs4917639) was associated with dose at moderate significance levels (P ∼ 10−4). Replication polymorphisms (355 SNPs) from the index study did not show any significant effects in the replication patient sets. We conclude that common SNPs with large effects on warfarin dose are unlikely to be discovered outside of the CYP2C9 and VKORC1 genes. Randomized clinical trials that account for these 2 genes should therefore produce results that are definitive and broadly applicable. PMID:18535201

  15. Excavating the Genome: Large-Scale Mutagenesis Screening for the Discovery of New Mouse Models.

    PubMed

    Sundberg, John P; Dadras, Soheil S; Silva, Kathleen A; Kennedy, Victoria E; Murray, Stephen A; Denegre, James M; Schofield, Paul N; King, Lloyd E; Wiles, Michael V; Pratt, C Herbert

    2015-11-01

    Technology now exists for rapid screening of mutated laboratory mice to identify phenotypes associated with specific genetic mutations. Large repositories exist for spontaneous mutants and those induced by chemical mutagenesis, many of which have never been fully studied or comprehensively evaluated. To supplement these resources, a variety of techniques have been consolidated in an international effort to create mutations in all known protein coding genes in the mouse. With targeted embryonic stem cell lines now available for almost all protein coding genes and more recently CRISPR/Cas9 technology, large-scale efforts are underway to create further novel mutant mouse strains and to characterize their phenotypes. However, accurate diagnosis of skin, hair, and nail diseases still relies on careful gross and histological analysis, and while not automated to the level of the physiological phenotyping, histopathology still provides the most direct and accurate diagnosis and correlation with human diseases. As a result of these efforts, many new mouse dermatological disease models are being characterized and developed. PMID:26551941

  16. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

    PubMed

    Arenas, Miguel

    2015-04-01

    NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.

  17. Mitogenomics and phylogenomics reveal priapulid worms as extant models of the ancestral Ecdysozoan.

    PubMed

    Webster, Bonnie L; Copley, Richard R; Jenner, Ronald A; Mackenzie-Dodds, Jacqueline A; Bourlat, Sarah J; Rota-Stabelli, Omar; Littlewood, D T J; Telford, Maximilian J

    2006-01-01

    Research into arthropod evolution is hampered by the derived nature and rapid evolution of the best-studied out-group: the nematodes. We consider priapulids as an alternative out-group. Priapulids are a small phylum of bottom-dwelling marine worms; their tubular body with spiny proboscis or introvert has changed little over 520 million years and recognizable priapulids are common among exceptionally preserved Cambrian fossils. Using the complete mitochondrial genome and 42 nuclear genes from Priapulus caudatus, we show that priapulids are slowly evolving ecdysozoans; almost all these priapulid genes have evolved more slowly than nematode orthologs and the priapulid mitochondrial gene order may be unchanged since the Cambrian. Considering their primitive bodyplan and embryology and the great conservation of both nuclear and mitochondrial genomes, priapulids may deserve the popular epithet of "living fossil." Their study is likely to yield significant new insights into the early evolution of the Ecdysozoa and the origins of the arthropods and their kin as well as aiding inference of the morphology of ancestral Ecdysozoa and Bilateria and their genomes. PMID:17073934

  18. Reciprocal chromosome painting among human, aardvark, and elephant (superorder Afrotheria) reveals the likely eutherian ancestral karyotype

    PubMed Central

    Yang, F.; Alkalaeva, E. Z.; Perelman, P. L.; Pardini, A. T.; Harrison, W. R.; O'Brien, P. C. M.; Fu, B.; Graphodatsky, A. S.; Ferguson-Smith, M. A.; Robinson, T. J.

    2003-01-01

    The Afrotheria, a supraordinal grouping of mammals whose radiation is rooted in Africa, is strongly supported by DNA sequence data but not by their disparate anatomical features. We have used flow-sorted human, aardvark, and African elephant chromosome painting probes and applied reciprocal painting schemes to representatives of two of the Afrotherian orders, the Tubulidentata (aardvark) and Proboscidea (elephants), in an attempt to shed additional light on the evolutionary affinities of this enigmatic group of mammals. Although we have not yet found any unique cytogenetic signatures that support the monophyly of the Afrotheria, embedded within the aardvark genome we find the strongest evidence yet of a mammalian ancestral karyotype comprising 2n = 44. This karyotype includes nine chromosomes that show complete conserved synteny to those of man, six that show conservation as single chromosome arms or blocks in the human karyotype but that occur on two different chromosomes in the ancestor, and seven neighbor-joining combinations (i.e., the synteny is maintained in the majority of species of the orders studied so far, but which corresponds to two chromosomes in humans). The comparative chromosome maps presented between human and these Afrotherian species provide further insight into mammalian genome organization and comparative genomic data for the Afrotheria, one of the four major evolutionary clades postulated for the Eutheria. PMID:12552116

  19. Ancestral developmental potential facilitates parallel evolution in ants.

    PubMed

    Rajakumar, Rajendhran; San Mauro, Diego; Dijkstra, Michiel B; Huang, Ming H; Wheeler, Diana E; Hiou-Tim, Francois; Khila, Abderrahman; Cournoyea, Michael; Abouheif, Ehab

    2012-01-01

    Complex worker caste systems have contributed to the evolutionary success of advanced ant societies; however, little is known about the developmental processes underlying their origin and evolution. We combined hormonal manipulation, gene expression, and phylogenetic analyses with field observations to understand how novel worker subcastes evolve. We uncovered an ancestral developmental potential to produce a "supersoldier" subcaste that has been actualized at least two times independently in the hyperdiverse ant genus Pheidole. This potential has been retained and can be environmentally induced throughout the genus. Therefore, the retention and induction of this potential have facilitated the parallel evolution of supersoldiers through a process known as genetic accommodation. The recurrent induction of ancestral developmental potential may facilitate the adaptive and parallel evolution of phenotypes.

  20. Inferring ancestral sequences in taxon-rich phylogenies.

    PubMed

    Gascuel, Olivier; Steel, Mike

    2010-10-01

    Statistical consistency in phylogenetics has traditionally referred to the accuracy of estimating phylogenetic parameters for a fixed number of species as we increase the number of characters. However, it is also useful to consider a dual type of statistical consistency where we increase the number of species, rather than characters. This raises some basic questions: what can we learn about the evolutionary process as we increase the number of species? In particular, does having more species allow us to infer the ancestral state of characters accurately? This question is particularly important when sequence evolution varies in a complex way from character to character, as methods applicable for i.i.d. models may no longer be valid. In this paper, we assemble a collection of results to analyse various approaches for inferring ancestral information with increasing accuracy as the number of taxa increases.

  1. deBWT: parallel construction of Burrows–Wheeler Transform for large collection of genomes with de Bruijn-branch encoding

    PubMed Central

    Liu, Bo; Zhu, Dixian; Wang, Yadong

    2016-01-01

    Motivation: With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows–Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approaches cannot well support scalable parallel computing owing to their incremental nature, which is a bottleneck to use modern computers to accelerate BWT construction. Results: We propose de Bruijn branch-based BWT constructor (deBWT), a novel parallel BWT construction approach. DeBWT innovatively represents and organizes the suffixes of input sequence with a novel data structure, de Bruijn branch encoding. This data structure takes the advantage of de Bruijn graph to facilitate the comparison between the suffixes with long common prefix, which breaks the bottleneck of the BWT construction of repetitive genomic sequences. Meanwhile, deBWT also uses the structure of de Bruijn graph for reducing unnecessary comparisons between suffixes. The benchmarking suggests that, deBWT is efficient and scalable to construct BWT for large dataset by parallel computing. It is well-suited to index many genomes, such as a collection of individual human genomes, with multiple-core servers or clusters. Availability and implementation: deBWT is implemented in C language, the source code is available at https://github.com/hitbc/deBWT or https://github.com/DixianZhu/deBWT Contact: ydwang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307614

  2. Bilingualism (Ancestral Language Maintenance) among Native American, Vietnamese American, and Hispanic American College Students.

    ERIC Educational Resources Information Center

    Wharry, Cheryl

    1993-01-01

    A survey of 21 Hispanic, 22 Native American, and 10 Vietnamese American college students found that adoption or maintenance of ancestral language was related to attitudes toward ancestral language, beliefs about parental attitudes, and integrative motivation (toward family and ancestral ethnic group). There were significant differences by gender…

  3. State of cat genomics.

    PubMed

    O'Brien, Stephen J; Johnson, Warren; Driscoll, Carlos; Pontius, Joan; Pecon-Slattery, Jill; Menotti-Raymond, Marilyn

    2008-06-01

    Our knowledge of cat family biology was recently expanded to include a genomics perspective with the completion of a draft whole genome sequence of an Abyssinian cat. The utility of the new genome information has been demonstrated by applications ranging from disease gene discovery and comparative genomics to species conservation. Patterns of genomic organization among cats and inbred domestic cat breeds have illuminated our view of domestication, revealing linkage disequilibrium tracks consequent of breed formation, defining chromosome exchanges that punctuated major lineages of mammals and suggesting ancestral continental migration events that led to 37 modern species of Felidae. We review these recent advances here. As the genome resources develop, the cat is poised to make a major contribution to many areas in genetics and biology.

  4. Amplification of an ancestral mammalian L1 family of long interspersed repeated DNA occurred just before the murine radiation

    SciTech Connect

    Pascale, E.; Valle, E.; Furano, A.V. )

    1990-12-01

    Each mammalian genus examined so far contains 50,000-100,000 members of an L1 (LINE 1) family of long interspersed repeated DNA elements. Current knowledge on the evolution of L1 families presents a paradox because, although L1 families have been in mammalian genomes since before the mammalian radiation {approximately}80 million years ago, most members of the L1 families are only a few million years old. Accordingly it has been suggested either that the extensive amplification that characterizes present-day L1 families did not occur in the past or that old members were removed as new one were generated. However, the authors show here that an ancestral rodent L1 family was extensively amplified {approximately}10 million years ago and that the relics of this amplification have persisted in modern murine genomes. This amplification occurred just before the divergence of modern murine genera from their common ancestor and identifies the murine node in the lineage of modern muroid rodents The results suggest that repeated amplification of L1 elements is a feature of the evaluation of mammalian genomes and that ancestral amplification events could provide a useful tool for determining mammalian lineages.

  5. Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex.

    PubMed

    Bourbon, Henri-Marc

    2008-07-01

    The multisubunit Mediator (MED) complex bridges DNA-bound transcriptional regulators to the RNA polymerase II (PolII) initiation machinery. In yeast, the 25 MED subunits are distributed within three core subcomplexes and a separable kinase module composed of Med12, Med13 and the Cdk8-CycC pair thought to control the reversible interaction between MED and PolII by phosphorylating repeated heptapeptides within the Rpb1 carboxyl-terminal domain (CTD). Here, MED conservation has been investigated across the eukaryotic kingdom. Saccharomyces cerevisiae Med2, Med3/Pgd1 and Med5/Nut1 subunits are apparent homologs of metazoan Med29/Intersex, Med27/Crsp34 and Med24/Trap100, respectively, and these and other 30 identified human MED subunits have detectable counterparts in the amoeba Dictyostelium discoideum, indicating that none is specific to metazoans. Indeed, animal/fungal subunits are also conserved in plants, green and red algae, entamoebids, oomycetes, diatoms, apicomplexans, ciliates and the 'deep-branching' protists Trichomonas vaginalis and Giardia lamblia. Surprisingly, although lacking CTD heptads, T. vaginalis displays 44 MED subunit homologs, including several CycC, Med12 and Med13 paralogs. Such observations have allowed the identification of a conserved 17-subunit framework around which peripheral subunits may be assembled, and support a very ancient eukaryotic origin for a large, four-module MED. The implications of this comprehensive work for MED structure-function relationships are discussed.

  6. Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing

    PubMed Central

    Nadeau, Nicola J.; Whibley, Annabel; Jones, Robert T.; Davey, John W.; Dasmahapatra, Kanchon K.; Baxter, Simon W.; Quail, Michael A.; Joron, Mathieu; ffrench-Constant, Richard H.; Blaxter, Mark L.; Mallet, James; Jiggins, Chris D.

    2012-01-01

    Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These ‘islands’ of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the ‘speciation continuum’. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races. PMID:22201164

  7. Ancestral facial morphology of Old World higher primates.

    PubMed Central

    Benefit, B R; McCrossin, M L

    1991-01-01

    Fossil remains of the cercopithecoid Victoria-pithecus recently recovered from middle Miocene deposits of Maboko Island (Kenya) provide evidence of the cranial anatomy of Old World monkeys prior to the evolutionary divergence of the extant subfamilies Colobinae and Cercopithecinae. Victoria-pithecus shares a suite of craniofacial features with the Oligocene catarrhine Aegyptopithecus and early Miocene hominoid Afropithecus. All three genera manifest supraorbital costae, anteriorly convergent temporal lines, the absence of a postglabellar fossa, a moderate to long snout, great facial height below the orbits, a deep cheek region, and anteriorly tapering premaxilla. The shared presence of these features in a catarrhine generally ancestral to apes and Old World monkeys, an early ape, and an early Old World monkey indicates that they are primitive characteristics that typified the last common ancestor of Hominoidea and Cercopithecoidea. These results contradict prevailing cranial morphotype reconstructions for ancestral catarrhines as Colobus- or Hylobates-like, characterized by a globular anterior braincase and orthognathy. By resolving several equivocal craniofacial morphocline polarities, these discoveries lay the foundation for a revised interpretation of the ancestral cranial morphology of Catarrhini more consistent with neontological and existing paleontological evidence. Images PMID:2052606

  8. An ancestral bacterial division system is widespread in eukaryotic mitochondria.

    PubMed

    Leger, Michelle M; Petrů, Markéta; Žárský, Vojtěch; Eme, Laura; Vlček, Čestmír; Harding, Tommy; Lang, B Franz; Eliáš, Marek; Doležal, Pavel; Roger, Andrew J

    2015-08-18

    Bacterial division initiates at the site of a contractile Z-ring composed of polymerized FtsZ. The location of the Z-ring in the cell is controlled by a system of three mutually antagonistic proteins, MinC, MinD, and MinE. Plastid division is also known to be dependent on homologs of these proteins, derived from the ancestral cyanobacterial endosymbiont that gave rise to plastids. In contrast, the mitochondria of model systems such as Saccharomyces cerevisiae, mammals, and Arabidopsis thaliana seem to have replaced the ancestral α-proteobacterial Min-based division machinery with host-derived dynamin-related proteins that form outer contractile rings. Here, we show that the mitochondrial division system of these model organisms is the exception, rather than the rule, for eukaryotes. We describe endosymbiont-derived, bacterial-like division systems comprising FtsZ and Min proteins in diverse less-studied eukaryote protistan lineages, including jakobid and heterolobosean excavates, a malawimonad, stramenopiles, amoebozoans, a breviate, and an apusomonad. For two of these taxa, the amoebozoan Dictyostelium purpureum and the jakobid Andalucia incarcerata, we confirm a mitochondrial localization of these proteins by their heterologous expression in Saccharomyces cerevisiae. The discovery of a proteobacterial-like division system in mitochondria of diverse eukaryotic lineages suggests that it was the ancestral feature of all eukaryotic mitochondria and has been supplanted by a host-derived system multiple times in distinct eukaryote lineages.

  9. The ancestral eutherian karyotype is present in Xenarthra.

    PubMed

    Svartman, Marta; Stone, Gary; Stanyon, Roscoe

    2006-07-01

    Molecular studies have led recently to the proposal of a new super-ordinal arrangement of the 18 extant Eutherian orders. From the four proposed super-orders, Afrotheria and Xenarthra were considered the most basal. Chromosome-painting studies with human probes in these two mammalian groups are thus key in the quest to establish the ancestral Eutherian karyotype. Although a reasonable amount of chromosome-painting data with human probes have already been obtained for Afrotheria, no Xenarthra species has been thoroughly analyzed with this approach. We hybridized human chromosome probes to metaphases of species (Dasypus novemcinctus, Tamandua tetradactyla, and Choloepus hoffmanii) representing three of the four Xenarthra families. Our data allowed us to review the current hypotheses for the ancestral Eutherian karyotype, which range from 2n = 44 to 2n = 48. One of the species studied, the two-toed sloth C. hoffmanii (2n = 50), showed a chromosome complement strikingly similar to the proposed 2n = 48 ancestral Eutherian karyotype, strongly reinforcing it.

  10. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

    PubMed Central

    Bhaskar, Anand; Wang, Y.X. Rachel; Song, Yun S.

    2015-01-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. PMID:25564017

  11. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke

    PubMed Central

    2012-01-01

    Genetic factors have been implicated in stroke risk but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) in ischemic stroke and its subtypes in 3,548 cases and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 cases and 6,281 controls. We replicated reported associations between variants close to PITX2 and ZFHX3 with cardioembolic stroke, and a 9p21 locus with large vessel stroke. We identified a novel association for a SNP within the histone deacetylase 9 (HDAC9) gene on chromosome 7p21.1 which was associated with large vessel stroke including additional replication in a further 735 cases and 28583 controls (rs11984041, combined P = 1.87×10−11, OR=1.42 (95% CI) 1.28-1.57). All four loci exhibit evidence for heterogeneity of effect across the stroke subtypes, with some, and possibly all, affecting risk for only one subtype. This suggests differing genetic architectures for different stroke subtypes. PMID:22306652

  12. Large-Scale Genome-Wide Association Studies and Meta-Analyses of Longitudinal Change in Adult Lung Function

    PubMed Central

    Tang, Wenbo; Kowgier, Matthew; Loth, Daan W.; Soler Artigas, María; Joubert, Bonnie R.; Hodge, Emily; Gharib, Sina A.; Smith, Albert V.; Ruczinski, Ingo; Gudnason, Vilmundur; Mathias, Rasika A.; Harris, Tamara B.; Hansel, Nadia N.; Launer, Lenore J.; Barnes, Kathleen C.; Hansen, Joyanna G.; Albrecht, Eva; Aldrich, Melinda C.; Allerhand, Michael; Barr, R. Graham; Brusselle, Guy G.; Couper, David J.; Curjuric, Ivan; Davies, Gail; Deary, Ian J.; Dupuis, Josée; Fall, Tove; Foy, Millennia; Franceschini, Nora; Gao, Wei; Gläser, Sven; Gu, Xiangjun; Hancock, Dana B.; Heinrich, Joachim; Hofman, Albert; Imboden, Medea; Ingelsson, Erik; James, Alan; Karrasch, Stefan; Koch, Beate; Kritchevsky, Stephen B.; Kumar, Ashish; Lahousse, Lies; Li, Guo; Lind, Lars; Lindgren, Cecilia; Liu, Yongmei; Lohman, Kurt; Lumley, Thomas; McArdle, Wendy L.; Meibohm, Bernd; Morris, Andrew P.; Morrison, Alanna C.; Musk, Bill; North, Kari E.; Palmer, Lyle J.; Probst-Hensch, Nicole M.; Psaty, Bruce M.; Rivadeneira, Fernando; Rotter, Jerome I.; Schulz, Holger; Smith, Lewis J.; Sood, Akshay; Starr, John M.; Strachan, David P.; Teumer, Alexander; Uitterlinden, André G.; Völzke, Henry; Voorman, Arend; Wain, Louise V.; Wells, Martin T.; Wilk, Jemma B.; Williams, O. Dale; Heckbert, Susan R.; Stricker, Bruno H.; London, Stephanie J.; Fornage, Myriam; Tobin, Martin D.; O′Connor, George T.; Hall, Ian P.; Cassano, Patricia A.

    2014-01-01

    Background Genome-wide association studies (GWAS) have identified numerous loci influencing cross-sectional lung function, but less is known about genes influencing longitudinal change in lung function. Methods We performed GWAS of the rate of change in forced expiratory volume in the first second (FEV1) in 14 longitudinal, population-based cohort studies comprising 27,249 adults of European ancestry using linear mixed effects model and combined cohort-specific results using fixed effect meta-analysis to identify novel genetic loci associated with longitudinal change in lung function. Gene expression analyses were subsequently performed for identified genetic loci. As a secondary aim, we estimated the mean rate of decline in FEV1 by smoking pattern, irrespective of genotypes, across these 14 studies using meta-analysis. Results The overall meta-analysis produced suggestive evidence for association at the novel IL16/STARD5/TMC3 locus on chromosome 15 (P  =  5.71 × 10-7). In addition, meta-analysis using the five cohorts with ≥3 FEV1 measurements per participant identified the novel ME3 locus on chromosome 11 (P  =  2.18 × 10-8) at genome-wide significance. Neither locus was associated with FEV1 decline in two additional cohort studies. We confirmed gene expression of IL16, STARD5, and ME3 in multiple lung tissues. Publicly available microarray data confirmed differential expression of all three genes in lung samples from COPD patients compared with controls. Irrespective of genotypes, the combined estimate for FEV1 decline was 26.9, 29.2 and 35.7 mL/year in never, former, and persistent smokers, respectively. Conclusions In this large-scale GWAS, we identified two novel genetic loci in association with the rate of change in FEV1 that harbor candidate genes with biologically plausible functional links to lung function. PMID:24983941

  13. ``Black Holes" and Bacterial Pathogenicity: A Large Genomic Deletion that Enhances the Virulence of Shigella spp. and Enteroinvasive Escherichia coli

    NASA Astrophysics Data System (ADS)

    Maurelli, Anthony T.; Fernandez, Reinaldo E.; Bloch, Craig A.; Rode, Christopher K.; Fasano, Alessio

    1998-03-01

    Plasmids, bacteriophages, and pathogenicity islands are genomic additions that contribute to the evolution of bacterial pathogens. For example, Shigella spp., the causative agents of bacillary dysentery, differ from the closely related commensal Escherichia coli in the presence of a plasmid in Shigella that encodes virulence functions. However, pathogenic bacteria also may lack properties that are characteristic of nonpathogens. Lysine decarboxylate (LDC) activity is present in ≈ 90% of E. coli strains but is uniformly absent in Shigella strains. When the gene for LDC, cadA, was introduced into Shigella flexneri 2a, virulence became attenuated, and enterotoxin activity was inhibited greatly. The enterotoxin inhibitor was identified as cadaverine, a product of the reaction catalyzed by LDC. Comparison of the S. flexneri 2a and laboratory E. coli K-12 genomes in the region of cadA revealed a large deletion in Shigella. Representative strains of Shigella spp. and enteroinvasive E. coli displayed similar deletions of cadA. Our results suggest that, as Shigella spp. evolved from E. coli to become pathogens, they not only acquired virulence genes on a plasmid but also shed genes via deletions. The formation of these ``black holes,'' deletions of genes that are detrimental to a pathogenic lifestyle, provides an evolutionary pathway that enables a pathogen to enhance virulence. Furthermore, the demonstration that cadaverine can inhibit enterotoxin activity may lead to more general models about toxin activity or entry into cells and suggests an avenue for antitoxin therapy. Thus, understanding the role of black holes in pathogen evolution may yield clues to new treatments of infectious diseases.

  14. "Black holes" and bacterial pathogenicity: a large genomic deletion that enhances the virulence of Shigella spp. and enteroinvasive Escherichia coli.

    PubMed

    Maurelli, A T; Fernández, R E; Bloch, C A; Rode, C K; Fasano, A

    1998-03-31

    Plasmids, bacteriophages, and pathogenicity islands are genomic additions that contribute to the evolution of bacterial pathogens. For example, Shigella spp., the causative agents of bacillary dysentery, differ from the closely related commensal Escherichia coli in the presence of a plasmid in Shigella that encodes virulence functions. However, pathogenic bacteria also may lack properties that are characteristic of nonpathogens. Lysine decarboxylase (LDC) activity is present in approximately 90% of E. coli strains but is uniformly absent in Shigella strains. When the gene for LDC, cadA, was introduced into Shigella flexneri 2a, virulence became attenuated, and enterotoxin activity was inhibited greatly. The enterotoxin inhibitor was identified as cadaverine, a product of the reaction catalyzed by LDC. Comparison of the S. flexneri 2a and laboratory E. coli K-12 genomes in the region of cadA revealed a large deletion in Shigella. Representative strains of Shigella spp. and enteroinvasive E. coli displayed similar deletions of cadA. Our results suggest that, as Shigella spp. evolved from E. coli to become pathogens, they not only acquired virulence genes on a plasmid but also shed genes via deletions. The formation of these "black holes," deletions of genes that are detrimental to a pathogenic lifestyle, provides an evolutionary pathway that enables a pathogen to enhance virulence. Furthermore, the demonstration that cadaverine can inhibit enterotoxin activity may lead to more general models about toxin activity or entry into cells and suggests an avenue for antitoxin therapy. Thus, understanding the role of black holes in pathogen evolution may yield clues to new treatments of infectious diseases.

  15. Comment on Schielzeth et al. (2014): "Genome size variation affects song attractiveness in grasshoppers: Evidence for sexual selection against large genomes".

    PubMed

    Camacho, Juan Pedro M

    2016-06-01

    Schielzeth et al. (2014) concluded that attractive grasshopper singers have significantly smaller genomes thus suggesting a possible role for sexual selection on genome size. Whereas this conclusion could still be conceivably valid, it is not supported by the data presented due to some technical flaws. In addition, the interpretation of the results, speculating on the possible presence of B chromosomes, is not justified. PMID:27327141

  16. The Historical Speciation of Mauremys Sensu Lato: Ancestral Area Reconstruction and Interspecific Gene Flow Level Assessment Provide New Insights

    PubMed Central

    Zhou, Huaxing; Jiang, Yuan; Nie, Liuwang; Yin, Huazong; Li, Haifeng; Dong, Xianmei; Zhao, Feifei; Zhang, Huanhuan; Pu, Youguang; Huang, Zhenfeng; Song, Jiaolian; Sun, Entao

    2015-01-01

    Mauremys sensu lato was divided into Mauremys, Chinemys, Ocadia, and Annamemys based on earlier research on morphology. Phylogenetic research on this group has been controversial because of disagreements regarding taxonomy, and the historical speciation is still poorly understood. In this study, 32 individuals of eight species that are widely distributed in Eurasia were collected. The complete mitochondrial (mt) sequences of 14 individuals of eight species were sequenced. Phylogenetic relationships, interspecific divergence times, and ancestral area reconstructions were explored using mt genome data (10,854 bp). Subsequent interspecific gene flow level assessment was performed using five unlinked polymorphic microsatellite loci. The Bayesian and maximum likelihood analyses revealed a paraphyletic relationship among four old genera (Mauremys, Annamemys, Chinemys, and Ocadia) and suggested the four old genera should be merged into the genus (Mauremys). Ancestral area reconstruction and divergence time estimation suggested Southeast Asia may be the area of origin for the common ancestral species of this genus and genetic drift may have played a decisive role in species divergence due to the isolated event of a glacial age. However, M. japonica may have been speciated due to the creation of the island of Japan. The detection of extensive gene flow suggested no vicariance occurred between Asia and Southeast Asia. Inconsistent results between gene flow assessment and phylogenetic analysis revealed the hybrid origin of M. mutica (Southeast Asian). Here ancestral area reconstruction and interspecific gene flow level assessment were first used to explore species origins and evolution of Mauremys sensu lato, which provided new insights on this genus. PMID:26657158

  17. The mitochondrial genome of Frankliniella intonsa: insights into the evolution of mitochondrial genomes at lower taxonomic levels in Thysanoptera.

    PubMed

    Yan, Dankan; Tang, Yunxia; Hu, Min; Liu, Fengquan; Zhang, Dongfang; Fan, Jiaqin

    2014-10-01

    Thrips is an ideal group for studying the evolution of mitochondrial (mt) genomes in the genus and family due to independent rearrangements within this order. The complete sequence of the mitochondrial DNA (mtDNA) of the flower thrips Frankliniella intonsa has been completed and annotated in this study. The circular genome is 15,215bp in length with an A+T content of 75.9% and contains the typical 37 genes and it has triplicate putative control regions. Nucleotide composition is A+T biased, and the majority of the protein-coding genes present opposite CG skew which is reflected by the nucleotide composition, codon and amino acid usage. Although the known thrips have massive gene rearrangements, it showed no reversal of strand asymmetry. Gene rearrangements have been found in the lower taxonomic levels of thrips. Three tRNA genes were translocated in the genus Frankliniella and eight tRNA genes in the family Thripidae. Although the gene arrangements of mt genomes of all three thrips species differ massively from the ancestral insect, they are all very similar to each other, indicating that there was a large rearrangement somewhere before the most recent common ancestor of these three species and very little genomic evolution or rearrangements after then. The extremely similar sequences among the CRs suggest that they are ongoing concerted evolution. Analyses of the up and downstream sequence of CRs reveal that the CR2 is actually the ancestral CR. The three CRs are in the same spot in each of the three thrips mt genomes which have the identical inverted genes. These characteristics might be obtained from the most recent common ancestor of this three thrips. Above observations suggest that the mt genomes of the three thrips keep a single massive rearrangement from the common ancestor and have low evolutionary rates among them.

  18. The Gut Fungus Basidiobolus ranarum Has a Large Genome and Different Copy Numbers of Putatively Functionally Redundant Elongation Factor Genes

    PubMed Central

    Henk, Daniel A.; Fisher, Matthew C.

    2012-01-01

    Fungal genomes range in size from 2.3 Mb for the microsporidian Encephalitozoon intestinalis up to 8000 Mb for Entomophaga aulicae, with a mean genome size of 37 Mb. Basidiobolus, a common inhabitant of vertebrate guts, is distantly related to all other fungi, and is unique in possessing both EF-1α and EFL genes. Using DNA sequencing and a quantitative PCR approach, we estimated a haploid genome size for Basidiobolus at 350 Mb. However, based on allelic variation, the nuclear genome is at least diploid, leading us to believe that the final genome size is at least 700 Mb. We also found that EFL was in three times the copy number of its putatively functionally overlapping paralog EF-1α. This suggests that gene or genome duplication may be an important feature of B. ranarum evolution, and also suggests that B. ranarum may have mechanisms in place that favor the preservation of functionally overlapping genes. PMID:22363602

  19. The gut fungus Basidiobolus ranarum has a large genome and different copy numbers of putatively functionally redundant elongation factor genes.

    PubMed

    Henk, Daniel A; Fisher, Matthew C

    2012-01-01

    Fungal genomes range in size from 2.3 Mb for the microsporidian Encephalitozoon intestinalis up to 8000 Mb for Entomophaga aulicae, with a mean genome size of 37 Mb. Basidiobolus, a common inhabitant of vertebrate guts, is distantly related to all other fungi, and is unique in possessing both EF-1α and EFL genes. Using DNA sequencing and a quantitative PCR approach, we estimated a haploid genome size for Basidiobolus at 350 Mb. However, based on allelic variation, the nuclear genome is at least diploid, leading us to believe that the final genome size is at least 700 Mb. We also found that EFL was in three times the copy number of its putatively functionally overlapping paralog EF-1α. This suggests that gene or genome duplication may be an important feature of B. ranarum evolution, and also suggests that B. ranarum may have mechanisms in place that favor the preservation of functionally overlapping genes. PMID:22363602

  20. Implementation of genomic recursions in single-step genomic best linear unbiased predictor for US Holsteins with a large number of genotyped animals.

    PubMed

    Masuda, Y; Misztal, I; Tsuruta, S; Legarra, A; Aguilar, I; Lourenco, D A L; Fragomeni, B O; Lawlor, T J

    2016-03-01

    The objectives of this study were to develop and evaluate an efficient implementation in the computation of the inverse of genomic relationship matrix with the recursion algorithm, called the algorithm for proven and young (APY), in single-step genomic BLUP. We validated genomic predictions for young bulls with more than 500,000 genotyped animals in final score for US Holsteins. Phenotypic data included 11,626,576 final scores on 7,093,380 US Holstein cows, and genotypes were available for 569,404 animals. Daughter deviations for young bulls with no classified daughters in 2009, but at least 30 classified daughters in 2014 were computed using all the phenotypic data. Genomic predictions for the same bulls were calculated with single-step genomic BLUP using phenotypes up to 2009. We calculated the inverse of the genomic relationship matrix GAPY(-1) based on a direct inversion of genomic relationship matrix on a small subset of genotyped animals (core animals) and extended that information to noncore animals by recursion. We tested several sets of core animals including 9,406 bulls with at least 1 classified daughter, 9,406 bulls and 1,052 classified dams of bulls, 9,406 bulls and 7,422 classified cows, and random samples of 5,000 to 30,000 animals. Validation reliability was assessed by the coefficient of determination from regression of daughter deviation on genomic predictions for the predicted young bulls. The reliabilities were 0.39 with 5,000 randomly chosen core animals, 0.45 with the 9,406 bulls, and 7,422 cows as core animals, and 0.44 with the remaining sets. With phenotypes truncated in 2009 and the preconditioned conjugate gradient to solve mixed model equations, the number of rounds to convergence for core animals defined by bulls was 1,343; defined by bulls and cows, 2,066; and defined by 10,000 random animals, at most 1,629. With complete phenotype data, the number of rounds decreased to 858, 1,299, and at most 1,092, respectively. Setting up GAPY(-1

  1. Accessing complex crop genomes with next-generation sequencing.

    PubMed

    Edwards, David; Batley, Jacqueline; Snowdon, Rod J

    2013-01-01

    Many important crop species have genomes originating from ancestral or recent polyploidisation events. Multiple homoeologous gene copies, chromosomal rearrangements and amplification of repetitive DNA within large and complex crop genomes can considerably complicate genome analysis and gene discovery by conventional, forward genetics approaches. On the other hand, ongoing technological advances in molecular genetics and genomics today offer unprecedented opportunities to analyse and access even more recalcitrant genomes. In this review, we describe next-generation sequencing and data analysis techniques that vastly improve our ability to dissect and mine genomes for causal genes underlying key traits and allelic variation of interest to breeders. We focus primarily on wheat and oilseed rape, two leading examples of major polyploid crop genomes whose size or complexity present different, significant challenges. In both cases, the latest DNA sequencing technologies, applied using quite different approaches, have enabled considerable progress towards unravelling the respective genomes. Our ability to discover the extent and distribution of genetic diversity in crop gene pools, and its relationship to yield and quality-related traits, is swiftly gathering momentum as DNA sequencing and the bioinformatic tools to deal with growing quantities of genomic data continue to develop. In the coming decade, genomic and transcriptomic sequencing, discovery and high-throughput screening of single nucleotide polymorphisms, presence-absence variations and other structural chromosomal variants in diverse germplasm collections will give detailed insight into the origins, domestication and available trait-relevant variation of polyploid crops, in the process facilitating novel approaches and possibilities for genomics-assisted breeding.

  2. Gene deregulation and spatial genome reorganization near breakpoints prior to formation of translocations in anaplastic large cell lymphoma

    PubMed Central

    Mathas, Stephan; Kreher, Stephan; Meaburn, Karen J.; Jöhrens, Korinna; Lamprecht, Björn; Assaf, Chalid; Sterry, Wolfram; Kadin, Marshall E.; Daibata, Masanori; Joos, Stefan; Hummel, Michael; Stein, Harald; Janz, Martin; Anagnostopoulos, Ioannis; Schrock, Evelin; Misteli, Tom; Dörken, Bernd

    2009-01-01

    Although the identification and characterization of translocations have rapidly increased, little is known about the mechanisms of how translocations occur in vivo. We used anaplastic large cell lymphoma (ALCL) with and without the characteristic t(2;5)(p23;q35) translocation to study the mechanisms of formation of translocations and of ALCL transformation. We report deregulation of several genes located near the ALCL translocation breakpoint, regardless of whether the tumor contains the t(2;5). The affected genes include the oncogenic transcription factor Fra2 (located on 2p23), the HLH protein Id2 (2p25), and the oncogenic tyrosine kinase CSF1-receptor (5q33.1). Their up-regulation promotes cell survival and repression of T cell-specific gene expression programs that are characteristic for ALCL. The deregulated genes are in spatial proximity within the nuclear space of t(2;5)-negative ALCL cells, facilitating their translocation on induction of double-strand breaks. These data suggest that deregulation of breakpoint-proximal genes occurs before the formation of translocations, and that aberrant transcriptional activity of genomic regions is linked to their propensity to undergo chromosomal translocations. Also, our data demonstrate that deregulation of breakpoint-proximal genes has a key role in ALCL. PMID:19321746

  3. The genomic architecture of NLRP7 is Alu rich and predisposes to disease-associated large deletions.

    PubMed

    Reddy, Ramesh; Nguyen, Ngoc M P; Sarrabay, Guillaume; Rezaei, Maryam; Rivas, Mayra C G; Kavasoglu, Aysenur; Berkil, Hakan; Elshafey, Alaa; Nunez, Kristin P; Dreyfus, Hélène; Philippe, Merviel; Hadipour, Zahra; Durmaz, Asude; Eaton, Erin E; Schubert, Brittany; Ulker, Volkan; Hadipour, Fatemeh; Ahmadpour, Fatemeh; Touitou, Isabelle; Fardaei, Majid; Slim, Rima

    2016-10-01

    NLRP7 is a major gene responsible for recurrent hydatidiform moles. Here, we report 11 novel NLRP7 protein truncating variants, of which five deletions of more than 1-kb. We analyzed the transcriptional consequences of four variants. We demonstrate that one large homozygous deletion removes NLRP7 transcription start site and results in the complete absence of its transcripts in a patient in good health besides her reproductive problem. This observation strengthens existing data on the requirement of NLRP7 only for female reproduction. We show that two other variants affecting the splice acceptor of exon 6 lead to its in-frame skipping while another variant affecting the splice donor site of exon 9 leads to an in-frame insertion of 54 amino acids. Our characterization of the deletion breakpoints demonstrated that most of the breakpoints occurred within Alu repeats and the deletions were most likely mediated by microhomology events. Our data define a hotspot of Alu instability and deletions in intron 5 with six different breakpoints and rearrangements. Analysis of NLRP7 genomic sequences for repetitive elements demonstrated that Alu repeats represent 48% of its intronic sequences and these repeats seem to have been inserted into the common NLRP2/7 primate ancestor before its duplication into two genes.

  4. Extensive Chromosomal Reorganization in the Evolution of New World Muroid Rodents (Cricetidae, Sigmodontinae): Searching for Ancestral Phylogenetic Traits.

    PubMed

    Pereira, Adenilson Leão; Malcher, Stella Miranda; Nagamachi, Cleusa Yoshiko; O'Brien, Patricia Caroline Mary; Ferguson-Smith, Malcolm Andrew; Mendes-Oliveira, Ana Cristina; Pieczarka, Julio Cesar

    2016-01-01

    Sigmodontinae rodents show great diversity and complexity in morphology and ecology. This diversity is accompanied by extensive chromosome variation challenging attempts to reconstruct their ancestral genome. The species Hylaeamys megacephalus--HME (Oryzomyini, 2n = 54), Necromys lasiurus--NLA (Akodontini, 2n = 34) and Akodon sp.--ASP (Akodontini, 2n = 10) have extreme diploid numbers that make it difficult to understand the rearrangements that are responsible for such differences. In this study we analyzed these changes using whole chromosome probes of HME in cross-species painting of NLA and ASP to construct chromosome homology maps that reveal the rearrangements between species. We include data from the literature for other Sigmodontinae previously studied with probes from HME and Mus musculus (MMU) probes. We also use the HME probes on MMU chromosomes for the comparative analysis of NLA with other species already mapped by MMU probes. Our results show that NLA and ASP have highly rearranged karyotypes when compared to HME. Eleven HME syntenic blocks are shared among the species studied here. Four syntenies may be ancestral to Akodontini (HME2/18, 3/25, 18/25 and 4/11/16) and eight to Sigmodontinae (HME26, 1/12, 6/21, 7/9, 5/17, 11/16, 20/13 and 19/14/19). Using MMU data we identified six associations shared among rodents from seven subfamilies, where MMU3/18 and MMU8/13 are phylogenetic signatures of Sigmodontinae. We suggest that the associations MMU2entire, MMU6proximal/12entire, MMU3/18, MMU8/13, MMU1/17, MMU10/17, MMU12/17, MMU5/16, MMU5/6 and MMU7/19 are part of the ancestral Sigmodontinae genome.

  5. Extensive Chromosomal Reorganization in the Evolution of New World Muroid Rodents (Cricetidae, Sigmodontinae): Searching for Ancestral Phylogenetic Traits

    PubMed Central

    Pereira, Adenilson Leão; Malcher, Stella Miranda; Nagamachi, Cleusa Yoshiko; O’Brien, Patricia Caroline Mary; Ferguson-Smith, Malcolm Andrew; Mendes-Oliveira, Ana Cristina; Pieczarka, Julio Cesar

    2016-01-01

    Sigmodontinae rodents show great diversity and complexity in morphology and ecology. This diversity is accompanied by extensive chromosome variation challenging attempts to reconstruct their ancestral genome. The species Hylaeamys megacephalus–HME (Oryzomyini, 2n = 54), Necromys lasiurus—NLA (Akodontini, 2n = 34) and Akodon sp.–ASP (Akodontini, 2n = 10) have extreme diploid numbers that make it difficult to understand the rearrangements that are responsible for such differences. In this study we analyzed these changes using whole chromosome probes of HME in cross-species painting of NLA and ASP to construct chromosome homology maps that reveal the rearrangements between species. We include data from the literature for other Sigmodontinae previously studied with probes from HME and Mus musculus (MMU) probes. We also use the HME probes on MMU chromosomes for the comparative analysis of NLA with other species already mapped by MMU probes. Our results show that NLA and ASP have highly rearranged karyotypes when compared to HME. Eleven HME syntenic blocks are shared among the species studied here. Four syntenies may be ancestral to Akodontini (HME2/18, 3/25, 18/25 and 4/11/16) and eight to Sigmodontinae (HME26, 1/12, 6/21, 7/9, 5/17, 11/16, 20/13 and 19/14/19). Using MMU data we identified six associations shared among rodents from seven subfamilies, where MMU3/18 and MMU8/13 are phylogenetic signatures of Sigmodontinae. We suggest that the associations MMU2entire, MMU6proximal/12entire, MMU3/18, MMU8/13, MMU1/17, MMU10/17, MMU12/17, MMU5/16, MMU5/6 and MMU7/19 are part of the ancestral Sigmodontinae genome. PMID:26800516

  6. Human Genetic Ancestral Composition Correlates with the Origin of Mycobacterium leprae Strains in a Leprosy Endemic Population

    PubMed Central

    Cardona-Castro, Nora; Cortés, Edwin; Beltrán, Camilo; Romero, Marcela; Badel-Mogollón, Jaime E.; Bedoya, Gabriel

    2015-01-01

    Recent reports have suggested that leprosy originated in Africa, extended to Asia and Europe, and arrived in the Americas during European colonization and the African slave trade. Due to colonization, the contemporary Colombian population is an admixture of Native-American, European and African ancestries. Because microorganisms are known to accompany humans during migrations, patterns of human migration can be traced by examining genomic changes in associated microbes. The current study analyzed 118 leprosy cases and 116 unrelated controls from two Colombian regions endemic for leprosy (Atlantic and Andean) in order to determine possible associations of leprosy with patient ancestral background (determined using 36 ancestry informative markers), Mycobacterium leprae genotype and/or patient geographical origin. We found significant differences between ancestral genetic composition. European components were predominant in Andean populations. In contrast, African components were higher in the Atlantic region. M. leprae genotypes were then analyzed for cluster associations and compared with the ancestral composition of leprosy patients. Two M. leprae principal clusters were found: haplotypes C54 and T45. Haplotype C54 associated with African origin and was more frequent in patients from the Atlantic region with a high African component. In contrast, haplotype T45 associated with European origin and was more frequent in Andean patients with a higher European component. These results suggest that the human and M. leprae genomes have co-existed since the African and European origins of the disease, with leprosy ultimately arriving in Colombia during colonization. Distinct M. leprae strains followed European and African settlement in the country and can be detected in contemporary Colombian populations. PMID:26360617

  7. Visual system evolution and the nature of the ancestral snake.

    PubMed

    Simões, B F; Sampaio, F L; Jared, C; Antoniazzi, M M; Loew, E R; Bowmaker, J K; Rodriguez, A; Hart, N S; Hunt, D M; Partridge, J C; Gower, D J

    2015-07-01

    The dominant hypothesis for the evolutionary origin of snakes from 'lizards' (non-snake squamates) is that stem snakes acquired many snake features while passing through a profound burrowing (fossorial) phase. To investigate this, we examined the visual pigments and their encoding opsin genes in a range of squamate reptiles, focusing on fossorial lizards and snakes. We sequenced opsin transcripts isolated from retinal cDNA and used microspectrophotometry to measure directly the spectral absorbance of the photoreceptor visual pigments in a subset of samples. In snakes, but not lizards, dedicated fossoriality (as in Scolecophidia and the alethinophidian Anilius scytale) corresponds with loss of all visual opsins other than RH1 (λmax 490-497 nm); all other snakes (including less dedicated burrowers) also have functional sws1 and lws opsin genes. In contrast, the retinas of all lizards sampled, even highly fossorial amphisbaenians with reduced eyes, express functional lws, sws1, sws2 and rh1 genes, and most also express rh2 (i.e. they express all five of the visual opsin genes present in the ancestral vertebrate). Our evidence of visual pigment complements suggests that the visual system of stem snakes was partly reduced, with two (RH2 and SWS2) of the ancestral vertebrate visual pigments being eliminated, but that this did not extend to the extreme additional loss of SWS1 and LWS that subsequently occurred (probably independently) in highly fossorial extant scolecophidians and A. scytale. We therefore consider it unlikely that the ancestral snake was as fossorial as extant scolecophidians, whether or not the latter are para- or monophyletic.

  8. A branch-heterogeneous model of protein evolution for efficient inference of ancestral sequences.

    PubMed

    Groussin, M; Boussau, B; Gouy, M

    2013-07-01

    Most models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor.

  9. The Organellar Genomes of Chromera and Vitrella, the Phototrophic Relatives of Apicomplexan Parasites.

    PubMed

    Oborník, Miroslav; Lukeš, Julius

    2015-01-01

    Apicomplexa are known to contain greatly reduced organellar genomes. Their mitochondrial genome carries only three protein-coding genes, and their plastid genome is reduced to a 35-kb-long circle. The discovery of coral-endosymbiotic algae Chromera velia and Vitrella brassicaformis, which share a common ancestry with Apicomplexa, provided an opportunity to study possibly ancestral forms of organellar genomes, a unique glimpse into the evolutionary history of apicomplexan parasites. The structurally similar mitochondrial genomes of Chromera and Vitrella differ in gene content, which is reflected in the composition of their respiratory chains. Thus, Chromera lacks respiratory complexes I and III, whereas Vitrella and apicomplexan parasites are missing only complex I. Plastid genomes differ substantially between these algae, particularly in structure: The Chromera plastid genome is a linear, 120-kb molecule with large and divergent genes, whereas the plastid genome of Vitrella is a highly compact circle that is only 85 kb long but nonetheless contains more genes than that of Chromera. It appears that organellar genomes have already been reduced in free-living phototrophic ancestors of apicomplexan parasites, and such reduction is not associated with parasitism. PMID:26092225

  10. Whole Genome Sequence Analysis of a Large Isoniazid-Resistant Tuberculosis Outbreak in London: A Retrospective Observational Study

    PubMed Central

    Casali, Nicola; Broda, Agnieszka; Harris, Simon R.; Brown, Timothy; Drobniewski, Francis

    2016-01-01

    Background A large isoniazid-resistant tuberculosis outbreak centred on London, United Kingdom, has been ongoing since 1995. The aim of this study was to investigate the power and value of whole genome sequencing (WGS) to resolve the transmission network compared to current molecular strain typing approaches, including analysis of intra-host diversity within a specimen, across body sites, and over time, with identification of genetic factors underlying the epidemiological success of this cluster. Methods and Findings We sequenced 344 outbreak isolates from individual patients collected over 14 y (2 February 1998–22 June 2012). This demonstrated that 96 (27.9%) were indistinguishable, and only one differed from this major clone by more than five single nucleotide polymorphisms (SNPs). The maximum number of SNPs between any pair of isolates was nine SNPs, and the modal distance between isolates was two SNPs. WGS was able to reveal the direction of transmission of tuberculosis in 16 cases within the outbreak (4.7%), including within a multidrug-resistant cluster that carried a rare rpoB mutation associated with rifampicin resistance. Eleven longitudinal pairs of patient pulmonary isolates collected up to 48 mo apart differed from each other by between zero and four SNPs. Extrapulmonary dissemination resulted in acquisition of a SNP in two of five cases. WGS analysis of 27 individual colonies cultured from a single patient specimen revealed ten loci differed amongst them, with a maximum distance between any pair of six SNPs. A limitation of this study, as in previous studies, is that indels and SNPs in repetitive regions were not assessed due to the difficulty in reliably determining this variation. Conclusions Our study suggests that (1) certain paradigms need to be revised, such as the 12 SNP distance as the gold standard upper threshold to identify plausible transmissions; (2) WGS technology is helpful to rule out the possibility of direct transmission when

  11. A Large Genome-Wide Association Study of Age-Related Hearing Impairment Using Electronic Health Records

    PubMed Central

    Hoffmann, Thomas J.; Keats, Bronya J.; Yoshikawa, Noriko; Risch, Neil

    2016-01-01

    Age-related hearing impairment (ARHI), one of the most common sensory disorders, can be mitigated, but not cured or eliminated. To identify genetic influences underlying ARHI, we conducted a genome-wide association study of ARHI in 6,527 cases and 45,882 controls among the non-Hispanic whites from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. We identified two novel genome-wide significant SNPs: rs4932196 (odds ratio = 1.185, p = 4.0x10-11), 52Kb 3’ of ISG20, which replicated in a meta-analysis of the other GERA race/ethnicity groups (1,025 cases, 12,388 controls, p = 0.00094) and in a UK Biobank case-control analysis (30,802 self-reported cases, 78,586 controls, p = 0.015); and rs58389158 (odds ratio = 1.132, p = 1.8x10-9), which replicated in the UK Biobank (p = 0.00021). The latter SNP lies just outside exon 8 and is highly correlated (r2 = 0.96) with the missense SNP rs5756795 in exon 7 of TRIOBP, a gene previously associated with prelingual nonsyndromic hearing loss. We further tested these SNPs in phenotypes from audiologist notes available on a subset of GERA (4,903 individuals), stratified by case/control status, to construct an independent replication test, and found a significant effect of rs58389158 on speech reception threshold (SRT; overall GERA meta-analysis p = 1.9x10-6). We also tested variants within exons of 132 other previously-identified hearing loss genes, and identified two common additional significant SNPs: rs2877561 (synonymous change in ILDR1, p = 6.2x10-5), which replicated in the UK Biobank (p = 0.00057), and had a significant GERA SRT (p = 0.00019) and speech discrimination score (SDS; p = 0.0019); and rs9493627 (missense change in EYA4, p = 0.00011) which replicated in the UK Biobank (p = 0.0095), other GERA groups (p = 0.0080), and had a consistent significant result for SRT (p = 0.041) and suggestive result for SDS (p = 0.081). Large cohorts with GWAS data and electronic health records may be a useful

  12. Genome of horsepox virus.

    PubMed

    Tulman, E R; Delhon, G; Afonso, C L; Lu, Z; Zsak, L; Sandybaev, N T; Kerembekova, U Z; Zaitsev, V L; Kutish, G F; Rock, D L

    2006-09-01

    Here we present the genomic sequence of horsepox virus (HSPV) isolate MNR-76, an orthopoxvirus (OPV) isolated in 1976 from diseased Mongolian horses. The 212-kbp genome contained 7.5-kbp inverted terminal repeats and lacked extensive terminal tandem repetition. HSPV contained 236 open reading frames (ORFs) with similarity to those in other OPVs, with those in the central 100-kbp region most conserved relative to other OPVs. Phylogenetic analysis of the conserved region indicated that HSPV is closely related to sequenced isolates of vaccinia virus (VACV) and rabbitpox virus, clearly grouping together these VACV-like viruses. Fifty-four HSPV ORFs likely represented fragments of 25 orthologous OPV genes, including in the central region the only known fragmented form of an OPV ribonucleotide reductase large subunit gene. In terminal genomic regions, HSPV lacked full-length homologues of genes variably fragmented in other VACV-like viruses but was unique in fragmentation of the homologue of VACV strain Copenhagen B6R, a gene intact in other known VACV-like viruses. Notably, HSPV contained in terminal genomic regions 17 kbp of OPV-like sequence absent in known VACV-like viruses, including fragments of genes intact in other OPVs and approximately 1.4 kb of sequence present only in cowpox virus (CPXV). HSPV also contained seven full-length genes fragmented or missing in other VACV-like viruses, including intact homologues of the CPXV strain GRI-90 D2L/I4R CrmB and D13L CD30-like tumor necrosis factor receptors, D3L/I3R and C1L ankyrin repeat proteins, B19R kelch-like protein, D7L BTB/POZ domain protein, and B22R variola virus B22R-like protein. These results indicated that HSPV contains unique genomic features likely contributing to a unique virulence/host range phenotype. They also indicated that while closely related to known VACV-like viruses, HSPV contains additional, potentially ancestral sequences absent in other VACV-like viruses.

  13. Applied animal genomics: results from the field.

    PubMed

    Van Eenennaam, Alison L; Weigel, Kent A; Young, Amy E; Cleveland, Matthew A; Dekkers, Jack C M

    2014-02-01

    Genomic selection (GS) is the use of statistical methods to estimate the genetic merit of a genotyped animal based on prediction equations derived from large ancestral populations with both phenotypes and genotypes. It has revolutionized the dairy cattle breeding industry and has been implemented with varying degrees of success in other animal breeding programs, including swine, poultry, and beef cattle. The findings of empirical field studies applying GS to the breeding sectors of these main animal protein industries are reviewed. Several translational considerations must be addressed before implementing GS in genetic improvement programs. These include determining and obtaining economically relevant phenotypes and determining the optimal size of the training population, cost-effective genotyping strategies, the practicality of field implementation, and the relative costs versus the benefits of the realized rate of genetic gain. GS may additionally change the optimal breeding scheme design, and studies that address this consideration are also reviewed briefly.

  14. Evidence from opsin genes rejects nocturnality in ancestral primates

    PubMed Central

    Tan, Ying; Yoder, Anne D.; Yamashita, Nayuta; Li, Wen-Hsiung

    2005-01-01

    It is firmly believed that ancestral primates were nocturnal, with nocturnality having been maintained in most prosimian lineages. Under this traditional view, the opsin genes in all nocturnal prosimians should have undergone similar degrees of functional relaxation and accumulated similar extents of deleterious mutations. This expectation is rejected by the short-wavelength (S) opsin gene sequences from 14 representative prosimians. We found severe defects of the S opsin gene only in lorisiforms, but no defect in five nocturnal and two diurnal lemur species and only minor defects in two tarsiers and two nocturnal lemurs. Further, the nonsynonymous-to-synonymous rate ratio of the S opsin gene is highest in the lorisiforms and varies among the other prosimian branches, indicating different time periods of functional relaxation among lineages. These observations suggest that the ancestral primates were diurnal or cathemeral and that nocturnality has evolved several times in the prosimians, first in the lorisiforms but much later in other lineages. This view is further supported by the distribution pattern of the middle-wavelength (M) and long-wavelength (L) opsin genes among prosimians. PMID:16192351

  15. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  16. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans.

  17. Functional conservation of an ancestral Pellino protein in helminth species

    PubMed Central

    Cluxton, Christopher D.; Caffrey, Brian E.; Kinsella, Gemma K.; Moynagh, Paul N.; Fares, Mario A.; Fallon, Padraic G.

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  18. Evidence from opsin genes rejects nocturnality in ancestral primates.

    PubMed

    Tan, Ying; Yoder, Anne D; Yamashita, Nayuta; Li, Wen-Hsiung

    2005-10-11

    It is firmly believed that ancestral primates were nocturnal, with nocturnality having been maintained in most prosimian lineages. Under this traditional view, the opsin genes in all nocturnal prosimians should have undergone similar degrees of functional relaxation and accumulated similar extents of deleterious mutations. This expectation is rejected by the short-wavelength (S) opsin gene sequences from 14 representative prosimians. We found severe defects of the S opsin gene only in lorisiforms, but no defect in five nocturnal and two diurnal lemur species and only minor defects in two tarsiers and two nocturnal lemurs. Further, the nonsynonymous-to-synonymous rate ratio of the S opsin gene is highest in the lorisiforms and varies among the other prosimian branches, indicating different time periods of functional relaxation among lineages. These observations suggest that the ancestral primates were diurnal or cathemeral and that nocturnality has evolved several times in the prosimians, first in the lorisiforms but much later in other lineages. This view is further supported by the distribution pattern of the middle-wavelength (M) and long-wavelength (L) opsin genes among prosimians.

  19. Chromosome painting between human and lorisiform prosimians: evidence for the HSA 7/16 synteny in the primate ancestral karyotype.

    PubMed

    Nie, Wenhui; O'Brien, Patricia C M; Fu, Beiyuan; Wang, Jinhuan; Su, Weiting; Ferguson-Smith, Malcolm A; Robinson, Terence J; Yang, Fengtang

    2006-02-01

    Multidirectional chromosome painting with probes derived from flow-sorted chromosomes of humans (Homo sapiens, HSA, 2n = 46) and galagos (Galago moholi, GMO, 2n = 38) allowed us to map evolutionarily conserved chromosomal segments among humans, galagos, and slow lorises (Nycticebus coucang, NCO, 2n = 50). In total, the 22 human autosomal painting probes detected 40 homologous chromosomal segments in the slow loris genome. The genome of the slow loris contains 16 sytenic associations of human homologues. The ancient syntenic associations of human chromosomes such as HSA 3/21, 7/16, 12/22 (twice), and 14/15, reported in most mammalian species, were also present in the slow loris genome. Six associations (HSA 1a/19a, 2a/12a, 6a/14b, 7a/12c, 9/15b, and 10a/19b) were shared by the slow loris and galago. Five associations (HSA 1b/6b, 4a/5a, 11b/15a, 12b/19b, and 15b/16b) were unique to the slow loris. In contrast, 30 homologous chromosome segments were identified in the slow loris genome when using galago chromosome painting probes. The data showed that the karyotypic differences between these two species were mainly due to Robertsonian translocations. Reverse painting, using galago painting probes onto human chromosomes, confirmed most of the chromosome homologies between humans and galagos established previously, and documented the HSA 7/16 association in galagos, which was not reported previously. The presence of the HSA 7/16 association in the slow loris and galago suggests that the 7/16 association is an ancestral synteny for primates. Based on our results and the published homology maps between humans and other primate species, we propose an ancestral karyotype (2n = 60) for lorisiform primates.

  20. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.)

    PubMed Central

    Lashermes, Philippe; Hueber, Yann; Combes, Marie-Christine; Severac, Dany; Dereeper, Alexis

    2016-01-01

    Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing. PMID:27440920

  1. Positive-selection and ligation-independent cloning vectors for large scale in planta expression for plant functional genomics.

    PubMed

    Oh, Sang-Keun; Kim, Saet-Byul; Yeom, Seon-In; Lee, Hyun-Ah; Choi, Doil

    2010-12-01

    Transient expression is an easy, rapid and powerful technique for producing proteins of interest in plants. Recombinational cloning is highly efficient but has disadvantages, including complicated, time consuming cloning procedures and expensive enzymes for large-scale gene cloning. To overcome these limitations, we developed new ligation-independent cloning (LIC) vectors derived from binary vectors including tobacco mosaic virus (pJL-TRBO), potato virus X (pGR106) and the pBI121 vector-based pMBP1. LIC vectors were modified to enable directional cloning of PCR products without restriction enzyme digestion or ligation reactions. In addition, the ccdB gene, which encodes a potent cell-killing protein, was introduced between the two LIC adapter sites in the pJL-LIC, pGR-LIC, and pMBP-LIC vectors for the efficient selection of recombinant clones. This new vector does not require restriction enzymes, alkaline phosphatase, or DNA ligase for cloning. To clone, the three LIC vectors are digested with SnaBI and treated with T4 DNA polymerase, which includes 3' to 5' exonuclease activity in the presence of only one dNTP (dGTP for the inserts and dCTP for the vector). To make recombinants, the vector plasmid and the insert PCR fragment were annealed at room temperature for 20 min prior to transformation into the host. Bacterial transformation was accomplished with 100% efficiency. To validate the new LIC vector systems, we were used to coexpressed the Phytophthora AVR and potato resistance (R) genes in N. benthamiana by infiltration of Agrobacterium. Coexpressed AVR and R genes in N. benthamiana induced the typical hypersensitive cell death resulting from in vivo interaction of the two proteins. These LIC vectors could be efficiently used for high-throughput cloning and laboratory-scale in planta expression. These vectors could provide a powerful tool for high-throughput transient expression assays for functional genomic studies in plants. PMID:21340673

  2. Complete Genome Sequence of a Porcine Epidemic Diarrhea Virus Strain from Vietnam, HUA-14PED96, with a Large Genomic Deletion

    PubMed Central

    Choe, Se-Eun; Park, Kee-Hwan; Lim, Seong-In; Hien, Nguyen Ba; Thach, Pham Ngoc; Phuong, Le Huynh Thanh; An, Byung-Hyun; Han, Song Hee; Cho, In-Soo

    2016-01-01

    A highly virulent strain of Porcine epidemic diarrhea virus (PEDV) causing severe diarrhea has recently emerged in Vietnam. Genomic sequences from a novel strain, HUA-14PED96, isolated from a Vietnamese piglet with serious diarrhea show relatively high identity with U.S.-like PEDV strains, and have a 72-nt deletion in the open reading frame 1a (ORF1a) gene. PMID:26893409

  3. Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum[OPEN

    PubMed Central

    Hardigan, Michael A.; Crisovan, Emily; Hamilton, John P.; Laimbeer, Parker; Leisner, Courtney P.; Manrique-Carpintero, Norma C.; Newton, Linsey; Pham, Gina M.; Vaillancourt, Brieanne; Zeng, Zixian; Jiang, Jiming

    2016-01-01

    Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996

  4. Complete mitochondrial genome of the Antarctic midge Parochlus steinenii (Diptera: Chironomidae).

    PubMed

    Kim, Sanghee; Kim, Hanna; Shin, Seung Chul

    2016-09-01

    Parochlus steinenii is a winged midge found in the Antarctic Peninsula and its offshore islands. We determined the complete mitochondrial genome sequence of P. steinenii, which is comprised of 16 803 nucleotides and contains 13 protein-coding genes (PCGs), 22 tRNA genes, and the large (rrnL) and small (rrnS) rRNA genes. Its total A + T content is 72.5%. The PCG arrangement of P. steinenii is identical to that of the ancestral Diptera ground pattern. This is the first report on the mitogenome sequence of an Antarctic midge, and provides insights into the evolution of dipterans in Antarctica. PMID:26642812

  5. Complete mitochondrial genome of the Antarctic midge Parochlus steinenii (Diptera: Chironomidae).

    PubMed

    Kim, Sanghee; Kim, Hanna; Shin, Seung Chul

    2016-09-01

    Parochlus steinenii is a winged midge found in the Antarctic Peninsula and its offshore islands. We determined the complete mitochondrial genome sequence of P. steinenii, which is comprised of 16 803 nucleotides and contains 13 protein-coding genes (PCGs), 22 tRNA genes, and the large (rrnL) and small (rrnS) rRNA genes. Its total A + T content is 72.5%. The PCG arrangement of P. steinenii is identical to that of the ancestral Diptera ground pattern. This is the first report on the mitogenome sequence of an Antarctic midge, and provides insights into the evolution of dipterans in Antarctica.

  6. Large Scale Full-Length cDNA Sequencing Reveals a Unique Genomic Landscape in a Lepidopteran Model Insect, Bombyx mori

    PubMed Central

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P.; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R.; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-01-01

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes. PMID:23821615

  7. Large scale full-length cDNA sequencing reveals a unique genomic landscape in a lepidopteran model insect, Bombyx mori.

    PubMed

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-09-04

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes.

  8. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    PubMed Central

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  9. Genome-wide SNP scan in a porcine Large White×Minzhu intercross population reveals a locus influencing muscle mass on chromosome 2.

    PubMed

    Liu, Xin; Wang, Li Gang; Luo, Wei Zhen; Li, Yong; Liang, Jing; Yan, Hua; Zhao, Ke Bin; Wang, Li Xian; Zhang, Long Chao

    2014-12-01

    A high-density single nucleotide polymorphism (SNP) array containing 62 163 markers was employed for a genome-wide association study (GWAS) to identify variants associated with lean meat in ham (LMH, %) and lean meat percentage (LMP, %) within a porcine Large White×Minzhu intercross population. For each individual, LMH and LMP were measured after slaughter at the age of 240±7 days. A total of 557 F2 animals were genotyped. The GWAS revealed that 21 SNPs showed significant genome-wide or chromosome-wide associations with LMH and LMP by the Genome-wide Rapid Association using Mixed Model and Regression-Genomic Control approach. Nineteen significant genome-wide SNPs were mapped to the distal end of Sus Scrofa Chromosome (SSC) 2, where a major known gene responsible for muscle mass, IGF2 is located. A conditioned analysis, in which the genotype of the strongest associated SNP is included as a fixed effect in the model, showed that those significant SNPs on SSC2 were derived from a single quantitative trait locus. The two chromosome-wide association SNPs on SSC1 disappeared after conditioned analysis suggested the association signal is a false association derived from using a F2 population. The present result is expected to lead to novel insights into muscle mass in different pig breeds and lays a preliminary foundation for follow-up studies for identification of causal mutations for subsequent application in marker-assisted selection programs for improving muscle mass in pigs.

  10. Genome physical mapping from large-insert clones by fingerprint analysis with capillary electrophoresis: a robust physical map of Penicillium chrysogenum.

    PubMed

    Xu, Zhanyou; van den Berg, Marco A; Scheuring, Chantel; Covaleda, Lina; Lu, Hong; Santos, Felipe A; Uhm, Taesik; Lee, Mi-Kyung; Wu, Chengcang; Liu, Steve; Zhang, Hong-Bin

    2005-01-01

    Physical mapping with large-insert clones is becoming an active area of genomics research, and capillary electrophoresis (CE) promises to revolutionize the physical mapping technology. Here, we demonstrate the utility of the CE technology for genome physical mapping with large-insert clones by constructing a robust, binary bacterial artificial chromosome (BIBAC)-based physical map of Penicillium chrysogenum. We fingerprinted 23.1x coverage BIBAC clones with five restriction enzymes and the SNaPshot kit containing four fluorescent-ddNTPs using the CE technology, and explored various strategies to construct quality physical maps. It was shown that the fingerprints labeled with one or two colors, resulting in 40-70 bands per clone, were assembled into much better quality maps than those labeled with three or four colors. The selection of fingerprinting enzymes was crucial to quality map construction. From the dataset labeled with ddTTP-dROX, we assembled a physical map for P.chrysogenum, with 2-3 contigs per chromosome and anchored the map to its chromosomes. This map represents the first physical map constructed using the CE technology, thus providing not only a platform for genomic studies of the penicillin-producing species, but also strategies for efficient use of the CE technology for genome physical mapping of plants, animals and microbes. PMID:15767275

  11. Life-Cycle and Genome of OtV5, a Large DNA Virus of the Pelagic Marine Unicellular Green Alga Ostreococcus tauri

    PubMed Central

    Derelle, Evelyne; Ferraz, Conchita; Escande, Marie-Line; Eychenié, Sophie; Cooke, Richard; Piganeau, Gwenaël; Desdevises, Yves; Bellec, Laure; Moreau, Hervé; Grimsley, Nigel

    2008-01-01

    Large DNA viruses are ubiquitous, infecting diverse organisms ranging from algae to man, and have probably evolved from an ancient common ancestor. In aquatic environments, such algal viruses control blooms and shape the evolution of biodiversity in phytoplankton, but little is known about their biological functions. We show that Ostreococcus tauri, the smallest known marine photosynthetic eukaryote, whose genome is completely characterized, is a host for large DNA viruses, and present an analysis of the life-cycle and 186,234 bp long linear genome of OtV5. OtV5 is a lytic phycodnavirus which unexpectedly does not degrade its host chromosomes before the host cell bursts. Analysis of its complete genome sequence confirmed that it lacks expected site-specific endonucleases, and revealed the presence of 16 genes whose predicted functions are novel to this group of viruses. OtV5 carries at least one predicted gene whose protein closely resembles its host counterpart and several other host-like sequences, suggesting that horizontal gene transfers between host and viral genomes may occur frequently on an evolutionary scale. Fifty seven percent of the 268 predicted proteins present no similarities with any known protein in Genbank, underlining the wealth of undiscovered biological diversity present in oceanic viruses, which are estimated to harbour 200Mt of carbon. PMID:18509524

  12. Did warfare among ancestral hunter-gatherers affect the evolution of human social behaviors?

    PubMed

    Bowles, Samuel

    2009-06-01

    Since Darwin, intergroup hostilities have figured prominently in explanations of the evolution of human social behavior. Yet whether ancestral humans were largely "peaceful" or "warlike" remains controversial. I ask a more precise question: If more cooperative groups were more likely to prevail in conflicts with other groups, was the level of intergroup violence sufficient to influence the evolution of human social behavior? Using a model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations, I find that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist.

  13. Did warfare among ancestral hunter-gatherers affect the evolution of human social behaviors?

    PubMed

    Bowles, Samuel

    2009-06-01

    Since Darwin, intergroup hostilities have figured prominently in explanations of the evolution of human social behavior. Yet whether ancestral humans were largely "peaceful" or "warlike" remains controversial. I ask a more precise question: If more cooperative groups were more likely to prevail in conflicts with other groups, was the level of intergroup violence sufficient to influence the evolution of human social behavior? Using a model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations, I find that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist. PMID:19498163

  14. High-resolution genomic profiling reveals clonal evolution and competition in gastrointestinal marginal zone B-cell lymphoma and its large cell variant.

    PubMed

    Flossbach, Lucia; Holzmann, Karlheinz; Mattfeldt, Torsten; Buck, Michaela; Lanz, Karin; Held, Michael; Möller, Peter; Barth, Thomas F E

    2013-02-01

    We studied marginal zone B-cell lymphomas of the gastrointestinal tract including seven small cell lymphomas, eight large cell areas of composite lymphomas and 13 large cell variants using SNP array profiling. We found an increase of genomic complexity with lymphoma progression from small to large cytology, and identified gains of prominent (proto) oncogenes such as REL, BCL11A, ETS1, PTPN1, PTEN and KRAS which were found exclusively in the large cell variants. Copy numbers of ADAM3A, SCAPER and SIRPB1 were varying between the three different modes of presentation, hence suggestive for aberrations associated with progression from small to large cell lymphoma. The number of aberrations was slightly higher in the large cell part of composite lymphomas than in large cell lymphomas, suggesting that clonal selection takes place and that composite lymphomas are in a transition state. To further investigate this, we comparatively analyzed samples of two morphologically different regions of the same small cell tumor with a BIRC3-MALT1 translocation, as well as material acquired at two different time points from one composite lymphoma. We found genomic heterogeneity in both cases, supporting the theory of competing subclones in the evolution and progression of extranodal marginal zone B-cell lymphoma.

  15. Once a Batesian mimic, not always a Batesian mimic: mimic reverts back to ancestral phenotype when the model is absent.

    PubMed

    Prudic, Kathleen L; Oliver, Jeffrey C

    2008-05-22

    Batesian mimics gain protection from predation through the evolution of physical similarities to a model species that possesses anti-predator defences. This protection should not be effective in the absence of the model since the predator does not identify the mimic as potentially dangerous and both the model and the mimic are highly conspicuous. Thus, Batesian mimics should probably encounter strong predation pressure outside the geographical range of the model species. There are several documented examples of Batesian mimics occurring in locations without their models, but the evolutionary responses remain largely unidentified. A mimetic species has four alternative evolutionary responses to the loss of model presence. If predation is weak, it could maintain its mimetic signal. If predation is intense, it is widely presumed the mimic will go extinct. However, the mimic could also evolve a new colour pattern to mimic another model species or it could revert back to its ancestral, less conspicuous phenotype. We used molecular phylogenetic approaches to reconstruct and test the evolution of mimicry in the North American admiral butterflies (Limenitis: Nymphalidae). We confirmed that the more cryptic white-banded form is the ancestral phenotype of North American admiral butterflies. However, one species, Limenitis arthemis, evolved the black pipevine swallowtail mimetic form but later reverted to the white-banded more cryptic ancestral form. This character reversion is strongly correlated with the geographical absence of the model species and its host plant, but not the host plant distribution of L. arthemis. Our results support the prediction that a Batesian mimic does not persist in locations without its model, but it does not go extinct either. The mimic can revert back to its ancestral, less conspicuous form and persist.

  16. Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes.

    PubMed

    Fu, Yan; Emrich, Scott J; Guo, Ling; Wen, Tsui-Jung; Ashlock, Daniel A; Aluru, Srinivas; Schnable, Patrick S

    2005-08-23

    Recent sequencing efforts have targeted the gene-rich regions of the maize (Zea mays L.) genome. We report the release of an improved assembly of maize assembled genomic islands (MAGIs). The 114,173 resulting contigs have been subjected to computational and physical quality assessments. Comparisons to the sequences of maize bacterial artificial chromosomes suggest that at least 97% (160 of 165) of MAGIs are correctly assembled. Because the rates at which junction-testing PCR primers for genomic survey sequences (90-92%) amplify genomic DNA are not significantly different from those of control primers ( approximately 91%), we conclude that a very high percentage of genic MAGIs accurately reflect the structure of the maize genome. EST alignments, ab initio gene prediction, and sequence similarity searches of the MAGIs are available at the Iowa State University MAGI web site. This assembly contains 46,688 ab initio predicted genes. The expression of almost half (628 of 1,369) of a sample of the predicted genes that lack expression evidence was validated by RT-PCR. Our analyses suggest that the maize genome contains between approximately 33,000 and approximately 54,000 expressed genes. Approximately 5% (32 of 628) of the maize transcripts discovered do not have detectable paralogs among maize ESTs or detectable homologs from other species in the GenBank NR nucleotide/protein database. Analyses therefore suggest that this assembly of the maize genome contains approximately 350 previously uncharacterized expressed genes. We hypothesize that these "orphans" evolved quickly during maize evolution and/or domestication.

  17. The evolution of MICOS: Ancestral and derived functions and interactions

    PubMed Central

    Muñoz-Gómez, Sergio A; Slamovits, Claudio H; Dacks, Joel B; Wideman, Jeremy G

    2015-01-01

    The MItochondrial Contact Site and Cristae Organizing System (MICOS) is required for the biogenesis and maintenance of mitochondrial cristae as well as the proper tethering of the mitochondrial inner and outer membranes. We recently demonstrated that the core components of MICOS, Mic10 and Mic60, are near-ubiquitous eukaryotic features inferred to have been present in the last eukaryote common ancestor. We also showed that Mic60 could be traced to α-proteobacteria, which suggests that mitochondrial cristae evolved from α-proteobacterial intracytoplasmic membranes. Here, we extend our evolutionary analysis to MICOS-interacting proteins (e.g., Sam50, Mia40, DNAJC11, DISC-1, QIL1, Aim24, and Cox17) and discuss the implications for both derived and ancestral functions of MICOS. PMID:27065250

  18. Experimental evidence for the thermophilicity of ancestral life

    PubMed Central

    Akanuma, Satoshi; Nakajima, Yoshiki; Yokobori, Shin-ichi; Kimura, Mitsuo; Nemoto, Naoki; Mase, Tomoko; Miyazono, Ken-ichi; Tanokura, Masaru; Yamagishi, Akihiko

    2013-01-01

    Theoretical studies have focused on the environmental temperature of the universal common ancestor of life with conflicting conclusions. Here we provide experimental support for the existence of a thermophilic universal common ancestor. We present the thermal stabilities and catalytic efficiencies of nucleoside diphosphate kinases (NDK), designed using the information contained in predictive phylogenetic trees, that seem to represent the last common ancestors of Archaea and of Bacteria. These enzymes display extreme thermal stabilities, suggesting thermophilic ancestries for Archaea and Bacteria. The results are robust to the uncertainties associated with the sequence predictions and to the tree topologies used to infer the ancestral sequences. Moreover, mutagenesis experiments suggest that the universal ancestor also possessed a very thermostable NDK. Because, as we show, the stability of an NDK is directly related to the environmental temperature of its host organism, our results indicate that the last common ancestor of extant life was a thermophile that flourished at a very high temperature. PMID:23776221

  19. Catastrophic debris avalanche from ancestral Mount Shasta volcano, California

    NASA Astrophysics Data System (ADS)

    Crandell, D. R.; Miller, C. D.; Glicken, H. X.; Christiansen, R. L.; Newhall, C. G.

    1984-03-01

    A debris-avalanche deposit extends 43 km northwestward from the base of Mount Shasta across the floor of Shasta Valley, California, where it covers an area of at least 450 km2. The surface of the deposit is dotted with hundreds of mounds, hills, and ridges, all formed of blocks of pyroxene andesite and unconsolidated volcaniclastic deposits derived from an ancestral Mount Shasta. Individual hills are separated by flat-topped laharlike deposits that also form the matrix of the debris avalanche and slope northwestward about 5 m/km. Radiometric ages of rocks in the deposit and of a postavalanche basalt flow indicate that the avalanche occurred between about 300,000 and 360,000 yr ago. An inferred average thickness of the deposit, plus a computed volume of about 4 km3 for the hills and ridges, indicate an estimated volume of about 26 km3, making it the largest known Quaternary landslide on Earth.

  20. Ancestral dichlorodiphenyltrichloroethane (DDT) exposure promotes epigenetic transgenerational inheritance of obesity

    PubMed Central

    2013-01-01

    Background Ancestral environmental exposures to a variety of environmental factors and toxicants have been shown to promote the epigenetic transgenerational inheritance of adult onset disease. The present work examined the potential transgenerational actions of the insecticide dichlorodiphenyltrichloroethane (DDT) on obesity and associated disease. Methods Outbred gestating female rats were transiently exposed to a vehicle control or DDT and the F1 generation offspring bred to generate the F2 generation and F2 generation bred to generate the F3 generation. The F1 and F3 generation control and DDT lineage rats were aged and various pathologies investigated. The F3 generation male sperm were collected to investigate methylation between the control and DDT lineage male sperm. Results The F1 generation offspring (directly exposed as a fetus) derived from the F0 generation exposed gestating female rats were not found to develop obesity. The F1 generation DDT lineage animals did develop kidney disease, prostate disease, ovary disease and tumor development as adults. Interestingly, the F3 generation (great grand-offspring) had over 50% of males and females develop obesity. Several transgenerational diseases previously shown to be associated with metabolic syndrome and obesity were observed in the testis, ovary and kidney. The transgenerational transmission of disease was through both female (egg) and male (sperm) germlines. F3 generation sperm epimutations, differential DNA methylation regions (DMR), induced by DDT were identified. A number of the genes associated with the DMR have previously been shown to be associated with obesity. Conclusions Observations indicate ancestral exposure to DDT can promote obesity and associated disease transgenerationally. The etiology of disease such as obesity may be in part due to environmentally induced epigenetic transgenerational inheritance. PMID:24228800

  1. Large differences in the genome organization of different plant Trypanosomatid parasites (Phytomonas spp.) reveal wide evolutionary divergences between taxa.

    PubMed

    Marín, C; Dollet, M; Pagès, M; Bastien, P

    2009-03-01

    All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera.

  2. Large differences in the genome organization of different plant Trypanosomatid parasites (Phytomonas spp.) reveal wide evolutionary divergences between taxa.

    PubMed

    Marín, C; Dollet, M; Pagès, M; Bastien, P

    2009-03-01

    All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera. PMID:19111630

  3. Genome sequence reveals that Pseudomonas fluorescens F113 possesses a large and diverse array of systems for rhizosphere function and host interaction

    PubMed Central

    2013-01-01

    Background Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) isolated from the sugar-beet rhizosphere. This bacterium has been extensively studied as a model strain for genetic regulation of secondary metabolite production in P. fluorescens, as a candidate biocontrol agent against phytopathogens, and as a heterologous host for expression of genes with biotechnological application. The F113 genome sequence and annotation has been recently reported. Results Comparative analysis of 50 genome sequences of strains belonging to the P. fluorescens group has revealed the existence of five distinct subgroups. F113 belongs to subgroup I, which is mostly composed of strains classified as P. brassicacearum. The core genome of these five strains is highly conserved and represents approximately 76% of the protein-coding genes in any given genome. Despite this strong conservation, F113 also contains a large number of unique protein-coding genes that encode traits potentially involved in the rhizocompetence of this strain. These features include protein coding genes required for denitrification, diterpenoids catabolism, motility and chemotaxis, protein secretion and production of antimicrobial compounds and insect toxins. Conclusions The genome of P. fluorescens F113 is composed of numerous protein-coding genes, not usually found together in previously sequenced genomes, which are potentially decisive during the colonisation of the rhizosphere and/or interaction with other soil organisms. This includes genes encoding proteins involved in the production of a second flagellar apparatus, the use of abietic acid as a growth substrate, the complete denitrification pathway, the possible production of a macrolide antibiotic and the assembly of multiple protein secretion systems. PMID:23350846

  4. Genome-wide DNA methylation analysis of neuroblastic tumors reveals clinically relevant epigenetic events and large-scale epigenomic alterations localized to telomeric regions.

    PubMed

    Buckley, Patrick G; Das, Sudipto; Bryan, Kenneth; Watters, Karen M; Alcock, Leah; Koster, Jan; Versteeg, Rogier; Stallings, Raymond L

    2011-05-15

    The downregulation of specific genes through DNA hypermethylation is a major hallmark of cancer, although the extent and genomic distribution of hypermethylation occurring within cancer genomes is poorly understood. We report on the first genome-wide analysis of DNA methylation alterations in different neuroblastic tumor subtypes and cell lines, revealing higher order organization and clinically relevant alterations of the epigenome. The methylation status of 33,485 discrete loci representing all annotated CpG islands and RefSeq gene promoters was assessed in primary neuroblastic tumors and cell lines. A comparison of genes that were hypermethylated exclusively in the clinically favorable ganglioneuroma/ganglioneuroblastoma tumors revealed that nine genes were associated with poor clinical outcome when overexpressed in the unfavorable neuroblastoma (NB) tumors. Moreover, an integrated DNA methylation and copy number analysis identified 80 genes that were recurrently concomitantly deleted and hypermethylated in NB, with 37 reactivated by 5-aza-deoxycytidine. Lower expression of four of these genes was correlated with poor clinical outcome, further implicating their inactivation in aggressive disease pathogenesis. Analysis of genome-wide hypermethylation patterns revealed 70 recurrent large-scale blocks of contiguously hypermethylated promoters/CpG islands, up to 590 kb in length, with a distribution bias toward telomeric regions. Genome-wide hypermethylation events in neuroblastic tumors are extensive and frequently occur in large-scale blocks with a significant bias toward telomeric regions, indicating that some methylation alterations have occurred in a coordinated manner. Our results indicate that methylation contributes toward the clinicopathological features of neuroblastic tumors, revealing numerous genes associated with poor patient survival in NB.

  5. Clinical and biological implications of ancestral and non-ancestral IDH1 and IDH2 mutations in myeloid neoplasms.

    PubMed

    Molenaar, R J; Thota, S; Nagata, Y; Patel, B; Clemente, M; Przychodzen, B; Hirsh, C; Viny, A D; Hosano, N; Bleeker, F E; Meggendorfer, M; Alpermann, T; Shiraishi, Y; Chiba, K; Tanaka, H; van Noorden, C J F; Radivoyevitch, T; Carraway, H E; Makishima, H; Miyano, S; Sekeres, M A; Ogawa, S; Haferlach, T; Maciejewski, J P

    2015-11-01

    Mutations in isocitrate dehydrogenase 1/2 (IDH1/2(MT)) are drivers of a variety of myeloid neoplasms. As they yield the same oncometabolite, D-2-hydroxyglutarate, they are often treated as equivalent, and pooled. We studied the validity of this approach and found IDH1/2 mutations in 179 of 2119 myeloid neoplasms (8%). Cross-sectionally, the frequencies of these mutations increased from lower- to higher risk disease, thus suggesting a role in clinical progression. Variant allelic frequencies indicated that IDH1(MT) and IDH2(MT) are ancestral in up to 14/74 (19%) vs 34/99 (34%; P=0.027) of cases, respectively, illustrating the pathogenic role of these lesions in myeloid neoplasms. IDH1/2(MT) was associated with poor overall survival, particularly in lower risk myelodysplastic syndromes. Ancestral IDH1(MT) cases were associated with a worse prognosis than subclonal IDH1(MT) cases, whereas the position of IDH2(MT) within clonal hierarchy did not impact survival. This may relate to distinct mutational spectra with more DNMT3A and NPM1 mutations associated with IDH1(MT) cases, and more ASXL1, SRSF2, RUNX1, STAG2 mutations associated with IDH2(MT) cases. Our data demonstrate important clinical and biological differences between IDH1(MT) and IDH2(MT) myeloid neoplasms. These mutations should be considered separately as their differences could have implications for diagnosis, prognosis and treatment with IDH1/2(MT) inhibitors of IDH1/2(MT) patients. PMID:25836588

  6. A unifying model of genome evolution under parsimony

    PubMed Central

    2014-01-01

    Background Parsimony and maximum likelihood methods of phylogenetic tree estimation and parsimony methods for genome rearrangements are central to the study of genome evolution yet to date they have largely been pursued in isolation. Results We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph G, a finite set of AVGs describe all parsimonious interpretations of G, and this set can be explored with a few sampling moves. Conclusion This theoretical study describes a model in which the inference of genome rearrangements and phylogeny can be unified under parsimony. PMID:24946830

  7. Phylogenetic analysis of kindlins suggests subfunctionalization of an ancestral unduplicated kindlin into three paralogs in vertebrates.

    PubMed

    Khan, Ammad Aslam; Janke, Axel; Shimokawa, Takashi; Zhang, Hongquan

    2011-01-01

    Kindlin proteins represent a newly discovered family of evolutionarily conserved FERM domain-containing proteins. This family includes three highly conserved proteins: Kindlin-1, Kindlin-2 and Kindlin-3. All three Kindlin proteins are associated with focal adhesions and are involved in integrin activation. The FERM domain of each Kindlin is bipartite and plays a key role in integrin activation. We herein explore for the first time the evolutionary history of these proteins. The phylogeny of the Kindlins suggests a single ancestral Kindlin protein present in even the earliest metazoan ie, hydra. This protein then underwent duplication events in insects and also experienced genome duplication in vertebrates, leading to the Kindlin family. A comparative study of the Kindlin paralogs showed that Kindlin-2 is the slowest evolving protein among the three family members. The analysis of synonymous and non-synonymous substitutions in orthologous Kindlin sequences in different species showed that all three Kindlins have been evolving under the influence of purifying selection. The expression pattern of Kindlins along with phylogenetic studies supports the subfunctionalization model of gene duplication.

  8. Small but Powerful, the Primary Endosymbiont of Moss Bugs, Candidatus Evansia muelleri, Holds a Reduced Genome with Large Biosynthetic Capabilities

    PubMed Central

    Santos-Garcia, Diego; Latorre, Amparo; Moya, Andrés; Gibbs, George; Hartung, Viktor; Dettner, Konrad; Kuechler, Stefan Martin; Silva, Francisco J.

    2014-01-01

    Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed. PMID:25115011

  9. An atypical human induced pluripotent stem cell line with a complex, stable, and balanced genomic rearrangement including a large de novo 1q uniparental disomy.

    PubMed

    Steichen, Clara; Maluenda, Jérôme; Tosca, Lucie; Luce, Eléanor; Pineau, Dominique; Dianat, Noushin; Hannoun, Zara; Tachdjian, Gérard; Melki, Judith; Dubart-Kupperschmitt, Anne

    2015-03-01

    Human induced pluripotent stem cells (hiPSCs) hold great promise for cell therapy through their use as vital tools for regenerative and personalized medicine. However, the genomic integrity of hiPSCs still raises some concern and is one of the barriers limiting their use in clinical applications. Numerous articles have reported the occurrence of aneuploidies, copy number variations, or single point mutations in hiPSCs, and nonintegrative reprogramming strategies have been developed to minimize the impact of the reprogramming process on the hiPSC genome. Here, we report the characterization of an hiPSC line generated by daily transfections of modified messenger RNAs, displaying several genomic abnormalities. Karyotype analysis showed a complex genomic rearrangement, which remained stable during long-term culture. Fluorescent in situ hybridization analyses were performed on the hiPSC line showing that this karyotype is balanced. Interestingly, single-nucleotide polymorphism analysis revealed the presence of a large 1q region of uniparental disomy (UPD), demonstrating for the first time that UPD can occur in a noncompensatory context during nonintegrative reprogramming of normal fibroblasts.

  10. An Atypical Human Induced Pluripotent Stem Cell Line With a Complex, Stable, and Balanced Genomic Rearrangement Including a Large De Novo 1q Uniparental Disomy

    PubMed Central

    Steichen, Clara; Maluenda, Jérôme; Tosca, Lucie; Luce, Eléanor; Pineau, Dominique; Dianat, Noushin; Hannoun, Zara; Tachdjian, Gérard; Melki, Judith

    2015-01-01

    Human induced pluripotent stem cells (hiPSCs) hold great promise for cell therapy through their use as vital tools for regenerative and personalized medicine. However, the genomic integrity of hiPSCs still raises some concern and is one of the barriers limiting their use in clinical applications. Numerous articles have reported the occurrence of aneuploidies, copy number variations, or single point mutations in hiPSCs, and nonintegrative reprogramming strategies have been developed to minimize the impact of the reprogramming process on the hiPSC genome. Here, we report the characterization of an hiPSC line generated by daily transfections of modified messenger RNAs, displaying several genomic abnormalities. Karyotype analysis showed a complex genomic rearrangement, which remained stable during long-term culture. Fluorescent in situ hybridization analyses were performed on the hiPSC line showing that this karyotype is balanced. Interestingly, single-nucleotide polymorphism analysis revealed the presence of a large 1q region of uniparental disomy (UPD), demonstrating for the first time that UPD can occur in a noncompensatory context during nonintegrative reprogramming of normal fibroblasts. PMID:25650439

  11. Large-scale genetic variation of the symbiosis-required megaplasmid pSymA revealed by comparative genomic analysis of Sinorhizobium meliloti natural strains