Science.gov

Sample records for large ancestral genomes

  1. Reconstruction of ancestral gene orders using intermediate genomes

    PubMed Central

    2015-01-01

    Background The problem of reconstructing ancestral genomes in a given phylogenetic tree arises in many different comparative genomics fields. Here, we focus on reconstructing the gene order of ancestral genomes, a problem that has been largely studied in the past 20 years, especially with the increasing availability of whole genome DNA sequences. There are two main approaches to this problem: event-based methods, that try to find the ancestral genomes that minimize the number of rearrangement events in the tree; and homology-based, that look for conserved structures, such as adjacent genes in the extant genomes, to build the ancestral genomes. Results We propose algorithms that use the concept of intermediate genomes, arising in optimal pairwise rearrangement scenarios. We show that intermediate genomes have combinatorial properties that make them easy to reconstruct, and develop fast algorithms with better reconstructed ancestral genomes than current event-based methods. The proposed framework is also designed to accept extra information, such as results from homology-based approaches, giving rise to combined algorithms with better results than the original methods. PMID:26451811

  2. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods

    NASA Astrophysics Data System (ADS)

    Tannier, Eric

    In 2006, a debate has risen on the question of the efficiency of bioinformatics methods to reconstruct mammalian ancestral genomes. Three years later, Gordon et al. (PLoS Genetics, 5(5), 2009) chose not to use automatic methods to build up the genome of a 100 million year old Saccharomyces cerevisiae ancestor. Their manually constructed ancestor provides a reference genome to test whether automatic methods are indeed unable to approach confident reconstructions. Adapting several methodological frameworks to the same yeast gene order data, I discuss the possibilities, differences and similarities of the available algorithms for ancestral genome reconstructions. The methods can be classified into two types: local and global. Studying the properties of both helps to clarify what we can expect from their usage. Both methods propose contiguous ancestral regions that come very close (> 95% identity) to the manually predicted ancestral yeast chromosomes, with a good coverage of the extant genomes.

  3. Ancestral genome inference using a genetic algorithm approach.

    PubMed

    Gao, Nan; Yang, Ning; Tang, Jijun

    2013-01-01

    Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The "double-cut-and-join" (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu. PMID:23658708

  4. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers. PMID:23653472

  5. Comparative paleogenomics of crucifers: ancestral genomic blocks revisited.

    PubMed

    Lysak, Martin A; Mandáková, Terezie; Schranz, M Eric

    2016-04-01

    A decade ago the concept of the Ancestral Crucifer Karyotype (ACK) and the definition of 24 conserved genomic blocks was presented. Subsequently, 35 cytogenetic reconstructions and/or draft genome sequences of crucifer species (members of the Brassicaceae family) have been analyzed in the context of this system; placing crucifers at the forefront of plant phylogenomics. In this review, we highlight how the ACK and genomic blocks have facilitated and guided genomic analysis of crucifers in the last 10 years and provide an update of this robust model. PMID:26945766

  6. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We

  7. Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates.

    PubMed

    Weng, Mao-Lun; Blazier, John C; Govindu, Madhumita; Jansen, Robert K

    2014-03-01

    Geraniaceae plastid genomes are highly rearranged, and each of the four genera already sequenced in the family has a distinct genome organization. This study reports plastid genome sequences of six additional species, Francoa sonchifolia, Melianthus villosus, and Viviania marifolia from Geraniales, and Pelargonium alternans, California macrophylla, and Hypseocharis bilobata from Geraniaceae. These genome sequences, combined with previously published species, provide sufficient taxon sampling to reconstruct the ancestral plastid genome organization of Geraniaceae and the rearrangements unique to each genus. The ancestral plastid genome of Geraniaceae has a 4 kb inversion and a reduced, Pelargonium-like small single copy region. Our ancestral genome reconstruction suggests that a few minor rearrangements occurred in the stem branch of Geraniaceae followed by independent rearrangements in each genus. The genomic comparison demonstrates that a series of inverted repeat boundary shifts and inversions played a major role in shaping genome organization in the family. The distribution of repeats is strongly associated with breakpoints in the rearranged genomes, and the proportion and the number of large repeats (>20 bp and >60 bp) are significantly correlated with the degree of genome rearrangements. Increases in the degree of plastid genome rearrangements are correlated with the acceleration in nonsynonymous substitution rates (dN) but not with synonymous substitution rates (dS). Possible mechanisms that might contribute to this correlation, including DNA repair system and selection, are discussed. PMID:24336877

  8. Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples

    NASA Astrophysics Data System (ADS)

    Tuller, Tamir; Birin, Hadas; Kupiec, Martin; Ruppin, Eytan

    The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content.

  9. Reflections on ancestral haplotypes: medical genomics, evolution, and human individuality.

    PubMed

    Steele, Edward J

    2014-01-01

    The major histocompatibility complex (MHC), once labelled the "sphinx of immunology" by Jan Klein, provides powerful challenges to evolutionary thinking. This essay highlights the main discoveries that established the block ancestral haplotype structure of the MHC and the wider genome, focusing on the work by the Perth (Australia) group, led by Roger Dawkins, and the Boston group, led by Chester Alper and Edmond Yunis. Their achievements have been overlooked in the rush to sequence the first and subsequent drafts of the human genome. In Caucasoids, where most of the detailed work has been done, about 70% of all known allelic MHC diversity can be accounted for by 30 or so ancestral haplotypes (AHs), or conserved sequences of many mega-bases, and their recombinants. The block haplotype structure of the genome, as shown for the MHC (and other genetic regions), is a story that needs to be understood in its own right, particularly given the promotion of the "HapMap" project and single nucleotide polymorphism (SNP) linkage disequilibrium (LD) analysis, which has been wrongly touted as the only way to pinpoint those genes that are important in genetic disorders or other desired (qualitative) characteristics. PMID:25544323

  10. A Cooperative Co-Evolutionary Genetic Algorithm for Tree Scoring and Ancestral Genome Inference.

    PubMed

    Gao, Nan; Zhang, Yan; Feng, Bing; Tang, Jijun

    2015-01-01

    Recent advances of technology have made it easy to obtain and compare whole genomes. Rearrangements of genomes through operations such as reversals and transpositions are rare events that enable researchers to reconstruct deep evolutionary history among species. Some of the popular methods need to search a large tree space for the best scored tree, thus it is desirable to have a fast and accurate method that can score a given tree efficiently. During the tree scoring procedure, the genomic structures of internal tree nodes are also provided, which provide important information for inferring ancestral genomes and for modeling the evolutionary processes. However, computing tree scores and ancestral genomes are very difficult and a lot of researchers have to rely on heuristic methods which have various disadvantages. In this paper, we describe the first genetic algorithm for tree scoring and ancestor inference, which uses a fitness function considering co-evolution, adopts different initial seeding methods to initialize the first population pool, and utilizes a sorting-based approach to realize evolution. Our extensive experiments show that compared with other existing algorithms, this new method is more accurate and can infer ancestral genomes that are much closer to the true ancestors. PMID:26671797

  11. Consistency of genome-wide associations across major ancestral groups.

    PubMed

    Ntzani, Evangelia E; Liberopoulos, George; Manolio, Teri A; Ioannidis, John P A

    2012-07-01

    It is not well known whether genetic markers identified through genome-wide association studies (GWAS) confer similar or different risks across people of different ancestry. We screened a regularly updated catalog of all published GWAS curated at the NHGRI website for GWAS-identified associations that had reached genome-wide significance (p ≤ 5 × 10(-8)) in at least one major ancestry group (European, Asian, African) and for which replication data were available for comparison in at least two different major ancestry groups. These groups were compared for the correlation between and differences in risk allele frequencies and genetic effects' estimates. Data on 108 eligible GWAS-identified associations with a total of 900 datasets (European, n = 624; Asian, n = 217; African, n = 60) were analyzed. Risk-allele frequencies were modestly correlated between ancestry groups, with >10% absolute differences in 75-89% of the three pairwise comparisons of ancestry groups. Genetic effect (odds ratio) point estimates between ancestry groups correlated modestly (pairwise comparisons' correlation coefficients: 0.20-0.33) and point estimates of risks were opposite in direction or differed more than twofold in 57%, 79%, and 89% of the European versus Asian, European versus African, and Asian versus African comparisons, respectively. The modest correlations, differing risk estimates, and considerable between-association heterogeneity suggest that differential ancestral effects can be anticipated and genomic risk markers may need separate further evaluation in different ancestry groups. PMID:22183176

  12. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  13. Genome-wide inference of ancestral recombination graphs.

    PubMed

    Rasmussen, Matthew D; Hubisz, Melissa J; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  14. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  15. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  16. DUPCAR: Reconstructing Contiguous Ancestral Regions with Duplications

    PubMed Central

    Ratan, Aakrosh; Raney, Brian J.; Suh, Bernard B.; Zhang, Louxin; Miller, Webb; Haussler, David

    2008-01-01

    Abstract Accurately reconstructing the large-scale gene order in an ancestral genome is a critical step to better understand genome evolution. In this paper, we propose a heuristic algorithm, called DUPCAR, for reconstructing ancestral genomic orders with duplications. The method starts from the order of genes in modern genomes and predicts predecessor and successor relationships in the ancestor. Then a greedy algorithm is used to reconstruct the ancestral orders by connecting genes into contiguous regions based on predicted adjacencies. Computer simulation was used to validate the algorithm. We also applied the method to reconstruct the ancestral chromosome X of placental mammals and the ancestral genomes of the ciliate Paramecium tetraurelia. PMID:18774902

  17. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.

    PubMed

    Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A

    2016-03-01

    Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools. PMID:26885568

  18. Genomic evolution in domestic cattle: ancestral haplotypes and healthy beef.

    PubMed

    Williamson, Joseph F; Steele, Edward J; Lester, Susan; Kalai, Oscar; Millman, John A; Wolrige, Lindsay; Bayard, Dominic; McLure, Craig; Dawkins, Roger L

    2011-05-01

    We have identified numerous Ancestral Haplotypes encoding a 14-Mb region of Bota C19. Three are frequent in Simmental, Angus and Wagyu and have been conserved since common progenitor populations. Others are more relevant to the differences between these 3 breeds including fat content and distribution in muscle. SREBF1 and Growth Hormone, which have been implicated in the production of healthy beef, are included within these haplotypes. However, we conclude that alleles at these 2 loci are less important than other sequences within the haplotypes. Identification of breeds and hybrids is improved by using haplotypes rather than individual alleles. PMID:21338665

  19. Ancient hybridizations among the ancestral genomes of bread wheat.

    PubMed

    Marcussen, Thomas; Sandve, Simen R; Heier, Lise; Spannagl, Manuel; Pfeifer, Matthias; Jakobsen, Kjetill S; Wulff, Brande B H; Steuernagel, Burkhard; Mayer, Klaus F X; Olsen, Odd-Arne

    2014-07-18

    The allohexaploid bread wheat genome consists of three closely related subgenomes (A, B, and D), but a clear understanding of their phylogenetic history has been lacking. We used genome assemblies of bread wheat and five diploid relatives to analyze genome-wide samples of gene trees, as well as to estimate evolutionary relatedness and divergence times. We show that the A and B genomes diverged from a common ancestor ~7 million years ago and that these genomes gave rise to the D genome through homoploid hybrid speciation 1 to 2 million years later. Our findings imply that the present-day bread wheat genome is a product of multiple rounds of hybrid speciation (homoploid and polyploid) and lay the foundation for a new framework for understanding the wheat genome as a multilevel phylogenetic mosaic. PMID:25035499

  20. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf

    PubMed Central

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-01-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200 000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance. PMID:23672264

  1. Unexpectedly large number of conserved noncoding regions within the ancestral chordate Hox cluster.

    PubMed

    Pascual-Anaya, Juan; D'Aniello, Salvatore; Garcia-Fernàndez, Jordi

    2008-12-01

    The single amphioxus Hox cluster contains 15 genes and may well resemble the ancestral chordate Hox cluster. We have sequenced the Hox genomic complement of the European amphioxus Branchiostoma lanceolatum and compared it to the American species, Branchiostoma floridae, by phylogenetic footprinting to gain insights into the evolution of Hox gene regulation in chordates. We found that Hox intergenic regions are largely conserved between the two amphioxus species, especially in the case of genes located at the 3' of the cluster, a trend previously observed in vertebrates. We further compared the amphioxus Hox cluster with the human HoxA, HoxB, HoxC, and HoxD clusters, finding several conserved noncoding regions, both in intergenic and intronic regions. This suggests that the regulation of Hox genes is highly conserved across chordates, consistent with the similar Hox expression patterns in vertebrates and amphioxus. PMID:18791732

  2. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    PubMed Central

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  3. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

    PubMed

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2014-12-12

    To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  4. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001. PMID:11479919

  5. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  6. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  7. Exploring the diploid wheat ancestral A genome through sequence comparison at the high-molecular-weight glutenin locus region.

    PubMed

    Dong, Lingli; Huo, Naxin; Wang, Yi; Deal, Karin; Luo, Ming-Cheng; Wang, Daowen; Anderson, Olin D; Gu, Yong Qiang

    2012-12-01

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurately assembly of its genome. To explore the utility of ancestral diploid species of polyploid wheat, sequence variation of T. urartu (A(u)A(u)) was analyzed by comparing its 277-kb large genomic region carrying the important Glu-1 locus with the homologous regions from the A genomes of the diploid T. monococcum (A(m)A(m)), tetraploid T. turgidum (AABB), and hexaploid T. aestivum (AABBDD). Our results revealed that in addition to a high degree of the gene collinearity, nested retroelement structures were also considerably conserved among the A(u) genome and the A genomes in polyploid wheats, suggesting that the majority of the repetitive sequences in the A genomes of polyploid wheats originated from the diploid A(u) genome. The difference in the compared region between A(u) and A is mainly caused by four differential TE insertion and two deletion events between these genomes. The estimated divergence time of A genomes calculated on nucleotide substitution rate in both shared TEs and collinear genes further supports the closer evolutionary relationship of A to A(u) than to A(m). The structure conservation in the repetitive regions promoted us to develop repeat junction markers based on the A(u) sequence for mapping the A genome in hexaploid wheat. Eighty percent of these repeat junction markers were successfully mapped to the corresponding region in hexaploid wheat, suggesting that T. urartu could serve as a useful resource for developing molecular markers for genetic and breeding studies in hexaploid wheat. PMID:23052831

  8. Evolution of the ancestral recombination graph along the genome in case of selective sweep.

    PubMed

    Leocard, Stephanie; Pardoux, Etienne

    2010-12-01

    We consider the genome of a sample of n individuals taken at the end of a selective sweep, which is the fixation of an advantageous allele in the population. When the selective advantage is high, the genealogy at a locus under selective sweep can be approximated by a comb with n teeth. However, because of recombinations during the selective sweep, the hitchhiking effect decreases as the distance from the selected site increases, so that far from this locus, the tree can be approximated by a Kingman coalescent tree, as in the neutral case. We first give the distribution of the tree at a given locus. Then we focus on the evolution of this tree along the genome. Since this tree-valued process is not Markovian, we study the evolution of the Ancestral Recombination Graph along the genome in case of selective sweep. PMID:20077118

  9. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

    PubMed

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the "salicoid" duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  10. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral “Salicoid” Genome Duplication

    PubMed Central

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the “salicoid” duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  11. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites

    PubMed Central

    2012-01-01

    Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci. PMID:23281809

  12. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure

    PubMed Central

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P.

    2016-01-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  13. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure.

    PubMed

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P

    2016-02-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  14. Exploring the diploid wheat ancestral A genome through sequence comparison at the High-Molecular-Weight glutenin locus region

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurate assembly of its genome. To explore the utility of ancestral diploid species o...

  15. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  16. Genomic organization of the crested ibis MHC provides new insight into ancestral avian MHC structure

    PubMed Central

    Chen, Li-Cheng; Lan, Hong; Sun, Li; Deng, Yan-Li; Tang, Ke-Yi; Wan, Qiu-Hong

    2015-01-01

    The major histocompatibility complex (MHC) plays an important role in immune response. Avian MHCs are not well characterized, only reporting highly compact Galliformes MHCs and extensively fragmented zebra finch MHC. We report the first genomic structure of an endangered Pelecaniformes (crested ibis) MHC containing 54 genes in three regions spanning ~500 kb. In contrast to the loose BG (26 loci within 265 kb) and Class I (11 within 150) genomic structures, the Core Region is condensed (17 within 85). Furthermore, this Region exhibits a COL11A2 gene, followed by four tandem MHC class II αβ dyads retaining two suites of anciently duplicated “αβ” lineages. Thus, the crested ibis MHC structure is entirely different from the known avian MHC architectures but similar to that of mammalian MHCs, suggesting that the fundamental structure of ancestral avian class II MHCs should be “COL11A2-IIαβ1-IIαβ2.” The gene structures, residue characteristics, and expression levels of the five class I genes reveal inter-locus functional divergence. However, phylogenetic analysis indicates that these five genes generate a well-supported intra-species clade, showing evidence for recent duplications. Our analyses suggest dramatic structural variation among avian MHC lineages, help elucidate avian MHC evolution, and provide a foundation for future conservation studies. PMID:25608659

  17. Complete genome sequence of Macrococcus caseolyticus strain JCSCS5402, [corrected] reflecting the ancestral genome of the human-pathogenic staphylococci.

    PubMed

    Baba, Tadashi; Kuwahara-Arai, Kyoko; Uchiyama, Ikuo; Takeuchi, Fumihiko; Ito, Teruyo; Hiramatsu, Keiichi

    2009-02-01

    We isolated the methicillin-resistant Macrococcus caseolyticus strain JCSC5402 from animal meat in a supermarket and determined its whole-genome nucleotide sequence. This is the first report on the genome analysis of a macrococcal species that is evolutionarily closely related to the human pathogens Staphylococcus aureus and Bacillus anthracis. The essential biological pathways of M. caseolyticus are similar to those of staphylococci. However, the species has a small chromosome (2.1 MB) and lacks many sugar and amino acid metabolism pathways and a plethora of virulence genes that are present in S. aureus. On the other hand, M. caseolyticus possesses a series of oxidative phosphorylation machineries that are closely related to those in the family Bacillaceae. We also discovered a probable primordial form of a Macrococcus methicillin resistance gene complex, mecIRAm, on one of the eight plasmids harbored by the M. caseolyticus strain. This is the first finding of a plasmid-encoding methicillin resistance gene. Macrococcus is considered to reflect the genome of ancestral bacteria before the speciation of staphylococcal species and may be closely associated with the origin of the methicillin resistance gene complex of the notorious human pathogen methicillin-resistant S. aureus. PMID:19074389

  18. Complete Genome Sequence of Macrococcus caseolyticus Strain JSCS5402, Reflecting the Ancestral Genome of the Human-Pathogenic Staphylococci▿

    PubMed Central

    Baba, Tadashi; Kuwahara-Arai, Kyoko; Uchiyama, Ikuo; Takeuchi, Fumihiko; Ito, Teruyo; Hiramatsu, Keiichi

    2009-01-01

    We isolated the methicillin-resistant Macrococcus caseolyticus strain JCSC5402 from animal meat in a supermarket and determined its whole-genome nucleotide sequence. This is the first report on the genome analysis of a macrococcal species that is evolutionarily closely related to the human pathogens Staphylococcus aureus and Bacillus anthracis. The essential biological pathways of M. caseolyticus are similar to those of staphylococci. However, the species has a small chromosome (2.1 MB) and lacks many sugar and amino acid metabolism pathways and a plethora of virulence genes that are present in S. aureus. On the other hand, M. caseolyticus possesses a series of oxidative phosphorylation machineries that are closely related to those in the family Bacillaceae. We also discovered a probable primordial form of a Macrococcus methicillin resistance gene complex, mecIRAm, on one of the eight plasmids harbored by the M. caseolyticus strain. This is the first finding of a plasmid-encoding methicillin resistance gene. Macrococcus is considered to reflect the genome of ancestral bacteria before the speciation of staphylococcal species and may be closely associated with the origin of the methicillin resistance gene complex of the notorious human pathogen methicillin-resistant S. aureus. PMID:19074389

  19. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  20. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes. PMID:20493270

  1. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus.

    PubMed

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-07-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework. PMID:27247244

  2. Vertebrate codon bias indicates a highly GC-rich ancestral genome.

    PubMed

    Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei

    2013-04-25

    Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences. PMID:23376453

  3. A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia.

    PubMed

    Ogedengbe, Mosun E; Qvarnstrom, Yvonne; da Silva, Alexandre J; Arrowood, Michael J; Barta, John R

    2015-05-01

    The near complete mitochondrial genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mitochondrial genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5'-terminus of the genome starting with a 21 bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mitochondrial genome 5'-terminus are found in all eimeriid mitochondrial genomes available and suggest this may be the ancestral start of eimeriid mitochondrial genomes. PMID:25812835

  4. Comparative Genome-Scale Reconstruction of Gapless Metabolic Networks for Present and Ancestral Species

    PubMed Central

    Pitkänen, Esa; Jouhten, Paula; Hou, Jian; Syed, Muhammad Fahad; Blomberg, Peter; Kludas, Jana; Oja, Merja; Holm, Liisa; Penttilä, Merja; Rousu, Juho; Arvas, Mikko

    2014-01-01

    We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/. PMID:24516375

  5. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage

    PubMed Central

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H.; Hugenholtz, Philip

    2016-01-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0–1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  6. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage.

    PubMed

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H; Hugenholtz, Philip

    2016-04-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0-1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  7. Genomic structure and evolution of the ancestral chromosome fusion site in 2q13-2q14.1 and paralogous regions on other human chromosomes.

    PubMed

    Fan, Yuxin; Linardopoulou, Elena; Friedman, Cynthia; Williams, Eleanor; Trask, Barbara J

    2002-11-01

    Human chromosome 2 was formed by the head-to-head fusion of two ancestral chromosomes that remained separate in other primates. Sequences that once resided near the ends of the ancestral chromosomes are now interstitially located in 2q13-2q14.1. Portions of these sequences had duplicated to other locations prior to the fusion. Here we present analyses of the genomic structure and evolutionary history of >600 kb surrounding the fusion site and closely related sequences on other human chromosomes. Sequence blocks that closely flank the inverted arrays of degenerate telomere repeats marking the fusion site are duplicated at many, primarily subtelomeric, locations. In addition, large portions of a 168-kb centromere-proximal block are duplicated at 9pter, 9p11.2, and 9q13, with 98%-99% average sequence identity. A 67-kb block on the distal side of the fusion site is highly homologous to sequences at 22qter. A third ~100-kb segment is 96% identical to a region in 2q11.2. By integrating data on the extent and similarity of these paralogous blocks, including the presence of phylogenetically informative repetitive elements, with observations of their chromosomal distribution in nonhuman primates, we infer the order of the duplications that led to their current arrangement. Several of these duplicated blocks may be associated with breakpoints of inversions that occurred during primate evolution and of recurrent chromosome rearrangements in humans. PMID:12421751

  8. Evaluation of the TREX1 gene in a large multi-ancestral lupus cohort

    PubMed Central

    Namjou, Bahram; Kothari, Parul H.; Kelly, Jennifer A.; Glenn, Stuart B.; Ojwang, Joshua O.; Adler, Adam; Alarcón-Riquelme, Marta E.; Gallant, Caroline J.; Boackle, Susan A.; Criswell, Lindsey A.; Kimberly, Robert P.; Brown, Elizabeth; Edberg, Jeffrey; Stevens, Anne M.; Jacob, Chaim O.; Tsao, Betty P.; Gilkeson, Gary S.; Kamen, Diane L.; Merrill, Joan T.; Petri, Michelle; Goldman, Rosalind Ramsey; Vila, Luis M.; Anaya, Juan-Manuel; Niewold, Timothy B.; Martin, Javier; Pons-Estel, Bernardo A.; Sabio, Jose M.; Callejas, Jose L.; Vyse, Timothy J.; Bae, Sang-Cheol; Perrino, Fred W.; Freedman, Barry I.; Scofield, R. Hal; Moser, Kathy L.; Gaffney, Patrick M.; James, Judith A.; Langefeld, Carl D.; Kaufman, Kenneth M.; Harley, John B.; Atkinson, John P.

    2011-01-01

    Systemic Lupus Erythematosus (SLE) is a prototypic autoimmune disorder with a complex pathogenesis in which genetic, hormonal and environmental factors play a role. Rare mutations in the TREX1 gene, the major mammalian 3′-5′ exonuclease, have been reported in sporadic SLE cases. Some of these mutations have also been identified in a rare pediatric neurologic condition featuring an inflammatory encephalopathy known as Aicardi-Goutières syndrome (AGS). We sought to investigate the frequency of these mutations in a large multi-ancestral cohort of SLE cases and controls. Methods Forty single-nucleotide polymorphisms (SNPs), including both common and rare variants, across the TREX1 gene were evaluated in ∼8370 patients with SLE and ∼7490 control subjects. Stringent quality control procedures were applied and principal components and admixture proportions were calculated to identify outliers for removal from analysis. Population-based case-control association analyses were performed. P values, false discovery rate q values, and odds ratios with 95% confidence intervals were calculated. Results The estimated frequency of TREX1 mutations in our lupus cohort was 0.5%. Five heterozygous mutations were detected at the Y305C polymorphism in European lupus cases but none were observed in European controls. Five African cases incurred heterozygous mutations at the E266G polymorphism and, again, none were observed in the African controls. A rare homozygous R114H mutation was identified in one Asian SLE patient whereas all genotypes at this mutation in previous reports for SLE were heterozygous. Analysis of common TREX1 SNPs (MAF >10%) revealed a relatively common risk haplotype in European SLE patients with neurologic manifestations, especially seizures, with a frequency of 58% in lupus cases compared to 45% in normal controls (p=0.0008, OR=1.73, 95% CI=1.25-2.39). Finally, the presence or absence of specific autoantibodies in certain populations produced significant

  9. Phylogenomics of primates and their ancestral populations

    PubMed Central

    Siepel, Adam

    2009-01-01

    Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics. PMID:19801602

  10. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  11. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera

    PubMed Central

    2012-01-01

    Background Lepidoptera encompasses more than 160,000 described species that have been classified into 45–48 superfamilies. The previously determined Lepidoptera mitochondrial genomes (mitogenomes) are limited to six superfamilies of the lineage Ditrysia. Compared with the ancestral insect gene order, these mitogenomes all contain a tRNA rearrangement. To gain new insights into Lepidoptera mitogenome evolution, we sequenced the mitogenomes of two ghost moths that belong to the non-ditrysian lineage Hepialoidea and conducted a comparative mitogenomic analysis across Lepidoptera. Results The mitogenomes of Thitarodes renzhiensis and T. yunnanensis are 16,173 bp and 15,816 bp long with an A + T content of 81.28 % and 82.34 %, respectively. Both mitogenomes include 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and the A + T-rich region. Different tandem repeats in the A + T-rich region mainly account for the size difference between the two mitogenomes. All the protein-coding genes start with typical mitochondrial initiation codons, except for cox1 (CGA) and nad1 (TTG) in both mitogenomes. The anticodon of trnS(AGN) in T. renzhiensis and T. yunnanensis is UCU instead of the mostly used GCU in other sequenced Lepidoptera mitogenomes. The 1,584-bp sequence from rrnS to nad2 was also determined for an unspecified ghost moth (Thitarodes sp.), which has no repetitive sequence in the A + T-rich region. All three Thitarodes species possess the ancestral gene order with trnI-trnQ-trnM located between the A + T-rich region and nad2, which is different from the gene order trnM-trnI-trnQ in all previously sequenced Lepidoptera species. The formerly identified conserved elements of Lepidoptera mitogenomes (i.e. the motif ‘ATAGA’ and poly-T stretch in the A + T-rich region and the long intergenic spacer upstream of nad2) are absent in the Thitarodes mitogenomes. Conclusion The mitogenomes of T. renzhiensis and T

  12. Reconstruction of the ancestral marsupial karyotype from comparative gene maps

    PubMed Central

    2013-01-01

    Background The increasing number of assembled mammalian genomes makes it possible to compare genome organisation across mammalian lineages and reconstruct chromosomes of the ancestral marsupial and therian (marsupial and eutherian) mammals. However, the reconstruction of ancestral genomes requires genome assemblies to be anchored to chromosomes. The recently sequenced tammar wallaby (Macropus eugenii) genome was assembled into over 300,000 contigs. We previously devised an efficient strategy for mapping large evolutionarily conserved blocks in non-model mammals, and applied this to determine the arrangement of conserved blocks on all wallaby chromosomes, thereby permitting comparative maps to be constructed and resolve the long debated issue between a 2n = 14 and 2n = 22 ancestral marsupial karyotype. Results We identified large blocks of genes conserved between human and opossum, and mapped genes corresponding to the ends of these blocks by fluorescence in situ hybridization (FISH). A total of 242 genes was assigned to wallaby chromosomes in the present study, bringing the total number of genes mapped to 554 and making it the most densely cytogenetically mapped marsupial genome. We used these gene assignments to construct comparative maps between wallaby and opossum, which uncovered many intrachromosomal rearrangements, particularly for genes found on wallaby chromosomes X and 3. Expanding comparisons to include chicken and human permitted the putative ancestral marsupial (2n = 14) and therian mammal (2n = 19) karyotypes to be reconstructed. Conclusions Our physical mapping data for the tammar wallaby has uncovered the events shaping marsupial genomes and enabled us to predict the ancestral marsupial karyotype, supporting a 2n = 14 ancestor. Futhermore, our predicted therian ancestral karyotype has helped to understand the evolution of the ancestral eutherian genome. PMID:24261750

  13. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages. PMID:25230663

  14. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

  15. Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events.

    PubMed

    Song, Xiaowei; Tang, Yezhong; Wang, Yajun

    2016-08-22

    The vertebrate FoxP subfamily genes play important roles in the construction of essential functional modules involved in physiological and developmental processes. To explore the adaptive evolution of functional modules associated with the FoxP subfamily member genes, it is necessary to study the gene duplication process. We detected four member genes of the FoxP subfamily in sea lampreys (a representative species of jawless vertebrates) through genome screenings and phylogenetic analyses. Reliable paralogons (i.e. paralogous chromosome segments) have rarely been detected in scaffolds of FoxP subfamily member genes in sea lampreys due to the considerable existence of HTH_Tnp_Tc3_2 transposases. However, these transposases did not alter gene numbers of the FoxP subfamily in sea lampreys. The coincidence between the "1-4" gene duplication pattern of FoxP subfamily genes from invertebrates to vertebrates and two rounds of ancestral whole genome duplication (1R- and 2R-WGD) events reveal that the FoxP subfamily of vertebrates was quadruplicated in the 1R- and 2R-WGD events. Furthermore, we deduced that a synchronous gene duplication process occurred for the FoxP subfamily and for three linked gene families/subfamilies (i.e. MIT family, mGluR group III and PLXNA subfamily) in the 1R- and 2R-WGD events using phylogenetic analyses and mirror-dendrogram methods (i.e. algorithms to test protein-protein interactions). Specifically, the ancestor of FoxP1 and FoxP3 and the ancestor of FoxP2 and FoxP4 were generated in 1R-WGD event. In the subsequent 2R-WGD event, these two ancestral genes were changed into FoxP1, FoxP2, FoxP3 and FoxP4. The elucidation of these gene duplication processes shed light on the phylogenetic relationships between functional modules of the FoxP subfamily member genes. PMID:27188254

  16. Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

    PubMed Central

    Lipson, Mark; Loh, Po-Ru; Sankararaman, Sriram; Patterson, Nick; Berger, Bonnie; Reich, David

    2015-01-01

    The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Most notably, recent approaches based on counting de novo mutations in family pedigrees have yielded significantly smaller values than classical methods based on sequence divergence. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.61 ± 0.13 × 10−8 mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes. PMID:26562831

  17. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle

    PubMed Central

    Huson, Heather J.; Kim, Eui-Soo; Godfrey, Robert W.; Olson, Timothy A.; McClure, Matthew C.; Chase, Chad C.; Rizzi, Rita; O'Brien, Ana M. P.; Van Tassell, Curt P.; Garcia, José F.; Sonstegard, Tad S.

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7–38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus. PMID:24808908

  18. The first complete mitochondrial genome sequences of Amblypygi (Chelicerata: Arachnida) reveal conservation of the ancestral arthropod gene order.

    PubMed

    Fahrein, Kathrin; Masta, Susan E; Podsiadlowski, Lars

    2009-05-01

    Amblypygi (whip spiders) are terrestrial chelicerates inhabiting the subtropics and tropics. In morphological and rRNA-based phylogenetic analyses, Amblypygi cluster with Uropygi (whip scorpions) and Araneae (spiders) to form the taxon Tetrapulmonata, but there is controversy regarding the interrelationship of these three taxa. Mitochondrial genomes provide an additional large data set of phylogenetic information (sequences, gene order, RNA secondary structure), but in arachnids, mitochondrial genome data are missing for some of the major orders. In the course of an ongoing project concerning arachnid mitochondrial genomics, we present the first two complete mitochondrial genomes from Amblypygi. Both genomes were found to be typical circular duplex DNA molecules with all 37 genes usually present in bilaterian mitochondrial genomes. In both species, gene order is identical to that of Limulus polyphemus (Xiphosura), which is assumed to reflect the putative arthropod ground pattern. All tRNA gene sequences have the potential to fold into structures that are typical of metazoan mitochondrial tRNAs, except for tRNA-Ala, which lacks the D arm in both amblypygids, suggesting the loss of this feature early in amblypygid evolution. Phylogenetic analysis resulted in weak support for Uropygi being the sister group of Amblypygi. PMID:19448726

  19. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species.

    PubMed

    Huguet-Tapia, Jose C; Lefebure, Tristan; Badger, Jonathan H; Guan, Dongli; Pettis, Gregg S; Stanhope, Michael J; Loria, Rosemary

    2016-04-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  20. Comparative sequence analyses indicate that Coffea (Asterids) and Vitis (Rosids) derive from the same paleo-hexaploid ancestral genome.

    PubMed

    Cenci, Alberto; Combes, Marie-Christine; Lashermes, Philippe

    2010-05-01

    The complete sequence of Vitis vinifera revealed that the rosid clade derives from a hexaploid ancestor. At present, no analysis of complete genome sequence is available for an asterid, the other large eudicot clade, which includes the economically important species potato, tomato and coffee. To elucidate the genomic history of asterids, we compared the sequence of an 800 kb region of diploid Coffea genome to the orthologous regions of V. vinifera, Populus trichocarpa and Arabidopsis thaliana. We found a very high level of collinearity between around 80 genes of the three rosid species and Coffea. Collinearity comparisons between orthologous and paralogous regions indicates that (1) the Coffea (and consequently all asterids) and rosids share the same hexaploid ancestor; (2) the diploidization process (loss of duplicated and redundant copies from the whole genome duplication) was very advanced in the most recent common ancestor of rosids and asterids. Finally, no additional polyploidization events were detected in the Coffea lineage. Differences in gene loss rates were detected among the three rosid species and linked to the divergence in protein sequences. PMID:20361338

  1. Annotating Large Genomes With Exact Word Matches

    PubMed Central

    Healy, John; Thomas, Elizabeth E.; Schwartz, Jacob T.; Wigler, Michael

    2003-01-01

    We have developed a tool for rapidly determining the number of exact matches of any word within large, internally repetitive genomes or sets of genomes. Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words. We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM. Our original interest was motivated by oligonucleotide probe design, and we describe a general protocol for defining unique hybridization probes. But our method also has applications for the analysis of genome structure and assembly. We demonstrate the identification of chromosome-specific repeats, and outline a general procedure for finding undiscovered repeats. We also illustrate the changing contents of the human genome assemblies by comparing the annotations built from different genome freezes. PMID:12975312

  2. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  3. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  4. A Dense Linkage Map for Chinook salmon (Oncorhynchus tshawytscha) Reveals Variable Chromosomal Divergence After an Ancestral Whole Genome Duplication Event

    PubMed Central

    Brieuc, Marine S. O.; Waters, Charles D.; Seeb, James E.; Naish, Kerry A.

    2014-01-01

    Comparisons between the genomes of salmon species reveal that they underwent extensive chromosomal rearrangements following whole genome duplication that occurred in their lineage 58−63 million years ago. Extant salmonids are diploid, but occasional pairing between homeologous chromosomes exists in males. The consequences of re-diploidization can be characterized by mapping the position of duplicated loci in such species. Linkage maps are also a valuable tool for genome-wide applications such as genome-wide association studies, quantitative trait loci mapping or genome scans. Here, we investigated chromosomal evolution in Chinook salmon (Oncorhynchus tshawytscha) after genome duplication by mapping 7146 restriction-site associated DNA loci in gynogenetic haploid, gynogenetic diploid, and diploid crosses. In the process, we developed a reference database of restriction-site associated DNA loci for Chinook salmon comprising 48528 non-duplicated loci and 6409 known duplicated loci, which will facilitate locus identification and data sharing. We created a very dense linkage map anchored to all 34 chromosomes for the species, and all arms were identified through centromere mapping. The map positions of 799 duplicated loci revealed that homeologous pairs have diverged at different rates following whole genome duplication, and that degree of differentiation along arms was variable. Many of the homeologous pairs with high numbers of duplicated markers appear conserved with other salmon species, suggesting that retention of conserved homeologous pairing in some arms preceded species divergence. As chromosome arms are highly conserved across species, the major resources developed for Chinook salmon in this study are also relevant for other related species. PMID:24381192

  5. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  6. TYPES AND RATES OF SEQUENCE EVOLUTION AT HMW-GLUTENIN LOCUS IN HEXAPLOID WHEAT AND ITS ANCESTRAL GENOMES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Glu-1 locus, encoding the High Molecular Weight-glutenin protein subunits, controls bread-making quality in hexaploid wheat (Triticum aestivum) and represents a recently evolved region unique to Triticeae genomes. To understand the molecular evolution of this locus region, three orthologous Glu...

  7. Rapid genome-wide evolution in Brassica rapa populations following drought revealed by sequencing of ancestral and descendant gene pools.

    PubMed

    Franks, Steven J; Kane, Nolan C; O'Hara, Niamh B; Tittes, Silas; Rest, Joshua S

    2016-08-01

    There is increasing evidence that evolution can occur rapidly in response to selection. Recent advances in sequencing suggest the possibility of documenting genetic changes as they occur in populations, thus uncovering the genetic basis of evolution, particularly if samples are available from both before and after selection. Here, we had a unique opportunity to directly assess genetic changes in natural populations following an evolutionary response to a fluctuation in climate. We analysed genome-wide differences between ancestors and descendants of natural populations of Brassica rapa plants from two locations that rapidly evolved changes in multiple phenotypic traits, including flowering time, following a multiyear late-season drought in California. These ancestor-descendant comparisons revealed evolutionary shifts in allele frequencies in many genes. Some genes showing evolutionary shifts have functions related to drought stress and flowering time, consistent with an adaptive response to selection. Loci differentiated between ancestors and descendants (FST outliers) were generally different from those showing signatures of selection based on site frequency spectrum analysis (Tajima's D), indicating that the loci that evolved in response to the recent drought and those under historical selection were generally distinct. Very few genes showed similar evolutionary responses between two geographically distinct populations, suggesting independent genetic trajectories of evolution yielding parallel phenotypic changes. The results show that selection can result in rapid genome-wide evolutionary shifts in allele frequencies in natural populations, and highlight the usefulness of combining resurrection experiments in natural populations with genomics for studying the genetic basis of adaptive evolution. PMID:27072809

  8. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.

    PubMed

    Iyer, Lakshminarayan M; Balaji, S; Koonin, Eugene V; Aravind, L

    2006-04-01

    A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria

  9. Genome-wide Association Study Identifies HLA 8.1 Ancestral Haplotype Alleles as Major Genetic Risk Factors for Myositis Phenotypes

    PubMed Central

    Miller, Frederick W.; Chen, Wei; O’Hanlon, Terrance P.; Cooper, Robert G.; Vencovsky, Jiri; Rider, Lisa G.; Danko, Katalin; Wedderburn, Lucy R.; Lundberg, Ingrid E.; Pachman, Lauren M.; Reed, Ann M.; Ytterberg, Steven R.; Padyukov, Leonid; Selva-O’Callaghan, Albert; Radstake, Timothy R.; Isenberg, David A.; Chinoy, Hector; Ollier, William E.R.; Scheet, Paul; Peng, Bo; Lee, Annette; Byun, Jinyoung; Lamb, Janine A.; Gregersen, Peter K.; Amos, Christopher I.

    2016-01-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis; 473 juvenile dermatomyositis; 532 polymyositis; and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P < 5 × 10−8) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1haplotype comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  10. Genome-wide association study identifies HLA 8.1 ancestral haplotype alleles as major genetic risk factors for myositis phenotypes.

    PubMed

    Miller, F W; Chen, W; O'Hanlon, T P; Cooper, R G; Vencovsky, J; Rider, L G; Danko, K; Wedderburn, L R; Lundberg, I E; Pachman, L M; Reed, A M; Ytterberg, S R; Padyukov, L; Selva-O'Callaghan, A; Radstake, T R; Isenberg, D A; Chinoy, H; Ollier, W E R; Scheet, P; Peng, B; Lee, A; Byun, J; Lamb, J A; Gregersen, P K; Amos, C I

    2015-10-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis, 473 juvenile dermatomyositis, 532 polymyositis and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl-tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P<5×10(-8)) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1 comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  11. Genomes of Helicobacter pylori from native Peruvians suggest admixture of ancestral and modern lineages and reveal a western type cag-pathogenicity island

    PubMed Central

    Devi, S Manjulata; Ahmed, Irshad; Khan, Aleem A; Rahman, Syed Asad; Alvi, Ayesha; Sechi, Leonardo A; Ahmed, Niyaz

    2006-01-01

    Background Helicobacter pylori is presumed to be co-evolved with its human host and is a highly diverse gastric pathogen at genetic levels. Ancient origins of H. pylori in the New World are still debatable. It is not clear how different waves of human migrations in South America contributed to the evolution of strain diversity of H. pylori. The objective of our 'phylogeographic' study was to gain fresh insights into these issues through mapping genetic origins of H. pylori of native Peruvians (of Amerindian ancestry) and their genomic comparison with isolates from Spain, and Japan. Results For this purpose, we attempted to dissect genetic identity of strains by fluorescent amplified fragment length polymorphism (FAFLP) analysis, multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and the sequence analyses of the babB adhesin and oipA genes. The whole cag pathogenicity-island (cagPAI) from these strains was analyzed using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. We observed that while European genotype (hp-Europe) predominates in native Peruvian strains, approximately 20% of these strains represent a sub-population with an Amerindian ancestry (hsp-Amerind). All of these strains however, irrespective of their ancestral affiliation harbored a complete, 'western' type cagPAI and the motifs surrounding it. This indicates a possible acquisition of cagPAI by the hsp-Amerind strains from the European strains, during decades of co-colonization. Conclusion Our observations suggest presence of ancestral H. pylori (hsp-Amerind) in Peruvian Amerindians which possibly managed to survive and compete against the Spanish strains that arrived to the New World about 500 years ago. We suggest that this might have happened after native Peruvian H. pylori strains acquired cagPAI sequences, either by new acquisition in cag-negative strains or by recombination in cag positive

  12. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments. PMID:16628499

  13. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration

    PubMed Central

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-01-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration—of a scope that is unprecedented in the field of traumatic stress—will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD. PMID:25904361

  14. Large-Scale Development of Gene-Associated Single-Nucleotide Polymorphism Markers for Molluscan Population Genomic, Comparative Genomic, and Genome-Wide Association Studies

    PubMed Central

    Jiao, Wenqian; Fu, Xiaoteng; Li, Jinqin; Li, Ling; Feng, Liying; Lv, Jia; Zhang, Lu; Wang, Xiaojian; Li, Yangping; Hou, Rui; Zhang, Lingling; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

    2014-01-01

    Mollusca is the second most diverse group of animals in the world. Despite their perceived importance, omics-level studies have seldom been applied to this group of animals largely due to a paucity of genomic resources. Here, we report the first large-scale gene-associated marker development and evaluation for a bivalve mollusc, Chlamys farreri. More than 21,000 putative single-nucleotide polymorphisms (SNPs) were identified from the C. farreri transcriptome. Primers and probes were designed and synthesized for 4500 SNPs, and 1492 polymorphic markers were successfully developed using a high-resolution melting genotyping platform. These markers are particularly suitable for population genomic analysis due to high polymorphism within and across populations, a low frequency of null alleles, and conformation to neutral expectations. Unexpectedly, high cross-species transferability was observed, suggesting that the transferable SNPs may largely represent ancestral genetic variations that have been preserved differentially among subfamilies of Pectinidae. Gene annotations were available for 73% of the markers, and 65% could be anchored to the recently released Pacific oyster genome. Large-scale association analysis revealed key candidate genes responsible for scallop growth regulation, and provided markers for further genetic improvement of C. farreri in breeding programmes. PMID:24277739

  15. A consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial sub-genome rearrangement

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Hexaploid oat (Avena sativa, 2n = 6x = 42) is a member of the Poaceae family with a very large genome (~13 Gb) containing 21 chromosome pairs: seven from each of two similar ancestral diploids (A and D) and seven from a more diverged ancestral diploid (C). Physical rearrangements among ancestral oat...

  16. Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

    PubMed

    Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

    2007-01-01

    The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence. PMID:18185983

  17. Genome comparison of Pseudomonas aeruginosa large phages.

    PubMed

    Hertveldt, Kirsten; Lavigne, Rob; Pleteneva, Elena; Sernova, Natalia; Kurochkina, Lidia; Korchevskii, Roman; Robben, Johan; Mesyanzhinov, Vadim; Krylov, Victor N; Volckaert, Guido

    2005-12-01

    Pseudomonas aeruginosa phage EL is a dsDNA phage related to the giant phiKZ-like Myoviridae. The EL genome sequence comprises 211,215 bp and has 201 predicted open reading frames (ORFs). The EL genome does not share DNA sequence homology with other viruses and micro-organisms sequenced to date. However, one-third of the predicted EL gene products (gps) shares similarity (Blast alignments of 17-55% amino acid identity) with phiKZ proteins. Comparative EL and phiKZ genomics reveals that these giant phages are an example of substantially diverged genetic mosaics. Based on the position of similar EL and phiKZ predicted gene products, five genome regions can be delineated in EL, four of which are relatively conserved between EL and phiKZ. Region IV, a 17.7 kb genome region with 28 predicted ORFs, is unique to EL. Fourteen EL ORFs have been assigned a putative function based on protein similarity. Assigned proteins are involved in DNA replication and nucleotide metabolism (NAD+-dependent DNA ligase, ribonuclease HI, helicase, thymidylate kinase), host lysis and particle structure. EL-gp146 is the first chaperonin GroEL sequence identified in a viral genome. Besides a putative transposase, EL harbours predicted mobile endonucleases related to H-N-H and LAGLIDADG homing endonucleases associated with group I intron and intein intervening sequences. PMID:16256135

  18. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  19. Ancestral Origins and Genetic History of Tibetan Highlanders.

    PubMed

    Lu, Dongsheng; Lou, Haiyi; Yuan, Kai; Wang, Xiaoji; Wang, Yuchen; Zhang, Chao; Lu, Yan; Yang, Xiong; Deng, Lian; Zhou, Ying; Feng, Qidi; Hu, Ya; Ding, Qiliang; Yang, Yajun; Li, Shilin; Jin, Li; Guan, Yaqun; Su, Bing; Kang, Longli; Xu, Shuhua

    2016-09-01

    The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection. PMID:27569548

  20. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  1. GDC 2: Compression of large collections of genomes.

    PubMed

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  2. GDC 2: Compression of large collections of genomes

    PubMed Central

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  3. Identification of large-scale genomic variation in cancer genomes using in silico reference models.

    PubMed

    Killcoyne, Sarah; Del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  4. Identification of large-scale genomic variation in cancer genomes using in silico reference models

    PubMed Central

    Killcoyne, Sarah; del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  5. Ancestral gene synteny reconstruction improves extant species scaffolding

    PubMed Central

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes. PMID:26450761

  6. BACFinder: genomic localisation of large insert genomic clones based on restriction fingerprinting

    PubMed Central

    Crowe, Mark L.; Rana, Debashis; Fraser, Fiona; Bancroft, Ian; Trick, Martin

    2002-01-01

    We have developed software that allows the prediction of the genomic location of a bacterial artificial chromosome (BAC) clone, or other large genomic clone, based on a simple restriction digest of the BAC. The mapping is performed by comparing the experimentally derived restriction digest of the BAC DNA with a virtual restriction digest of the whole genome sequence. Our trials indicate that this program identified the genomic regions represented by BAC clones with a degree of accuracy comparable to that of end-sequencing, but at considerably less cost. Although the program has been developed principally for use with Arabidopsis BACs, it should align large insert genomic clones to any fully sequenced genome. PMID:12409477

  7. Exon capture optimization in amphibians with large genomes.

    PubMed

    McCartney-Melstad, Evan; Mount, Genevieve G; Shaffer, H Bradley

    2016-09-01

    Gathering genomic-scale data efficiently is challenging for nonmodel species with large, complex genomes. Transcriptome sequencing is accessible for organisms with large genomes, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but relatively few data exist for exon capture experiments in nonmodel organisms with large genomes. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for amphibians and other taxa with large, complex genomes. Increasing the amounts of c0 t-1 repetitive sequence blocker and individual input DNA used in target enrichment reactions reduced the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9% and rendering target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen vertebrates with large genomes, including amphibians. PMID:27223337

  8. Large-scale structure of genomic methylation patterns.

    PubMed

    Rollins, Robert A; Haghighi, Fatemeh; Edwards, John R; Das, Rajdeep; Zhang, Michael Q; Ju, Jingyue; Bestor, Timothy H

    2006-02-01

    The mammalian genome depends on patterns of methylated cytosines for normal function, but the relationship between genomic methylation patterns and the underlying sequence is unclear. We have characterized the methylation landscape of the human genome by global analysis of patterns of CpG depletion and by direct sequencing of 3073 unmethylated domains and 2565 methylated domains from human brain DNA. The genome was found to consist of short (<4 kb) unmethylated domains embedded in a matrix of long methylated domains. Unmethylated domains were enriched in promoters, CpG islands, and first exons, while methylated domains comprised interspersed and tandem-repeated sequences, exons other than first exons, and non-annotated single-copy sequences that are depleted in the CpG dinucleotide. The enrichment of regulatory sequences in the relatively small unmethylated compartment suggests that cytosine methylation constrains the effective size of the genome through the selective exposure of regulatory sequences. This buffers regulatory networks against changes in total genome size and provides an explanation for the C value paradox, which concerns the wide variations in genome size that scale independently of gene number. This suggestion is compatible with the finding that cytosine methylation is universal among large-genome eukaryotes, while many eukaryotes with genome sizes <5 x 10(8) bp do not methylate their DNA. PMID:16365381

  9. A method to capture large DNA fragments from genomic DNA.

    PubMed

    Ball, Geneviève; Filloux, Alain; Voulhoux, Romé

    2014-01-01

    The gene capture technique is a powerful tool that allows the cloning of large DNA regions (up to 80 kb), such as entire genomic islands, without using restriction enzymes or DNA amplification. This technique takes advantage of the high recombinant capacity of the yeast. A "capture" vector containing both ends of the target DNA region must first be constructed. The target region is then captured by co-transformation and recombination in yeast between the "capture" vector and appropriate genomic DNA. The selected recombinant plasmid can be verified by sequencing and transferred in the bacteria for multiple applications. This chapter describes a protocol specifically adapted for Pseudomonas aeruginosa genomic DNA capture. PMID:24818928

  10. Stability analysis of chickpea large genomic DNA inserts in Agrobacterium.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Agrobacterium tumefaciens-mediated transformation of large DNA inserts directly into plants facilitates the transfer of gene clusters and flanking regulatory elements. It is recommended that the integrity of large genomic fragments in Agrobacterium be verified prior to plant transformation. In this ...

  11. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    PubMed

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. PMID:25200798

  12. Unraveling recombination rate evolution using ancestral recombination maps

    PubMed Central

    Munch, Kasper; Schierup, Mikkel H; Mailund, Thomas

    2014-01-01

    Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it. PMID:25043668

  13. Territorial Polymers and Large Scale Genome Organization

    NASA Astrophysics Data System (ADS)

    Grosberg, Alexander

    2012-02-01

    Chromatin fiber in interphase nucleus represents effectively a very long polymer packed in a restricted volume. Although polymer models of chromatin organization were considered, most of them disregard the fact that DNA has to stay not too entangled in order to function properly. One polymer model with no entanglements is the melt of unknotted unconcatenated rings. Extensive simulations indicate that rings in the melt at large length (monomer numbers) N approach the compact state, with gyration radius scaling as N^1/3, suggesting every ring being compact and segregated from the surrounding rings. The segregation is consistent with the known phenomenon of chromosome territories. Surface exponent β (describing the number of contacts between neighboring rings scaling as N^β) appears only slightly below unity, β 0.95. This suggests that the loop factor (probability to meet for two monomers linear distance s apart) should decay as s^-γ, where γ= 2 - β is slightly above one. The later result is consistent with HiC data on real human interphase chromosomes, and does not contradict to the older FISH data. The dynamics of rings in the melt indicates that the motion of one ring remains subdiffusive on the time scale well above the stress relaxation time.

  14. Roary: rapid large-scale prokaryote pan genome analysis

    PubMed Central

    Page, Andrew J.; Cummins, Carla A.; Hunt, Martin; Wong, Vanessa K.; Reuter, Sandra; Holden, Matthew T.G.; Fookes, Maria; Falush, Daniel; Keane, Jacqueline A.; Parkhill, Julian

    2015-01-01

    Summary: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. Availability and implementation: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary Contact: roary@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26198102

  15. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships.

    PubMed

    Legarra, Andres; Christensen, Ole F; Vitezica, Zulma G; Aguilar, Ignacio; Misztal, Ignacy

    2015-06-01

    Recent use of genomic (marker-based) relationships shows that relationships exist within and across base population (breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicates the conciliation of both approaches and, in particular, combining pedigree with genomic relationships. We present a coherent theoretical framework to consider base population in pedigree relationships. We suggest a conceptual framework that considers each ancestral population as a finite-sized pool of gametes. This generates across-individual relationships and contrasts with the classical view which each population is considered as an infinite, unrelated pool. Several ancestral populations may be connected and therefore related. Each ancestral population can be represented as a "metafounder," a pseudo-individual included as founder of the pedigree and similar to an "unknown parent group." Metafounders have self- and across relationships according to a set of parameters, which measure ancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees. Equivalences of genetic variance and variance components between the classical and this new parameterization are shown. Segregation variance on crosses of populations is modeled. Efficient algorithms for computation of relationship matrices, their inverses, and inbreeding coefficients are presented. Use of metafounders leads to compatibility of genomic and pedigree relationship matrices and to simple computing algorithms. Examples and code are given. PMID:25873631

  16. Large-scale data mining pilot project in human genome

    SciTech Connect

    Musick, R.; Fidelis, R.; Slezak, T.

    1997-05-01

    This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the accuracy of inference results, and lay the groundwork for a more extensive effort in large-scale data mining. A major aspect of the approach is that we will be fully-staffed data warehousing effort in the human Genome area. The long term goal is strong applications- oriented research program in large-@e data mining. The tools, skill set gained will be directly applicable to a wide spectrum of tasks involving a for large spatial and multidimensional data. This includes applications in ensuring non-proliferation, stockpile stewardship, enabling Global Ecology (Materials Database Industrial Ecology), advancing the Biosciences (Human Genome Project), and supporting data for others (Battlefield Management, Health Care).

  17. Optimization of AFLP for extremely large genomes over 70 Gb.

    PubMed

    Veselá, Petra; Volařík, Daniel; Mráček, Jaroslav

    2016-07-01

    Here, we present an improved amplified fragment length polymorphism (AFLP) protocol using restriction enzymes (AscI and SbfI) that recognize 8-base pair sequences to provide alternative optimization suitable for species with a genome size over 70 Gb. This cost-effective optimization massively reduces the number of amplified fragments using only +3 selective bases per primer during selective amplification. We demonstrate the effects of the number of fragments and genome size on the appearance of nonidentical comigrating fragments (size homoplasy), which has a negative impact on the informative value of AFLP genotypes. We also present various reaction conditions and their effects on reproducibility and the band intensity of the extremely large genome of Viscum album. The reproducibility of this octo-cutter protocol was calculated using several species with genome sizes ranging from 1 Gb (Carex panicea) to 76 Gb (V. album). The improved protocol also succeeded in detecting high intraspecific variability in species with large genomes (V. album, Galanthus nivalis and Pinus pumila). PMID:26849414

  18. Kernel methods for large-scale genomic data analysis

    PubMed Central

    Xing, Eric P.; Schaid, Daniel J.

    2015-01-01

    Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today’s explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion. PMID:25053743

  19. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    SciTech Connect

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan; DiFazio, Steven P; Tuskan, Gerald A

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  20. Primate chromosome evolution: ancestral karyotypes, marker order and neocentromeres.

    PubMed

    Stanyon, R; Rocchi, M; Capozzi, O; Roberto, R; Misceo, D; Ventura, M; Cardone, M F; Bigoni, F; Archidiacono, N

    2008-01-01

    In 1992 the Japanese macaque was the first species for which the homology of the entire karyotype was established by cross-species chromosome painting. Today, there are chromosome painting data on more than 50 species of primates. Although chromosome painting is a rapid and economical method for tracking translocations, it has limited utility for revealing intrachromosomal rearrangements. Fortunately, the use of BAC-FISH in the last few years has allowed remarkable progress in determining marker order along primate chromosomes and there are now marker order data on an array of primate species for a good number of chromosomes. These data reveal inversions, but also show that centromeres of many orthologous chromosomes are embedded in different genomic contexts. Even if the mechanisms of neocentromere formation and progression are just beginning to be understood, it is clear that these phenomena had a significant impact on shaping the primate genome and are fundamental to our understanding of genome evolution. In this report we complete and integrate the dataset of BAC-FISH marker order for human syntenies 1, 2, 4, 5, 8, 12, 17, 18, 19, 21, 22 and the X. These results allowed us to develop hypotheses about the content, marker order and centromere position in ancestral karyotypes at five major branching points on the primate evolutionary tree: ancestral primate, ancestral anthropoid, ancestral platyrrhine, ancestral catarrhine and ancestral hominoid. Current models suggest that between-species structural rearrangements are often intimately related to speciation. Comparative primate cytogenetics has become an important tool for elucidating the phylogeny and the taxonomy of primates. It has become increasingly apparent that molecular cytogenetic data in the future can be fruitfully combined with whole-genome assemblies to advance our understanding of primate genome evolution as well as the mechanisms and processes that have led to the origin of the human genome. PMID

  1. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution. PMID:26382233

  2. Whole-Genome Comparison of Two Campylobacter jejuni Isolates of the Same Sequence Type Reveals Multiple Loci of Different Ancestral Lineage

    PubMed Central

    Biggs, Patrick J.; Fearnhead, Paul; Hotter, Grant; Mohan, Vathsala; Collins-Emerson, Julie; Kwan, Errol; Besser, Thomas E.; Cookson, Adrian; Carter, Philip E.; French, Nigel P.

    2011-01-01

    Campylobacter jejuni ST-474 is the most important human enteric pathogen in New Zealand, and yet this genotype is rarely found elsewhere in the world. Insight into the evolution of this organism was gained by a whole genome comparison of two ST-474, flaA SVR-14 isolates and other available C. jejuni isolates and genomes. The two isolates were collected from different sources, human (H22082) and retail poultry (P110b), at the same time and from the same geographical location. Solexa sequencing of each isolate resulted in 1.659 Mb (H22082) and 1.656 Mb (P110b) of assembled sequences within 28 (H22082) and 29 (P110b) contigs. We analysed 1502 genes for which we had sequences within both ST-474 isolates and within at least one of 11 C. jejuni reference genomes. Although 94.5% of genes were identical between the two ST-474 isolates, we identified 83 genes that differed by at least one nucleotide, including 55 genes with non-synonymous substitutions. These covered 101 kb and contained 672 point differences. We inferred that 22 (3.3%) of these differences were due to mutation and 650 (96.7%) were imported via recombination. Our analysis estimated 38 recombinant breakpoints within these 83 genes, which correspond to recombination events affecting at least 19 loci regions and gives a tract length estimate of 2 kb. This includes a 12 kb region displaying non-homologous recombination in one of the ST-474 genomes, with the insertion of two genes, including ykgC, a putative oxidoreductase, and a conserved hypothetical protein of unknown function. Furthermore, our analysis indicates that the source of this recombined DNA is more likely to have come from C. jejuni strains that are more closely related to ST-474. This suggests that the rates of recombination and mutation are similar in order of magnitude, but that recombination has been much more important for generating divergence between the two ST-474 isolates. PMID:22096527

  3. Indexes of Large Genome Collections on a PC

    PubMed Central

    Danek, Agnieszka; Deorowicz, Sebastian; Grabowski, Szymon

    2014-01-01

    The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s) genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16–32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp) are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online. PMID:25289699

  4. BMPER Mutation in Diaphanospondylodysostosis Identified by Ancestral Autozygosity Mapping and Targeted High-Throughput Sequencing

    PubMed Central

    Funari, Vincent A.; Krakow, Deborah; Nevarez, Lisette; Chen, Zugen; Funari, Tara L.; Vatanavicharn, Nithiwat; Wilcox, William R.; Rimoin, David L.; Nelson, Stanley F.; Cohn, Daniel H.

    2010-01-01

    Diaphanospondylodysostosis (DSD) is a rare, recessively inherited, perinatal lethal skeletal disorder. The low frequency and perinatal lethality of DSD makes assembling a large set of families for traditional linkage-based genetic approaches challenging. By searching for evidence of unknown ancestral consanguinity, we identified two autozygous intervals, comprising 34 Mbps, unique to a single case of DSD. Empirically testing for ancestral consanguinity was effective in localizing the causative variant, thereby reducing the genomic space within which the mutation resides. High-throughput sequence analysis of exons captured from these intervals demonstrated that the affected individual was homozygous for a null mutation in BMPER, which encodes the bone morphogenetic protein-binding endothelial cell precursor-derived regulator. Mutations in BMPER were subsequently found in three additional DSD cases, confirming that defects in BMPER produce DSD. Phenotypic similarities between DSD and Bmper null mice indicate that BMPER-mediated signaling plays an essential role in vertebral segmentation early in human development. PMID:20869035

  5. What was the ancestral sex-determining mechanism in amniote vertebrates?

    PubMed

    Johnson Pokorná, Martina; Kratochvíl, Lukáš

    2016-02-01

    Amniote vertebrates, the group consisting of mammals and reptiles including birds, possess various mechanisms of sex determination. Under environmental sex determination (ESD), the sex of individuals depends on the environmental conditions occurring during their development and therefore there are no sexual differences present in their genotypes. Alternatively, through the mode of genotypic sex determination (GSD), sex is determined by a sex-specific genotype, i.e. by the combination of sex chromosomes at various stages of differentiation at conception. As well as influencing sex determination, sex-specific parts of genomes may, and often do, develop specific reproductive or ecological roles in their bearers. Accordingly, an individual with a mismatch between phenotypic (gonadal) and genotypic sex, for example an individual sex-reversed by environmental effects, should have a lower fitness due to the lack of specialized, sex-specific parts of their genome. In this case, evolutionary transitions from GSD to ESD should be less likely than transitions in the opposite direction. This prediction contrasts with the view that GSD was the ancestral sex-determining mechanism for amniote vertebrates. Ancestral GSD would require several transitions from GSD to ESD associated with an independent dedifferentiation of sex chromosomes, at least in the ancestors of crocodiles, turtles, and lepidosaurs (tuataras and squamate reptiles). In this review, we argue that the alternative theory postulating ESD as ancestral in amniotes is more parsimonious and is largely concordant with the theoretical expectations and current knowledge of the phylogenetic distribution and homology of sex-determining mechanisms. PMID:25424152

  6. The Mitochondrial Genome of the Leaf-Cutter Ant Atta laevigata: A Mitogenome with a Large Number of Intergenic Spacers

    PubMed Central

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans. PMID:24828084

  7. Recombination-mediated genetic engineering of large genomic DNA transgenes.

    PubMed

    Ejsmont, Radoslaw Kamil; Ahlfeld, Peter; Pozniakovsky, Andrei; Stewart, A Francis; Tomancak, Pavel; Sarov, Mihail

    2011-01-01

    Faithful gene activity reporters are a useful tool for evo-devo studies enabling selective introduction of specific loci between species and assaying the activity of large gene regulatory sequences. The use of large genomic constructs such as BACs and fosmids provides an efficient platform for exploration of gene function under endogenous regulatory control. Despite their large size they can be easily engineered using in vivo homologous recombination in Escherichia coli (recombineering). We have previously demonstrated that the efficiency and fidelity of recombineering are sufficient to allow high-throughput transgene engineering in liquid culture, and have successfully applied this approach in several model systems. Here, we present a detailed protocol for recombineering of BAC/fosmid transgenes for expression of fluorescent or affinity tagged proteins in Drosophila under endogenous in vivo regulatory control. The tag coding sequence is seamlessly recombineered into the genomic region contained in the BAC/fosmid clone, which is then integrated into the fly genome using ϕC31 recombination. This protocol can be easily adapted to other recombineering projects. PMID:22065454

  8. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications

    PubMed Central

    2013-01-01

    Background Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Results Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. Conclusions We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the

  9. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  10. The Ancestral Gene for Transcribed, Low-Copy Repeats in the Prader-Willi/Angleman Region Encodes a Large Protein Implicated in Protein Trafficking that is Deficient in Mice with Neuromuscular and

    SciTech Connect

    Ji, Y.

    1999-01-01

    Transcribed, low-copy repeat elements are associated with the breakpoint regions of common deletions in Prader-Willi and Angelman syndromes. We report here the identification of the ancestral gene ( HERC2 ) and a family of duplicated, truncated copies that comprise these low-copy repeats. This gene encodes a highly conserved giant protein, HERC2, that is distantly related to p532 (HERC1), a guanine nucleotide exchange factor (GEF) implicated in vesicular trafficking. The mouse genome contains a single Herc2 locus, located in the jdf2 (juvenile development and fertility-2) interval of chromosome 7C. We have identified single nucleotide splice junction mutations in Herc2 in three independent N-ethyl-N-nitrosourea-induced jdf2 mutant alleles, each leading to exon skipping with premature termination of translation and/or deletion of conserved amino acids. Therefore, mutations in Herc2 lead to the neuromuscular secretory vesicle and sperm acrosome defects, other developmental abnormalities and juvenile lethality of jdf2 mice. Combined, these findings suggest that HERC2 is an important gene encoding a GEF involved in protein trafficking and degradation pathways in the cell.

  11. ProCARs: Progressive Reconstruction of Ancestral Gene Orders

    PubMed Central

    2015-01-01

    Background In the context of ancestral gene order reconstruction from extant genomes, there exist two main computational approaches: rearrangement-based, and homology-based methods. The rearrangement-based methods consist in minimizing a total rearrangement distance on the branches of a species tree. The homology-based methods consist in the detection of a set of potential ancestral contiguity features, followed by the assembling of these features into Contiguous Ancestral Regions (CARs). Results In this paper, we present a new homology-based method that uses a progressive approach for both the detection and the assembling of ancestral contiguity features into CARs. The method is based on detecting a set of potential ancestral adjacencies iteratively using the current set of CARs at each step, and constructing CARs progressively using a 2-phase assembling method. Conclusion We show the usefulness of the method through a reconstruction of the boreoeutherian ancestral gene order, and a comparison with three other homology-based methods: AnGeS, InferCARs and GapAdj. The program, written in Python, and the dataset used in this paper are available at http://bioinfo.lifl.fr/procars/. PMID:26040958

  12. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  13. Large-scale genomic analysis suggests a neutral punctuated dynamics of transposable elements in bacterial genomes.

    PubMed

    Iranzo, Jaime; Gómez, Manuel J; López de Saro, Francisco J; Manrubia, Susanna

    2014-06-01

    Insertion sequences (IS) are the simplest and most abundant form of transposable DNA found in bacterial genomes. When present in multiple copies, it is thought that they can promote genomic plasticity and genetic exchange, thus being a major force of evolutionary change. The main processes that determine IS content in genomes are, though, a matter of debate. In this work, we take advantage of the large amount of genomic data currently available and study the abundance distributions of 33 IS families in 1811 bacterial chromosomes. This allows us to test simple models of IS dynamics and estimate their key parameters by means of a maximum likelihood approach. We evaluate the roles played by duplication, lateral gene transfer, deletion and purifying selection. We find that the observed IS abundances are compatible with a neutral scenario where IS proliferation is controlled by deletions instead of purifying selection. Even if there may be some cases driven by selection, neutral behavior dominates over large evolutionary scales. According to this view, IS and hosts tend to coexist in a dynamic equilibrium state for most of the time. Our approach also allows for a detection of recent IS expansions, and supports the hypothesis that rapid expansions constitute transient events-punctuations-during which the state of coexistence of IS and host becomes perturbated. PMID:24967627

  14. SMRT® Sequencing Solutions for Large Genomes and Transcriptomes

    PubMed Central

    Chin, J.; Peluso, P.; Rank, D.; Kim, K.; Landolin, J.; Koren, S.; Phillippy, A.M.; Tseng, E.; Wang, S.; Baybayan, P.; Gu, J.

    2014-01-01

    Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve with short-read technologies. We present solutions available for both reference genome improvement (100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome improvement projects using the latest P5-C3 chemistry on model organisms with contig N50 exceeding 6 Mb and longest contig exceeding 12.5 Mb with an average base quality of QV50 will be shared. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented.

  15. Fast randomization of large genomic datasets while preserving alteration counts

    PubMed Central

    Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J.; Wedge, David C.; Tamborero, David; Alexandrov, Ludmil B.; Lopez-Bigas, Nuria; Garnett, Mathew J.; Jurman, Giuseppe; Saez-Rodriguez, Julio

    2014-01-01

    Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a ‘mutually exclusive’ manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. Availability and implementation: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html Contact: iorio@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161255

  16. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine

    PubMed Central

    Biesecker, Leslie G.; Mullikin, James C.; Facio, Flavia M.; Turner, Clesson; Cherukuri, Praveen F.; Blakesley, Robert W.; Bouffard, Gerard G.; Chines, Peter S.; Cruz, Pedro; Hansen, Nancy F.; Teer, Jamie K.; Maskeri, Baishali; Young, Alice C.; Manolio, Teri A.; Wilson, Alexander F.; Finkel, Toren; Hwang, Paul; Arai, Andrew; Remaley, Alan T.; Sachdev, Vandana; Shamburek, Robert; Cannon, Richard O.; Green, Eric D.

    2009-01-01

    ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine. PMID:19602640

  17. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  18. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    PubMed

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-03-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  19. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  20. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  1. Volume visualization of multiple alignment of large genomicDNA

    SciTech Connect

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  2. Genomic analysis of regulatory network dynamics reveals large topological changes

    NASA Astrophysics Data System (ADS)

    Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark

    2004-09-01

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  3. Ancestral reconstruction of tick lineages.

    PubMed

    Mans, Ben J; de Castro, Minique H; Pienaar, Ronel; de Klerk, Daniel; Gaven, Philasande; Genu, Siyamcela; Latif, Abdalla A

    2016-06-01

    Ancestral reconstruction in its fullest sense aims to describe the complete evolutionary history of a lineage. This depends on accurate phylogenies and an understanding of the key characters of each parental lineage. An attempt is made to delineate our current knowledge with regard to the ancestral reconstruction of the tick (Ixodida) lineage. Tick characters may be assigned to Core of Life, Lineages of Life or Edges of Life phenomena depending on how far back these characters may be assigned in the evolutionary Tree of Life. These include housekeeping genes, sub-cellular systems, heme processing (Core of Life), development, moulting, appendages, nervous and organ systems, homeostasis, respiration (Lineages of Life), specific adaptations to a blood-feeding lifestyle, including the complexities of salivary gland secretions and tick-host interactions (Edges of Life). The phylogenetic relationships of lineages, their origins and importance in ancestral reconstruction are discussed. Uncertainties with respect to systematic relationships, ancestral reconstruction and the challenges faced in comparative transcriptomics (next-generation sequencing approaches) are highlighted. While almost 150 years of information regarding tick biology have been assembled, progress in recent years indicates that we are in the infancy of understanding tick evolution. Even so, broad reconstructions can be made with relation to biological features associated with various lineages. Conservation of characters shared with sister and parent lineages are evident, but appreciable differences are present in the tick lineage indicating modification with descent, as expected for Darwinian evolutionary theory. Many of these differences can be related to the hematophagous lifestyle of ticks. PMID:26868413

  4. Intron-genome size relationship on a large evolutionary scale.

    PubMed

    Vinogradov, A E

    1999-09-01

    The intron-genome size relationship was studied across a wide evolutionary range (from slime mold and yeast to human and maize), as well as the relationship between genome size and the ratio of intervening/coding sequence size. The average intron size is scaled to genome size with a slope of about one-fourth for the log-transformed values; i.e., on the global scale its increase in evolution is lower than the increase in genome size by four orders of magnitude. There are exceptions to the general trend. In baker's yeast introns are extraordinarily long for its genome size. Tetrapods also have longer introns than expected for their genome sizes. In teleost fish the mean intron size does not differ significantly, notwithstanding the differences in genome size. In contrast to previous reports, avian introns were not found to be significantly shorter than introns of mammals, although avian genomes are smaller than genomes of mammals on average by about a factor of 2.5. The extra-/intragenic ratio of noncoding DNA can be higher in fungi than in animals, notwithstanding the smaller fungal genomes. In vertebrates and invertebrates taken separately, this ratio is increasing as the increase in genome size. Two hypotheses are proposed to explain the variation in the extra-/intragenic ratio of noncoding DNA in organisms with similar numbers of genes: transition (dynamic) and equilibrium (static). According to the transition model, this variation arises with the rapid shift of genome size because the bulk of extragenic DNA can be changed more rapidly than the finely interspersed intron sequences. The equilibrium model assumes that this variation is a result of selective adjustment of genome size with constraints imposed on the intron size due to its putative link to chromatin structure (and constraints of the splicing machinery). PMID:10473779

  5. Evaluation of Target Preparation Methods for Single Feature Polymorphism Detection in Large Complex Plant Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    For those genomes low in repetitive DNA, hybridizing total genomic DNA to high-density expression arrays offers an effective strategy for scoring single feature polymorphisms (SFPs). Of the ~2.5 Gb that constitute the maize genome (Zea mays L.), only 10-20% are genic sequences, with large amounts o...

  6. Combining p-values in large-scale genomics experiments.

    PubMed

    Zaykin, Dmitri V; Zhivotovsky, Lev A; Czika, Wendy; Shao, Susan; Wolfinger, Russell D

    2007-01-01

    In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis - that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K-ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  7. Combining p-values in large scale genomics experiments

    PubMed Central

    Zaykin, Dmitri V.; Zhivotovsky, Lev A.; Czika, Wendy; Shao, Susan; Wolfinger, Russell D.

    2008-01-01

    Summary In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher’s and Lancaster’s combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher’s method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis – that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  8. Genomic evidence for large, long-lived ancestors to placental mammals.

    PubMed

    Romiguier, J; Ranwez, V; Douzery, E J P; Galtier, N

    2013-01-01

    It is widely assumed that our mammalian ancestors, which lived in the Cretaceous era, were tiny animals that survived massive asteroid impacts in shelters and evolved into modern forms after dinosaurs went extinct, 65 Ma. The small size of most Mesozoic mammalian fossils essentially supports this view. Paleontology, however, is not conclusive regarding the ancestry of extant mammals, because Cretaceous and Paleocene fossils are not easily linked to modern lineages. Here, we use full-genome data to estimate the longevity and body mass of early placental mammals. Analyzing 36 fully sequenced mammalian genomes, we reconstruct two aspects of the ancestral genome dynamics, namely GC-content evolution and nonsynonymous over synonymous rate ratio. Linking these molecular evolutionary processes to life-history traits in modern species, we estimate that early placental mammals had a life span above 25 years and a body mass above 1 kg. This is similar to current primates, cetartiodactyls, or carnivores, but markedly different from mice or shrews, challenging the dominant view about mammalian origin and evolution. Our results imply that long-lived mammals existed in the Cretaceous era and were the most successful in evolution, opening new perspectives about the conditions for survival to the Cretaceous-Tertiary crisis. PMID:22949523

  9. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist. PMID:26340718

  10. Accommodating the load: The transposable element content of very large genomes.

    PubMed

    Metcalfe, Cushla J; Casane, Didier

    2013-03-01

    Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the 'missing' portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the

  11. Invariants of DNA genomic signals

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan A.

    2005-02-01

    For large scale analysis purposes, the conversion of genomic sequences into digital signals opens the possibility to use powerful signal processing methods for handling genomic information. The study of complex genomic signals reveals large scale features, maintained over the scale of whole chromosomes, that would be difficult to find by using only the symbolic representation. Based on genomic signal methods and on statistical techniques, the paper defines parameters of DNA sequences which are invariant to transformations induced by SNPs, splicing or crossover. Re-orienting concatenated coding regions in the same direction, regularities shared by the genomic material in all exons are revealed, pointing towards the hypothesis of a regular ancestral structure from which the current chromosome structures have evolved. This property is not found in non-nuclear genomic material, e.g., plasmids.

  12. Identifying Recent Adaptations in Large-scale Genomic Data

    PubMed Central

    Grossman, Sharon R.; Andersen, Kristian G.; Shlyakhter, Ilya; Tabrizi, Shervin; Winnicki, Sarah; Yen, Angela; Park, Daniel J.; Griesemer, Dustin; Karlsson, Elinor K.; Wong, Sunny H.; Cabili, Moran; Adegbola, Richard A.; Bamezai, Rameshwar N. K.; Hill, Adrian V. S.; Vannberg, Fredrik O.; Rinn, John L.; Lander, Eric S.; Schaffner, Stephen F.; Sabeti, Pardis C.

    2013-01-01

    SUMMARY While several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes Project (1000G) and the Composite of Multiple Signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes thirty-five high-scoring non-synonymous variants, fifty-nine variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate non-synonymous variant in TLR5, and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PMID:23415221

  13. Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases

    PubMed Central

    2015-01-01

    Programmable CRISPR-Cas systems have augmented our ability to produce precise genome manipulations. Here we demonstrate and characterize the ability of CRISPR-Cas derived nickases to direct targeted recombination of both small and large genomic regions flanked by repetitive elements in Escherichia coli. While CRISPR directed double-stranded DNA breaks are highly lethal in many bacteria, we show that CRISPR-guided nickase systems can be programmed to make precise, nonlethal, single-stranded incisions in targeted genomic regions. This induces recombination events and leads to targeted deletion. We demonstrate that dual-targeted nicking enables deletion of 36 and 97 Kb of the genome. Furthermore, multiplex targeting enables deletion of 133 Kb, accounting for approximately 3% of the entire E. coli genome. This technology provides a framework for methods to manipulate bacterial genomes using CRISPR-nickase systems. We envision this system working synergistically with preexisting bacterial genome engineering methods. PMID:26451892

  14. The influence of large scale genomics and the changing role of ex situ collections

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The development of large scale genomics resources in non-model organisms promises to have a fundamental impact on the utilization of genetic resources. Technical innovation in high through-put sequencing has reduced the cost to a point where genome-wide SNP development is feasible across a range of ...

  15. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  16. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    PubMed

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. PMID:23463597

  17. GE-17ALTERATION OF THE p53 PATHWAY AND ANCESTRAL PROGENITORS ARE ASSOCIATED WITH TUMOR RECURRENCE IN GLIOBLASTOMA

    PubMed Central

    Kim, Hoon; Zheng, Siyuan; Amini, Seyed; Virk, Selene; Mikkelsen, Tom; Brat, Daniel; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew; Cohen, Mark; Van Meir, Erwin; Scarpace, Lisa; Lander, Eric; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill; Verhaak, Roel

    2014-01-01

    To evaluate evolutionary patterns of GBM recurrence, we analyzed whole genome sequencing (WGS) and multi-sector exome sequencing data from pairs of primary and posttreatment GBM. WGS on ten primary-recurrent pairs detected a median number of 12,214 mutations which we utilized to uncover clonal structures, by analyzing the distribution of mutation cellular frequencies (the fraction of tumor cells harboring a mutation). On average, 41 % of the mutations were shared by primary and recurrence. The majority of shared mutations were clonal in both primary and recurrence, but we also observed many clonal mutations that were uniquely detected in either the primary or the recurrence. This raises the intriguing possibility that major tumor clones in the primary tumor and disease relapse both evolved from a shared ancestral tumor cell population. At least one subclone was identified in the majority of WGS samples, and we observed groups of mutations that were at low cancer cell fractions in both primary and recurrence, suggesting that both subclones evolved from the same ancestral tumor cells separate from the major clone ancestral cells. To address the possibility that the lack of overlap between subsequent tumors was due to intratumoral heterogeneity, we analyzed exome sequencing from a second tumor sector of seven primary and six recurrent tumors. We found that the majority of "second biopsy" mutations were not conserved between time points, suggesting that intratumoral heterogeneity did not explain the large number of mutations uniquely detected in primary and recurrence. The limited overlap of mutations in primary and recurrence provides evidence for ancestral tumor cell populations that could not be eradicated by therapy, while offspring cell populations contained unique mutations, were selectively killed by treatment and could therefore no longer be detected after disease relapse. This study has provided new insights into patterns and dynamics of tumor evolution.

  18. BactoGeNIE: a large-scale comparative genome visualization for big displays

    PubMed Central

    2015-01-01

    Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021

  19. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGESBeta

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  20. BactoGeNIE: A large-scale comparative genome visualization for big displays

    SciTech Connect

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.

  1. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods. PMID:12200476

  2. Multiway admixture deconvolution using phased or unphased ancestral panels.

    PubMed

    Churchhouse, Claire; Marchini, Jonathan

    2013-01-01

    We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model-Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported. PMID:23136122

  3. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. PMID:25919952

  4. Ancestral gene and "complementary" antibody dominate early ontogeny.

    PubMed

    Arend, Peter

    2013-05-01

    According to N.K. Jerne the somatic generation of immune recognition occurs in conjunction with germ cell evolution and precedes the formation of the zygote, i.e. operates before clonal selection. We propose that it is based on interspecies inherent, ancestral forces maintaining the lineage. Murine oogenesis may be offered as a model. So in C57BL/10BL sera an anti-A reactive, mercapto-ethanol sensitive glycoprotein of up to now unknown cellular origin, but exhibiting immunoglobulin M character, presents itself "complementary" to a syngeneic epitope, which encoded by histocompatibility gene A or meanwhile accepted ancestor of the ABO gene family, arises predominantly in ovarian tissue and was detected statistically significant exclusively in polar glycolipids. Reports either on loss, pronounced expressions or de novo appearances of A-type structures in various conditions of accelerated growth like germ cell evolution, wound healing, inflammation and tumor proliferation in man and ABO related animals might show the dynamics of ancestral functions guarantying stem cell fidelity in maturation and tissue renewal processes. Procedures vice versa generating pluripotent stem cells for therapeutical reasons may indicate, that any artificially started growth should somehow pass through the germ line from the beginning, where according to growing knowledge exclusively the oocyte's genome provides a completely channeling ancestral information. In predatory animals such as the modern-day sea anemone, ancestral proteins, particularly those of the p53 gene family govern the reproduction processes, and are active up to the current mammalian female germ line. Lectins, providing the dual function of growth promotion and defense in higher plants, are suggested to represent the evolutionary precursors of the mammalian immunoglobulin M molecules, or protein moiety implying the greatest functional diversity in nature. And apart from any established mammalian genetic tree, a common vetch

  5. Large-scale genomic comparison using two-dimensional DNA gels

    SciTech Connect

    Sidman, C.L.; Shaffer, D.J.

    1994-09-01

    Two-dimensional electrophoresis (2DE) of DNA fragments, in which separation occurs first by size and then by sequence variation, is a method enabling large-scale comparison of complex genomes. Combining 2DE with probing for various classes of repetitive genomic elements allows rapid and efficient comparison of thousands of fragments and millions of basepairs of DNA distributed across most genomic regions. This approach is demonstrated here by analyzing the extent of genomic relatedness of different inbred strains of mice. Such strains are shown to differ from each other by approximately 0.2-1% of their nucleotides, above which level reproductive speciation occurs. The 2DE method of assessing the overall relationship between two genomes represents an appropriate tool for analyzing members of a single species, but is too sensitive for use in interspecies comparisons. 51 refs., 4 figs., 1 tab.

  6. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  7. Large-scale profiling of microRNAs for The Cancer Genome Atlas

    PubMed Central

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J.; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A.

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ∼11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  8. Large-scale profiling of microRNAs for The Cancer Genome Atlas.

    PubMed

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ~11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  9. The draft genome of the large yellow croaker reveals well-developed innate immunity.

    PubMed

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  10. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  11. Inference of Ancestral Recombination Graphs through Topological Data Analysis.

    PubMed

    Cámara, Pablo G; Levine, Arnold J; Rabadán, Raúl

    2016-08-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  12. Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials

    PubMed Central

    Meric-Bernstam, Funda; Brusco, Lauren; Shaw, Kenna; Horombe, Chacha; Kopetz, Scott; Davies, Michael A.; Routbort, Mark; Piha-Paul, Sarina A.; Janku, Filip; Ueno, Naoto; Hong, David; De Groot, John; Ravi, Vinod; Li, Yisheng; Luthra, Raja; Patel, Keyur; Broaddus, Russell; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Purpose We report the experience with 2,000 consecutive patients with advanced cancer who underwent testing on a genomic testing protocol, including the frequency of actionable alterations across tumor types, subsequent enrollment onto clinical trials, and the challenges for trial enrollment. Patients and Methods Standardized hotspot mutation analysis was performed in 2,000 patients, using either an 11-gene (251 patients) or a 46- or 50-gene (1,749 patients) multiplex platform. Thirty-five genes were considered potentially actionable based on their potential to be targeted with approved or investigational therapies. Results Seven hundred eighty-nine patients (39%) had at least one mutation in potentially actionable genes. Eighty-three patients (11%) with potentially actionable mutations went on genotype-matched trials targeting these alterations. Of 230 patients with PIK3CA/AKT1/PTEN/BRAF mutations that returned for therapy, 116 (50%) received a genotype-matched drug. Forty patients (17%) were treated on a genotype-selected trial requiring a mutation for eligibility, 16 (7%) were treated on a genotype-relevant trial targeting a genomic alteration without biomarker selection, and 40 (17%) received a genotype-relevant drug off trial. Challenges to trial accrual included patient preference of noninvestigational treatment or local treatment, poor performance status or other reasons for trial ineligibility, lack of trials/slots, and insurance denial. Conclusion Broad implementation of multiplex hotspot testing is feasible; however, only a small portion of patients with actionable alterations were actually enrolled onto genotype-matched trials. Increased awareness of therapeutic implications and access to novel therapeutics are needed to optimally leverage results from broad-based genomic testing. PMID:26014291

  13. Phylogeny-driven target selection for large-scale genome-sequencing (and other) projects

    PubMed Central

    Göker, Markus; Klenk, Hans-Peter

    2013-01-01

    Despite the steadily decreasing costs of genome sequencing, prioritizing organisms for sequencing remains important in large-scale projects. Phylogeny-based selection is of interest to identify those organisms whose genomes can be expected to differ most from those that have already been sequenced. Here, we describe a method that infers a phylogenetic scoring independent of which set of organisms has previously been targeted, which is computationally simple and easy to apply in practice. The scoring itself, as well as pre- and post-processing of the data, is illustrated using two real-world examples in which the method has already been applied for selecting targets for genome sequencing. These projects are the JGI CSP Genomic Encyclopedia of Bacteria and Archaea phase I, targeting 1,000 type strains, and, on a smaller-scale, the phylogenomics of the Roseobacter clade. Potential artifacts of the method are discussed and compared to a selection approach based on the taxonomic classification. PMID:23991265

  14. Patterns and Mechanisms of Ancestral Histone Protein Inheritance in Budding Yeast

    PubMed Central

    van Welsem, Tibor; Friedman, Nir; Rando, Oliver J.; van Leeuwen, Fred

    2011-01-01

    Replicating chromatin involves disruption of histone-DNA contacts and subsequent reassembly of maternal histones on the new daughter genomes. In bulk, maternal histones are randomly segregated to the two daughters, but little is known about the fine details of this process: do maternal histones re-assemble at preferred locations or close to their original loci? Here, we use a recently developed method for swapping epitope tags to measure the disposition of ancestral histone H3 across the yeast genome over six generations. We find that ancestral H3 is preferentially retained at the 5′ ends of most genes, with strongest retention at long, poorly transcribed genes. We recapitulate these observations with a quantitative model in which the majority of maternal histones are reincorporated within 400 bp of their pre-replication locus during replication, with replication-independent replacement and transcription-related retrograde nucleosome movement shaping the resulting distributions of ancestral histones. We find a key role for Topoisomerase I in retrograde histone movement during transcription, and we find that loss of Chromatin Assembly Factor-1 affects replication-independent turnover. Together, these results show that specific loci are enriched for histone proteins first synthesized several generations beforehand, and that maternal histones re-associate close to their original locations on daughter genomes after replication. Our findings further suggest that accumulation of ancestral histones could play a role in shaping histone modification patterns. PMID:21666805

  15. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity

    PubMed Central

    Nikolaev, L.G; Akopov, S.B; Didych, D.A; Sverdlov, E.D

    2009-01-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  16. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity.

    PubMed

    Nikolaev, L G; Akopov, S B; Didych, D A; Sverdlov, E D

    2009-08-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  17. The PRRS Host Genomic Consortium (PHGC) Database: Management of large data sets.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In any consortium project where large amounts of phenotypic and genotypic data are collected across several research labs, issues arise with maintenance and analysis of datasets. The PRRS Host Genomic Consortium (PHGC) Database was developed to meet this need for the PRRS research community. The sch...

  18. Software engineering the mixed model for genome-wide association studies on large samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  19. Physical mapping resources for large plant genomes: radiation hybrids for wheat D-genome progenitor Aegilops tauschii

    PubMed Central

    2012-01-01

    Background Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported. Results Radiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH

  20. Captured segment exchange: a strategy for custom engineering large genomic regions in Drosophila melanogaster.

    PubMed

    Bateman, Jack R; Palopoli, Michael F; Dale, Sarah T; Stauffer, Jennifer E; Shah, Anita L; Johnson, Justine E; Walsh, Conor W; Flaten, Hanna; Parsons, Christine M

    2013-02-01

    Site-specific recombinases (SSRs) are valuable tools for manipulating genomes. In Drosophila, thousands of transgenic insertions carrying SSR recognition sites have been distributed throughout the genome by several large-scale projects. Here we describe a method with the potential to use these insertions to make custom alterations to the Drosophila genome in vivo. Specifically, by employing recombineering techniques and a dual recombinase-mediated cassette exchange strategy based on the phiC31 integrase and FLP recombinase, we show that a large genomic segment that lies between two SSR recognition-site insertions can be "captured" as a target cassette and exchanged for a sequence that was engineered in bacterial cells. We demonstrate this approach by targeting a 50-kb segment spanning the tsh gene, replacing the existing segment with corresponding recombineered sequences through simple and efficient manipulations. Given the high density of SSR recognition-site insertions in Drosophila, our method affords a straightforward and highly efficient approach to explore gene function in situ for a substantial portion of the Drosophila genome. PMID:23150604

  1. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  2. Epistatic interactions between ancestral genotype and beneficial mutations shape evolvability in Pseudomonas aeruginosa.

    PubMed

    Gifford, Danna R; Toll-Riera, Macarena; MacLean, R Craig

    2016-07-01

    The idea that interactions between mutations influence adaptation by driving populations to low and high fitness peaks on adaptive landscapes is deeply ingrained in evolutionary theory. Here, we investigate the impact of epistasis on evolvability by challenging populations of two Pseudomonas aeruginosa clones bearing different initial mutations (in rpoB conferring rifampicin resistance, and the type IV pili gene network) to adaptation to a medium containing l-serine as the sole carbon source. Despite being initially indistinguishable in fitness, populations founded by the two ancestral genotypes reached different fitness following 300 generations of evolution. Genome sequencing revealed that the difference could not be explained by acquiring mutations in different targets of selection; the majority of clones from both ancestors converged on one of the following two strategies: (1) acquiring mutations in either PA2449 (gcsR, an l-serine-metabolism RpoN enhancer binding protein) or (2) protease genes. Additionally, populations from both ancestors converged on loss-of-function mutations in the type IV pili gene network, either due to ancestral or acquired mutations. No compensatory or reversion mutations were observed in RNA polymerase (RNAP) genes, in spite of the large fitness costs typically associated with mutations in rpoB. Although current theory points to sign epistasis as the dominant constraint on evolvability, these results suggest that the role of magnitude epistasis in constraining evolvability may be underappreciated. The contribution of magnitude epistasis is likely to be greatest under the biologically relevant mutation supply rates that make back mutations probabilistically unlikely. PMID:27230588

  3. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli

    PubMed Central

    Lee, Heewook; Doak, Thomas G.; Popodi, Ellen; Foster, Patricia L.; Tang, Haixu

    2016-01-01

    A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10−4 insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10−5 recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186. Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5′-GGGG(N6/N7)CCCC-3′. We also detected 48 long deletions not involving IS elements. PMID:27431326

  4. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data.

    PubMed

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yu, Yang; Lu, Zhaohua; Knickmeyer, Rebecca C; Feng, Qianjin; Zhu, Hongtu

    2015-09-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  5. Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond

    SciTech Connect

    Oehmen, Christopher S.; Sofia, Heidi J.; Baxter, Douglas; Szeto, Ernest; Hugenholtz, Philip; Kyrpides, Nikos; Markowitz, Victor; Straatsma, Tjerk P.

    2007-06-01

    Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the Integrated Microbial Genome (IMG) system at the Joint Genome Institute (JGI). We present an example of how ScalaBLAST, a high-throughput sequence analysis program harnesses increasingly critical high-performance computing to perform sequence analysis which is a critical component of maintaining a state-of-the-art sequence data repository. The Integrated Microbial Genomes (IMG) system1 is a data management and analysis platform for microbial genomes hosted at the JGI. IMG contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life. IMG provides tools and viewers for interactive analysis of genomes, genes and functions, individually or in a comparative context. Most of these tools are based on pre-computed pairwise sequence similarities involving millions of genes. These computations are becoming prohibitively time consuming with the rapid increase in the number of newly sequenced genomes incorporated into IMG and the need to refresh regularly the content of IMG in order to reflect changes in the annotations of existing genomes. Thus, building IMG 2.0 (released on December 1st 2006) entailed reloading from NCBI's RefSeq all the genomes in the previous version of IMG (IMG 1.6, as of September 1st, 2006) together with 1,541 new public microbial,viral and eukaryal genomes, bringing the total of IMG genomes to 2,301. A critical part of building IMG 2.0 involved using PNNL ScalaBLAST software for computing pairwise similarities for over 2.2 million genes in under 26 hours on 1,000 processors, thus illustrating the impact that new generation bioinformatics tools are poised to make in biology. The BLAST algorithm2, 3 is a familiar bioinformatics application for computing sequence similarity, and has become a workhorse in large

  6. Biological Consequences of Ancient Gene Acquisition and Duplication in the Large Genome of Candidatus Solibacter usitatus Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Hauser, Loren John; Land, Miriam L; Xie, Gary; Kuske, Cheryl R

    2011-01-01

    Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2 5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.

  7. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  8. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick [Kerfeld Lab, UC Berkeley and JGI

    2013-01-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  9. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Shih, Patrick

    2012-03-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  10. Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus

    PubMed Central

    2010-01-01

    Background Higher crustaceans (class Malacostraca) represent the most species-rich and morphologically diverse group of non-insect arthropods and many of its members are commercially important. Although the crustacean DNA sequence information is growing exponentially, little is known about the genome organization of Malacostraca. Here, we constructed a bacterial artificial chromosome (BAC) library and performed BAC-end sequencing to provide genomic information for kuruma shrimp (Marsupenaeus japonicus), one of the most widely cultured species among crustaceans, and found the presence of a redundant sequence in the BAC library. We examined the BAC clone that includes the redundant sequence to further analyze its length, copy number and location in the kuruma shrimp genome. Results Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences. Conclusions Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the

  11. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    SciTech Connect

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  12. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains

    PubMed Central

    Salipante, Stephen J.; Roach, David J.; Kitzman, Jacob O.; Snyder, Matthew W.; Stackhouse, Bethany; Butler-Wu, Susan M.; Lee, Choli; Cookson, Brad T.

    2015-01-01

    Large-scale bacterial genome sequencing efforts to date have provided limited information on the most prevalent category of disease: sporadically acquired infections caused by common pathogenic bacteria. Here, we performed whole-genome sequencing and de novo assembly of 312 blood- or urine-derived isolates of extraintestinal pathogenic (ExPEC) Escherichia coli, a common agent of sepsis and community-acquired urinary tract infections, obtained during the course of routine clinical care at a single institution. We find that ExPEC E. coli are highly genomically heterogeneous, consistent with pan-genome analyses encompassing the larger species. Investigation of differential virulence factor content and antibiotic resistance phenotypes reveals markedly different profiles among lineages and among strains infecting different body sites. We use high-resolution molecular epidemiology to explore the dynamics of infections at the level of individual patients, including identification of possible person-to-person transmission. Notably, a limited number of discrete lineages caused the majority of bloodstream infections, including one subclone (ST131-H30) responsible for 28% of bacteremic E. coli infections over a 3-yr period. We additionally use a microbial genome-wide-association study (GWAS) approach to identify individual genes responsible for antibiotic resistance, successfully recovering known genes but notably not identifying any novel factors. We anticipate that in the near future, whole-genome sequencing of microorganisms associated with clinical disease will become routine. Our study reveals what kind of information can be obtained from sequencing clinical isolates on a large scale, even well-characterized organisms such as E. coli, and provides insight into how this information might be utilized in a healthcare setting. PMID:25373147

  13. Insights into the Genome of Large Sulfur Bacteria Revealed by Analysis of Single Filaments

    PubMed Central

    Richter, Michael; de Beer, Dirk; Preisler, André; Jørgensen, Bo B; Huntemann, Marcel; Glöckner, Frank Oliver; Amann, Rudolf; Koopman, Werner J. H; Lasken, Roger S; Janto, Benjamin; Hogg, Justin; Stoodley, Paul; Boissy, Robert; Ehrlich, Garth D

    2007-01-01

    Marine sediments are frequently covered by mats of the filamentous Beggiatoa and other large nitrate-storing bacteria that oxidize hydrogen sulfide using either oxygen or nitrate, which they store in intracellular vacuoles. Despite their conspicuous metabolic properties and their biogeochemical importance, little is known about their genetic repertoire because of the lack of pure cultures. Here, we present a unique approach to access the genome of single filaments of Beggiatoa by combining whole genome amplification, pyrosequencing, and optical genome mapping. Sequence assemblies were incomplete and yielded average contig sizes of approximately 1 kb. Pathways for sulfur oxidation, nitrate and oxygen respiration, and CO2 fixation confirm the chemolithoautotrophic physiology of Beggiatoa. In addition, Beggiatoa potentially utilize inorganic sulfur compounds and dimethyl sulfoxide as electron acceptors. We propose a mechanism of vacuolar nitrate accumulation that is linked to proton translocation by vacuolar-type ATPases. Comparative genomics indicates substantial horizontal gene transfer of storage, metabolic, and gliding capabilities between Beggiatoa and cyanobacteria. These capabilities enable Beggiatoa to overcome non-overlapping availabilities of electron donors and acceptors while gliding between oxic and sulfidic zones. The first look into the genome of these filamentous sulfur-oxidizing bacteria substantially deepens the understanding of their evolution and their contribution to sulfur and nitrogen cycling in marine sediments. PMID:17760503

  14. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

    PubMed Central

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852

  15. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  16. The ancestral gene repertoire of animal stem cells

    PubMed Central

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-01-01

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the “germ-line multipotency program” and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  17. An improved method for oriT-directed cloning and functionalization of large bacterial genomic regions.

    PubMed

    Kvitko, Brian H; McMillan, Ian A; Schweizer, Herbert P

    2013-08-01

    We have made significant improvements to a broad-host-range system for the cloning and manipulation of large bacterial genomic regions based on site-specific recombination between directly repeated oriT sites during conjugation. Using two suicide capture vectors carrying flanking homology regions, oriT sites are recombined on either side of the target region. Using a broad-host-range conjugation helper plasmid, the region between the oriT sites is conjugated into an Escherichia coli recipient strain, where it is circularized and maintained as a chimeric mini-F vector. The cloned target region is functionalized in multiple ways to accommodate downstream manipulation. The target region is flanked with Gateway attB sites for recombination into other vectors and by rare 18-bp I-SceI restriction sites for subcloning. The Tn7-functionalized target can also be inserted at a naturally occurring chromosomal attTn7 site(s) or maintained as a broad-host-range plasmid for complementation or heterologous expression studies. We have used the oriTn7 capture technique to clone and complement Burkholderia pseudomallei genomic regions up to 140 kb in size and have created isogenic Burkholderia strains with various combinations of genomic islands. We believe this system will greatly aid the cloning and genetic analysis of genomic islands, biosynthetic gene clusters, and large open reading frames. PMID:23747708

  18. An Improved Method for oriT-Directed Cloning and Functionalization of Large Bacterial Genomic Regions

    PubMed Central

    Kvitko, Brian H.; McMillan, Ian A.

    2013-01-01

    We have made significant improvements to a broad-host-range system for the cloning and manipulation of large bacterial genomic regions based on site-specific recombination between directly repeated oriT sites during conjugation. Using two suicide capture vectors carrying flanking homology regions, oriT sites are recombined on either side of the target region. Using a broad-host-range conjugation helper plasmid, the region between the oriT sites is conjugated into an Escherichia coli recipient strain, where it is circularized and maintained as a chimeric mini-F vector. The cloned target region is functionalized in multiple ways to accommodate downstream manipulation. The target region is flanked with Gateway attB sites for recombination into other vectors and by rare 18-bp I-SceI restriction sites for subcloning. The Tn7-functionalized target can also be inserted at a naturally occurring chromosomal attTn7 site(s) or maintained as a broad-host-range plasmid for complementation or heterologous expression studies. We have used the oriTn7 capture technique to clone and complement Burkholderia pseudomallei genomic regions up to 140 kb in size and have created isogenic Burkholderia strains with various combinations of genomic islands. We believe this system will greatly aid the cloning and genetic analysis of genomic islands, biosynthetic gene clusters, and large open reading frames. PMID:23747708

  19. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2. PMID:27527786

  20. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

    PubMed

    Gao, Song; Bertrand, Denis; Chia, Burton K H; Nagarajan, Niranjan

    2016-01-01

    The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies. PMID:27169502

  1. Distilling Artificial Recombinants from Large Sets of Complete mtDNA Genomes

    PubMed Central

    Kong, Qing-Peng; Salas, Antonio; Sun, Chang; Fuku, Noriyuki; Tanaka, Masashi; Zhong, Li; Wang, Cheng-Ye; Yao, Yong-Gang; Bandelt, Hans-Jürgen

    2008-01-01

    Background Large-scale genome sequencing poses enormous problems to the logistics of laboratory work and data handling. When numerous fragments of different genomes are PCR amplified and sequenced in a laboratory, there is a high immanent risk of sample confusion. For genetic markers, such as mitochondrial DNA (mtDNA), which are free of natural recombination, single instances of sample mix-up involving different branches of the mtDNA phylogeny would give rise to reticulate patterns and should therefore be detectable. Methodology/Principal Findings We have developed a strategy for comparing new complete mtDNA genomes, one by one, to a current skeleton of the worldwide mtDNA phylogeny. The mutations distinguishing the reference sequence from a putative recombinant sequence can then be allocated to two or more different branches of this phylogenetic skeleton. Thus, one would search for two (or three) near-matches in the total mtDNA database that together best explain the variation seen in the recombinants. The evolutionary pathway from the mtDNA tree connecting this pair together with the recombinant then generate a grid-like median network, from which one can read off the exchanged segments. Conclusions We have applied this procedure to a large collection of complete human mtDNA sequences, where several recombinants could be distilled by our method. All these recombinant sequences were subsequently corrected by de novo experiments – fully concordant with the predictions from our data-analytical approach. PMID:18714389

  2. Genome-scale phylogenetic function annotation of large and diverse protein families

    PubMed Central

    Engelhardt, Barbara E.; Jordan, Michael I.; Srouji, John R.; Brenner, Steven E.

    2011-01-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu. PMID:21784873

  3. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Xie, Gary; Kuske, Cheryl R; Hauser, Loren; Land, Miriam

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  4. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  5. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing

    PubMed Central

    Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J.

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  6. A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics

    PubMed Central

    Doroghazi, James R.; Albright, Jessica C.; Goering, Anthony W.; Ju, Kou-San; Haines, Robert R.; Tchalukov, Konstantin A.; Labeda, David P.; Kelleher, Neil L.; Metcalf, William W.

    2014-01-01

    Actinobacteria encode a wealth of natural product biosynthetic gene clusters (NPGCs), whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprised of 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence/absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of novel natural products using large data sets. Extrapolation from the 830-genome dataset reveals that Actinobacteria encode hundreds of thousands of future drug leads, while the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them. PMID:25262415

  7. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    PubMed

    Schiffer, Philipp H; Gravemeyer, Jan; Rauscher, Martina; Wiehe, Thomas

    2016-01-01

    Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term "run-away evolution". This process might ultimately lead to the failure of genomic integrity and drive species to extinction. PMID:27509525

  8. Draft genome sequence of the Daphnia pathogen Octosporea bayeri: insights into the gene content of a large microsporidian genome and a model for host-parasite interactions

    PubMed Central

    2009-01-01

    Background The highly compacted 2.9-Mb genome of Encephalitozoon cuniculi placed the microsporidia in the spotlight, encoding a mere 2,000 proteins and a highly reduced suite of biochemical pathways. This extreme level of reduction is not universal across the microsporidia, with genomes known to vary up to sixfold in size, suggesting that some genomes may harbor a gene content that is not as reduced as that of Enc. cuniculi. In this study, we present an in-depth survey of the large genome of Octosporea bayeri, a pathogen of Daphnia magna, with an estimated genome size of 24 Mb, in order to shed light on the organization and content of a large microsporidian genome. Results Using Illumina sequencing, 898 Mb of O. bayeri genome sequence was generated, resulting in 13.3 Mb of unique sequence. We annotated a total of 2,174 genes, of which 893 encodes proteins with assigned function. The gene density of the O. bayeri genome is very low on average, but also highly uneven, so gene-dense regions also occur. The data presented here suggest that the O. bayeri proteome is well represented in this analysis and is more complex that that of Enc. cuniculi. Functional annotation of O. bayeri proteins suggests that this species might be less biochemically dependent on its host for its metabolism than its more reduced relatives. Conclusions The combination of the data presented here, together with the imminent annotated genome of Daphnia magna, will provide a wealth of genetic and genomic tools to study host-parasite interactions in an interesting model for pathogenesis. PMID:19807911

  9. Ancestral European roots of Helicobacter pylori in India

    PubMed Central

    Devi, S Manjulata; Ahmed, Irshad; Francalacci, Paolo; Hussain, M Abid; Akhter, Yusuf; Alvi, Ayesha; Sechi, Leonardo A; Mégraud, Francis; Ahmed, Niyaz

    2007-01-01

    Background The human gastric pathogen Helicobacter pylori is co-evolved with its host and therefore, origins and expansion of multiple populations and sub populations of H. pylori mirror ancient human migrations. Ancestral origins of H. pylori in the vast Indian subcontinent are debatable. It is not clear how different waves of human migrations in South Asia shaped the population structure of H. pylori. We tried to address these issues through mapping genetic origins of present day H. pylori in India and their genomic comparison with hundreds of isolates from different geographic regions. Results We attempted to dissect genetic identity of strains by multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and phylogeographic analysis of haplotypes using MEGA and NETWORK software while incorporating DNA sequences and genotyping data of whole cag pathogenicity-islands (cagPAI). The distribution of cagPAI genes within these strains was analyzed by using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. All the isolates analyzed revealed European ancestry and belonged to H. pylori sub-population, hpEurope. The cagPAI harbored by Indian strains revealed European features upon PCR based analysis and whole PAI sequencing. Conclusion These observations suggest that H. pylori strains in India share ancestral origins with their European counterparts. Further, non-existence of other sub-populations such as hpAfrica and hpEastAsia, at least in our collection of isolates, suggest that the hpEurope strains enjoyed a special fitness advantage in Indian stomachs to out-compete any endogenous strains. These results also might support hypotheses related to gene flow in India through Indo-Aryans and arrival of Neolithic practices and languages from the Fertile Crescent. PMID:17584914

  10. Reverse engineering and analysis of large genome-scale gene networks

    PubMed Central

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-01

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249

  11. Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

    PubMed Central

    Makarova, Kira S.; Wolf, Yuri I.; Mekhedov, Sergey L.; Mirkin, Boris G.; Koonin, Eugene V.

    2005-01-01

    Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes. PMID:16106042

  12. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection

    PubMed Central

    Xie, Weibo; Wang, Gongwei; Yuan, Meng; Yao, Wen; Lyu, Kai; Zhao, Hu; Yang, Meng; Li, Pingbo; Zhang, Xing; Yuan, Jing; Wang, Quanxiu; Liu, Fang; Dong, Huaxia; Zhang, Lejing; Li, Xinglei; Meng, Xiangzhou; Zhang, Wan; Xiong, Lizhong; He, Yuqing; Wang, Shiping; Yu, Sibin; Xu, Caiguo; Luo, Jie; Li, Xianghua; Xiao, Jinghua; Lian, Xingming; Zhang, Qifa

    2015-01-01

    Intensive rice breeding over the past 50 y has dramatically increased productivity especially in the indica subspecies, but our knowledge of the genomic changes associated with such improvement has been limited. In this study, we analyzed low-coverage sequencing data of 1,479 rice accessions from 73 countries, including landraces and modern cultivars. We identified two major subpopulations, indica I (IndI) and indica II (IndII), in the indica subspecies, which corresponded to the two putative heterotic groups resulting from independent breeding efforts. We detected 200 regions spanning 7.8% of the rice genome that had been differentially selected between IndI and IndII, and thus referred to as breeding signatures. These regions included large numbers of known functional genes and loci associated with important agronomic traits revealed by genome-wide association studies. Grain yield was positively correlated with the number of breeding signatures in a variety, suggesting that the number of breeding signatures in a line may be useful for predicting agronomic potential and the selected loci may provide targets for rice improvement. PMID:26358652

  13. In search of ancestral Kilauea volcano

    USGS Publications Warehouse

    Lipman, P.W.; Sisson, T.W.; Ui, T.; Naka, J.

    2000-01-01

    Submersible observations and samples show that the lower south flank of Hawaii, offshore from Kilauea volcano and the active Hilina slump system, consists entirely of compositionally diverse volcaniclastic rocks; pillow lavas are confined to shallow slopes. Submarine-erupted basalt clasts have strongly variable alkalic and transitional basalt compositions (to 41% SiO2, 10.8% alkalies), contrasting with present-day Kilauea tholeiites. The volcaniclastic rocks provide a unique record of ancestral alkalic growth of an archetypal hotspot volcano, including transition to its tholeiitic shield stage, and associated slope-failure events.

  14. A method for the large scale isolation of high transformation efficiency fungal genomic DNA.

    PubMed

    Zhang, D; Yang, Y; Castlebury, L A; Cerniglia, C E

    1996-12-01

    A procedure for isolation of genomic DNA from the zygomycete Cunninghamella elegans and other filamentous fungi and yeasts is reported. This procedure involves disruption of cells by grinding using dry ice, removal of polysaccharides using cetyltrimethylammonium bromide and by phenol extractions, and precipitation of DNA with isopropanol at room temperature. The isolation method produced large scale (approximate 1 mg DNA/5 g wet cells) and highly purified high molecular mass DNA. Sau3AI partially digested DNA showed high transformation efficiency (> 10(6)/100 ng DNA) when ligated to ZAP-express lambda vector. PMID:8961565

  15. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    PubMed

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy

  16. Managing Large-Scale Genomic Datasets and Translation into Clinical Practice

    PubMed Central

    2014-01-01

    Summary Objective To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain. Method We provide a synopsis of the articles selected for the IMIA Yearbook 2014, from which we attempt to derive a synthetic overview of current and future activities in the field. A first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor evaluated independently the set of 1,851 articles and 15 articles were retained for peer-review. Results The selection and evaluation process of this Yearbook’s section on Bioinformatics and Translational Informatics yielded three excellent articles regarding data management and genome medicine. In the first article, the authors present VEST (Variant Effect Scoring Tool) which is a supervised machine learning tool for prioritizing variants found in exome sequencing projects that are more likely involved in human Mendelian diseases. In the second article, the authors show how to infer surnames of male individuals by crossing anonymous publicly available genomic data from the Y chromosome and public genealogy data banks. The third article presents a statistical framework called iCluster+ that can perform pattern discovery in integrated cancer genomic data. This framework was able to determine different tumor subtypes in colon cancer. Conclusions The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on large-scale biological, genomic, and Electronic Health Records data. Indeed, there is a need for powerful tools for managing and interpreting complex data, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts are contributing to the challenge of impacting clinically the results and even going towards a

  17. A Common Ancestral Mutation in CRYBB3 Identified in Multiple Consanguineous Families with Congenital Cataracts

    PubMed Central

    Irum, Bushra; Khan, Arif O.; Wang, Qiwei; Li, David; Khan, Asma A.; Husnain, Tayyab; Akram, Javed; Riazuddin, Sheikh

    2016-01-01

    Purpose This study was performed to investigate the genetic determinants of autosomal recessive congenital cataracts in large consanguineous families. Methods Affected individuals underwent a detailed ophthalmological examination and slit-lamp photographs of the cataractous lenses were obtained. An aliquot of blood was collected from all participating family members and genomic DNA was extracted from white blood cells. Initially, a genome-wide scan was performed with genomic DNAs of family PKCC025 followed by exclusion analysis of our familial cohort of congenital cataracts. Protein-coding exons of CRYBB1, CRYBB2, CRYBB3, and CRYBA4 were sequenced bidirectionally. A haplotype was constructed with SNPs flanking the causal mutation for affected individuals in all four families, while the probability that the four familial cases have a common founder was estimated using EM and CHM-based algorithms. The expression of Crybb3 in the developing murine lens was investigated using TaqMan assays. Results The clinical and ophthalmological examinations suggested that all affected individuals had nuclear cataracts. Genome-wide linkage analysis localized the causal phenotype in family PKCC025 to chromosome 22q with statistically significant two-point logarithm of odds (LOD) scores. Subsequently, we localized three additional families, PKCC063, PKCC131, and PKCC168 to chromosome 22q. Bidirectional Sanger sequencing identified a missense variation: c.493G>C (p.Gly165Arg) in CRYBB3 that segregated with the disease phenotype in all four familial cases. This variation was not found in ethnically matched control chromosomes, the NHLBI exome variant server, or the 1000 Genomes or dbSNP databases. Interestingly, all four families harbor a unique disease haplotype that strongly suggests a common founder of the causal mutation (p<1.64E-10). We observed expression of Crybb3 in the mouse lens as early as embryonic day 15 (E15), and expression remained relatively steady throughout

  18. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations. PMID:25696824

  19. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  20. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

    PubMed Central

    Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A.; Tenesa, Albert

    2015-01-01

    Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010

  1. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach.

    PubMed

    Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A; Tenesa, Albert

    2015-01-01

    Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010

  2. Cross-Platform Assessment of Genomic Imbalance Confirms the Clinical Relevance of Genomic Complexity and Reveals Loci with Potential Pathogenic Roles in Diffuse Large B-Cell Lymphoma

    PubMed Central

    Dias, Lizalynn M.; Thodima, Venkata; Friedman, Julia; Ma, Charles; Guttapalli, Asha; Mendiratta, Geetu; Siddiqi, Imran N.; Syrbu, Sergei; Chaganti, R. S. K.; Houldsworth, Jane

    2016-01-01

    Genomic copy number alterations (CNAs) in diffuse large B-cell lymphoma (DLBCL) have roles in disease pathogenesis but overall clinical relevance remains unclear. Herein, an unbiased algorithm was uniformly applied across three genome profiling datasets comprising 392 newly-diagnosed DLBCL specimens that defined 32 overlapping CNAs, involving 36 minimal common regions (MCRs). Scoring criteria were established for 50 aberrations within the MCRs while considering peak gains/losses. Application of these criteria to independent datasets revealed novel candidate genes with coordinated expression, such as CNOT2, potentially with pathogenic roles. No one single aberration significantly associated with patient outcome across datasets, but genomic complexity, defined by imbalance in more than one MCR, significantly portended adverse outcome in two of three independent datasets. Thus, the standardized scoring of CNAs currently developed can be uniformly applied across platforms, affording robust validation of genomic imbalance and complexity in DLBCL and overall clinical utility as biomarkers of patient outcome. PMID:26294112

  3. An Ancestral Recombination Graph for Diploid Populations with Skewed Offspring Distribution

    PubMed Central

    Birkner, Matthias; Blath, Jochen; Eldon, Bjarki

    2013-01-01

    A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between

  4. Transitions in Sexuality: Recapitulation of an Ancestral Tri- and Tetrapolar Mating System in Cryptococcus neoformans▿ †

    PubMed Central

    Hsueh, Yen-Ping; Fraser, James A.; Heitman, Joseph

    2008-01-01

    Sex is orchestrated by the mating-type locus (MAT) in fungi and by sex chromosomes in plants and animals. In fungi, two patterns of sexuality occur: bipolar with a single, typically biallelic sex determinant that promotes inbreeding, and tetrapolar with two unlinked, often multiallelic sex determinants that restrict inbreeding. Multiallelism in either bipolar or tetrapolar mating systems promotes outcrossing. Cryptococcus neoformans is a pathogenic bipolar yeast with two unusually large MAT alleles (a/α) spanning >100 kb, ∼100-fold larger than many other fungal MAT loci. Based on comparative genomic analysis, this unusual MAT locus is hypothesized to have evolved from an ancestral tetrapolar system. In this model, the unlinked homeodomain (HD) transcription factor and pheromone/receptor tetrapolar loci acquired additional sex-related genes and then fused via chromosomal translocation, forming an intermediate transitional mating system (which we term tripolar), which then underwent recombination and gene conversion to fashion the extant bipolar MAT alleles. To experimentally validate this model, C. neoformans was engineered to have a tetrapolar mating system by relocating the MAT SXI1α and SXI2a HD genes to an unlinked genomic locale. Genetic and molecular analyses revealed that this modified organism could complete a tetrapolar sexual cycle. Analysis of progeny generated from bipolar, tripolar, and tetrapolar crosses provides direct experimental evidence that the tripolar state confers decreased fertility and therefore may represent an unstable evolutionary intermediate. These findings illustrate how transitions between outcrossing and inbreeding preference occur by involving sex determinant linkage and collapse from multiallelic to biallelic sex determination, providing insights into both fungal sex evolution and early steps in sex chromosome evolution. PMID:18723606

  5. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    PubMed Central

    2010-01-01

    Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind. PMID:20178631

  6. Genomic mechanisms underlying PARK2 large deletions identified in a cohort of patients with PD

    PubMed Central

    Morais, Sara; Bastos-Ferreira, Rita; Sequeiros, Jorge

    2016-01-01

    Objectives: To identify the genomic mechanisms that result in PARK2 large gene deletions. Methods: We conducted mutation screening using PCR amplification of PARK2-coding regions and exon-intron boundaries, followed by sequencing to evaluate a large series of 244 unrelated Portuguese patients with symptoms of Parkinson disease. For the detection of large gene rearrangements, we performed multiplex ligation-dependent probe amplification, followed by long-range PCR and sequencing to map deletion breakpoints. Results: We identified biallelic pathogenic parkin mutations in 40 of the 244 patients. There were 18 different mutations, some of them novel. This study included mapping of 17 deletion breakpoints showing that nonhomologous end joining is the most common mechanism responsible for these gene rearrangements. None of these deletion breakpoints were previously described, and only one was present in 2 unrelated families, indicating that most of the deletions result from independent events. Conclusions: The c.155delA mutation is highly prevalent in the Portuguese population (62.5% of the cases). Large deletions were present in 42.5% of the patients. We present the largest study on the molecular mechanisms that mediate PARK2 deletions in a homogeneous population. PMID:27182553

  7. Differentially expressed genes match bill morphology and plumage despite largely undifferentiated genomes in a Holarctic songbird.

    PubMed

    Mason, Nicholas A; Taylor, Scott A

    2015-06-01

    Understanding the patterns and processes that contribute to phenotypic diversity and speciation is a central goal of evolutionary biology. Recently, high-throughput sequencing has provided unprecedented phylogenetic resolution in many lineages that have experienced rapid diversification. The Holarctic redpoll finches (Genus: Acanthis) provide an intriguing example of a recent, phenotypically diverse lineage; traditional sequencing and genotyping methods have failed to detect any genetic differences between currently recognized species, despite marked variation in plumage and morphology within the genus. We examined variation among 20 712 anonymous single nucleotide polymorphisms (SNPs) distributed throughout the redpoll genome in combination with 215 825 SNPs within the redpoll transcriptome, gene expression data and ecological niche modelling to evaluate genetic and ecological differentiation among currently recognized species. Expanding upon previous findings, we present evidence of (i) largely undifferentiated genomes among currently recognized species; (ii) substantial niche overlap across the North American Acanthis range; and (iii) a strong relationship between polygenic patterns of gene expression and continuous phenotypic variation within a sample of redpolls from North America. The patterns we report may be caused by high levels of ongoing gene flow between polymorphic populations, incomplete lineage sorting accompanying very recent or ongoing divergence, variation in cis-regulatory elements, or phenotypic plasticity, but do not support a scenario of prolonged isolation and subsequent secondary contact. Together, these findings highlight ongoing theoretical and computational challenges presented by recent, rapid bouts of phenotypic diversification and provide new insight into the evolutionary dynamics of an intriguing, understudied non-model system. PMID:25735539

  8. Large-scale analysis of tandem repeat variability in the human genome

    PubMed Central

    Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R.; Verstrepen, Kevin J.; Froyen, Guy

    2014-01-01

    Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats. PMID:24682812

  9. The Exceptionally Large Chloroplast Genome of the Green Alga Floydiella terrestris Illuminates the Evolutionary History of the Chlorophyceae

    PubMed Central

    Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2010-01-01

    The Chlorophyceae, an advanced class of chlorophyte green algae, comprises five lineages that form two major clades (Chlamydomonadales + Sphaeropleales and Oedogoniales + Chaetopeltidales + Chaetophorales). The four complete chloroplast DNA (cpDNA) sequences currently available for chlorophyceans uncovered an extraordinarily fluid genome architecture as well as many structural features distinguishing this group from other green algae. We report here the 521,168-bp cpDNA sequence from a member of the Chaetopeltidales (Floydiella terrestris), the sole chlorophycean lineage not previously sampled for chloroplast genome analysis. This genome, which contains 97 conserved genes and 26 introns (19 group I and 7 group II introns), is the largest chloroplast genome ever sequenced. Intergenic regions account for 77.8% of the genome size and are populated by short repeats. Numerous genomic features are shared with the cpDNA of the chaetophoralean Stigeoclonium helveticum, notably the absence of a large inverted repeat and the presence of unique gene clusters and trans-spliced group II introns. Although only one of the Floydiella group I introns encodes a homing endonuclease gene, our finding of five free-standing reading frames having similarity with such genes suggests that chloroplast group I introns endowed with mobility were once more abundant in the Floydiella lineage. Parsimony analysis of structural genomic features and phylogenetic analysis of chloroplast sequence data unambiguously resolved the Oedogoniales as sister to the Chaetopeltidales and Chaetophorales. An evolutionary scenario of the molecular events that shaped the chloroplast genome in the Chlorophyceae is presented. PMID:20624729

  10. Rapidly Registering Identity-by-Descent Across Ancestral Recombination Graphs.

    PubMed

    Yang, Shuo; Carmi, Shai; Pe'er, Itsik

    2016-06-01

    The genomes of remotely related individuals occasionally contain long segments that are identical by descent (IBD). Sharing of IBD segments has many applications in population and medical genetics, and it is thus desirable to study their properties in simulations. However, no current method provides a direct, efficient means to extract IBD segments from simulated genealogies. Here, we introduce computationally efficient approaches to extract ground-truth IBD segments from a sequence of genealogies, or equivalently, an ancestral recombination graph. Specifically, we use a two-step scheme, where we first identify putative shared segments by comparing the common ancestors of all pairs of individuals at some distance apart. This reduces the search space considerably, and we then proceed by determining the true IBD status of the candidate segments. Under some assumptions and when allowing a limited resolution of segment lengths, our run-time complexity is reduced from O(n(3) log n) for the naïve algorithm to O(n log n), where n is the number of individuals in the sample. PMID:27104872

  11. The Bimodal Distribution of Genic GC Content Is Ancestral to Monocot Species

    PubMed Central

    Clément, Yves; Fustier, Margaux-Alison; Nabholz, Benoit; Glémin, Sylvain

    2015-01-01

    In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots. PMID:25527839

  12. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development. PMID:19933212

  13. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  14. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-08-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  15. Physical mapping in large genomes: accelerating anchoring of BAC contigs to genetic maps through in silico analysis.

    PubMed

    Paux, Etienne; Legeai, Fabrice; Guilhot, Nicolas; Adam-Blondon, Anne-Françoise; Alaux, Michaël; Salse, Jérôme; Sourdille, Pierre; Leroy, Philippe; Feuillet, Catherine

    2008-02-01

    Anchored physical maps represent essential frameworks for map-based cloning, comparative genomics studies, and genome sequencing projects. High throughput anchoring can be achieved by polymerase chain reaction (PCR) screening of bacterial artificial chromosome (BAC) library pools with molecular markers. However, for large genomes such as wheat, the development of high dimension pools and the number of reactions that need to be performed can be extremely large making the screening laborious and costly. To improve the cost efficiency of anchoring in such large genomes, we have developed a new software named Elephant (electronic physical map anchoring tool) that combines BAC contig information generated by FingerPrinted Contig with results of BAC library pools screening to identify BAC addresses with a minimal amount of PCR reactions. Elephant was evaluated during the construction of a physical map of chromosome 3B of hexaploid wheat. Results show that a one dimensional pool screening can be sufficient to anchor a BAC contig while reducing the number of PCR by 384-fold thereby demonstrating that Elephant is an efficient and cost-effective tool to support physical mapping in large genomes. PMID:18038165

  16. The genomic and physical organization of Ty1-copia-like sequences as a component of large genomes in Pinus elliottii var. elliottii and other gymnosperms.

    PubMed Central

    Kamm, A; Doudrick, R L; Heslop-Harrison, J S; Schmidt, T

    1996-01-01

    A DNA sequence, TPE1, representing the internal domain of a Ty1-copia retroelement, was isolated from genomic DNA of Pinus elliottii Engelm. var. elliottii (slash pine). Genomic Southern analysis showed that this sequence, carrying partial reverse transcriptase and integrase gene sequences, is highly amplified within the genome of slash pine and part of a dispersed element >4.8 kbp. Fluorescent in situ hybridization to metaphase chromosomes shows that the element is relatively uniformly dispersed over all 12 chromosome pairs and is highly abundant in the genome. It is largely excluded from centromeric regions and intercalary chromosomal sites representing the 18S-5.8S-25S rRNA genes. Southern hybridization with specific DNA probes for the reverse transcriptase gene shows that TPE1 represents a large subgroup of heterogeneous Ty1-copia retrotransposons in Pinus species. Because no TPE1 transcription could be detected, it is most likely an inactive element--at least in needle tissue. Further evidence for inactivity was found in recombinant reverse transcriptase and integrase sequences. The distribution of TPE1 within different gymnosperms that contain Ty1-copia group retrotransposons, as shown by a PCR assay, was investigated by Southern hybridization. The TPE1 family is highly amplified and conserved in all Pinus species analyzed, showing a similar genomic organization in the three- and five-needle pine species investigated. It is also present in spruce, bald cypress (swamp cypress), and in gingko but in fewer copies and a different genomic organization. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 PMID:8610105

  17. Evo-Devo: Variations on Ancestral Themes

    PubMed Central

    De Robertis, E.M.

    2008-01-01

    Most animals evolved from a common ancestor, Urbilateria, which already had in place the developmental genetic networks for shaping body plans. Comparative genomics has revealed rather unexpectedly that many of the genes present in bilaterian animal ancestors were lost by individual phyla during evolution. Reconstruction of the archetypal developmental genomic tool-kit present in Urbilateria will help to elucidate the contribution of gene loss and developmental constraints to the evolution of animal body plans. PMID:18243095

  18. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction.

    PubMed

    De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan

    2016-05-01

    Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions. PMID:26936924

  19. Diversity and relationships of cocirculating modern human rotaviruses revealed using large-scale comparative genomics.

    PubMed

    McDonald, Sarah M; McKell, Allison O; Rippinger, Christine M; McAllen, John K; Akopov, Asmik; Kirkness, Ewen F; Payne, Daniel C; Edwards, Kathryn M; Chappell, James D; Patton, John T

    2012-09-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  20. Diversity and Relationships of Cocirculating Modern Human Rotaviruses Revealed Using Large-Scale Comparative Genomics

    PubMed Central

    McKell, Allison O.; Rippinger, Christine M.; McAllen, John K.; Akopov, Asmik; Kirkness, Ewen F.; Payne, Daniel C.; Edwards, Kathryn M.; Chappell, James D.; Patton, John T.

    2012-01-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  1. Comparative genomics of protoploid Saccharomycetaceae.

    PubMed

    Souciet, Jean-Luc; Dujon, Bernard; Gaillardin, Claude; Johnston, Mark; Baret, Philippe V; Cliften, Paul; Sherman, David J; Weissenbach, Jean; Westhof, Eric; Wincker, Patrick; Jubin, Claire; Poulain, Julie; Barbe, Valérie; Ségurens, Béatrice; Artiguenave, François; Anthouard, Véronique; Vacherie, Benoit; Val, Marie-Eve; Fulton, Robert S; Minx, Patrick; Wilson, Richard; Durrens, Pascal; Jean, Géraldine; Marck, Christian; Martin, Tiphaine; Nikolski, Macha; Rolland, Thomas; Seret, Marie-Line; Casarégola, Serge; Despons, Laurence; Fairhead, Cécile; Fischer, Gilles; Lafontaine, Ingrid; Leh, Véronique; Lemaire, Marc; de Montigny, Jacky; Neuvéglise, Cécile; Thierry, Agnès; Blanc-Lenfle, Isabelle; Bleykasten, Claudine; Diffels, Julie; Fritsch, Emilie; Frangeul, Lionel; Goëffon, Adrien; Jauniaux, Nicolas; Kachouri-Lafond, Rym; Payen, Célia; Potier, Serge; Pribylova, Lenka; Ozanne, Christophe; Richard, Guy-Franck; Sacerdot, Christine; Straub, Marie-Laure; Talla, Emmanuel

    2009-10-01

    Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call "protoploid" because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified. PMID:19525356

  2. Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq.

    PubMed

    Karam, M-J; Lefèvre, F; Dagher-Kharrat, M Bou; Pinosio, S; Vendramin, G G

    2015-05-01

    We combined restriction site associated DNA sequencing (RADseq) using a hypomethylation-sensitive enzyme and messenger RNA sequencing (mRNAseq) to develop molecular markers for the 16 gigabase genome of Cedrus atlantica, a conifer tree species. With each method, Illumina(®) reads from one individual were used to generate de novo assemblies. SNPs from the RADseq data set were detected in a panel of one single individual and three pools of three individuals each. We developed a flexible script to estimate the ascertainment bias in SNP detection considering the pooling and sampling effects on the probability of not detecting an existing polymorphism. Gene Ontology (GO) and transposable element (TE) search analyses were applied to both data sets. The RADseq and the mRNAseq assemblies represented 0.1% and 0.6% of the genome, respectively. Genome complexity reduction resulted in 17% of the RADseq contigs potentially coding for proteins. This rate was doubled in the mRNAseq data set, suggesting that RADseq also explores noncoding low-repeat regions. The two methods gave very similar GO-slim profiles. As expected, the two assemblies were poor in TE-like sequences (<4% of contigs length). We identified 17,348 single nucleotide polymorphisms (SNPs) in the RADseq data set and 5,714 simple sequence repeats (SSRs) in the transcriptome. A subset of 282 SNPs was validated using the Fluidigm genotyping technology, giving a conversion rate of 50.4%, falling within the expected range for conifers. Increasing sample size had the greatest effect for ascertainment bias reduction. These results validated the utility of the RADseq approach for highly complex genomes such as conifers. PMID:25224750

  3. Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum.

    PubMed

    Hardigan, Michael A; Crisovan, Emily; Hamilton, John P; Kim, Jeongwoon; Laimbeer, Parker; Leisner, Courtney P; Manrique-Carpintero, Norma C; Newton, Linsey; Pham, Gina M; Vaillancourt, Brieanne; Yang, Xueming; Zeng, Zixian; Douches, David S; Jiang, Jiming; Veilleux, Richard E; Buell, C Robin

    2016-02-01

    Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996

  4. Reconstruction of Oomycete Genome Evolution Identifies Differences in Evolutionary Trajectories Leading to Present-Day Large Gene Families

    PubMed Central

    Seidl, Michael F.; Van den Ackerveken, Guido; Govers, Francine; Snel, Berend

    2012-01-01

    The taxonomic class of oomycetes contains numerous pathogens of plants and animals but is related to nonpathogenic diatoms and brown algae. Oomycetes have flexible genomes comprising large gene families that play roles in pathogenicity. The evolutionary processes that shaped the gene content have not yet been studied by applying systematic tree reconciliation of the phylome of these species. We analyzed evolutionary dynamics of ten Stramenopiles. Gene gains, duplications, and losses were inferred by tree reconciliation of 18,459 gene trees constituting the phylome with a highly supported species phylogeny. We reconstructed a strikingly large last common ancestor of the Stramenopiles that contained ∼10,000 genes. Throughout evolution, the genomes of pathogenic oomycetes have constantly gained and lost genes, though gene gains through duplications outnumber the losses. The branch leading to the plant pathogenic Phytophthora genus was identified as a major transition point characterized by increased frequency of duplication events that has likely driven the speciation within this genus. Large gene families encoding different classes of enzymes associated with pathogenicity such as glycoside hydrolases are formed by complex and distinct patterns of duplications and losses leading to their expansion in extant oomycetes. This study unveils the large-scale evolutionary dynamics that shaped the genomes of pathogenic oomycetes. By the application of phylogenetic based analyses methods, it provides additional insights that shed light on the complex history of oomycete genome evolution and the emergence of large gene families characteristic for this important class of pathogens. PMID:22230142

  5. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus

    PubMed Central

    Horvath, Julie E.; Sheedy, Christina B.; Merrett, Stephanie L.; Diallo, Abdoulaye Banire; Swofford, David L.; NISC Comparative Sequencing Program; Green, Eric D.; Willard, Huntington F.

    2011-01-01

    Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years. PMID:21518738

  6. Large Genomic Fragment Deletions and Insertions in Mouse Using CRISPR/Cas9

    PubMed Central

    Satheka, Achim Cchitvsanzwhoh; Togo, Jacques; An, Yao; Humphrey, Mabwi; Ban, Luying; Ji, Yan; Jin, Honghong; Feng, Xuechao; Zheng, Yaowu

    2015-01-01

    ZFN, TALENs and CRISPR/Cas9 system have been used to generate point mutations and large fragment deletions and insertions in genomic modifications. CRISPR/Cas9 system is the most flexible and fast developing technology that has been extensively used to make mutations in all kinds of organisms. However, the most mutations reported up to date are small insertions and deletions. In this report, CRISPR/Cas9 system was used to make large DNA fragment deletions and insertions, including entire Dip2a gene deletion, about 65kb in size, and β-galactosidase (lacZ) reporter gene insertion of larger than 5kb in mouse. About 11.8% (11/93) are positive for 65kb deletion from transfected and diluted ES clones. High targeting efficiencies in ES cells were also achieved with G418 selection, 46.2% (12/26) and 73.1% (19/26) for left and right arms respectively. Targeted large fragment deletion efficiency is about 21.4% of live pups or 6.0% of injected embryos. Targeted insertion of lacZ reporter with NEO cassette showed 27.1% (13/48) of targeting rate by ES cell transfection and 11.1% (2/18) by direct zygote injection. The procedures have bypassed in vitro transcription by directly co-injection of zygotes or co-transfection of embryonic stem cells with circular plasmid DNA. The methods are technically easy, time saving, and cost effective in generating mouse models and will certainly facilitate gene function studies. PMID:25803037

  7. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. PMID:27342254

  8. Physical mapping of a large plant genome using global high-information content fingerprinting: a distal region of wheat chromosome 3DS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of wheat. We report the use of the Ae. tauschii, the diploid ancestor of the wheat D genome, for the construction of t...

  9. Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of wheat genome, Aegilops tauschii, is used as a resource for wheat...

  10. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  11. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  12. Draft Genome Sequence of Rheinheimera sp. F8, a Biofilm-Forming Strain Which Produces Large Amounts of Extracellular DNA

    PubMed Central

    Szewzyk, Ulrich

    2016-01-01

    Rheinheimera sp. strain F8 is a biofilm-forming gammaproteobacterium that has been found to produce large amounts of filamentous extracellular DNA. Here, we announce the de novo assembly of its genome. It is estimated to be 4,464,511 bp in length, with 3,970 protein-coding sequences and 92 RNA-coding sequences. PMID:26966195

  13. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  14. An Integrative Approach for the Large-scale Identification of Human Genome Kinases Regulating Cancer Metastasis

    PubMed Central

    Zhang, Hanshuo; Wu, Pu-Yen; Ma, Ming; Ye, Yanzheng; Hao, Yang; Yang, Junyu; Yin, Shenyi; Sun, Changhong; Phan, John H.; Wang, May D.; Xi, Jianzhong Jeff

    2016-01-01

    Kinases regulate the majority of biological processes and become one of important groups of drug targets. To identify more kinases being potential for cancer therapy, we developed an integrative approach for the large-scale screen of functional genes capable of regulating the main traits of cancer metastasis, including cell migration as well as invasion. We first employed self-assembled cell microarray (SAMcell) to screen functional genes that regulate cancer cell migration using a siRNA library targeting 710 human genome kinase genes. We identified 81 genes capable of significantly regulating cancer cell migration. Following with invasion assays and bio-informatics analysis, we discovered that 16 genes with differentially expression in cancer samples can regulate both cell migration and invasion, among which 10 genes have been well known to play critical roles in the cancer development. The remaining 6 genes were experimentally validated to have the capacities of regulating the metastasis-related traits, including cell proliferation, apoptosis and anoikis activities besides cell motility. Together, these findings provide a new insight into the therapeutic use of human kinases. PMID:23751374

  15. Complete mitochondrial DNA sequence of the ark shell Scapharca broughtonii: an ultra-large metazoan mitochondrial genome.

    PubMed

    Liu, Yun-Guo; Kurokawa, Tadahide; Sekino, Masashi; Tanabe, Toru; Watanabe, Kazuhito

    2013-03-01

    The complete mitochondrial (mt) genome of the ark shell Scapharca broughtonii was determined using long PCR and a genome walking sequencing strategy with genus-specific primers. The S. broughtonii mt genome (GenBank accession number AB729113) contained 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 2 ribosomal RNA genes, and 42 transfer tRNA genes, in a length of 46,985 nucleotides for the size of mtDNA with only one copy of the heteroplasmic tandem repeat (HTR) unit. Moreover the S. broughtonii mt genome shows size variation; these genomes ranged in size from about 47 kb to about 50 kb because of variation in the number of repeat sequences in the non-coding region. The mt-genome of S. broughtonii is, to date, the longest reported metazoan mtDNA sequence. Sequence duplication in non-coding region and the formation of HTR arrays were two of the factors responsible for the ultra-large size of this mt genome. All the tRNA genes were found within the S. broughtonii mt genome, unlike the other bivalves usually lacking one or more tRNA genes. Twelve additional specimens were used to analyze the patterns of tandem repeat arrays by PCR amplification and agarose electrophoresis. Each of the 12 specimens displayed extensive heteroplasmy and had 8-10 length variants. The motifs of the HTR arrays are about 353-362 bp and the number of repeats ranges from 1 to 11. PMID:23291309

  16. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life

    SciTech Connect

    Elkins, James G.; Kunin, Victor; Anderson, Iain; Barry, Kerrie; Goltsman, Eugene; Lapidus, Alla; Hedlund, Brian; Hugenholtz, Phil; Kyrpides, Nikos; Graham, David; Keller, Martin; Wanner, Gerhard; Richardson, Paul; Stetter, Karl O.

    2007-05-01

    Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signature gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.

  17. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats

    PubMed Central

    Corman, Victor Max; Baldwin, Heather J.; Tateno, Adriana Fumie; Zerbinati, Rodrigo Melim; Annan, Augustina; Owusu, Michael; Nkrumah, Evans Ewald; Maganga, Gael Darren; Oppong, Samuel; Adu-Sarkodie, Yaw; Vallo, Peter; da Silva Filho, Luiz Vicente Ribeiro Ferreira; Leroy, Eric M.; Thiel, Volker; van der Hoek, Lia; Poon, Leo L. M.; Tschapka, Marco

    2015-01-01

    ABSTRACT We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E. IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in

  18. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages. PMID:19874726

  19. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  20. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    SciTech Connect

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  1. Obligate Insect Endosymbionts Exhibit Increased Ortholog Length Variation and Loss of Large Accessory Proteins Concurrent with Genome Shrinkage

    PubMed Central

    Kenyon, Laura J.; Sabree, Zakee L.

    2014-01-01

    Extreme genome reduction has been observed in obligate intracellular insect mutualists and is an assumed consequence of fixed, long-term host isolation. Rapid accumulation of mutations and pseudogenization of genes no longer vital for an intracellular lifestyle, followed by deletion of many genes, are factors that lead to genome reduction. Size reductions in individual genes due to small-scale deletions have also been implicated in contributing to overall genome shrinkage. Conserved protein functional domains are expected to exhibit low tolerance for mutations and therefore remain relatively unchanged throughout protein length reduction while nondomain regions, presumably under less selective pressures, would shorten. This hypothesis was tested using orthologous protein sets from the Flavobacteriaceae (phylum: Bacteroidetes) and Enterobacteriaceae (subphylum: Gammaproteobacteria) families, each of which includes some of the smallest known genomes. Upon examination of protein, functional domain, and nondomain region lengths, we found that proteins were not uniformly shrinking with genome reduction, but instead increased length variability and variability was observed in both the functional domain and nondomain regions. Additionally, as complete gene loss also contributes to overall genome shrinkage, we found that the largest proteins in the proteomes of nonhost-restricted bacteroidetial and gammaproteobacterial species often were inferred to be involved in secondary metabolic processes, extracellular sensing, or of unknown function. These proteins were absent in the proteomes of obligate insect endosymbionts. Therefore, loss of genes encoding large proteins not required for host-restricted lifestyles in obligate endosymbiont proteomes likely contributes to extreme genome reduction to a greater degree than gene shrinkage. PMID:24671745

  2. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.

    PubMed

    Lowery, Robert K; Uribe, Gabriel; Jimenez, Eric B; Weiss, Mark A; Herrera, Kristian J; Regueiro, Maria; Herrera, Rene J

    2013-11-01

    Analyses of the genetic relationships among modern humans, Neanderthals and Denisovans have suggested that 1-4% of the non-Sub-Saharan African gene pool may be Neanderthal derived, while 6-8% of the Melanesian gene pool may be the product of admixture between the Denisovans and the direct ancestors of Melanesians. In the present study, we analyzed single nucleotide polymorphism (SNP) diversity among a worldwide collection of contemporary human populations with respect to the genetic constitution of these two archaic hominins and Pan troglodytes (chimpanzee). We partitioned SNPs into subsets, including those that are derived in both archaic lineages, those that are ancestral in both archaic lineages and those that are only derived in one archaic lineage. By doing this, we have conducted separate examinations of subsets of mutations with higher probabilities of divergent phylogenetic origins. While previous investigations have excluded SNPs from common ancestors in principal component analyses, we included common ancestral SNPs in our analyses to visualize the relative placement of the Neanderthal and Denisova among human populations. To assess the genetic similarities among the various hominin lineages, we performed genetic structure analyses to provide a comparison of genetic patterns found within contemporary human genomes that may have archaic or common ancestral roots. Our results indicate that 3.6% of the Neanderthal genome is shared with roughly 65.4% of the average European gene pool, which clinally diminishes with distance from Europe. Our results suggest that Neanderthal genetic associations with contemporary non-Sub-Saharan African populations, as well as the genetic affinities observed between Denisovans and Melanesians most likely result from the retention of ancient mutations in these populations. PMID:23872234

  3. The mammary gland-specific marsupial ELP and eutherian CTI share a common ancestral gene

    PubMed Central

    2012-01-01

    Background The marsupial early lactation protein (ELP) gene is expressed in the mammary gland and the protein is secreted into milk during early lactation (Phase 2A). Mature ELP shares approximately 55.4% similarity with the colostrum-specific bovine colostrum trypsin inhibitor (CTI) protein. Although ELP and CTI both have a single bovine pancreatic trypsin inhibitor (BPTI)-Kunitz domain and are secreted only during the early lactation phases, their evolutionary history is yet to be investigated. Results Tammar ELP was isolated from a genomic library and the fat-tailed dunnart and Southern koala ELP genes cloned from genomic DNA. The tammar ELP gene was expressed only in the mammary gland during late pregnancy (Phase 1) and early lactation (Phase 2A). The opossum and fat-tailed dunnart ELP and cow CTI transcripts were cloned from RNA isolated from the mammary gland and dog CTI from cells in colostrum. The putative mature ELP and CTI peptides shared 44.6%-62.2% similarity. In silico analyses identified the ELP and CTI genes in the other species examined and provided compelling evidence that they evolved from a common ancestral gene. In addition, whilst the eutherian CTI gene was conserved in the Laurasiatherian orders Carnivora and Cetartiodactyla, it had become a pseudogene in others. These data suggest that bovine CTI may be the ancestral gene of the Artiodactyla-specific, rapidly evolving chromosome 13 pancreatic trypsin inhibitor (PTI), spleen trypsin inhibitor (STI) and the five placenta-specific trophoblast Kunitz domain protein (TKDP1-5) genes. Conclusions Marsupial ELP and eutherian CTI evolved from an ancestral therian mammal gene before the divergence of marsupials and eutherians between 130 and 160 million years ago. The retention of the ELP gene in marsupials suggests that this early lactation-specific milk protein may have an important role in the immunologically naïve young of these species. PMID:22681678

  4. Efficient generation of large-scale genome-modified mice using gRNA and CAS9 endonuclease.

    PubMed

    Fujii, Wataru; Kawasaki, Kurenai; Sugiura, Koji; Naito, Kunihiko

    2013-11-01

    The generation of genome-modified animals is a powerful approach to analyze gene functions. The CAS9/guide RNA (gRNA) system is expected to become widely used for the efficient generation of genome-modified animals, but detailed studies on optimum conditions and availability are limited. In the present study, we attempted to generate large-scale genome-modified mice with an optimized CAS9/gRNA system, and confirmed the transmission of these mutations to the next generations. A comparison of different types of gRNA indicated that the target loci of almost all pups were modified successfully by the use of long-type gRNAs with CAS9. We showed that this system has much higher mutation efficiency and much lower off-target effect compared to zinc-finger nuclease. We propose that most of these off-target effects can be avoided by the careful control of CAS9 mRNA concentration and that the genome-modification efficiency depends rather on the gRNA concentration. Under optimized conditions, large-scale (~10 kb) genome-modified mice can be efficiently generated by modifying two loci on a single chromosome using two gRNAs at once in mouse zygotes. In addition, the normal transmission of these CAS9/gRNA-induced mutations to the next generation was confirmed. These results indicate that CAS9/gRNA system can become a highly effective tool for the generation of genome-modified animals. PMID:23997119

  5. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  6. A large maize (Zea Mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection for accelerated breeding. We report the establishment of a large SNP array for maize and i...

  7. Minimal genome: Worthwhile or worthless efforts toward being smaller?

    PubMed

    Choe, Donghui; Cho, Suhyung; Kim, Sun Chang; Cho, Byung-Kwan

    2016-02-01

    Microbial cells are versatile hosts for the production of value-added products due to the well-established background knowledge, various genetic tools, and ease of manipulation. Despite those advantages, efficiency of newly incorporated synthetic pathways in microbial cells is frequently limited by innate metabolism, product toxicity, and growth-mediated genetic instability. To overcome those obstacles, a minimal genome harboring only the essential set of genes was proposed, which is a fascinating concept with potential for use as a platform strain. Here, we review the currently available artificial reduced genomes and discuss the prospects for extending use of the genome-reduced strains as programmable chasses. The genome-reduced strains generally showed comparable growth to and higher productivity than their ancestral strains. In Escherichia coli, about 300 genes are estimated as the minimal number of genes under laboratory conditions. However, recent advances revealed that there are non-essential components in essential genes, suggesting that the design principle of minimal genomes should be reconstructed. Current technology is not efficient enough to reduce large amount of interspaced genomic regions or to synthesize the genome. Furthermore, construction of minimal genome frequently has failed due to lack of genomic information. Technological breakthroughs and intense systematic studies on genomes remain tasks. PMID:26356135

  8. Analysis of the bread wheat genome using whole-genome shotgun sequencing.

    PubMed

    Brenchley, Rachel; Spannagl, Manuel; Pfeifer, Matthias; Barker, Gary L A; D'Amore, Rosalinda; Allen, Alexandra M; McKenzie, Neil; Kramer, Melissa; Kerhornou, Arnaud; Bolser, Dan; Kay, Suzanne; Waite, Darren; Trick, Martin; Bancroft, Ian; Gu, Yong; Huo, Naxin; Luo, Ming-Cheng; Sehgal, Sunish; Gill, Bikram; Kianian, Sharyar; Anderson, Olin; Kersey, Paul; Dvorak, Jan; McCombie, W Richard; Hall, Anthony; Mayer, Klaus F X; Edwards, Keith J; Bevan, Michael W; Hall, Neil

    2012-11-29

    Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis. Here we report the sequencing of its large, 17-gigabase-pair, hexaploid genome using 454 pyrosequencing, and comparison of this with the sequences of diploid ancestral and progenitor genomes. We identified between 94,000 and 96,000 genes, and assigned two-thirds to the three component genomes (A, B and D) of hexaploid wheat. High-resolution synteny maps identified many small disruptions to conserved gene order. We show that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments. Several classes of genes involved in energy harvesting, metabolism and growth are among expanded gene families that could be associated with crop productivity. Our analyses, coupled with the identification of extensive genetic variation, provide a resource for accelerating gene discovery and improving this major crop. PMID:23192148

  9. Ancestral-derived effects on the mutational landscape of laryngeal cancer.

    PubMed

    Ramakodi, Meganathan P; Kulathinal, Rob J; Chung, Yujin; Serebriiskii, Ilya; Liu, Jeffrey C; Ragin, Camille C

    2016-03-01

    Laryngeal cancer disproportionately affects more African-Americans than European-Americans. Here, we analyze the genome-wide somatic point mutations from the tumors of 13 African-Americans and 57 European-Americans from TCGA to differentiate between environmental and ancestrally-inherited factors. The mean number of mutations was different between African-Americans (151.31) and European-Americans (277.63). Other differences in the overall mutational landscape between African-American and European-American were also found. The frequency of C>A, and C>G were significantly different between the two populations (p-value<0.05). Context nucleotide signatures for some mutation types significantly differ between these two populations. Thus, the context nucleotide signatures along with other factors could be related to the observed mutational landscape differences between two races. Finally, we show that mutated genes associated with these mutational differences differ between the two populations. Thus, at the molecular level, race appears to be a factor in the progression of laryngeal cancer with ancestral genomic signatures best explaining these differences. PMID:26721311

  10. Genome Sequence of the Pathogenic Intestinal Spirochete Brachyspira hyodysenteriae Reveals Adaptations to Its Lifestyle in the Porcine Large Intestine

    PubMed Central

    La, Tom; Ryan, Karon; Moolhuijzen, Paula; Albertyn, Zayed; Shaban, Babak; Motro, Yair; Dunn, David S.; Schibeci, David; Hunter, Adam; Barrero, Roberto; Phillips, Nyree D.; Hampson, David J.

    2009-01-01

    Brachyspira hyodysenteriae is an anaerobic intestinal spirochete that colonizes the large intestine of pigs and causes swine dysentery, a disease of significant economic importance. The genome sequence of B. hyodysenteriae strain WA1 was determined, making it the first representative of the genus Brachyspira to be sequenced, and the seventeenth spirochete genome to be reported. The genome consisted of a circular 3,000,694 base pair (bp) chromosome, and a 35,940 bp circular plasmid that has not previously been described. The spirochete had 2,122 protein-coding sequences. Of the predicted proteins, more had similarities to proteins of the enteric Escherichia coli and Clostridium species than they did to proteins of other spirochetes. Many of these genes were associated with transport and metabolism, and they may have been gradually acquired through horizontal gene transfer in the environment of the large intestine. A reconstruction of central metabolic pathways identified a complete set of coding sequences for glycolysis, gluconeogenesis, a non-oxidative pentose phosphate pathway, nucleotide metabolism, lipooligosaccharide biosynthesis, and a respiratory electron transport chain. A notable finding was the presence on the plasmid of the genes involved in rhamnose biosynthesis. Potential virulence genes included those for 15 proteases and six hemolysins. Other adaptations to an enteric lifestyle included the presence of large numbers of genes associated with chemotaxis and motility. B. hyodysenteriae has diverged from other spirochetes in the process of accommodating to its habitat in the porcine large intestine. PMID:19262690

  11. Ancestral Rocky Mountian Tectonics: A Sedimentary Record of Ancestral Front Range and Uncompahgre Exhumation

    NASA Astrophysics Data System (ADS)

    Smith, T. M.; Saylor, J. E.; Lapen, T. J.

    2015-12-01

    The Ancestral Rocky Mountains (ARM) encompass multiple crustal provinces with characteristic crystallization ages across the central and western US. Two driving mechanisms have been proposed to explain ARM deformation. (1) Ouachita-Marathon collision SE of the ARM uplifts has been linked to an E-to-W sequence of uplift and is consistent with proposed disruption of a larger Paradox-Central Colorado Trough Basin by exhumation of the Uncompahgre Uplift. Initial exhumation of the Amarillo-Wichita Uplift to the east would provide a unique ~530 Ma signal absent from source areas to the SW, and result in initial exhumation of the Ancestral Front Range. (2) Alternatively, deformation due to flat slab subduction along a hypothesized plate boundary to the SW suggests a SW-to-NE younging of exhumation. This hypothesis suggests a SW-derived Grenville signature, and would trigger uplift of the Uncompahgre first. We analyzed depositional environments, sediment dispersal patterns, and sediment and basement zircon U-Pb and (U-Th)/He ages in 3 locations in the Paradox Basin and Central Colorado Trough (CCT). The Paradox Basin exhibits an up-section transition in fluvial style that suggests a decrease in overbank stability and increased lateral migration. Similarly, the CCT records a long-term progradation of depositional environments from marginal marine to fluvial, indicating that sediment supply in both basins outpaced accommodation. Preliminary provenance results indicate little to no input from the Amarillo-Wichita uplift in either basin despite uniformly westward sediment dispersal systems in both basins. Results also show that the Uncompahgre Uplift was the source for sediment throughout Paradox Basin deposition. These observations are inconsistent with the predictions of scenario 1 above. Rather, they suggest either a synchronous response to tectonic stress across the ARM provinces or an SW-to-NE pattern of deformation.

  12. Retroviral envelope syncytin capture in an ancestrally diverged mammalian clade for placentation in the primitive Afrotherian tenrecs

    PubMed Central

    Cornelis, Guillaume; Vernochet, Cécile; Malicorne, Sébastien; Souquere, Sylvie; Tzika, Athanasia C.; Goodman, Steven M.; Catzeflis, François; Robinson, Terence J.; Milinkovitch, Michel C.; Pierron, Gérard; Heidmann, Odile; Dupressoir, Anne; Heidmann, Thierry

    2014-01-01

    Syncytins are fusogenic envelope (env) genes of retroviral origin that have been captured for a function in placentation. Syncytins have been identified in Euarchontoglires (primates, rodents, Leporidae) and Laurasiatheria (Carnivora, ruminants) placental mammals. Here, we searched for similar genes in species that retained characteristic features of primitive mammals, namely the Malagasy and mainland African Tenrecidae. They belong to the superorder Afrotheria, an early lineage that diverged from Euarchotonglires and Laurasiatheria 100 Mya, during the Cretaceous terrestrial revolution. An in silico search for env genes with full coding capacity within a Tenrecidae genome identified several candidates, with one displaying placenta-specific expression as revealed by RT-PCR analysis of a large panel of Setifer setosus tissues. Cloning of this endogenous retroviral env gene demonstrated fusogenicity in an ex vivo cell–cell fusion assay on a panel of mammalian cells. Refined analysis of placental architecture and ultrastructure combined with in situ hybridization demonstrated specific expression of the gene in multinucleate cellular masses and layers at the materno–fetal interface, consistent with a role in syncytium formation. This gene, which we named “syncytin-Ten1,” is conserved among Tenrecidae, with evidence of purifying selection and conservation of fusogenic activity. To our knowledge, it is the first syncytin identified to date within the ancestrally diverged Afrotheria superorder. PMID:25267646

  13. Chromatin organization and cytological features of carnivorous Genlisea species with large genome size differences

    PubMed Central

    Tran, Trung D.; Cao, Hieu X.; Jovtchev, Gabriele; Novák, Petr; Vu, Giang T. H.; Macas, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-01-01

    The monophyletic carnivorous genus Genlisea (Lentibulariaceae) is characterized by a bi-directional genome size evolution resulting in a 25-fold difference in nuclear DNA content. This is one of the largest ranges found within a genus so far and makes Genlisea an interesting subject to study mechanisms of genome and karyotype evolution. Genlisea nigrocaulis, with 86 Mbp one of the smallest plant genomes, and the 18-fold larger genome of G. hispidula (1,550 Mbp) possess identical chromosome numbers (2n = 40) but differ considerably in chromatin organization, nuclear and cell size. Interphase nuclei of G. nigrocaulis and of related species with small genomes, G. aurea (133 Mbp, 2n ≈ 104) and G. pygmaea (179 Mbp, 2n = 80), are hallmarked by intensely DAPI-stained chromocenters, carrying typical heterochromatin-associated methylation marks (5-methylcytosine, H3K9me2), while in G. hispidula and surprisingly also in the small genome of G. margaretae (184 Mbp, 2n = 38) the heterochromatin marks are more evenly distributed. Probes of tandem repetitive sequences together with rDNA allow the unequivocal discrimination of 13 out of 20 chromosome pairs of G. hispidula. One of the repetitive sequences labeled half of the chromosome set almost homogenously supporting an allopolyploid status of G. hispidula and its close relative G. subglabra (1,622 Mbp, 2n = 40). In G. nigrocaulis 11 chromosome pairs could be individualized using a combination of rDNA and unique genomic probes. The presented data provide a basis for future studies of karyotype evolution within the genus Genlisea. PMID:26347752

  14. Chromatin organization and cytological features of carnivorous Genlisea species with large genome size differences.

    PubMed

    Tran, Trung D; Cao, Hieu X; Jovtchev, Gabriele; Novák, Petr; Vu, Giang T H; Macas, Jiří; Schubert, Ingo; Fuchs, Joerg

    2015-01-01

    The monophyletic carnivorous genus Genlisea (Lentibulariaceae) is characterized by a bi-directional genome size evolution resulting in a 25-fold difference in nuclear DNA content. This is one of the largest ranges found within a genus so far and makes Genlisea an interesting subject to study mechanisms of genome and karyotype evolution. Genlisea nigrocaulis, with 86 Mbp one of the smallest plant genomes, and the 18-fold larger genome of G. hispidula (1,550 Mbp) possess identical chromosome numbers (2n = 40) but differ considerably in chromatin organization, nuclear and cell size. Interphase nuclei of G. nigrocaulis and of related species with small genomes, G. aurea (133 Mbp, 2n ≈ 104) and G. pygmaea (179 Mbp, 2n = 80), are hallmarked by intensely DAPI-stained chromocenters, carrying typical heterochromatin-associated methylation marks (5-methylcytosine, H3K9me2), while in G. hispidula and surprisingly also in the small genome of G. margaretae (184 Mbp, 2n = 38) the heterochromatin marks are more evenly distributed. Probes of tandem repetitive sequences together with rDNA allow the unequivocal discrimination of 13 out of 20 chromosome pairs of G. hispidula. One of the repetitive sequences labeled half of the chromosome set almost homogenously supporting an allopolyploid status of G. hispidula and its close relative G. subglabra (1,622 Mbp, 2n = 40). In G. nigrocaulis 11 chromosome pairs could be individualized using a combination of rDNA and unique genomic probes. The presented data provide a basis for future studies of karyotype evolution within the genus Genlisea. PMID:26347752

  15. Mechanisms for the Evolution of a Derived Function in the Ancestral Glucocorticoid Receptor

    SciTech Connect

    Carroll, Sean Michael; Ortlund, Eric A; Thornton, Joseph W.

    2012-03-16

    Understanding the genetic, structural, and biophysical mechanisms that caused protein functions to evolve is a central goal of molecular evolutionary studies. Ancestral sequence reconstruction (ASR) offers an experimental approach to these questions. Here we use ASR to shed light on the earliest functions and evolution of the glucocorticoid receptor (GR), a steroid-activated transcription factor that plays a key role in the regulation of vertebrate physiology. Prior work showed that GR and its paralog, the mineralocorticoid receptor (MR), duplicated from a common ancestor roughly 450 million years ago; the ancestral functions were largely conserved in the MR lineage, but the functions of GRs - reduced sensitivity to all hormones and increased selectivity for glucocorticoids - are derived. Although the mechanisms for the evolution of glucocorticoid specificity have been identified, how reduced sensitivity evolved has not yet been studied. Here we report on the reconstruction of the deepest ancestor in the GR lineage (AncGR1) and demonstrate that GR's reduced sensitivity evolved before the acquisition of restricted hormone specificity, shortly after the GR-MR split. Using site-directed mutagenesis, X-ray crystallography, and computational analyses of protein stability to recapitulate and determine the effects of historical mutations, we show that AncGR1's reduced ligand sensitivity evolved primarily due to three key substitutions. Two large-effect mutations weakened hydrogen bonds and van der Waals interactions within the ancestral protein, reducing its stability. The degenerative effect of these two mutations is extremely strong, but a third permissive substitution, which has no apparent effect on function in the ancestral background and is likely to have occurred first, buffered the effects of the destabilizing mutations. Taken together, our results highlight the potentially creative role of substitutions that partially degrade protein structure and function and

  16. Maintenance of Large Numbers of Virus Genomes in Human Cytomegalovirus-Infected T98G Glioblastoma Cells

    PubMed Central

    Duan, Ying-Liang; Ye, Han-Qing; Zavala, Anamaria G.; Yang, Cui-Qing; Miao, Ling-Feng; Fu, Bi-Shi; Seo, Keun Seok; Davrinche, Christian

    2014-01-01

    ABSTRACT After infection, human cytomegalovirus (HCMV) persists for life. Primary infections and reactivation of latent virus can both result in congenital infection, a leading cause of central nervous system birth defects. We previously reported long-term HCMV infection in the T98G glioblastoma cell line (1). HCMV infection has been further characterized in T98Gs, emphasizing the presence of HCMV DNA over an extended time frame. T98Gs were infected with either HCMV Towne or AD169-IE2-enhanced green fluorescent protein (eGFP) strains. Towne infections yielded mixed IE1 antigen-positive and -negative (Ag+/Ag−) populations. AD169-IE2-eGFP infections also yielded mixed populations, which were sorted to obtain an IE2− (Ag−) population. Viral gene expression over the course of infection was determined by immunofluorescent analysis (IFA) and reverse transcription-PCR (RT-PCR). The presence of HCMV genomes was determined by PCR, nested PCR (n-PCR), and fluorescence in situ hybridization (FISH). Compared to the HCMV latency model, THP-1, Towne-infected T98Gs expressed IE1 and latency-associated transcripts for longer periods, contained many more HCMV genomes during early passages, and carried genomes for a greatly extended period of passaging. Large numbers of HCMV genomes were also found in purified Ag− AD169-infected cells for the first several passages. Interestingly, latency transcripts were observed from very early times in the Towne-infected cells, even when IE1 was expressed at low levels. Although AD169-infected Ag− cells expressed no detectable levels of either IE1 or latency transcripts, they also maintained large numbers of genomes within the cell nuclei for several passages. These results identify HCMV-infected T98Gs as an attractive new model in the study of the long-term maintenance of virus genomes in the context of neural cell types. IMPORTANCE Our previous work showed that T98G glioblastoma cells were semipermissive to HCMV infection; virus

  17. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  18. Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

    PubMed Central

    2010-01-01

    Background Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection. Results Microarray analysis indicated that 480 DNA markers (out of in total 3360 markers) showed clear variation in presence between individual strains and these were therefore selected for further analysis. Unsupervised statistical analysis of these markers showed the enormous genomic variation within the species but did not show any correlation with a pathogenic phenotype. We therefore used supervised statistical analysis to identify discriminating markers. Genetic programming was used both to identify predictive markers and to define their interrelationships. A model consisting of five markers was developed that together correctly predicted 100% of the clinical strains and 69% of the environmental strains. Conclusions A novel approach for identifying predictive markers enabling discrimination between clinical and environmental isolates of L. pneumophila is presented. Out of over 3000 possible markers, five were selected that together enabled correct prediction of all the clinical strains included in this study. This novel approach for identifying predictive markers can be applied to all bacterial species, allowing for better discrimination between strains well equipped to cause human disease and relatively harmless strains. PMID:20630115

  19. Ancestral state reconstruction of body size in the Caniformia (Carnivora, Mammalia): the effects of incorporating data from the fossil record.

    PubMed

    Finarelli, John A; Flynn, John J

    2006-04-01

    A recent molecular phylogeny of the mammalian order Carnivora implied large body size as the ancestral condition for the caniform subclade Arctoidea using the distribution of species mean body sizes among living taxa. "Extant taxa-only" approaches such as these discount character state observations for fossil members of living clades and completely ignore data from extinct lineages. To more rigorously reconstruct body sizes of ancestral forms within the Caniformia, body size and first appearance data were collected for 149 extant and 367 extinct taxa. Body sizes were reconstructed for four ancestral nodes using weighted squared-change parsimony on log-transformed body mass data. Reconstructions based on extant taxa alone favored large body sizes (on the order of 10 to 50 kg) for the last common ancestors of both the Caniformia and Arctoidea. In contrast, reconstructions incorporating fossil data support small body sizes (< 5 kg) for the ancestors of those clades. When the temporal information associated with fossil data was discarded, body size reconstructions became ambiguous, demonstrating that incorporating both character state and temporal information from fossil taxa unambiguously supports a small ancestral body size, thereby falsifying hypotheses derived from extant taxa alone. Body size reconstructions for Caniformia, Arctoidea, and Musteloidea were not sensitive to potential errors introduced by uncertainty in the position of extinct lineages relative to the molecular topology, or to missing body size data for extinct members of an entire major clade (the aquatic Pinnipedia). Incorporating character state observations and temporal information from the fossil record into hypothesis testing has a significant impact on the ability to reconstruct ancestral characters and constrains the range of potential hypotheses of character evolution. Fossil data here provide the evidence to reliably document trends of both increasing and decreasing body size in several

  20. Large-scale genomics unveil polygenic architecture of human cortical surface area.

    PubMed

    Chen, Chi-Hua; Peng, Qian; Schork, Andrew J; Lo, Min-Tzu; Fan, Chun-Chieh; Wang, Yunpeng; Desikan, Rahul S; Bettella, Francesco; Hagler, Donald J; Westlye, Lars T; Kremen, William S; Jernigan, Terry L; Le Hellard, Stephanie; Steen, Vidar M; Espeseth, Thomas; Huentelman, Matt; Håberg, Asta K; Agartz, Ingrid; Djurovic, Srdjan; Andreassen, Ole A; Schork, Nicholas; Dale, Anders M

    2015-01-01

    Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h(2)∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices. PMID:26189703

  1. Large-scale genomics unveil polygenic architecture of human cortical surface area

    PubMed Central

    Chen, Chi-Hua; Peng, Qian; Schork, Andrew J.; Lo, Min-Tzu; Fan, Chun-Chieh; Wang, Yunpeng; Desikan, Rahul S.; Bettella, Francesco; Hagler, Donald J.; McCabe, Connor; Chang, Linda; Akshoomoff, Natacha; Newman, Erik; Ernst, Thomas; Van Zijl, Peter; Kuperman, Joshua; Murray, Sarah; Bloss, Cinnamon; Appelbaum, Mark; Gamst, Anthony; Thompson, Wesley; Bartsch, Hauke; Weiner, Michael; Aisen, Paul; Petersen, Ronald; Jack Jr, Clifford R.; Jagust, William; Trojanowki, John Q.; Toga, Arthur W.; Beckett, Laurel; Green, Robert C.; Saykin, Andrew J.; Morris, John; Shaw, Leslie M.; Khachaturian, Zaven; Sorensen, Greg; Carrillo, Maria; Kuller, Lew; Raichle, Marc; Paul, Steven; Davies, Peter; Fillit, Howard; Hefti, Franz; Holtzman, Davie; Mesulman, M. Marcel; Potter, William; Snyder, Peter J.; Schwartz, Adam; Montine, Tom; Thomas, Ronald G.; Donohue, Michael; Walter, Sarah; Gessert, Devon; Sather, Tamie; Jiminez, Gus; Harvey, Danielle; Bernstein, Matthew; Fox, Nick; Thompson, Paul; Schuff, Norbert; DeCarli, Charles; Borowski, Bret; Gunter, Jeff; Senjem, Matt; Vemuri, Prashanthi; Jones, David; Kantarci, Kejal; Ward, Chad; Koeppe, Robert A.; Foster, Norm; Reiman, Eric M.; Chen, Kewei; Mathis, Chet; Landau, Susan; Cairns, Nigel J.; Householder, Erin; Taylor-Reinwald, Lisa; Lee, Virginia M.Y.; Korecka, Magdalena; Figurski, Michal; Crawford, Karen; Neu, Scott; Foroud, Tatiana M.; Potkin, Steven; Shen, Li; Faber, Kelley; Kim, Sungeun; Nho, Kwangsik; Thal, Leon; Frank, Richard; Buckholtz, Neil; Albert, Marilyn; Hsiao, John; Westlye, Lars T.; Kremen, William S.; Jernigan, Terry L.; Hellard, Stephanie Le; Steen, Vidar M.; Espeseth, Thomas; Huentelman, Matt; Håberg, Asta K.; Agartz, Ingrid; Djurovic, Srdjan; Andreassen, Ole A.; Schork, Nicholas; Dale, Anders M.

    2015-01-01

    Little is known about how genetic variation contributes to neuroanatomical variability, and whether particular genomic regions comprising genes or evolutionarily conserved elements are enriched for effects that influence brain morphology. Here, we examine brain imaging and single-nucleotide polymorphisms (SNPs) data from ∼2,700 individuals. We show that a substantial proportion of variation in cortical surface area is explained by additive effects of SNPs dispersed throughout the genome, with a larger heritable effect for visual and auditory sensory and insular cortices (h2∼0.45). Genome-wide SNPs collectively account for, on average, about half of twin heritability across cortical regions (N=466 twins). We find enriched genetic effects in or near genes. We also observe that SNPs in evolutionarily more conserved regions contributed significantly to the heritability of cortical surface area, particularly, for medial and temporal cortical regions. SNPs in less conserved regions contributed more to occipital and dorsolateral prefrontal cortices. PMID:26189703

  2. Twenty years of artificial directional selection have shaped the genome of the Italian Large White pig breed.

    PubMed

    Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L

    2016-04-01

    In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni  < 0.10. However, there was an increasing number of SNPs with a decreasing level of allele frequency changes over time, representing a continuous profile across the genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population. PMID:26644200

  3. Large Scale Sequencing of Dothideomycetes Provides Insights into Genome Evolution and Adaptation

    SciTech Connect

    Haridas, Sajeet; Crous, Pedro; Binder, Manfred; Spatafora, Joseph; Grigoriev, Igor

    2015-03-16

    Dothideomycetes is the largest and most diverse class of ascomycete fungi with 23 orders 110 families, 1300 genera and over 19,000 known species. We present comparative analysis of 70 Dothideomycete genomes including over 50 that we sequenced and are as yet unpublished. This extensive sampling has almost quadrupled the previous study of 18 species and uncovered a 10 fold range of genome sizes. We were able to clarify the phylogenetic positions of several species whose origins were unclear in previous morphological and sequence comparison studies. We analyzed selected gene families including proteases, transporters and small secreted proteins and show that major differences in gene content is influenced by speciation.

  4. A large genome centre’s improvements to the Illumina sequencing system

    PubMed Central

    Quail, Michael A.; Kozarewa, Iwanka; Smith, Frances; Scally, Aylwyn; Stephens, Philip J.; Durbin, Richard; Swerdlow, Harold; Turner, Daniel J.

    2008-01-01

    Preface The Wellcome Trust Sanger Institute is one of the world’s largest genome centres, and a substantial amount of our sequencing is performed on ‘next generation’ massively parallel sequencing technologies: in June 2008 the quantity of purity filtered sequence data generated by our Genome Analyzer (Illumina) platforms reached 1 terabase, and our average weekly Illumina production output is currently 64gigabases (Gb). Here we describe a set of improvements we have made to the standard Illumina protocols to make the library preparation more reliable in a high throughput environment, to reduce bias, tighten insert size distribution, and reliably obtain high yields of data. PMID:19034268

  5. Derived Immune and Ancestral Pigmentation Alleles in a 7,000-Year-old Mesolithic European

    PubMed Central

    Olalde, Iñigo; Allentoft, Morten E.; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W. K.; DeGiorgio, Michael; Prado-Martínez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M.; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G.; Novembre, John; Sturm, Richard A.; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-01-01

    Ancient genomic sequences have started revealing the origin and the demographic impact of Neolithic farmers spreading into Europe1–3. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet4. However, the limited data available from earlier hunter-gatherers precludes an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. By sequencing a ~7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León (Spain), we retrieved the first complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across Western and Central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. Hence, these genomic variants cannot represent novel mutations that occurred during the adaptation to the farming lifestyle. PMID:24463515

  6. Kmasker--a tool for in silico prediction of single-copy FISH probes for the large-genome species Hordeum vulgare.

    PubMed

    Schmutzer, T; Ma, L; Pousarebani, N; Bull, F; Stein, N; Houben, A; Scholz, U

    2014-01-01

    Specific localization of large genomic fragments by fluorescence in situ hybridization (FISH) is challenging in large- genome plant species due to the high content of repetitive sequences. We report the automated work flow (Kmasker) for in silico extraction of unique genomic sequences of large genomic fragments suitable for FISH in barley. This method can be widely used for the integration of genetic and cytogenetic maps in plants and other species with large and complex genomes if the probe sequence (e.g. BACs, sequence contigs) and a low coverage (8-fold) of unassembled sequences of the species of interest are available. Kmasker has been made publicly available as a web tool at http://webblast.ipk-gatersleben.de/kmasker. PMID:24335088

  7. Magmatism and Epithermal Gold-Silver Deposits of the Southern Ancestral Cascade Arc, Western Nevada and Eastern California

    USGS Publications Warehouse

    John, David A.; du Bray, Edward A.; Henry, Christopher D., (compiler); Vikre, Peter

    2015-01-01

    Many epithermal gold-silver deposits are temporally and spatially associated with late Oligocene to Pliocene magmatism of the southern ancestral Cascade arc in western Nevada and eastern California. These deposits, which include both quartz-adularia (low- and intermediate-sulfidation; Comstock Lode, Tonopah, Bodie) and quartz-alunite (high-sulfidation; Goldfield, Paradise Peak) types, were major producers of gold and silver. Ancestral Cascade arc magmatism preceded that of the modern High Cascades arc and reflects subduction of the Farallon plate beneath North America. Ancestral arc magmatism began about 45 Ma, continued until about 3 Ma, and extended from near the Canada-United States border in Washington southward to about 250 km southeast of Reno, Nevada. The ancestral arc was split into northern and southern segments across an inferred tear in the subducting slab between Mount Shasta and Lassen Peak in northern California. The southern segment extends between 42°N in northern California and 37°N in western Nevada and was active from about 30 to 3 Ma. It is bounded on the east by the northeast edge of the Walker Lane. Ancestral arc volcanism represents an abrupt change in composition and style of magmatism relative to that in central Nevada. Large volume, caldera-forming, silicic ignimbrites associated with the 37 to 19 Ma ignimbrite flareup are dominant in central Nevada, whereas volcanic centers of the ancestral arc in western Nevada consist of andesitic stratovolcanoes and dacitic to rhyolitic lava domes that mostly formed between 25 and 4 Ma. Both ancestral arc and ignimbrite flareup magmatism resulted from rollback of the shallowly dipping slab that began about 45 Ma in northeast Nevada and migrated south-southwest with time. Most southern segment ancestral arc rocks have oxidized, high potassium, calc-alkaline compositions with silica contents ranging continuously from about 55 to 77 wt%. Most lavas are porphyritic and contain coarse plagioclase

  8. Whole genome analysis of an MDR Beijing/W strain of Mycobacterium tuberculosis with large genomic deletions associated with resistance to isoniazid.

    PubMed

    Zhang, Qiufen; Wan, Baoshan; Zhou, Aiping; Ni, Jinjing; Xu, Zhihong; Li, Shuxian; Tao, Jing; Yao, YuFeng

    2016-05-15

    Mycobacterium tuberculosis (M.tb) is one of the most prevalent bacterial pathogens in the world. With geographical wide spread and hypervirulence, Beijing/W family is the most successful M.tb lineage. China is a country of high tuberculosis (TB) and high multiple drug-resistant TB (MDR-TB) burden, and the Beijing/W family strains take the largest share of MDR strains. To study the genetic basis of Beijing/W family strains' virulence and drug resistance, we performed the whole genome sequencing of M.tb strain W146, a clinical Beijing/W genotype MDR isolated from Wuxi, Jiangsu province, China. Compared with genome sequence of M.tb strain H37Rv, we found that strain W146 lacks three large fragments and the missing of furA-katG operon confers isoniazid resistance. Besides the missing of furA-katG operon, strain W146 harbored almost all known drug resistance-associated mutations. Comparison analysis of single nucleotide polymorphisms (SNPs) and indels between strain W146 and Beijing/W genotype strains and non-Beijing/W genotype strains revealed that strain W146 possessed some unique mutations, which may be related to drug resistance, transmission and pathogenicity. These findings will help to understand the large sequence polymorphisms (LSPs) and the transmission and drug resistance related genetic characteristics of the Beijing/W genotype of M.tb. PMID:26854371

  9. Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species

    PubMed Central

    Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G.; O’Brien, Stephen J.; Johnson, Warren E.

    2006-01-01

    Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2 – telomeric region vs. D2 – centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 kb vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species. PMID:16380222

  10. A Protocol for mtGenome Analysis on Large Sample Numbers

    PubMed Central

    Hamoy, Igor G; Ribeiro-dos-Santos, André M; Alvarez, Luiz; Barbosa, Silvanira; Silva, Artur; Santos, Sidney; Gusmão, Leonor; Ribeiro-dos-Santos, Ândrea

    2014-01-01

    The mitochondrial genome is widely studied in a variety of fields, such as population, forensic, and human and medical genetics. Most studies have been limited to a small portion of the sequence that, although highly diverse, does not describe the total variability. The arrival of modern high-throughput sequencing technologies has made it possible to investigate larger sequences in a shorter amount of time as well as in a more affordable fashion. This work aims to describe a protocol for sequencing and analyzing the complete mitochondrial genome with the Ion PGM™ platform. To evaluate the protocol, the mitochondrial genome was sequenced to approximately 210 Mbp, with high-quality sequences distributed between 12 samples that had an average coverage of 1023× per sample. Several variant callers were compared to improve the protocol outcome. The results suggest that it is possible to run up to 120 samples per run without any loss of any significant quality. Therefore, this protocol is an efficient and accurate tool for full mitochondrial genome analysis. PMID:25002812

  11. Discovery of novel phosphonate natural products and their biosynthetic pathways by large-scale genome mining

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genome mining has revolutionized the field of natural products, providing hope that new antibiotics can be discovered in time before all remainders are rendered useless against multidrug resistant pathogens. While this approach has been successful in academic settings focused on small collections or...

  12. High-resolution typing by integration of genome sequencing data in a large tuberculosis cluster.

    PubMed

    Schürch, Anita C; Kremer, Kristin; Daviena, Olaf; Kiers, Albert; Boeree, Martin J; Siezen, Roland J; van Soolingen, Dick

    2010-09-01

    To investigate whether genome sequencing yields more useful markers than those currently used to study the epidemiology of tuberculosis, it was applied to three Mycobacterium tuberculosis isolates of the Harlingen outbreak. Our findings suggest that single nucleotide polymorphisms can be used to identify transmission chains in restriction fragment length polymorphism clusters. PMID:20592143

  13. High-Resolution Typing by Integration of Genome Sequencing Data in a Large Tuberculosis Cluster▿

    PubMed Central

    Schürch, Anita C.; Kremer, Kristin; Daviena, Olaf; Kiers, Albert; Boeree, Martin J.; Siezen, Roland J.; van Soolingen, Dick

    2010-01-01

    To investigate whether genome sequencing yields more useful markers than those currently used to study the epidemiology of tuberculosis, it was applied to three Mycobacterium tuberculosis isolates of the Harlingen outbreak. Our findings suggest that single nucleotide polymorphisms can be used to identify transmission chains in restriction fragment length polymorphism clusters. PMID:20592143

  14. The conditional ancestral selection graph with strong balancing selection.

    PubMed

    Wakeley, John; Sargsyan, Ori

    2009-06-01

    Using a heuristic separation-of-time-scales argument, we describe the behavior of the conditional ancestral selection graph with very strong balancing selection between a pair of alleles. In the limit as the strength of selection tends to infinity, we find that the ancestral process converges to a neutral structured coalescent, with two subpopulations representing the two alleles and mutation playing the role of migration. This agrees with a previous result of Kaplan et al., obtained using a different approach. We present the results of computer simulations to support our heuristic mathematical results. We also present a more rigorous demonstration that the neutral conditional ancestral process converges to the Kingman coalescent in the limit as the mutation rate tends to infinity. PMID:19371754

  15. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates

    PubMed Central

    Peng, Jiguang; He, Bin Z.; Li, Yumei; Liu, Chu-Jun; Luan, Xuke; Ding, Wanqiu; Li, Shuxian; Chen, Chunyan; Tan, Bertrand Chin-Ming; Zhang, Yong E.; He, Aibin; Li, Chuan-Yun

    2015-01-01

    While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. PMID:26177073

  16. Complete Genome Sequence of the Multiresistant Acinetobacter baumannii Strain AbH12O-A2, Isolated during a Large Outbreak in Spain.

    PubMed

    Merino, M; Alvarez-Fraga, L; Gómez, M J; Aransay, A M; Lavín, J L; Chaves, F; Bou, G; Poza, M

    2014-01-01

    We report the complete genome sequence of Acinetobacter baumannii strain AbH12O-A2, isolated during a large outbreak in Spain. The genome has 3,875,775 bp and 3,526 coding sequences, with 39.4% G+C content. The availability of this genome will facilitate the study of the pathogenicity of the Acinetobacter species. PMID:25395646

  17. Complete Genome Sequence of the Multiresistant Acinetobacter baumannii Strain AbH12O-A2, Isolated during a Large Outbreak in Spain

    PubMed Central

    Merino, M.; Alvarez-Fraga, L.; Gómez, M. J.; Aransay, A. M.; Lavín, J. L.; Chaves, F.

    2014-01-01

    We report the complete genome sequence of Acinetobacter baumannii strain AbH12O-A2, isolated during a large outbreak in Spain. The genome has 3,875,775 bp and 3,526 coding sequences, with 39.4% G+C content. The availability of this genome will facilitate the study of the pathogenicity of the Acinetobacter species. PMID:25395646

  18. The Complete Genome Sequence and Comparative Genome Analysis of the High Pathogenicity Yersinia enterocolitica Strain 8081

    PubMed Central

    Thomson, Nicholas R; Howard, Sarah; Wren, Brendan W; Holden, Matthew T. G; Crossman, Lisa; Challis, Gregory L; Churcher, Carol; Mungall, Karen; Brooks, Karen; Chillingworth, Tracey; Feltwell, Theresa; Abdellah, Zahra; Hauser, Heidi; Jagels, Kay; Maddison, Mark; Moule, Sharon; Sanders, Mandy; Whitehead, Sally; Quail, Michael A; Dougan, Gordon; Parkhill, Julian; Prentice, Michael B

    2006-01-01

    The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the

  19. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    PubMed

    Thomson, Nicholas R; Howard, Sarah; Wren, Brendan W; Holden, Matthew T G; Crossman, Lisa; Challis, Gregory L; Churcher, Carol; Mungall, Karen; Brooks, Karen; Chillingworth, Tracey; Feltwell, Theresa; Abdellah, Zahra; Hauser, Heidi; Jagels, Kay; Maddison, Mark; Moule, Sharon; Sanders, Mandy; Whitehead, Sally; Quail, Michael A; Dougan, Gordon; Parkhill, Julian; Prentice, Michael B

    2006-12-15

    The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the

  20. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots

    PubMed Central

    Wang, Yupeng; Ficklin, Stephen P.; Wang, Xiyin; Feltus, F. Alex; Paterson, Andrew H.

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots. PMID:27195960

  1. Large, Male Germ Cell-Specific Hypomethylated DNA Domains With Unique Genomic and Epigenomic Features on the Mouse X Chromosome

    PubMed Central

    Ikeda, Rieko; Shiura, Hirosuke; Numata, Koji; Sugimoto, Michihiko; Kondo, Masayo; Mise, Nathan; Suzuki, Masako; Greally, John M.; Abe, Kuniya

    2013-01-01

    To understand the epigenetic regulation required for germ cell-specific gene expression in the mouse, we analysed DNA methylation profiles of developing germ cells using a microarray-based assay adapted for a small number of cells. The analysis revealed differentially methylated sites between cell types tested. Here, we focused on a group of genomic sequences hypomethylated specifically in germline cells as candidate regions involved in the epigenetic regulation of germline gene expression. These hypomethylated sequences tend to be clustered, forming large (10 kb to ∼9 Mb) genomic domains, particularly on the X chromosome of male germ cells. Most of these regions, designated here as large hypomethylated domains (LoDs), correspond to segmentally duplicated regions that contain gene families showing germ cell- or testis-specific expression, including cancer testis antigen genes. We found an inverse correlation between DNA methylation level and expression of genes in these domains. Most LoDs appear to be enriched with H3 lysine 9 dimethylation, usually regarded as a repressive histone modification, although some LoD genes can be expressed in male germ cells. It thus appears that such a unique epigenomic state associated with the LoDs may constitute a basis for the specific expression of genes contained in these genomic domains. PMID:23861320

  2. Research guidelines in the era of large-scale collaborations: an analysis of Genome-wide Association Study Consortia.

    PubMed

    Austin, Melissa A; Hair, Marilyn S; Fullerton, Stephanie M

    2012-05-01

    Scientific research has shifted from studies conducted by single investigators to the creation of large consortia. Genetic epidemiologists, for example, now collaborate extensively for genome-wide association studies (GWAS). The effect has been a stream of confirmed disease-gene associations. However, effects on human subjects oversight, data-sharing, publication and authorship practices, research organization and productivity, and intellectual property remain to be examined. The aim of this analysis was to identify all research consortia that had published the results of a GWAS analysis since 2005, characterize them, determine which have publicly accessible guidelines for research practices, and summarize the policies in these guidelines. A review of the National Human Genome Research Institute's Catalog of Published Genome-Wide Association Studies identified 55 GWAS consortia as of April 1, 2011. These consortia were comprised of individual investigators, research centers, studies, or other consortia and studied 48 different diseases or traits. Only 14 (25%) were found to have publicly accessible research guidelines on consortia websites. The available guidelines provide information on organization, governance, and research protocols; half address institutional review board approval. Details of publication, authorship, data-sharing, and intellectual property vary considerably. Wider access to consortia guidelines is needed to establish appropriate research standards with broad applicability to emerging forms of large-scale collaboration. PMID:22491085

  3. Research Guidelines in the Era of Large-scale Collaborations: An Analysis of Genome-wide Association Study Consortia

    PubMed Central

    Austin, Melissa A.; Hair, Marilyn S.; Fullerton, Stephanie M.

    2012-01-01

    Scientific research has shifted from studies conducted by single investigators to the creation of large consortia. Genetic epidemiologists, for example, now collaborate extensively for genome-wide association studies (GWAS). The effect has been a stream of confirmed disease-gene associations. However, effects on human subjects oversight, data-sharing, publication and authorship practices, research organization and productivity, and intellectual property remain to be examined. The aim of this analysis was to identify all research consortia that had published the results of a GWAS analysis since 2005, characterize them, determine which have publicly accessible guidelines for research practices, and summarize the policies in these guidelines. A review of the National Human Genome Research Institute’s Catalog of Published Genome-Wide Association Studies identified 55 GWAS consortia as of April 1, 2011. These consortia were comprised of individual investigators, research centers, studies, or other consortia and studied 48 different diseases or traits. Only 14 (25%) were found to have publicly accessible research guidelines on consortia websites. The available guidelines provide information on organization, governance, and research protocols; half address institutional review board approval. Details of publication, authorship, data-sharing, and intellectual property vary considerably. Wider access to consortia guidelines is needed to establish appropriate research standards with broad applicability to emerging forms of large-scale collaboration. PMID:22491085

  4. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. PMID:26756535

  5. Strain Dependent Genetic Networks for Antibiotic-Sensitivity in a Bacterial Pathogen with a Large Pan-Genome.

    PubMed

    van Opijnen, Tim; Dedrick, Sandra; Bento, José

    2016-09-01

    The interaction between an antibiotic and bacterium is not merely restricted to the drug and its direct target, rather antibiotic induced stress seems to resonate through the bacterium, creating selective pressures that drive the emergence of adaptive mutations not only in the direct target, but in genes involved in many different fundamental processes as well. Surprisingly, it has been shown that adaptive mutations do not necessarily have the same effect in all species, indicating that the genetic background influences how phenotypes are manifested. However, to what extent the genetic background affects the manner in which a bacterium experiences antibiotic stress, and how this stress is processed is unclear. Here we employ the genome-wide tool Tn-Seq to construct daptomycin-sensitivity profiles for two strains of the bacterial pathogen Streptococcus pneumoniae. Remarkably, over half of the genes that are important for dealing with antibiotic-induced stress in one strain are dispensable in another. By confirming over 100 genotype-phenotype relationships, probing potassium-loss, employing genetic interaction mapping as well as temporal gene-expression experiments we reveal genome-wide conditionally important/essential genes, we discover roles for genes with unknown function, and uncover parts of the antibiotic's mode-of-action. Moreover, by mapping the underlying genomic network for two query genes we encounter little conservation in network connectivity between strains as well as profound differences in regulatory relationships. Our approach uniquely enables genome-wide fitness comparisons across strains, facilitating the discovery that antibiotic responses are complex events that can vary widely between strains, which suggests that in some cases the emergence of resistance could be strain specific and at least for species with a large pan-genome less predictable. PMID:27607357

  6. Completion of the swine genome will simplify the production of swine as a large animal biomedical model

    PubMed Central

    2012-01-01

    Background Anatomic and physiological similarities to the human make swine an excellent large animal model for human health and disease. Methods Cloning from a modified somatic cell, which can be determined in cells prior to making the animal, is the only method available for the production of targeted modifications in swine. Results Since some strains of swine are similar in size to humans, technologies that have been developed for swine can be readily adapted to humans and vice versa. Here the importance of swine as a biomedical model, current technologies to produce genetically enhanced swine, current biomedical models, and how the completion of the swine genome will promote swine as a biomedical model are discussed. Conclusions The completion of the swine genome will enhance the continued use and development of swine as models of human health, syndromes and conditions. PMID:23151353

  7. β-Propeller Blades as Ancestral Peptides in Protein Evolution

    PubMed Central

    Kopec, Klaus O.; Lupas, Andrei N.

    2013-01-01

    Proteins of the β-propeller fold are ubiquitous in nature and widely used as structural scaffolds for ligand binding and enzymatic activity. This fold comprises between four and twelve four-stranded β-meanders, the so called blades that are arranged circularly around a central funnel-shaped pore. Despite the large size range of β-propellers, their blades frequently show sequence similarity indicative of a common ancestry and it has been proposed that the majority of β-propellers arose divergently by amplification and diversification of an ancestral blade. Given the structural versatility of β-propellers and the hypothesis that the first folded proteins evolved from a simpler set of peptides, we investigated whether this blade may have given rise to other folds as well. Using sequence comparisons, we identified proteins of four other folds as potential homologs of β-propellers: the luminal domain of inositol-requiring enzyme 1 (IRE1-LD), type II β-prisms, β-pinwheels, and WW domains. Because, with increasing evolutionary distance and decreasing sequence length, the statistical significance of sequence comparisons becomes progressively harder to distinguish from the background of convergent similarities, we complemented our analyses with a new method that evaluates possible homology based on the correlation between sequence and structure similarity. Our results indicate a homologous relationship of IRE1-LD and type II β-prisms with β-propellers, and an analogous one for β-pinwheels and WW domains. Whereas IRE1-LD most likely originated by fold-changing mutations from a fully formed PQQ motif β-propeller, type II β-prisms originated by amplification and differentiation of a single blade, possibly also of the PQQ type. We conclude that both β-propellers and type II β-prisms arose by independent amplification of a blade-sized fragment, which represents a remnant of an ancient peptide world. PMID:24143202

  8. DNA content variation in monilophytes and lycophytes: large genomes that are not endopolyploid.

    PubMed

    Bainard, Jillian D; Henry, Thomas A; Bainard, Luke D; Newmaster, Steven G

    2011-08-01

    Less than 1% of known monilophytes and lycophytes have a genome size estimate, and substantially less is known about the presence and prevalence of endopolyploid nuclei in these groups. Thirty-one monilophyte species (including three horsetails) and six lycophyte species were collected in Ontario, Canada. Using flow cytometry, genome size and degree of endopolyploidy were estimated for 37 species. Across the five orders covered, 1Cx-values averaged 4.2 pg in the Lycopodiales, 18.1 pg for the Equisetales, 5.06 pg for a single representative of the Ophioglossales, 14.3 pg for the Osmundales, and 7.06 pg for the Polypodiales. There was no indication of endoreduplication in any of the leaf, stem, or root tissue analyzed. This information is essential to our understanding of DNA content evolution in land plants. PMID:21847691

  9. Multiple recent horizontal transfers of a large genomic region in cheese making fungi

    PubMed Central

    Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

    2014-01-01

    While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti—called Wallaby—present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes. PMID:24407037

  10. Large homogeneous genome regions (isochores) in soybean [glycine max (L.) merr].

    PubMed

    Woody, J L; Beavis, W; Shoemaker, R C

    2012-01-01

    The landscape of plant genomes, while slowly being characterized and defined, is still composed primarily of regions of undefined function. Many eukaryotic genomes contain isochore regions, mosaics of homogeneous GC content that can abruptly change from one neighboring isochore to the next. Isochores are broken into families that are characterized by their GC levels. We identified 4,339 compositionally distinct domains and 331 of these were identified as long homogeneous genome regions (LHGRs). We assigned these to four families based on finite mixture models of GC content. We then characterized each family with respect to exon length, gene content, and transposable elements. The LHGR pattern of soybeans is unique in that while the majority of the genes within LHGRs are found within a single LHGR family with a narrow GC range (Family B), that family is not the highest in GC content as seen in vertebrates and invertebrates. Instead Family B has a mean GC content of 35%. The range of GC content for all LHGRs is 16-59% GC which is a larger range than what is typical of vertebrates. This is the first study in which LHGRs have been identified in soybeans and the functions of the genes within the LHGRs have been analyzed. PMID:22934101

  11. Genomic diversity of large-plaque-forming podoviruses infecting the phytopathogen Ralstonia solanacearum.

    PubMed

    Kawasaki, Takeru; Narulita, Erlia; Matsunami, Minaho; Ishikawa, Hiroki; Shimizu, Mio; Fujie, Makoto; Bhunchoth, Anjana; Phironrit, Namthip; Chatchawankanphanich, Orawan; Yamada, Takashi

    2016-05-01

    The genome organization, gene structure, and host range of five podoviruses that infect Ralstonia solanacearum, the causative agent of bacterial wilt disease were characterized. The phages fell into two distinctive groups based on the genome position of the RNA polymerase gene (i.e., T7-type and ϕKMV-type). One-step growth experiments revealed that ϕRSB2 (a T7-like phage) lysed host cells more efficiently with a shorter infection cycle (ca. 60min corresponding to half the doubling time of the host) than ϕKMV-like phages such as ϕRSB1 (with an infection cycle of ca. 180min). Co-infection experiments with ϕRSB1 and ϕRSB2 showed that ϕRSB2 always predominated in the phage progeny independent of host strains. Most phages had wide host-ranges and the phage particles usually did not attach to the resistant strains; when occasionally some did, the phage genome was injected into the resistant strain׳s cytoplasm, as revealed by fluorescence microscopy with SYBR Gold-labeled phage particles. PMID:26901487

  12. Merlin: Computer-Aided Oligonucleotide Design for Large Scale Genome Engineering with MAGE.

    PubMed

    Quintin, Michael; Ma, Natalie J; Ahmed, Samir; Bhatia, Swapnil; Lewis, Aaron; Isaacs, Farren J; Densmore, Douglas

    2016-06-17

    Genome engineering technologies now enable precise manipulation of organism genotype, but can be limited in scalability by their design requirements. Here we describe Merlin ( http://merlincad.org ), an open-source web-based tool to assist biologists in designing experiments using multiplex automated genome engineering (MAGE). Merlin provides methods to generate pools of single-stranded DNA oligonucleotides (oligos) for MAGE experiments by performing free energy calculation and BLAST scoring on a sliding window spanning the targeted site. These oligos are designed not only to improve recombination efficiency, but also to minimize off-target interactions. The application further assists experiment planning by reporting predicted allelic replacement rates after multiple MAGE cycles, and enables rapid result validation by generating primer sequences for multiplexed allele-specific colony PCR. Here we describe the Merlin oligo and primer design procedures and validate their functionality compared to OptMAGE by eliminating seven AvrII restriction sites from the Escherichia coli genome. PMID:27054880

  13. Creation of Functional Viruses from Non-Functional cDNA Clones Obtained from an RNA Virus Population by the Use of Ancestral Reconstruction.

    PubMed

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Dräger, Carolin; Orton, Richard J; Blome, Sandra; Höper, Dirk; Beer, Martin; Rasmussen, Thomas Bruun

    2015-01-01

    RNA viruses have the highest known mutation rates. Consequently it is likely that a high proportion of individual RNA virus genomes, isolated from an infected host, will contain lethal mutations and be non-functional. This is problematic if the aim is to clone and investigate high-fitness, functional cDNAs and may also pose problems for sequence-based analysis of viral evolution. To address these challenges we have performed a study of the evolution of classical swine fever virus (CSFV) using deep sequencing and analysis of 84 full-length cDNA clones, each representing individual genomes from a moderately virulent isolate. In addition to here being used as a model for RNA viruses generally, CSFV has high socioeconomic importance and remains a threat to animal welfare and pig production. We find that the majority of the investigated genomes are non-functional and only 12% produced infectious RNA transcripts. Full length sequencing of cDNA clones and deep sequencing of the parental population identified substitutions important for the observed phenotypes. The investigated cDNA clones were furthermore used as the basis for inferring the sequence of functional viruses. Since each unique clone must necessarily be the descendant of a functional ancestor, we hypothesized that it should be possible to produce functional clones by reconstructing ancestral sequences. To test this we used phylogenetic methods to infer two ancestral sequences, which were then reconstructed as cDNA clones. Viruses rescued from the reconstructed cDNAs were tested in cell culture and pigs. Both reconstructed ancestral genomes proved functional, and displayed distinct phenotypes in vitro and in vivo. We suggest that reconstruction of ancestral viruses is a useful tool for experimental and computational investigations of virulence and viral evolution. Importantly, ancestral reconstruction can be done even on the basis of a set of sequences that all correspond to non-functional variants. PMID

  14. Creation of Functional Viruses from Non-Functional cDNA Clones Obtained from an RNA Virus Population by the Use of Ancestral Reconstruction

    PubMed Central

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Dräger, Carolin; Orton, Richard J; Blome, Sandra; Höper, Dirk; Beer, Martin; Rasmussen, Thomas Bruun

    2015-01-01

    RNA viruses have the highest known mutation rates. Consequently it is likely that a high proportion of individual RNA virus genomes, isolated from an infected host, will contain lethal mutations and be non-functional. This is problematic if the aim is to clone and investigate high-fitness, functional cDNAs and may also pose problems for sequence-based analysis of viral evolution. To address these challenges we have performed a study of the evolution of classical swine fever virus (CSFV) using deep sequencing and analysis of 84 full-length cDNA clones, each representing individual genomes from a moderately virulent isolate. In addition to here being used as a model for RNA viruses generally, CSFV has high socioeconomic importance and remains a threat to animal welfare and pig production. We find that the majority of the investigated genomes are non-functional and only 12% produced infectious RNA transcripts. Full length sequencing of cDNA clones and deep sequencing of the parental population identified substitutions important for the observed phenotypes. The investigated cDNA clones were furthermore used as the basis for inferring the sequence of functional viruses. Since each unique clone must necessarily be the descendant of a functional ancestor, we hypothesized that it should be possible to produce functional clones by reconstructing ancestral sequences. To test this we used phylogenetic methods to infer two ancestral sequences, which were then reconstructed as cDNA clones. Viruses rescued from the reconstructed cDNAs were tested in cell culture and pigs. Both reconstructed ancestral genomes proved functional, and displayed distinct phenotypes in vitro and in vivo. We suggest that reconstruction of ancestral viruses is a useful tool for experimental and computational investigations of virulence and viral evolution. Importantly, ancestral reconstruction can be done even on the basis of a set of sequences that all correspond to non-functional variants. PMID

  15. The complete mitochondrial genome of Solemya velum (Mollusca: Bivalvia) and its relationships with Conchifera

    PubMed Central

    2013-01-01

    Background Bivalve mitochondrial genomes exhibit a wide array of uncommon features, like extensive gene rearrangements, large sizes, and unusual ways of inheritance. Species pertaining to the order Solemyida (subclass Opponobranchia) show many peculiar evolutionary adaptations, f.i. extensive symbiosis with chemoautotrophic bacteria. Despite Opponobranchia are central in bivalve phylogeny, being considered the sister group of all Autobranchia, a complete mitochondrial genome has not been sequenced yet. Results In this paper, we characterized the complete mitochondrial genome of the Atlantic awning clam Solemya velum: A-T content, gene arrangement and other features are more similar to putative ancestral mollusks than to other bivalves. Two supranumerary open reading frames are present in a large, otherwise unassigned, region, while the origin of replication could be located in a region upstream to the cox3 gene. Conclusions We show that S. velum mitogenome retains most of the ancestral conchiferan features, which is unusual among bivalve mollusks, and we discuss main peculiarities of this first example of an organellar genome coming from the subclass Opponobranchia. Mitochondrial genomes of Solemya (for bivalves) and Haliotis (for gastropods) seem to retain the original condition of mollusks, as most probably exemplified by Katharina. PMID:23777315

  16. Detection of Weakly Conserved Ancestral Mammalian RegulatorySequences by Primate Comparisons

    SciTech Connect

    Wang, Qian-fei; Prabhakar, Shyam; Chanan, Sumita; Cheng,Jan-Fang; Rubin, Edward M.; Boffelli, Dario

    2006-06-01

    Genomic comparisons between human and distant, non-primatemammals are commonly used to identify cis-regulatory elements based onconstrained sequence evolution. However, these methods fail to detectcryptic functional elements, which are too weakly conserved among mammalsto distinguish from nonfunctional DNA. To address this problem, weexplored the potential of deep intra-primate sequence comparisons. Wesequenced the orthologs of 558 kb of human genomic sequence, coveringmultiple loci involved in cholesterol homeostasis, in 6 nonhumanprimates. Our analysis identified 6 noncoding DNA elements displayingsignificant conservation among primates, but undetectable in more distantcomparisons. In vitro and in vivo tests revealed that at least three ofthese 6 elements have regulatory function. Notably, the mouse orthologsof these three functional human sequences had regulatory activity despitetheir lack of significant sequence conservation, indicating that they arecryptic ancestral cis-regulatory elements. These regulatory elementscould still be detected in a smaller set of three primate speciesincluding human, rhesus and marmoset. Since the human and rhesus genomesequences are already available, and the marmoset genome is activelybeing sequenced, the primate-specific conservation analysis describedhere can be applied in the near future on a whole-genome scale, tocomplement the annotation provided by more distant speciescomparisons.

  17. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.

    PubMed

    Olalde, Iñigo; Allentoft, Morten E; Sánchez-Quinto, Federico; Santpere, Gabriel; Chiang, Charleston W K; DeGiorgio, Michael; Prado-Martinez, Javier; Rodríguez, Juan Antonio; Rasmussen, Simon; Quilez, Javier; Ramírez, Oscar; Marigorta, Urko M; Fernández-Callejo, Marcos; Prada, María Encina; Encinas, Julio Manuel Vidal; Nielsen, Rasmus; Netea, Mihai G; Novembre, John; Sturm, Richard A; Sabeti, Pardis; Marquès-Bonet, Tomàs; Navarro, Arcadi; Willerslev, Eske; Lalueza-Fox, Carles

    2014-03-13

    Ancient genomic sequences have started to reveal the origin and the demographic impact of farmers from the Neolithic period spreading into Europe. The adoption of farming, stock breeding and sedentary societies during the Neolithic may have resulted in adaptive changes in genes associated with immunity and diet. However, the limited data available from earlier hunter-gatherers preclude an understanding of the selective processes associated with this crucial transition to agriculture in recent human evolution. Here we sequence an approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, to retrieve a complete pre-agricultural European human genome. Analysis of this genome in the context of other ancient samples suggests the existence of a common ancient genomic signature across western and central Eurasia from the Upper Paleolithic to the Mesolithic. The La Braña individual carries ancestral alleles in several skin pigmentation genes, suggesting that the light skin of modern Europeans was not yet ubiquitous in Mesolithic times. Moreover, we provide evidence that a significant number of derived, putatively adaptive variants associated with pathogen resistance in modern Europeans were already present in this hunter-gatherer. PMID:24463515

  18. Sexually dimorphic effects of ancestral exposure to vinclozolin on stress reactivity in rats.

    PubMed

    Gillette, Ross; Miller-Crews, Isaac; Nilsson, Eric E; Skinner, Michael K; Gore, Andrea C; Crews, David

    2014-10-01

    How an individual responds to the environment depends upon both personal life history as well as inherited genetic and epigenetic factors from ancestors. Using a 2-hit, 3 generations apart model, we tested how F3 descendants of rats given in utero exposure to the environmental endocrine-disrupting chemical (EDC) vinclozolin reacted to stress during adolescence in their own lives, focusing on sexually dimorphic phenotypic outcomes. In adulthood, male and female F3 vinclozolin- or vehicle-lineage rats, stressed or nonstressed, were behaviorally characterized on a battery of tests and then euthanized. Serum was used for hormone assays, and brains were used for quantitative PCR and transcriptome analyses. Results showed that the effects of ancestral exposure to vinclozolin converged with stress experienced during adolescence in a sexually dimorphic manner. Debilitating effects were seen at all levels of the phenotype, including physiology, behavior, brain metabolism, gene expression, and genome-wide transcriptome modifications in specific brain nuclei. Additionally, females were significantly more vulnerable than males to transgenerational effects of vinclozolin on anxiety but not sociality tests. This fundamental transformation occurs in a manner not predicted by the ancestral exposure or the proximate effects of stress during adolescence, an interaction we refer to as synchronicity. PMID:25051444

  19. Sexually Dimorphic Effects of Ancestral Exposure to Vinclozolin on Stress Reactivity in Rats

    PubMed Central

    Gillette, Ross; Miller-Crews, Isaac; Nilsson, Eric E.; Skinner, Michael K.; Gore, Andrea C.

    2014-01-01

    How an individual responds to the environment depends upon both personal life history as well as inherited genetic and epigenetic factors from ancestors. Using a 2-hit, 3 generations apart model, we tested how F3 descendants of rats given in utero exposure to the environmental endocrine-disrupting chemical (EDC) vinclozolin reacted to stress during adolescence in their own lives, focusing on sexually dimorphic phenotypic outcomes. In adulthood, male and female F3 vinclozolin- or vehicle-lineage rats, stressed or nonstressed, were behaviorally characterized on a battery of tests and then euthanized. Serum was used for hormone assays, and brains were used for quantitative PCR and transcriptome analyses. Results showed that the effects of ancestral exposure to vinclozolin converged with stress experienced during adolescence in a sexually dimorphic manner. Debilitating effects were seen at all levels of the phenotype, including physiology, behavior, brain metabolism, gene expression, and genome-wide transcriptome modifications in specific brain nuclei. Additionally, females were significantly more vulnerable than males to transgenerational effects of vinclozolin on anxiety but not sociality tests. This fundamental transformation occurs in a manner not predicted by the ancestral exposure or the proximate effects of stress during adolescence, an interaction we refer to as synchronicity. PMID:25051444

  20. Are survival processing memory advantages based on ancestral priorities?

    PubMed

    Soderstrom, Nicholas C; McCabe, David P

    2011-06-01

    Recent research has suggested that our memory systems are especially tuned to process information according to its survival relevance, and that inducing problems of "ancestral priorities" faced by our ancestors should lead to optimal recall performance (Nairne & Pandeirada, Cognitive Psychology, 2010). The present study investigated the specificity of this idea by comparing an ancestor-consistent scenario and a modern survival scenario that involved threats that were encountered by human ancestors (e.g., predators) or threats from fictitious creatures (i.e., zombies). Participants read one of four survival scenarios in which the environment and the explicit threat were either consistent or inconsistent with ancestrally based problems (i.e., grasslands-predators, grasslands-zombies, city-attackers, city-zombies), or they rated words for pleasantness. After rating words based on their survival relevance (or pleasantness), the participants performed a free recall task. All survival scenarios led to better recall than did pleasantness ratings, but recall was greater when zombies were the threat, as compared to predators or attackers. Recall did not differ for the modern (i.e., city) and ancestral (i.e., grasslands) scenarios. These recall differences persisted when valence and arousal ratings for the scenarios were statistically controlled as well. These data challenge the specificity of ancestral priorities in survival-processing advantages in memory. PMID:21327372

  1. Reaching Children through Their Ancestral Language and Authentic Literature

    ERIC Educational Resources Information Center

    Bannon, Kay Thorpe

    2004-01-01

    In this article, the author describes a program of Eastern Cherokee ancestral language restoration in Cherokee, North Carolina. One of the primary goals of the program is to enhance the self-concept of the children and motivate the students to experience academic excitement and success. The use of authentic legends and stories is one method…

  2. A comparison of ancestral state reconstruction methods for quantitative characters.

    PubMed

    Royer-Carenzi, Manuela; Didier, Gilles

    2016-09-01

    Choosing an ancestral state reconstruction method among the alternatives available for quantitative characters may be puzzling. We present here a comparison of seven of them, namely the maximum likelihood, restricted maximum likelihood, generalized least squares under Brownian, Brownian-with-trend and Ornstein-Uhlenbeck models, phylogenetic independent contrasts and squared parsimony methods. A review of the relations between these methods shows that the maximum likelihood, the restricted maximum likelihood and the generalized least squares under Brownian model infer the same ancestral states and can only be distinguished by the distributions accounting for the reconstruction uncertainty which they provide. The respective accuracy of the methods is assessed over character evolution simulated under a Brownian motion with (and without) directional or stabilizing selection. We give the general form of ancestral state distributions conditioned on leaf states under the simulation models. Ancestral distributions are used first, to give a theoretical lower bound of the expected reconstruction error, and second, to develop an original evaluation scheme which is more efficient than comparing the reconstructed and the simulated states. Our simulations show that: (i) the distributions of the reconstruction uncertainty provided by the methods generally make sense (some more than others); (ii) it is essential to detect the presence of an evolutionary trend and to choose a reconstruction method accordingly; (iii) all the methods show good performances on characters under stabilizing selection; (iv) without trend or stabilizing selection, the maximum likelihood method is generally the most accurate. PMID:27234644

  3. Advanced Intestinal Cancers often Maintain a Multi-Ancestral Architecture

    PubMed Central

    Zahm, Christopher D.; Szulczewski, Joseph M.; Leystra, Alyssa A.; Paul Olson, Terrah J.; Clipson, Linda; Albrecht, Dawn M.; Middlebrooks, Malisa; Thliveris, Andrew T.; Matkowskyj, Kristina A.; Washington, Mary Kay; Newton, Michael A.; Eliceiri, Kevin W.; Halberg, Richard B.

    2016-01-01

    A widely accepted paradigm in the field of cancer biology is that solid tumors are uni-ancestral being derived from a single founder and its descendants. However, data have been steadily accruing that indicate early tumors in mice and humans can have a multi-ancestral origin in which an initiated primogenitor facilitates the transformation of neighboring co-genitors. We developed a new mouse model that permits the determination of clonal architecture of intestinal tumors in vivo and ex vivo, have validated this model, and then used it to assess the clonal architecture of adenomas, intramucosal carcinomas, and invasive adenocarcinomas of the intestine. The percentage of multi-ancestral tumors did not significantly change as tumors progressed from adenomas with low-grade dysplasia [40/65 (62%)], to adenomas with high-grade dysplasia [21/37 (57%)], to intramucosal carcinomas [10/23 (43%]), to invasive adenocarcinomas [13/19 (68%)], indicating that the clone arising from the primogenitor continues to coexist with clones arising from co-genitors. Moreover, neoplastic cells from distinct clones within a multi-ancestral adenocarcinoma have even been observed to simultaneously invade into the underlying musculature [2/15 (13%)]. Thus, intratumoral heterogeneity arising early in tumor formation persists throughout tumorigenesis. PMID:26919712

  4. Isolation of ancestral sylvatic dengue virus type 1, Malaysia.

    PubMed

    Teoh, Boon-Teong; Sam, Sing-Sin; Abd-Jamil, Juraina; AbuBakar, Sazaly

    2010-11-01

    Ancestral sylvatic dengue virus type 1, which was isolated from a monkey in 1972, was isolated from a patient with dengue fever in Malaysia. The virus is neutralized by serum of patients with endemic DENV-1 infection. Rare isolation of this virus suggests a limited spillover infection from an otherwise restricted sylvatic cycle. PMID:21029545

  5. A genome-wide linkage analysis for reproductive traits in F2 Large White × Meishan cross gilts

    PubMed Central

    Hernandez, S C; Finlayson, H A; Ashworth, C J; Haley, C S; Archibald, A L

    2014-01-01

    Female reproductive performance traits in pigs have low heritabilities thus limiting improvement through traditional selective breeding programmes. However, there is substantial genetic variation found between pig breeds with the Chinese Meishan being one of the most prolific pig breeds known. In this study, three cohorts of Large White × Meishan F2 cross-bred pigs were analysed to identify quantitative trait loci (QTL) with effects on reproductive traits, including ovulation rate, teat number, litter size, total born alive and prenatal survival. A total of 307 individuals were genotyped for 174 genetic markers across the genome. The genome-wide analysis of the trait-recorded F2 gilts in their first parity/litter revealed one QTL for teat number significant at the genome level and a total of 12 QTL, which are significant at the chromosome-wide level, for: litter size (three QTL), total born alive (two QTL), ovulation rate (four QTL), prenatal survival (one QTL) and teat number (two QTL). Further support for eight of these QTL is provided by results from other studies. Four of these 12 QTL were mapped for the first time in this study: on SSC15 for ovulation rate and on SSC18 for teat number, ovulation rate and litter size. PMID:24456574

  6. Genomic profiling of high-grade large-cell neuroendocrine carcinoma of the colon

    PubMed Central

    Hammond, William A.; Crozier, Jennifer A.; Nakhleh, Raouf E.

    2016-01-01

    High-grade neuroendocrine carcinoma (HGNEC) of the colon is a rare and aggressive cancer that has a poor prognosis. Currently no standard treatment exists, and published case series report an overall survival of approximately one year with treatment. Typically patients receive treatment similar to that recommended for small-cell lung cancer, extrapolating from the similarity in cancer biology. Here we report a case of HGNEC of the colon with genomic profiling that identified a KRAS G12D mutation and a PI3K mutation that has not yet been reported in the literature for this tumor type. PMID:27034803

  7. Hawaiian Drosophila genomes: size variation and evolutionary expansions.

    PubMed

    Craddock, Elysse M; Gall, Joseph G; Jonas, Mark

    2016-02-01

    This paper reports genome sizes of one Hawaiian Scaptomyza and 16 endemic Hawaiian Drosophila species that include five members of the antopocerus species group, one member of the modified mouthpart group, and ten members of the picture wing clade. Genome size expansions have occurred independently multiple times among Hawaiian Drosophila lineages, and have resulted in an over 2.3-fold range of genome sizes among species, with the largest observed in Drosophila cyrtoloma (1C = 0.41 pg). We find evidence that these repeated genome size expansions were likely driven by the addition of significant amounts of heterochromatin and satellite DNA. For example, our data reveal that the addition of seven heterochromatic chromosome arms to the ancestral haploid karyotype, and a remarkable proportion of ~70 % satellite DNA, account for the greatly expanded size of the D. cyrtoloma genome. Moreover, the genomes of 13/17 Hawaiian picture wing species are composed of substantial proportions (22-70 %) of detectable satellites (all but one of which are AT-rich). Our results suggest that in this tightly knit group of recently evolved species, genomes have expanded, in large part, via evolutionary amplifications of satellite DNA sequences in centric and pericentric domains (especially of the X and dot chromosomes), which have resulted in longer acrocentric chromosomes or metacentrics with an added heterochromatic chromosome arm. We discuss possible evolutionary mechanisms that may have shaped these patterns, including rapid fixation of novel expanded genomes during founder-effect speciation. PMID:26790663

  8. Genome-wide Association Study of Porcine Hematological Parameters in a Large White × Minzhu F2 Resource Population

    PubMed Central

    Luo, Weizhen; Chen, Shaokang; Cheng, Duxue; Wang, Ligang; Li, Yong; Ma, Xiaojun; Song, Xin; Liu, Xin; Li, Wen; Liang, Jing; Yan, Hua; Zhao, Kebin; Wang, Chuduan; Wang, Lixian; Zhang, Longchao

    2012-01-01

    Hematological traits, which are important indicators of immune function in animals, have been commonly examined as biomarkers of disease and disease severity in humans and animals. Genome-wide significant quantitative trait loci (QTLs) provide important information for use in breeding programs of animals such as pigs. QTLs for hematological parameters (hematological traits) have been detected in pig chromosomes, although these are often mapped by linkage analysis to large intervals making identification of the underlying mutation problematic. Single nucleotide polymorphisms (SNPs) are the common form of genetic variation among individuals and are thought to account for the majority of inherited traits. In this study, a genome-wide association study (GWAS) was performed to detect regions of association with hematological traits in a three-generation resource population produced by intercrossing Large White boars and Minzhu sows during the period from 2007 to 2011. Illumina PorcineSNP60 BeadChip technology was used to genotype each animal and seven hematological parameters were measured (hematocrit (HCT), hemoglobin (HGB), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), mean corpuscular volume (MCV), red blood cell count (RBC) and red blood cell volume distribution width (RDW)). Data were analyzed in a three step Genome-wide Rapid Association using the Mixed Model and Regression-Genomic Control (GRAMMAR-GC) method. A total of 62 genome-wide significant and three chromosome-wide significant SNPs associated with hematological parameters were detected in this GWAS. Seven and five SNPs were associated with HCT and HGB, respectively. These SNPs were all located within the region of 34.6-36.5 Mb on SSC7. Four SNPs within the region of 43.7-47.0 Mb and fifty-five SNPs within the region of 42.2-73.8 Mb on SSC8 showed significant association with MCH and MCV, respectively. At chromosome-wide significant level, one SNP at 29.2 Mb on SSC1

  9. Large-scale whole-genome sequencing of the Icelandic population.

    PubMed

    Gudbjartsson, Daniel F; Helgason, Hannes; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Gylfason, Arnaldur; Besenbacher, Soren; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Stacey, Simon N; Frigge, Michael L; Holm, Hilma; Saemundsdottir, Jona; Helgadottir, Hafdis Th; Johannsdottir, Hrefna; Sigfusson, Gunnlaugur; Thorgeirsson, Gudmundur; Sverrisson, Jon Th; Gretarsdottir, Solveig; Walters, G Bragi; Rafnar, Thorunn; Thjodleifsson, Bjarni; Bjornsson, Einar S; Olafsson, Sigurdur; Thorarinsdottir, Hildur; Steingrimsdottir, Thora; Gudmundsdottir, Thora S; Theodors, Asgeir; Jonasson, Jon G; Sigurdsson, Asgeir; Bjornsdottir, Gyda; Jonsson, Jon J; Thorarensen, Olafur; Ludvigsson, Petur; Gudbjartsson, Hakon; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Arnar, David O; Magnusson, Olafur Th; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Sulem, Patrick; Stefansson, Kari

    2015-05-01

    Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity. PMID:25807286

  10. Perspectives on Clinical Informatics: Integrating Large-Scale Clinical, Genomic, and Health Information for Clinical Care

    PubMed Central

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K.

    2013-01-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population. PMID:24465229

  11. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins.

    PubMed

    Croucher, Nicholas J; Page, Andrew J; Connor, Thomas R; Delaney, Aidan J; Keane, Jacqueline A; Bentley, Stephen D; Parkhill, Julian; Harris, Simon R

    2015-02-18

    The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X. PMID:25414349

  12. Perspectives on clinical informatics: integrating large-scale clinical, genomic, and health information for clinical care.

    PubMed

    Choi, In Young; Kim, Tae-Min; Kim, Myung Shin; Mun, Seong K; Chung, Yeun-Jun

    2013-12-01

    The advances in electronic medical records (EMRs) and bioinformatics (BI) represent two significant trends in healthcare. The widespread adoption of EMR systems and the completion of the Human Genome Project developed the technologies for data acquisition, analysis, and visualization in two different domains. The massive amount of data from both clinical and biology domains is expected to provide personalized, preventive, and predictive healthcare services in the near future. The integrated use of EMR and BI data needs to consider four key informatics areas: data modeling, analytics, standardization, and privacy. Bioclinical data warehouses integrating heterogeneous patient-related clinical or omics data should be considered. The representative standardization effort by the Clinical Bioinformatics Ontology (CBO) aims to provide uniquely identified concepts to include molecular pathology terminologies. Since individual genome data are easily used to predict current and future health status, different safeguards to ensure confidentiality should be considered. In this paper, we focused on the informatics aspects of integrating the EMR community and BI community by identifying opportunities, challenges, and approaches to provide the best possible care service for our patients and the population. PMID:24465229

  13. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

    PubMed Central

    Croucher, Nicholas J.; Page, Andrew J.; Connor, Thomas R.; Delaney, Aidan J.; Keane, Jacqueline A.; Bentley, Stephen D.; Parkhill, Julian; Harris, Simon R.

    2015-01-01

    The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates’ recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X. PMID:25414349

  14. Sequence variants from whole genome sequencing a large group of Icelanders.

    PubMed

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports. PMID:25977816

  15. Ancestral relationships of the major eukaryotic lineages.

    PubMed

    Sogin, M L; Morrison, H G; Hinkle, G; Silberman, J D

    1996-03-01

    Molecular systematics has revolutionized our understanding of microbial evolution. Phylogenetic frameworks relating all organisms in this biosphere can be inferred from comparisons of slowly evolving molecules such as the small and large subunit ribosomal RNAs. Unlike today's text book standard, the "Five Kingdoms" (plants, animals, fungi, protists and bacteria), molecular studies define three primary lines of descent (Eukaryotes, Eubacteria, and Archaebacteria). Within the Eukaryotes, the "higher" kingdoms (Fungi, Plantae, and Animalia) are joined by at least two novel complex evolutionary assemblages, the "Alveolates" (ciliates, dinoflagellates and apicomplexans) and the "Stramenopiles" (diatoms, oomycetes, labyrinthulids, brown algae and chrysophytes). The separation of these eukaryotic groups (described as the eukaryotic "crown") occurred approximately 10(9) years ago and was preceded by a succession of earlier diverging protist lineages, some as ancient as the separation of the prokaryotic domains. The molecular phylogenies suggest that multiple endosymbiotic events introduced plastids into discrete eukaryotic lineages. PMID:9019131

  16. Origin of human chromosome 2: An ancestral telomere-telomere fusion

    SciTech Connect

    Ijdo, J.W.; Baldini, A.; Ward, D.C.; Reeders, S.T.; Wells, R.A. )

    1991-10-15

    The authors identified two allelic genomic cosmids from human chromosome 2, c8.1 and c29B, each containing two inverted arrays of the vertebrate telomeric repeat in a head-to-head arrangement, 5{prime}(TTAGGG){sub n}-(CCCTAA){sub m}3{prime}. Sequences flanking this telomeric repeat are characteristic of present-day human pretelomeres. BAL-31 nuclease experiments with yeast artificial chromosome clones of human telomeres and fluorescence in situ hybridization reveal that sequences flanking these inverted repeats hybridize both to band 2q13 and to different, but overlapping, subsets of human chromosome ends. They conclude that the locus cloned in cosmids c8.1 and c29B is the relic of an ancient telomere-telomere fusion and marks the point at which two ancestral ape chromosomes fused to give rise to human chromosome 2.

  17. Agnathan VIP, PACAP and Their Receptors: Ancestral Origins of Today's Highly Diversified Forms

    PubMed Central

    Ng, Stephanie Y. L.; Chow, Billy K. C.; Kasamatsu, Jun; Kasahara, Masanori; Lee, Leo T. O.

    2012-01-01

    VIP and PACAP are pleiotropic peptides belonging to the secretin superfamily of brain-gut peptides and interact specifically with three receptors (VPAC1, PAC1 and VPAC2) from the class II B G protein-coupled receptor family. There is immense interest regarding their molecular evolution which is often described closely alongside gene and/or genome duplications. Despite the wide array of information available in various vertebrates and one invertebrate the tunicate, their evolutionary origins remain unresolved. Through searches of genome databases and molecular cloning techniques, the first lamprey VIP/PACAP ligands and VPAC receptors are identified from the Japanese lamprey. In addition, two VPAC receptors (VPACa/b) are identified from inshore hagfish and ligands predicted for sea lamprey. Phylogenetic analyses group these molecules into their respective PHI/VIP, PRP/PACAP and VPAC receptor families and show they resemble ancestral forms. Japanese lamprey VIP/PACAP peptides synthesized were tested with the hagfish VPAC receptors. hfVPACa transduces signal via both adenylyl cylase and phospholipase C pathways, whilst hfVPACb was only able to transduce through the calcium pathway. In contrast to the widespread distribution of VIP/PACAP ligands and receptors in many species, the agnathan PACAP and VPAC receptors were found almost exclusively in the brain. In situ hybridisation further showed their abundance throughout the brain. The range of VIP/PACAP ligands and receptors found are highly useful, providing a glimpse into the evolutionary events both at the structural and functional levels. Though representative of ancestral forms, the VIP/PACAP ligands in particular have retained high sequence conservation indicating the importance of their functions even early in vertebrate evolution. During these nascent stages, only two VPAC receptors are likely responsible for eliciting functions before evolving later into specific subtypes post-Agnatha. We also propose VIP and

  18. Genomic characterization of a large outbreak of Legionella pneumophila serogroup 1 strains in Quebec City, 2012.

    PubMed

    Lévesque, Simon; Plante, Pier-Luc; Mendis, Nilmini; Cantin, Philippe; Marchand, Geneviève; Charest, Hugues; Raymond, Frédéric; Huot, Caroline; Goupil-Sormany, Isabelle; Desbiens, François; Faucher, Sébastien P; Corbeil, Jacques; Tremblay, Cécile

    2014-01-01

    During the summer of 2012, a major Legionella pneumophila serogroup 1 outbreak occurred in Quebec City, Canada, which caused 182 declared cases of Legionnaire's disease and included 13 fatalities. Legionella pneumophila serogroup 1 isolates from 23 patients as well as from 32 cooling towers located in the vicinity of the outbreak were recovered for analysis. In addition, 6 isolates from the 1996 Quebec City outbreak and 4 isolates from patients unrelated to both outbreaks were added to allow comparison. We characterized the isolates using pulsed-field gel electrophoresis, sequence-based typing, and whole genome sequencing. The comparison of patients-isolated strains to cooling tower isolates allowed the identification of the tower that was the source of the outbreak. Legionella pneumophila strain Quebec 2012 was identified as a ST-62 by sequence-based typing methodology. Two new Legionellaceae plasmids were found only in the epidemic strain. The LVH type IV secretion system was found in the 2012 outbreak isolates but not in the ones from the 1996 outbreak and only in half of the contemporary human isolates. The epidemic strains replicated more efficiently and were more cytotoxic to human macrophages than the environmental strains tested. At least four Icm/Dot effectors in the epidemic strains were absent in the environmental strains suggesting that some effectors could impact the intracellular replication in human macrophages. Sequence-based typing and pulsed-field gel electrophoresis combined with whole genome sequencing allowed the identification and the analysis of the causative strain including its likely environmental source. PMID:25105285

  19. Genome evolution of a tertiary dinoflagellate plastid.

    PubMed

    Gabrielsen, Tove M; Minge, Marianne A; Espelund, Mari; Tooming-Klunderud, Ave; Patil, Vishwanath; Nederbragt, Alexander J; Otis, Christian; Turmel, Monique; Shalchian-Tabrizi, Kamran; Lemieux, Claude; Jakobsen, Kjetill S

    2011-01-01

    The dinoflagellates have repeatedly replaced their ancestral peridinin-plastid by plastids derived from a variety of algal lineages ranging from green algae to diatoms. Here, we have characterized the genome of a dinoflagellate plastid of tertiary origin in order to understand the evolutionary processes that have shaped the organelle since it was acquired as a symbiont cell. To address this, the genome of the haptophyte-derived plastid in Karlodinium veneficum was analyzed by Sanger sequencing of library clones and 454 pyrosequencing of plastid enriched DNA fractions. The sequences were assembled into a single contig of 143 kb, encoding 70 proteins, 3 rRNAs and a nearly full set of tRNAs. Comparative genomics revealed massive rearrangements and gene losses compared to the haptophyte plastid; only a small fraction of the gene clusters usually found in haptophytes as well as other types of plastids are present in K. veneficum. Despite the reduced number of genes, the K. veneficum plastid genome has retained a large size due to expanded intergenic regions. Some of the plastid genes are highly diverged and may be pseudogenes or subject to RNA editing. Gene losses and rearrangements are also features of the genomes of the peridinin-containing plastids, apicomplexa and Chromera, suggesting that the evolutionary processes that once shaped these plastids have occurred at multiple independent occasions over the history of the Alveolata. PMID:21541332

  20. Genome Evolution of a Tertiary Dinoflagellate Plastid

    PubMed Central

    Espelund, Mari; Tooming-Klunderud, Ave; Patil, Vishwanath; Nederbragt, Alexander J.; Otis, Christian; Turmel, Monique; Shalchian-Tabrizi, Kamran; Lemieux, Claude; Jakobsen, Kjetill S.

    2011-01-01

    The dinoflagellates have repeatedly replaced their ancestral peridinin-plastid by plastids derived from a variety of algal lineages ranging from green algae to diatoms. Here, we have characterized the genome of a dinoflagellate plastid of tertiary origin in order to understand the evolutionary processes that have shaped the organelle since it was acquired as a symbiont cell. To address this, the genome of the haptophyte-derived plastid in Karlodinium veneficum was analyzed by Sanger sequencing of library clones and 454 pyrosequencing of plastid enriched DNA fractions. The sequences were assembled into a single contig of 143 kb, encoding 70 proteins, 3 rRNAs and a nearly full set of tRNAs. Comparative genomics revealed massive rearrangements and gene losses compared to the haptophyte plastid; only a small fraction of the gene clusters usually found in haptophytes as well as other types of plastids are present in K. veneficum. Despite the reduced number of genes, the K. veneficum plastid genome has retained a large size due to expanded intergenic regions. Some of the plastid genes are highly diverged and may be pseudogenes or subject to RNA editing. Gene losses and rearrangements are also features of the genomes of the peridinin-containing plastids, apicomplexa and Chromera, suggesting that the evolutionary processes that once shaped these plastids have occurred at multiple independent occasions over the history of the Alveolata. PMID:21541332

  1. Exceptionally high cumulative percentage of NUMTs originating from linear mitochondrial DNA molecules in the Hydra magnipapillata genome

    PubMed Central

    2013-01-01

    Background In contrast to most animal genomes, mitochondrial genomes in species belonging to the phylum Cnidaria show distinct variations in genome structure, including the mtDNA structure (linear or circular) and the presence or absence of introns in protein-coding genes. Therefore, the analysis of nuclear insertions of mitochondrial sequences (NUMTs) in cnidarians allows us to compare the NUMT content in animals with different mitochondrial genome structures. Results NUMT identification in the Hydra magnipapillata, Nematostella vectensis and Acropora digitifera genomes showed that the NUMT density in the H. magnipapillata genome clearly exceeds that in other two cnidarians with circular mitochondrial genomes. We found that H. magnipapillata is an exceptional ancestral metazoan with a high NUMT cumulative percentage but a large genome, and its mitochondrial genome linearisation might be responsible for the NUMT enrichment. We also detected the co-transposition of exonic and intronic fragments within NUMTs in N. vectensis and provided direct evidence that mitochondrial sequences can be transposed into the nuclear genome through DNA-mediated fragment transfer. In addition, NUMT expression analyses showed that NUMTs are co-expressed with adjacent protein-coding genes, suggesting the relevance of their biological function. Conclusions Taken together, our results provide valuable information for understanding the impact of mitochondrial genome structure on the interaction of mitochondrial molecules and nuclear genomes. PMID:23826818

  2. Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference

    PubMed Central

    Ni, Xumin; Yang, Xiong; Guo, Wei; Yuan, Kai; Zhou, Ying; Ma, Zhiming; Xu, Shuhua

    2016-01-01

    The length of ancestral tracks decays with the passing of generations which can be used to infer population admixture histories. Previous studies have shown the power in recovering the histories of admixed populations via the length distributions of ancestral tracks even under simple models. We believe that the deduction of length distributions under a general model will greatly elevate the power. Here we first deduced the length distributions under a general model and proposed general principles in parameter estimation and model selection with the deduced length distributions. Next, we focused on studying the length distributions and its applications under three typical special cases. Extensive simulations showed that the length distributions of ancestral tracks were well predicted by our theoretical framework. We further developed a new method, AdmixInfer, based on the length distributions and good performance was observed when it was applied to infer population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, good performance was also observed when applied to some real datasets of African Americans, Mexicans and South Asian populations from the HapMap project and the Human Genome Diversity Project. PMID:26818889

  3. Retroviral enhancer detection insertions in zebrafish combined with comparative genomics reveal genomic regulatory blocks - a fundamental feature of vertebrate genomes

    PubMed Central

    Kikuta, Hiroshi; Fredman, David; Rinkwitz, Silke; Lenhard, Boris; Becker, Thomas S

    2007-01-01

    A large-scale enhancer detection screen was performed in the zebrafish using a retroviral vector carrying a basal promoter and a fluorescent protein reporter cassette. Analysis of insertional hotspots uncovered areas around developmental regulatory genes in which an insertion results in the same global expression pattern, irrespective of exact position. These areas coincide with vertebrate chromosomal segments containing identical gene order; a phenomenon known as conserved synteny and thought to be a vestige of evolution. Genomic comparative studies have found large numbers of highly conserved noncoding elements (HCNEs) spanning these and other loci. HCNEs are thought to act as transcriptional enhancers based on the finding that many of those that have been tested direct tissue specific expression in transient or transgenic assays. Although gene order in hox and other gene clusters has long been known to be conserved because of shared regulatory sequences or overlapping transcriptional units, the chromosomal areas found through insertional hotspots contain only one or a few developmental regulatory genes as well as phylogenetically unrelated genes. We have termed these regions genomic regulatory blocks (GRBs), and show that they underlie the phenomenon of conserved synteny through all sequenced vertebrate genomes. After teleost whole genome duplication, a subset of GRBs were retained in two copies, underwent degenerative changes compared with tetrapod loci that exist as single copy, and that therefore can be viewed as representing the ancestral form. We discuss these findings in light of evolution of vertebrate chromosomal architecture and the identification of human disease mutations. PMID:18047696

  4. Bilingualism (Ancestral Language Maintenance) among Native American, Vietnamese American, and Hispanic American College Students.

    ERIC Educational Resources Information Center

    Wharry, Cheryl

    1993-01-01

    A survey of 21 Hispanic, 22 Native American, and 10 Vietnamese American college students found that adoption or maintenance of ancestral language was related to attitudes toward ancestral language, beliefs about parental attitudes, and integrative motivation (toward family and ancestral ethnic group). There were significant differences by gender…

  5. Amplification of an ancestral mammalian L1 family of long interspersed repeated DNA occurred just before the murine radiation

    SciTech Connect

    Pascale, E.; Valle, E.; Furano, A.V. )

    1990-12-01

    Each mammalian genus examined so far contains 50,000-100,000 members of an L1 (LINE 1) family of long interspersed repeated DNA elements. Current knowledge on the evolution of L1 families presents a paradox because, although L1 families have been in mammalian genomes since before the mammalian radiation {approximately}80 million years ago, most members of the L1 families are only a few million years old. Accordingly it has been suggested either that the extensive amplification that characterizes present-day L1 families did not occur in the past or that old members were removed as new one were generated. However, the authors show here that an ancestral rodent L1 family was extensively amplified {approximately}10 million years ago and that the relics of this amplification have persisted in modern murine genomes. This amplification occurred just before the divergence of modern murine genera from their common ancestor and identifies the murine node in the lineage of modern muroid rodents The results suggest that repeated amplification of L1 elements is a feature of the evaluation of mammalian genomes and that ancestral amplification events could provide a useful tool for determining mammalian lineages.

  6. Amplification of an ancestral mammalian L1 family of long interspersed repeated DNA occurred just before the murine radiation.

    PubMed Central

    Pascale, E; Valle, E; Furano, A V

    1990-01-01

    Each mammalian genus examined so far contains 50,000-100,000 members of an L1 (LINE 1) family of long interspersed repeated DNA elements. Current knowledge on the evolution of L1 families presents a paradox because, although L1 families have been in mammalian genomes since before the mammalian radiation approximately 80 million years ago, most members of the L1 families are only a few million years old. Accordingly it has been suggested either that the extensive amplification that characterizes present-day L1 families did not occur in the past or that old members were removed as new ones were generated. However, we show here that an ancestral rodent L1 family was extensively amplified approximately 10 million years ago and that the relics (approximately 60,000 copies) of this amplification have persisted in modern murine genomes (Old World rats and mice). This amplification occurred just before the divergence of modern murine genera from their common ancestor and identifies the murine node in the lineage of modern muroid rodents. Our results suggest that repeated amplification of L1 elements is a feature of the evolution of mammalian genomes and that ancestral amplification events could provide a useful tool for determining mammalian lineages. Images PMID:2251288

  7. Even modest prediction accuracy of genomic models can have large clinical utility

    PubMed Central

    Dhurandhar, Emily J.; Vazquez, Ana I.; Argyropoulos, George A.; Allison, David B.

    2014-01-01

    Whole Genome Prediction (WGP) jointly fits thousands of SNPs into a regression model to yield estimates for the contribution of markers to the overall variance of a particular trait, and for their associations with that trait. To date, WGP has offered only modest prediction accuracy, but in some cases even modest prediction accuracy may be useful. We provide an illustration of this using a theoretical simulation that used WGP to predict weight loss after bariatric surgery with moderate accuracy (R2 = 0.07) to assess the clinical utility of WGP despite these limitations. Prevention of Type 2 Diabetes (T2DM) post-surgery was considered the major outcome. Treating only patients above predefined threshold of predicted weight loss in our simulation, in the realistic context of finite resources for the surgery, significantly reduced lifetime risk of T2DM in the treatable population by selecting those most likely to succeed. Thus, our example illustrates how WGP may be clinically useful in some situations, and even with moderate accuracy, may provide a clear path for turning personalized medicine from theory to reality. PMID:25506355

  8. A general framework for association tests with multivariate traits in large-scale genomics studies.

    PubMed

    He, Qianchuan; Avery, Christy L; Lin, Dan-Yu

    2013-12-01

    Genetic association studies often collect data on multiple traits that are correlated. Discovery of genetic variants influencing multiple traits can lead to better understanding of the etiology of complex human diseases. Conventional univariate association tests may miss variants that have weak or moderate effects on individual traits. We propose several multivariate test statistics to complement univariate tests. Our framework covers both studies of unrelated individuals and family studies and allows any type/mixture of traits. We relate the marginal distributions of multivariate traits to genetic variants and covariates through generalized linear models without modeling the dependence among the traits or family members. We construct score-type statistics, which are computationally fast and numerically stable even in the presence of covariates and which can be combined efficiently across studies with different designs and arbitrary patterns of missing data. We compare the power of the test statistics both theoretically and empirically. We provide a strategy to determine genome-wide significance that properly accounts for the linkage disequilibrium (LD) of genetic variants. The application of the new methods to the meta-analysis of five major cardiovascular cohort studies identifies a new locus (HSCB) that is pleiotropic for the four traits analyzed. PMID:24227293

  9. High proportion of large genomic deletions and a genotype–phenotype update in 80 unrelated families with juvenile polyposis syndrome

    PubMed Central

    Aretz, S; Stienen, D; Uhlhaas, S; Stolte, M; Entius, M M; Loff, S; Back, W; Kaufmann, A; Keller, K‐M; Blaas, S H; Siebert, R; Vogt, S; Spranger, S; Holinski‐Feder, E; Sunde, L; Propping, P; Friedl, W

    2007-01-01

    Background In patients with juvenile polyposis syndrome (JPS) the frequency of large genomic deletions in the SMAD4 and BMPR1A genes was unknown. Methods Mutation and phenotype analysis was used in 80 unrelated patients of whom 65 met the clinical criteria for JPS (typical JPS) and 15 were suspected to have JPS. Results By direct sequencing of the two genes, point mutations were identified in 30 patients (46% of typical JPS). Using MLPA, large genomic deletions were found in 14% of all patients with typical JPS (six deletions in SMAD4 and three deletions in BMPR1A). Mutation analysis of the PTEN gene in the remaining 41 mutation negative cases uncovered a point mutation in two patients (5%). SMAD4 mutation carriers had a significantly higher frequency of gastric polyposis (73%) than did patients with BMPR1A mutations (8%) (p<0.001); all seven cases of gastric cancer occurred in families with SMAD4 mutations. SMAD4 mutation carriers with gastric polyps were significantly older at gastroscopy than those without (p<0.001). In 22% of the 23 unrelated SMAD4 mutation carriers, hereditary hemorrhagic telangiectasia (HHT) was also diagnosed clinically. The documented histologic findings encompassed a wide distribution of different polyp types, comparable with that described in hereditary mixed polyposis syndromes (HMPS). Conclusions Screening for large deletions raised the mutation detection rate to 60% in the 65 patients with typical JPS. A strong genotype‐phenotype correlation for gastric polyposis, gastric cancer, and HHT was identified, which should have implications for counselling and surveillance. Histopathological results in hamartomatous polyposis syndromes must be critically interpreted. PMID:17873119

  10. The Historical Speciation of Mauremys Sensu Lato: Ancestral Area Reconstruction and Interspecific Gene Flow Level Assessment Provide New Insights

    PubMed Central

    Zhou, Huaxing; Jiang, Yuan; Nie, Liuwang; Yin, Huazong; Li, Haifeng; Dong, Xianmei; Zhao, Feifei; Zhang, Huanhuan; Pu, Youguang; Huang, Zhenfeng; Song, Jiaolian; Sun, Entao

    2015-01-01

    Mauremys sensu lato was divided into Mauremys, Chinemys, Ocadia, and Annamemys based on earlier research on morphology. Phylogenetic research on this group has been controversial because of disagreements regarding taxonomy, and the historical speciation is still poorly understood. In this study, 32 individuals of eight species that are widely distributed in Eurasia were collected. The complete mitochondrial (mt) sequences of 14 individuals of eight species were sequenced. Phylogenetic relationships, interspecific divergence times, and ancestral area reconstructions were explored using mt genome data (10,854 bp). Subsequent interspecific gene flow level assessment was performed using five unlinked polymorphic microsatellite loci. The Bayesian and maximum likelihood analyses revealed a paraphyletic relationship among four old genera (Mauremys, Annamemys, Chinemys, and Ocadia) and suggested the four old genera should be merged into the genus (Mauremys). Ancestral area reconstruction and divergence time estimation suggested Southeast Asia may be the area of origin for the common ancestral species of this genus and genetic drift may have played a decisive role in species divergence due to the isolated event of a glacial age. However, M. japonica may have been speciated due to the creation of the island of Japan. The detection of extensive gene flow suggested no vicariance occurred between Asia and Southeast Asia. Inconsistent results between gene flow assessment and phylogenetic analysis revealed the hybrid origin of M. mutica (Southeast Asian). Here ancestral area reconstruction and interspecific gene flow level assessment were first used to explore species origins and evolution of Mauremys sensu lato, which provided new insights on this genus. PMID:26657158

  11. An ancestral bacterial division system is widespread in eukaryotic mitochondria.

    PubMed

    Leger, Michelle M; Petrů, Markéta; Žárský, Vojtěch; Eme, Laura; Vlček, Čestmír; Harding, Tommy; Lang, B Franz; Eliáš, Marek; Doležal, Pavel; Roger, Andrew J

    2015-08-18

    Bacterial division initiates at the site of a contractile Z-ring composed of polymerized FtsZ. The location of the Z-ring in the cell is controlled by a system of three mutually antagonistic proteins, MinC, MinD, and MinE. Plastid division is also known to be dependent on homologs of these proteins, derived from the ancestral cyanobacterial endosymbiont that gave rise to plastids. In contrast, the mitochondria of model systems such as Saccharomyces cerevisiae, mammals, and Arabidopsis thaliana seem to have replaced the ancestral α-proteobacterial Min-based division machinery with host-derived dynamin-related proteins that form outer contractile rings. Here, we show that the mitochondrial division system of these model organisms is the exception, rather than the rule, for eukaryotes. We describe endosymbiont-derived, bacterial-like division systems comprising FtsZ and Min proteins in diverse less-studied eukaryote protistan lineages, including jakobid and heterolobosean excavates, a malawimonad, stramenopiles, amoebozoans, a breviate, and an apusomonad. For two of these taxa, the amoebozoan Dictyostelium purpureum and the jakobid Andalucia incarcerata, we confirm a mitochondrial localization of these proteins by their heterologous expression in Saccharomyces cerevisiae. The discovery of a proteobacterial-like division system in mitochondria of diverse eukaryotic lineages suggests that it was the ancestral feature of all eukaryotic mitochondria and has been supplanted by a host-derived system multiple times in distinct eukaryote lineages. PMID:25831547

  12. Infant and juvenile growth in ancestral Pueblo Indians.

    PubMed

    Schillaci, Michael A; Nikitovic, Dejana; Akins, Nancy J; Tripp, Lianne; Palkovich, Ann M

    2011-06-01

    The present study examines patterns of infant and juvenile growth in a diachronic sample of ancestral Pueblo Indians (AD 1300-1680) from the American Southwest. An assessment of growth patterns is accompanied by an evaluation of pathological conditions often considered to be indicators of nutritional deficiencies and/or gastrointestinal infections. Growth patterns and the distribution of pathological conditions are interpreted relative to culturally relevant age categories defined by Puebloan rites of passage described in the ethnographic literature. A visual comparison of growth distance curves revealed that relative to a modern comparative group our sample of ancestral Pueblo infant and juveniles exhibited faltering growth beginning soon after birth to about 5 years of age. A comparison of curves describing growth relative to adult femoral length, however, indicated reduced growth occurring later, by around 2 years of age. Similar to previous studies, we observed a high proportion of nonsurvivors exhibiting porotic cranial lesions during the first 2 years of life. Contrary to expectations, infants and juveniles without evidence of porotic cranial lesions exhibited a higher degree of stunting. Our study is generally consistent with previous research reporting poor health and high mortality for ancestral Pueblo Indian infants and juveniles. Through use of a culturally relevant context defining childhood, we argue that the observed poor health and high mortality in our sample occur before the important transition from young to older child and the concomitant initial incorporation into tribal ritual organization. PMID:21469079

  13. An ancestral bacterial division system is widespread in eukaryotic mitochondria

    PubMed Central

    Leger, Michelle M.; Petrů, Markéta; Žárský, Vojtěch; Eme, Laura; Vlček, Čestmír; Harding, Tommy; Lang, B. Franz; Eliáš, Marek; Doležal, Pavel; Roger, Andrew J.

    2015-01-01

    Bacterial division initiates at the site of a contractile Z-ring composed of polymerized FtsZ. The location of the Z-ring in the cell is controlled by a system of three mutually antagonistic proteins, MinC, MinD, and MinE. Plastid division is also known to be dependent on homologs of these proteins, derived from the ancestral cyanobacterial endosymbiont that gave rise to plastids. In contrast, the mitochondria of model systems such as Saccharomyces cerevisiae, mammals, and Arabidopsis thaliana seem to have replaced the ancestral α-proteobacterial Min-based division machinery with host-derived dynamin-related proteins that form outer contractile rings. Here, we show that the mitochondrial division system of these model organisms is the exception, rather than the rule, for eukaryotes. We describe endosymbiont-derived, bacterial-like division systems comprising FtsZ and Min proteins in diverse less-studied eukaryote protistan lineages, including jakobid and heterolobosean excavates, a malawimonad, stramenopiles, amoebozoans, a breviate, and an apusomonad. For two of these taxa, the amoebozoan Dictyostelium purpureum and the jakobid Andalucia incarcerata, we confirm a mitochondrial localization of these proteins by their heterologous expression in Saccharomyces cerevisiae. The discovery of a proteobacterial-like division system in mitochondria of diverse eukaryotic lineages suggests that it was the ancestral feature of all eukaryotic mitochondria and has been supplanted by a host-derived system multiple times in distinct eukaryote lineages. PMID:25831547

  14. Ancestral facial morphology of Old World higher primates.

    PubMed Central

    Benefit, B R; McCrossin, M L

    1991-01-01

    Fossil remains of the cercopithecoid Victoria-pithecus recently recovered from middle Miocene deposits of Maboko Island (Kenya) provide evidence of the cranial anatomy of Old World monkeys prior to the evolutionary divergence of the extant subfamilies Colobinae and Cercopithecinae. Victoria-pithecus shares a suite of craniofacial features with the Oligocene catarrhine Aegyptopithecus and early Miocene hominoid Afropithecus. All three genera manifest supraorbital costae, anteriorly convergent temporal lines, the absence of a postglabellar fossa, a moderate to long snout, great facial height below the orbits, a deep cheek region, and anteriorly tapering premaxilla. The shared presence of these features in a catarrhine generally ancestral to apes and Old World monkeys, an early ape, and an early Old World monkey indicates that they are primitive characteristics that typified the last common ancestor of Hominoidea and Cercopithecoidea. These results contradict prevailing cranial morphotype reconstructions for ancestral catarrhines as Colobus- or Hylobates-like, characterized by a globular anterior braincase and orthognathy. By resolving several equivocal craniofacial morphocline polarities, these discoveries lay the foundation for a revised interpretation of the ancestral cranial morphology of Catarrhini more consistent with neontological and existing paleontological evidence. Images PMID:2052606

  15. Dynamic evolution of Rht-1 homologous regions in grass genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bread wheat contains A, B, and D subgenomes with its well characterized ancestral genomes that exist at the diploid and tetraploid levels. Therefore, the wheat genome system acts as a model specie for studying genome evolutionary dynamics. Here, we performed intra- and inter-species comparative ana...

  16. Analysis of FOXO1 mutations in diffuse large B-cell lymphoma | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) accounts for 30% to 40% of newly diagnosed lymphomas and has an overall cure rate of approximately 60%. Previously, we observed FOXO1 mutations in non-Hodgkin lymphoma patient samples. To explore the effects of FOXO1 mutations, we assessed FOXO1 status in 279 DLBCL patient samples and 22 DLBCL-derived cell lines.

  17. Genomic characterization of a large panel of patient-derived hepatocellular carcinoma xenograft tumor models for preclinical development.

    PubMed

    Gu, Qingyang; Zhang, Bin; Sun, Hongye; Xu, Qiang; Tan, Yexiong; Wang, Guan; Luo, Qin; Xu, Weiguo; Yang, Shuqun; Li, Jian; Fu, Jing; Chen, Lei; Yuan, Shengxian; Liang, Guibai; Ji, Qunsheng; Chen, Shu-Hui; Chan, Chi-Chung; Zhou, Weiping; Xu, Xiaowei; Wang, Hongyang; Fang, Douglas D

    2015-08-21

    Lack of clinically relevant tumor models dramatically hampers development of effective therapies for hepatocellular carcinoma (HCC). Establishment of patient-derived xenograft (PDX) models that faithfully recapitulate the genetic and phenotypic features of HCC becomes important. In this study, we first established a cohort of 65 stable PDX models of HCC from corresponding Chinese patients. Then we showed that the histology and gene expression patterns of PDX models were highly consistent between xenografts and case-matched original tumors. Genetic alterations, including mutations and DNA copy number alterations (CNAs), of the xenografts correlated well with the published data of HCC patient specimens. Furthermore, differential responses to sorafenib, the standard-of-care agent, in randomly chosen xenografts were unveiled. Finally, in the models expressing high levels of FGFR1 gene according to the genomic data, FGFR1 inhibitor lenvatinib showed greater efficacy than sorafenib. Taken together, our data indicate that PDX models resemble histopathological and genomic characteristics of clinical HCC tumors, as well as recapitulate the differential responses of HCC patients to the standard-of-care treatment. Overall, this large collection of PDX models becomes a clinically relevant platform for drug screening, biomarker discovery and translational research in preclinical setting. PMID:26062443

  18. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare.

    PubMed

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome. PMID:26361137

  19. Mining the genome for susceptibility to diabetic nephropathy: the role of large-scale studies and consortia.

    PubMed

    Iyengar, Sudha K; Freedman, Barry I; Sedor, John R

    2007-03-01

    Approximately 30% of individuals with type 1 and type 2 diabetes develop persistent albuminuria, lose renal function, and are at increased risk for cardiovascular and other microvascular complications. Diabetes and kidney diseases rank within the top 10 causes of death in Westernized countries and cause significant morbidity. Given these observations, genetic, genomic, and proteomic investigations have been initiated to better define basic mechanisms for disease initiation and progression, to identify individuals at risk for diabetic complications, and to develop more efficacious therapies. In this review we have focused on linkage analyses of candidate genes or chromosomal regions, or coarse genome-wide scans, which have mapped either categorical (chronic kidney disease or end-stage renal disease) or quantitative kidney traits (albuminuria/proteinuria or glomerular filtration rate). Most loci identified to date have not been replicated, however, several linked chromosomal regions are concordant between independent samples, suggesting the presence of a diabetic nephropathy gene. Two genes, carnosinase (CNDP1) on 18q, and engulfment and cell motility 1 (ELMO1) on 7p14, have been identified as diabetic nephropathy susceptibility genes, but these results require authentication. The availability of patient data sets with large sample sizes, improvements in informatics, genotyping technology, and statistical methodologies should accelerate the discovery of valid diabetic nephropathy susceptibility genes. PMID:17418689

  20. Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare

    PubMed Central

    Doublet, Vincent; Ubrig, Elodie; Alioua, Abdelmalek; Bouchon, Didier; Marcadé, Isabelle; Maréchal-Drouard, Laurence

    2015-01-01

    A faithful expression of the mitochondrial DNA is crucial for cell survival. Animal mitochondrial DNA (mtDNA) presents a highly compact gene organization. The typical 16.5 kbp animal mtDNA encodes 13 proteins, 2 rRNAs and 22 tRNAs. In the backyard pillbug Armadillidium vulgare, the rather small 13.9 kbp mtDNA encodes the same set of proteins and rRNAs as compared to animal kingdom mtDNA, but seems to harbor an incomplete set of tRNA genes. Here, we first confirm the expression of 13 tRNA genes in this mtDNA. Then we show the extensive repair of a truncated tRNA, the expression of tRNA involved in large gene overlaps and of tRNA genes partially or fully integrated within protein-coding genes in either direct or opposite orientation. Under selective pressure, overlaps between genes have been likely favored for strong genome size reduction. Our study underlines the existence of unknown biochemical mechanisms for the complete gene expression of A. vulgare mtDNA, and of co-evolutionary processes to keep overlapping genes functional in a compacted mitochondrial genome. PMID:26361137

  1. Genomic organization and reproductive regulation of a large lipid transfer protein in the varroa mite, Varroa destructor (Anderson & Trueman).

    PubMed

    Cabrera, A R; Shirk, P D; Duehl, A J; Donohue, K V; Grozinger, C M; Evans, J D; Teal, P E A

    2013-10-01

    The complete genomic region and corresponding transcript of the most abundant protein in phoretic varroa mites, Varroa destructor (Anderson & Trueman), were sequenced and have homology with acarine hemelipoglycoproteins and the large lipid transfer protein (LLTP) super family. The genomic sequence of VdLLTP included 14 introns and the mature transcript coded for a predicted polypeptide of 1575 amino acid residues. VdLLTP shared a minimum of 25% sequence identity with acarine LLTPs. Phylogenetic assessment showed VdLLTP was most closely related to Metaseiulus occidentalis vitellogenin and LLTP proteins of ticks; however, no heme binding by VdLLTP was detected. Analysis of lipids associated with VdLLTP showed that it was a carrier for free and esterified C12 -C22 fatty acids from triglycerides, diacylglycerides and monoacylglycerides. Additionally, cholesterol and β-sitosterol were found as cholesterol esters linked to common fatty acids. Transcript levels of VdLLTP were 42 and 310 times higher in phoretic female mites when compared with males and quiescent deutonymphs, respectively. Coincident with initiation of the reproductive phase, VdLLTP transcript levels declined to a third of those in phoretic female mites. VdLLTP functions as an important lipid transporter and should provide a significant RNA interference target for assessing the control of varroa mites. PMID:23834736

  2. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.

    PubMed

    Ye, Chengxi; Hill, Christopher M; Wu, Shigang; Ruan, Jue; Ma, Zhanshan Sam

    2016-01-01

    The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. PMID:27573208

  3. DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

    PubMed Central

    Ye, Chengxi; Hill, Christopher M.; Wu, Shigang; Ruan, Jue; Ma, Zhanshan (Sam)

    2016-01-01

    The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost. PMID:27573208

  4. Large genomic fragment deletion and functional gene cassette knock-in via Cas9 protein mediated genome editing in one-cell rodent embryos

    PubMed Central

    Wang, Liren; Shao, Yanjiao; Guan, Yuting; Li, Liang; Wu, Lijuan; Chen, Fangrui; Liu, Meizhen; Chen, Huaqing; Ma, Yanlin; Ma, Xueyun; Liu, Mingyao; Li, Dali

    2015-01-01

    The CRISPR-Cas RNA-guided system has versatile uses in many organisms and allows modification of multiple target sites simultaneously. Generating novel genetically modified mouse and rat models is one valuable application of this system. Through the injection of Cas9 protein instead of mRNA into embryos, we observed fewer off-target effects of Cas9 and increased point mutation knock-in efficiency. Large genomic DNA fragment (up to 95 kb) deletion mice were generated for in vivo study of lncRNAs and gene clusters. Site-specific insertion of a 2.7 kb CreERT2 cassette into the mouse Nfatc1 locus allowed labeling and tracing of hair follicle stem cells. In addition, we combined the Cre-Loxp system with a gene-trap strategy to insert a GFP reporter in the reverse orientation into the rat Lgr5 locus, which was later inverted by Cre-mediated recombination, yielding a conditional knockout/reporter strategy suitable for mosaic mutation analysis. PMID:26620761

  5. Large genomic fragment deletion and functional gene cassette knock-in via Cas9 protein mediated genome editing in one-cell rodent embryos.

    PubMed

    Wang, Liren; Shao, Yanjiao; Guan, Yuting; Li, Liang; Wu, Lijuan; Chen, Fangrui; Liu, Meizhen; Chen, Huaqing; Ma, Yanlin; Ma, Xueyun; Liu, Mingyao; Li, Dali

    2015-01-01

    The CRISPR-Cas RNA-guided system has versatile uses in many organisms and allows modification of multiple target sites simultaneously. Generating novel genetically modified mouse and rat models is one valuable application of this system. Through the injection of Cas9 protein instead of mRNA into embryos, we observed fewer off-target effects of Cas9 and increased point mutation knock-in efficiency. Large genomic DNA fragment (up to 95 kb) deletion mice were generated for in vivo study of lncRNAs and gene clusters. Site-specific insertion of a 2.7 kb CreERT2 cassette into the mouse Nfatc1 locus allowed labeling and tracing of hair follicle stem cells. In addition, we combined the Cre-Loxp system with a gene-trap strategy to insert a GFP reporter in the reverse orientation into the rat Lgr5 locus, which was later inverted by Cre-mediated recombination, yielding a conditional knockout/reporter strategy suitable for mosaic mutation analysis. PMID:26620761

  6. Sequencing-based large-scale genomics approaches with small numbers of isolated maize meiocytes

    PubMed Central

    Dukowic-Schulze, Stefanie; Sundararajan, Anitha; Ramaraj, Thiruvarangan; Mudge, Joann; Chen, Changbin

    2014-01-01

    High-throughput sequencing has become the large-scale approach of choice to study global gene expression and the distribution of specific chromatin marks and features. However, the limited availability of large amounts of purified cells made it very challenging to apply sequencing-based techniques in plant meiosis research in the past. In this paper, we describe a method to isolate meiocytes from maize anthers and detailed protocols to successfully perform RNA-seq, smRNA-seq, H3K4me3-ChIP-seq, and DNA bisulfite conversion sequencing with 5000–30,000 isolated maize male meiotic cells. These methods can be adjusted for other flowering plant species as well. PMID:24611068

  7. A new way to protect privacy in large-scale genome-wide association studies

    PubMed Central

    Kamm, Liina; Bogdanov, Dan; Laur, Sven; Vilo, Jaak

    2013-01-01

    Motivation: Increased availability of various genotyping techniques has initiated a race for finding genetic markers that can be used in diagnostics and personalized medicine. Although many genetic risk factors are known, key causes of common diseases with complex heritage patterns are still unknown. Identification of such complex traits requires a targeted study over a large collection of data. Ideally, such studies bring together data from many biobanks. However, data aggregation on such a large scale raises many privacy issues. Results: We show how to conduct such studies without violating privacy of individual donors and without leaking the data to third parties. The presented solution has provable security guarantees. Contact: jaak.vilo@ut.ee Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23413435

  8. Excavating the Genome: Large Scale Mutagenesis Screening for the Discovery of New Mouse Models

    PubMed Central

    Sundberg, John P.; Dadras, Soheil S.; Silva, Kathleen A.; Kennedy, Victoria E.; Murray, Stephen A.; Denegre, James; Schofield, Paul N.; King, Lloyd E.; Wiles, Michael; Pratt, C. Herbert

    2016-01-01

    Technology now exists for rapid screening of mutated laboratory mice to identify phenotypes associated with specific genetic mutations. Large repositories exist for spontaneous mutants and those induced by chemical mutagenesis, many of which have never been studied or comprehensively evaluated. To supplement these resources, a variety of techniques have been consolidated in an international effort to create mutations in all known protein coding genes in the mouse. With targeted embryonic stem cell lines now available for almost all protein coding genes and more recently CRISPR/Cas9 technology, large-scale efforts are underway to create novel mutant mouse strains and to characterize their phenotypes. However, accurate diagnosis of skin, hair, and nail diseases still relies on careful gross and histological analysis. While not automated to the level of the physiological phenotyping, histopathology provides the most direct and accurate diagnosis and correlation with human diseases. As a result of these efforts, many new mouse dermatological disease models are being developed. PMID:26551941

  9. Integrating large-scale functional genomics data to dissect metabolic networks for hydrogen production

    SciTech Connect

    Harwood, Caroline S

    2012-12-17

    The goal of this project is to identify gene networks that are critical for efficient biohydrogen production by leveraging variation in gene content and gene expression in independently isolated Rhodopseudomonas palustris strains. Coexpression methods were applied to large data sets that we have collected to define probabilistic causal gene networks. To our knowledge this a first systems level approach that takes advantage of strain-to strain variability to computationally define networks critical for a particular bacterial phenotypic trait.

  10. Genome-Wide Association Study of Event-Free Survival in Diffuse Large B-Cell Lymphoma Treated With Immunochemotherapy

    PubMed Central

    Ghesquieres, Hervé; Slager, Susan L.; Jardin, Fabrice; Veron, Amelie S.; Asmann, Yan W.; Maurer, Matthew J.; Fest, Thierry; Habermann, Thomas M.; Bene, Marie C.; Novak, Anne J.; Mareschal, Sylvain; Haioun, Corinne; Lamy, Thierry; Ansell, Stephen M.; Tilly, Herve; Witzig, Thomas E.; Weiner, George J.; Feldman, Andrew L.; Dogan, Ahmet; Cunningham, Julie M.; Olswold, Curtis L.; Molina, Thierry Jo; Link, Brian K.; Milpied, Noel; Cox, David G.; Salles, Gilles A.; Cerhan, James R.

    2015-01-01

    Purpose We performed a multistage genome-wide association study to identify inherited genetic variants that predict outcome in diffuse large B-cell lymphoma patients treated with immunochemotherapy. Methods We conducted a meta-analysis of two genome-wide association study data sets, one from the LNH2003B trial (N = 540), a prospective clinical trial from the Lymphoma Study Association, and the other from the Molecular Epidemiology Resource study (N = 312), a prospective observational study from the University of Iowa–Mayo Clinic Lymphoma Specialized Program of Research Excellence. Top single nucleotide polymorphisms were then genotyped in independent cohorts of patients from the Specialized Program of Research Excellence (N = 391) and the Groupe Ouest-Est des Leucémies Aiguës et Maladies du Sang (GOELAMS) -075 randomized trial (N = 294). We calculated the hazard ratios (HRs) and 95% CIs for event-free survival (EFS) and overall survival (OS) using a log-additive genetic model with adjustment for age, sex, and age-adjusted International Prognostic Index. Results In a meta-analysis of the four studies, the top loci for EFS were marked by rs7712513 at 5q23.2 (near SNX2 and SNCAIP; HR, 1.39; 95% CI, 1.23 to 1.57; P = 2.08 × 10−7), and rs7765004 at 6q21 (near MARCKS and HDAC2; HR, 1.38; 95% CI, 1.22 to 1.57; P = 7.09 × 10−7), although they did not reach conventional genome-wide significance (P = 5 × 10−8). Both rs7712513 (HR, 1.49; 95% CI, 1.29 to 1.72; P = 3.53 × 10−8) and rs7765004 (HR, 1.47; 95% CI, 1.27 to 1.71; P = 5.36 × 10−7) were also associated with OS. In exploratory analyses, a two–single nucleotide polymorphism risk score was highly predictive of EFS (P = 1.78 × 10−12) and was independent of treatment, IPI, and cell-of-origin classification. Conclusion Our study provides encouraging evidence for associations between loci at 5q23.2 and 6q21 with EFS and OS in patients with diffuse large B-cell lymphoma treated with immunochemotherapy

  11. Intrastrand annealing leads to the formation of a large DNA palindrome and determines the boundaries of genomic amplification in human cancer.

    PubMed

    Tanaka, Hisashi; Cao, Yi; Bergstrom, Donald A; Kooperberg, Charles; Tapscott, Stephen J; Yao, Meng-Chao

    2007-03-01

    Amplification of large chromosomal regions (gene amplification) is a common somatic alteration in human cancer cells and often is associated with advanced disease. A critical event initiating gene amplification is a DNA double-strand break (DSB), which is immediately followed by the formation of a large DNA palindrome. Large DNA palindromes are frequent and nonrandomly distributed in the genomes of cancer cells and facilitate a further increase in copy number. Although the importance of the formation of large DNA palindromes as a very early event in gene amplification is widely recognized, it is not known how a DSB is resolved to form a large DNA palindrome and whether any local DNA structure determines the location of large DNA palindromes. We show here that intrastrand annealing following a DNA double-strand break leads to the formation of large DNA palindromes and that DNA inverted repeats in the genome determine the efficiency of this event. Furthermore, in human Colo320DM cancer cells, a DNA inverted repeat in the genome marks the border between amplified and nonamplified DNA. Therefore, an early step of gene amplification is a regulated process that is facilitated by DNA inverted repeats in the genome. PMID:17242211

  12. deBWT: parallel construction of Burrows–Wheeler Transform for large collection of genomes with de Bruijn-branch encoding

    PubMed Central

    Liu, Bo; Zhu, Dixian; Wang, Yadong

    2016-01-01

    Motivation: With the development of high-throughput sequencing, the number of assembled genomes continues to rise. It is critical to well organize and index many assembled genomes to promote future genomics studies. Burrows–Wheeler Transform (BWT) is an important data structure of genome indexing, which has many fundamental applications; however, it is still non-trivial to construct BWT for large collection of genomes, especially for highly similar or repetitive genomes. Moreover, the state-of-the-art approaches cannot well support scalable parallel computing owing to their incremental nature, which is a bottleneck to use modern computers to accelerate BWT construction. Results: We propose de Bruijn branch-based BWT constructor (deBWT), a novel parallel BWT construction approach. DeBWT innovatively represents and organizes the suffixes of input sequence with a novel data structure, de Bruijn branch encoding. This data structure takes the advantage of de Bruijn graph to facilitate the comparison between the suffixes with long common prefix, which breaks the bottleneck of the BWT construction of repetitive genomic sequences. Meanwhile, deBWT also uses the structure of de Bruijn graph for reducing unnecessary comparisons between suffixes. The benchmarking suggests that, deBWT is efficient and scalable to construct BWT for large dataset by parallel computing. It is well-suited to index many genomes, such as a collection of individual human genomes, with multiple-core servers or clusters. Availability and implementation: deBWT is implemented in C language, the source code is available at https://github.com/hitbc/deBWT or https://github.com/DixianZhu/deBWT Contact: ydwang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307614

  13. Excavating the Genome: Large-Scale Mutagenesis Screening for the Discovery of New Mouse Models.

    PubMed

    Sundberg, John P; Dadras, Soheil S; Silva, Kathleen A; Kennedy, Victoria E; Murray, Stephen A; Denegre, James M; Schofield, Paul N; King, Lloyd E; Wiles, Michael V; Pratt, C Herbert

    2015-11-01

    Technology now exists for rapid screening of mutated laboratory mice to identify phenotypes associated with specific genetic mutations. Large repositories exist for spontaneous mutants and those induced by chemical mutagenesis, many of which have never been fully studied or comprehensively evaluated. To supplement these resources, a variety of techniques have been consolidated in an international effort to create mutations in all known protein coding genes in the mouse. With targeted embryonic stem cell lines now available for almost all protein coding genes and more recently CRISPR/Cas9 technology, large-scale efforts are underway to create further novel mutant mouse strains and to characterize their phenotypes. However, accurate diagnosis of skin, hair, and nail diseases still relies on careful gross and histological analysis, and while not automated to the level of the physiological phenotyping, histopathology still provides the most direct and accurate diagnosis and correlation with human diseases. As a result of these efforts, many new mouse dermatological disease models are being characterized and developed. PMID:26551941

  14. The Mitochondrial Genome of an Aquatic Plant, Spirodela polyrhiza

    PubMed Central

    Wang, Wenqin; Wu, Yongrui; Messing, Joachim

    2012-01-01

    Background Spirodela polyrhiza is a species of the order Alismatales, which represent the basal lineage of monocots with more ancestral features than the Poales. Its complete sequence of the mitochondrial (mt) genome could provide clues for the understanding of the evolution of mt genomes in plant. Methods Spirodela polyrhiza mt genome was sequenced from total genomic DNA without physical separation of chloroplast and nuclear DNA using the SOLiD platform. Using a genome copy number sensitive assembly algorithm, the mt genome was successfully assembled. Gap closure and accuracy was determined with PCR products sequenced with the dideoxy method. Conclusions This is the most compact monocot mitochondrial genome with 228,493 bp. A total of 57 genes encode 35 known proteins, 3 ribosomal RNAs, and 19 tRNAs that recognize 15 amino acids. There are about 600 RNA editing sites predicted and three lineage specific protein-coding-gene losses. The mitochondrial genes, pseudogenes, and other hypothetical genes (ORFs) cover 71,783 bp (31.0%) of the genome. Imported plastid DNA accounts for an additional 9,295 bp (4.1%) of the mitochondrial DNA. Absence of transposable element sequences suggests that very few nuclear sequences have migrated into Spirodela mtDNA. Phylogenetic analysis of conserved protein-coding genes suggests that Spirodela shares the common ancestor with other monocots, but there is no obvious synteny between Spirodela and rice mtDNAs. After eliminating genes, introns, ORFs, and plastid-derived DNA, nearly four-fifths of the Spirodela mitochondrial genome is of unknown origin and function. Although it contains a similar chloroplast DNA content and range of RNA editing as other monocots, it is void of nuclear insertions, active gene loss, and comprises large regions of sequences of unknown origin in non-coding regions. Moreover, the lack of synteny with known mitochondrial genomic sequences shed new light on the early evolution of monocot mitochondrial genomes

  15. Extensive Chromosomal Reorganization in the Evolution of New World Muroid Rodents (Cricetidae, Sigmodontinae): Searching for Ancestral Phylogenetic Traits

    PubMed Central

    Pereira, Adenilson Leão; Malcher, Stella Miranda; Nagamachi, Cleusa Yoshiko; O’Brien, Patricia Caroline Mary; Ferguson-Smith, Malcolm Andrew; Mendes-Oliveira, Ana Cristina; Pieczarka, Julio Cesar

    2016-01-01

    Sigmodontinae rodents show great diversity and complexity in morphology and ecology. This diversity is accompanied by extensive chromosome variation challenging attempts to reconstruct their ancestral genome. The species Hylaeamys megacephalus–HME (Oryzomyini, 2n = 54), Necromys lasiurus—NLA (Akodontini, 2n = 34) and Akodon sp.–ASP (Akodontini, 2n = 10) have extreme diploid numbers that make it difficult to understand the rearrangements that are responsible for such differences. In this study we analyzed these changes using whole chromosome probes of HME in cross-species painting of NLA and ASP to construct chromosome homology maps that reveal the rearrangements between species. We include data from the literature for other Sigmodontinae previously studied with probes from HME and Mus musculus (MMU) probes. We also use the HME probes on MMU chromosomes for the comparative analysis of NLA with other species already mapped by MMU probes. Our results show that NLA and ASP have highly rearranged karyotypes when compared to HME. Eleven HME syntenic blocks are shared among the species studied here. Four syntenies may be ancestral to Akodontini (HME2/18, 3/25, 18/25 and 4/11/16) and eight to Sigmodontinae (HME26, 1/12, 6/21, 7/9, 5/17, 11/16, 20/13 and 19/14/19). Using MMU data we identified six associations shared among rodents from seven subfamilies, where MMU3/18 and MMU8/13 are phylogenetic signatures of Sigmodontinae. We suggest that the associations MMU2entire, MMU6proximal/12entire, MMU3/18, MMU8/13, MMU1/17, MMU10/17, MMU12/17, MMU5/16, MMU5/6 and MMU7/19 are part of the ancestral Sigmodontinae genome. PMID:26800516

  16. Human Genetic Ancestral Composition Correlates with the Origin of Mycobacterium leprae Strains in a Leprosy Endemic Population

    PubMed Central

    Cardona-Castro, Nora; Cortés, Edwin; Beltrán, Camilo; Romero, Marcela; Badel-Mogollón, Jaime E.; Bedoya, Gabriel

    2015-01-01

    Recent reports have suggested that leprosy originated in Africa, extended to Asia and Europe, and arrived in the Americas during European colonization and the African slave trade. Due to colonization, the contemporary Colombian population is an admixture of Native-American, European and African ancestries. Because microorganisms are known to accompany humans during migrations, patterns of human migration can be traced by examining genomic changes in associated microbes. The current study analyzed 118 leprosy cases and 116 unrelated controls from two Colombian regions endemic for leprosy (Atlantic and Andean) in order to determine possible associations of leprosy with patient ancestral background (determined using 36 ancestry informative markers), Mycobacterium leprae genotype and/or patient geographical origin. We found significant differences between ancestral genetic composition. European components were predominant in Andean populations. In contrast, African components were higher in the Atlantic region. M. leprae genotypes were then analyzed for cluster associations and compared with the ancestral composition of leprosy patients. Two M. leprae principal clusters were found: haplotypes C54 and T45. Haplotype C54 associated with African origin and was more frequent in patients from the Atlantic region with a high African component. In contrast, haplotype T45 associated with European origin and was more frequent in Andean patients with a higher European component. These results suggest that the human and M. leprae genomes have co-existed since the African and European origins of the disease, with leprosy ultimately arriving in Colombia during colonization. Distinct M. leprae strains followed European and African settlement in the country and can be detected in contemporary Colombian populations. PMID:26360617

  17. Extensive Chromosomal Reorganization in the Evolution of New World Muroid Rodents (Cricetidae, Sigmodontinae): Searching for Ancestral Phylogenetic Traits.

    PubMed

    Pereira, Adenilson Leão; Malcher, Stella Miranda; Nagamachi, Cleusa Yoshiko; O'Brien, Patricia Caroline Mary; Ferguson-Smith, Malcolm Andrew; Mendes-Oliveira, Ana Cristina; Pieczarka, Julio Cesar

    2016-01-01

    Sigmodontinae rodents show great diversity and complexity in morphology and ecology. This diversity is accompanied by extensive chromosome variation challenging attempts to reconstruct their ancestral genome. The species Hylaeamys megacephalus--HME (Oryzomyini, 2n = 54), Necromys lasiurus--NLA (Akodontini, 2n = 34) and Akodon sp.--ASP (Akodontini, 2n = 10) have extreme diploid numbers that make it difficult to understand the rearrangements that are responsible for such differences. In this study we analyzed these changes using whole chromosome probes of HME in cross-species painting of NLA and ASP to construct chromosome homology maps that reveal the rearrangements between species. We include data from the literature for other Sigmodontinae previously studied with probes from HME and Mus musculus (MMU) probes. We also use the HME probes on MMU chromosomes for the comparative analysis of NLA with other species already mapped by MMU probes. Our results show that NLA and ASP have highly rearranged karyotypes when compared to HME. Eleven HME syntenic blocks are shared among the species studied here. Four syntenies may be ancestral to Akodontini (HME2/18, 3/25, 18/25 and 4/11/16) and eight to Sigmodontinae (HME26, 1/12, 6/21, 7/9, 5/17, 11/16, 20/13 and 19/14/19). Using MMU data we identified six associations shared among rodents from seven subfamilies, where MMU3/18 and MMU8/13 are phylogenetic signatures of Sigmodontinae. We suggest that the associations MMU2entire, MMU6proximal/12entire, MMU3/18, MMU8/13, MMU1/17, MMU10/17, MMU12/17, MMU5/16, MMU5/6 and MMU7/19 are part of the ancestral Sigmodontinae genome. PMID:26800516

  18. Human Genetic Ancestral Composition Correlates with the Origin of Mycobacterium leprae Strains in a Leprosy Endemic Population.

    PubMed

    Cardona-Castro, Nora; Cortés, Edwin; Beltrán, Camilo; Romero, Marcela; Badel-Mogollón, Jaime E; Bedoya, Gabriel

    2015-01-01

    Recent reports have suggested that leprosy originated in Africa, extended to Asia and Europe, and arrived in the Americas during European colonization and the African slave trade. Due to colonization, the contemporary Colombian population is an admixture of Native-American, European and African ancestries. Because microorganisms are known to accompany humans during migrations, patterns of human migration can be traced by examining genomic changes in associated microbes. The current study analyzed 118 leprosy cases and 116 unrelated controls from two Colombian regions endemic for leprosy (Atlantic and Andean) in order to determine possible associations of leprosy with patient ancestral background (determined using 36 ancestry informative markers), Mycobacterium leprae genotype and/or patient geographical origin. We found significant differences between ancestral genetic composition. European components were predominant in Andean populations. In contrast, African components were higher in the Atlantic region. M. leprae genotypes were then analyzed for cluster associations and compared with the ancestral composition of leprosy patients. Two M. leprae principal clusters were found: haplotypes C54 and T45. Haplotype C54 associated with African origin and was more frequent in patients from the Atlantic region with a high African component. In contrast, haplotype T45 associated with European origin and was more frequent in Andean patients with a higher European component. These results suggest that the human and M. leprae genomes have co-existed since the African and European origins of the disease, with leprosy ultimately arriving in Colombia during colonization. Distinct M. leprae strains followed European and African settlement in the country and can be detected in contemporary Colombian populations. PMID:26360617

  19. Full-length genomes of 16 hepatitis C virus genotype 1 isolates representing subtypes 1c, 1d, 1e, 1g, 1h, 1i, 1j and 1k, and two new subtypes 1m and 1n, and four unclassified variants reveal ancestral relationships among subtypes.

    PubMed

    Lu, Ling; Li, Chunhua; Xu, Yan; Murphy, Donald G

    2014-07-01

    We characterized the full-length genomes of 16 distinct hepatitis C virus genotype 1 (HCV-1) isolates. Among them, four represented the first full-length genomes for subtypes 1d (QC103), 1i (QC181), 1j (QC329) and 1k (QC82), and another four corresponded to subtypes 1c (QC165), 1g (QC78), 1h (QC156) and 1e (QC172). Both QC196 and QC87 were assigned into a new subtype 1m, and QC113 and QC74 into another new subtype 1n. The remaining four (QC60, QC316, QC152 and QC180) did not classify among the established subtypes and corresponded to four new lineages. Subtypes 1j, 1k, 1m, 1n and the unclassified isolate QC60 were identified in Haitian immigrants. In the updated HCV nomenclature of 2005, a total of 12 subtypes of HCV-1 were designated. Including the data from the present study, all but subtype 1f now have their full-length genomes defined. Further analysis of partial NS5B sequences available in GenBank denoted a total of 21 unclassified lineages, indicating the taxonomic complexity of HCV-1. Among them, six have had their full-length genomes characterized. Based on the available full-length genome sequences, a timescale phylogenetic tree was reconstructed which estimated important time points in the evolution of HCV-1. It revealed that subtype 1a diverged from its nearest relatives 135 years ago and subtype 1b diverged from its nearest relatives 112 years ago. When subtypes 1a, 1j, 1k, 1m, 1n and six close relatives (all but one from Haitian immigrants) were considered as a whole, the divergence time was 176 years ago. This diversification was concurrent with the time period when the transatlantic slave trade was active. When taking all the HCV-1 isolates as a single lineage, the divergence time was 326 years ago. This analysis suggested the existence of a recent common ancestor for subtype 1a and the Haitian variants; a co-origin for subtypes 1b, 1i and 1d was also implied. PMID:24718832

  20. In Silico Resurrection of the Major Vault Protein Suggests It Is Ancestral in Modern Eukaryotes

    PubMed Central

    Daly, Toni K.; Sutherland-Smith, Andrew J.; Penny, David

    2013-01-01

    Vaults are very large oligomeric ribonucleoproteins conserved among a variety of species. The rat vault 3D structure shows an ovoid oligomeric particle, consisting of 78 major vault protein monomers, each of approximately 861 amino acids. Vaults are probably the largest ribonucleoprotein structures in eukaryote cells, being approximately 70 nm in length with a diameter of 40 nm—the size of three ribosomes and with a lumen capacity of 50 million Å3. We use both protein sequences and inferred ancestral sequences for in silico virtual resurrection of tertiary and quaternary structures to search for vaults in a wide variety of eukaryotes. We find that the vault’s phylogenetic distribution is widespread in eukaryotes, but is apparently absent in some notable model organisms. Our conclusion from the distribution of vaults is that they were present in the last eukaryote common ancestor but they have apparently been lost from a number of groups including fungi, insects, and probably plants. Our approach of inferring ancestral 3D and quaternary structures is expected to be useful generally. PMID:23887922

  1. In silico resurrection of the major vault protein suggests it is ancestral in modern eukaryotes.

    PubMed

    Daly, Toni K; Sutherland-Smith, Andrew J; Penny, David

    2013-01-01

    Vaults are very large oligomeric ribonucleoproteins conserved among a variety of species. The rat vault 3D structure shows an ovoid oligomeric particle, consisting of 78 major vault protein monomers, each of approximately 861 amino acids. Vaults are probably the largest ribonucleoprotein structures in eukaryote cells, being approximately 70 nm in length with a diameter of 40 nm--the size of three ribosomes and with a lumen capacity of 50 million Å(3). We use both protein sequences and inferred ancestral sequences for in silico virtual resurrection of tertiary and quaternary structures to search for vaults in a wide variety of eukaryotes. We find that the vault's phylogenetic distribution is widespread in eukaryotes, but is apparently absent in some notable model organisms. Our conclusion from the distribution of vaults is that they were present in the last eukaryote common ancestor but they have apparently been lost from a number of groups including fungi, insects, and probably plants. Our approach of inferring ancestral 3D and quaternary structures is expected to be useful generally. PMID:23887922

  2. The Organellar Genomes of Chromera and Vitrella, the Phototrophic Relatives of Apicomplexan Parasites.

    PubMed

    Oborník, Miroslav; Lukeš, Julius

    2015-01-01

    Apicomplexa are known to contain greatly reduced organellar genomes. Their mitochondrial genome carries only three protein-coding genes, and their plastid genome is reduced to a 35-kb-long circle. The discovery of coral-endosymbiotic algae Chromera velia and Vitrella brassicaformis, which share a common ancestry with Apicomplexa, provided an opportunity to study possibly ancestral forms of organellar genomes, a unique glimpse into the evolutionary history of apicomplexan parasites. The structurally similar mitochondrial genomes of Chromera and Vitrella differ in gene content, which is reflected in the composition of their respiratory chains. Thus, Chromera lacks respiratory complexes I and III, whereas Vitrella and apicomplexan parasites are missing only complex I. Plastid genomes differ substantially between these algae, particularly in structure: The Chromera plastid genome is a linear, 120-kb molecule with large and divergent genes, whereas the plastid genome of Vitrella is a highly compact circle that is only 85 kb long but nonetheless contains more genes than that of Chromera. It appears that organellar genomes have already been reduced in free-living phototrophic ancestors of apicomplexan parasites, and such reduction is not associated with parasitism. PMID:26092225

  3. The genome of woodland strawberry (Fragaria vesca)

    PubMed Central

    Shulaev, Vladimir; Sargent, Daniel J; Crowhurst, Ross N; Mockler, Todd C; Folkerts, Otto; Delcher, Arthur L; Jaiswal, Pankaj; Mockaitis, Keithanne; Liston, Aaron; Mane, Shrinivasrao P; Burns, Paul; Davis, Thomas M; Slovin, Janet P; Bassil, Nahla; Hellens, Roger P; Evans, Clive; Harkins, Tim; Kodira, Chinnappa; Desany, Brian; Crasta, Oswald R; Jensen, Roderick V; Allan, Andrew C; Michael, Todd P; Setubal, Joao Carlos; Celton, Jean-Marc; Rees, D Jasper G; Williams, Kelly P; Holt, Sarah H; Ruiz Rojas, Juan Jairo; Chatterjee, Mithu; Liu, Bo; Silva, Herman; Meisel, Lee; Adato, Avital; Filichkin, Sergei A; Troggio, Michela; Viola, Roberto; Ashman, Tia-Lynn; Wang, Hao; Dharmawardhana, Palitha; Elser, Justin; Raja, Rajani; Priest, Henry D; Bryant, Douglas W; Fox, Samuel E; Givan, Scott A; Wilhelm, Larry J; Naithani, Sushma; Christoffels, Alan; Salama, David Y; Carter, Jade; Girona, Elena Lopez; Zdepski, Anna; Wang, Wenqin; Kerstetter, Randall A; Schwab, Wilfried; Korban, Schuyler S; Davik, Jahn; Monfort, Amparo; Denoyes-Rothan, Beatrice; Arus, Pere; Mittler, Ron; Flinn, Barry; Aharoni, Asaph; Bennetzen, Jeffrey L; Salzberg, Steven L; Dickerman, Allan W; Velasco, Riccardo; Borodovsky, Mark; Veilleux, Richard E; Folta, Kevin M

    2012-01-01

    The woodland strawberry, Fragaria vesca (2n = 2x = 14), is a versatile experimental plant system. This diminutive herbaceous perennial has a small genome (240 Mb), is amenable to genetic transformation and shares substantial sequence identity with the cultivated strawberry (Fragaria × ananassa) and other economically important rosaceous plants. Here we report the draft F. vesca genome, which was sequenced to ×39 coverage using second-generation technology, assembled de novo and then anchored to the genetic linkage map into seven pseudochromosomes. This diploid strawberry sequence lacks the large genome duplications seen in other rosids. Gene prediction modeling identified 34,809 genes, with most being supported by transcriptome mapping. Genes critical to valuable horticultural traits including flavor, nutritional value and flowering time were identified. Macrosyntenic relationships between Fragaria and Prunus predict a hypothetical ancestral Rosaceae genome that had nine chromosomes. New phylogenetic analysis of 154 protein-coding genes suggests that assignment of Populus to Malvidae, rather than Fabidae, is warranted. PMID:21186353

  4. Genomic islands of divergence in hybridizing Heliconius butterflies identified by large-scale targeted sequencing

    PubMed Central

    Nadeau, Nicola J.; Whibley, Annabel; Jones, Robert T.; Davey, John W.; Dasmahapatra, Kanchon K.; Baxter, Simon W.; Quail, Michael A.; Joron, Mathieu; ffrench-Constant, Richard H.; Blaxter, Mark L.; Mallet, James; Jiggins, Chris D.

    2012-01-01

    Heliconius butterflies represent a recent radiation of species, in which wing pattern divergence has been implicated in speciation. Several loci that control wing pattern phenotypes have been mapped and two were identified through sequencing. These same gene regions play a role in adaptation across the whole Heliconius radiation. Previous studies of population genetic patterns at these regions have sequenced small amplicons. Here, we use targeted next-generation sequence capture to survey patterns of divergence across these entire regions in divergent geographical races and species of Heliconius. This technique was successful both within and between species for obtaining high coverage of almost all coding regions and sufficient coverage of non-coding regions to perform population genetic analyses. We find major peaks of elevated population differentiation between races across hybrid zones, which indicate regions under strong divergent selection. These ‘islands’ of divergence appear to be more extensive between closely related species, but there is less clear evidence for such islands between more distantly related species at two further points along the ‘speciation continuum’. We also sequence fosmid clones across these regions in different Heliconius melpomene races. We find no major structural rearrangements but many relatively large (greater than 1 kb) insertion/deletion events (including gain/loss of transposable elements) that are variable between races. PMID:22201164

  5. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes

    PubMed Central

    Koonin, Eugene V; Fedorova, Natalie D; Jackson, John D; Jacobs, Aviva R; Krylov, Dmitri M; Makarova, Kira S; Mazumder, Raja; Mekhedov, Sergei L; Nikolskaya, Anastasia N; Rao, B Sridhar; Rogozin, Igor B; Smirnov, Sergei; Sorokin, Alexander V; Sverdlov, Alexander V; Vasudevan, Sona; Wolf, Yuri I; Yin, Jodie J; Natale, Darren A

    2004-01-01

    the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes. Conclusions The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms. PMID:14759257

  6. Comment on Schielzeth et al. (2014): "Genome size variation affects song attractiveness in grasshoppers: Evidence for sexual selection against large genomes".

    PubMed

    Camacho, Juan Pedro M

    2016-06-01

    Schielzeth et al. (2014) concluded that attractive grasshopper singers have significantly smaller genomes thus suggesting a possible role for sexual selection on genome size. Whereas this conclusion could still be conceivably valid, it is not supported by the data presented due to some technical flaws. In addition, the interpretation of the results, speculating on the possible presence of B chromosomes, is not justified. PMID:27327141

  7. Reconstructing the ancestor of Mycobacterium leprae: The dynamics of gene loss and genome reduction

    PubMed Central

    Gómez-Valero, Laura; Rocha, Eduardo P.C.; Latorre, Amparo; Silva, Francisco J.

    2007-01-01

    We have reconstructed the gene content and order of the last common ancestor of the human pathogens Mycobacterium leprae and Mycobacterium tuberculosis. During the reductive evolution of M. leprae, 1537 of 2977 ancestral genes were lost, among which we found 177 previously unnoticed pseudogenes. We find evidence that a massive gene inactivation took place very recently in the M. leprae lineage, leading to the loss of hundreds of ancestral genes. A large proportion of their nucleotide content (∼89%) still remains in the genome, which allowed us to characterize and date them. The age of the pseudogenes was computed using a new methodology based on the rates and patterns of substitution in the pseudogenes and functional orthologous genes of closely related genomes. The position of the genes that were lost in the ancestor’s genome revealed that the process of function loss and degradation mainly took place through a gene-to-gene inactivation process, followed by the gradual loss of their DNA. This suggests a scenario of massive genome reduction through many nearly simultaneous pseudogenization events, leading to a highly specialized pathogen. PMID:17623808

  8. A Genome-Wide Association Study in Large White and Landrace Pig Populations for Number Piglets Born Alive

    PubMed Central

    Bergfelder-Drüing, Sarah; Grosse-Brinkhaus, Christine; Lind, Bianca; Erbe, Malena; Schellander, Karl; Simianer, Henner; Tholen, Ernst

    2015-01-01

    The number of piglets born alive (NBA) per litter is one of the most important traits in pig breeding due to its influence on production efficiency. It is difficult to improve NBA because the heritability of the trait is low and it is governed by a high number of loci with low to moderate effects. To clarify the biological and genetic background of NBA, genome-wide association studies (GWAS) were performed using 4,012 Large White and Landrace pigs from herdbook and commercial breeding companies in Germany (3), Austria (1) and Switzerland (1). The animals were genotyped with the Illumina PorcineSNP60 BeadChip. Because of population stratifications within and between breeds, clusters were formed using the genetic distances between the populations. Five clusters for each breed were formed and analysed by GWAS approaches. In total, 17 different significant markers affecting NBA were found in regions with known effects on female reproduction. No overlapping significant chromosome areas or QTL between Large White and Landrace breed were detected. PMID:25781935

  9. A genome-wide association study in large white and landrace pig populations for number piglets born alive.

    PubMed

    Bergfelder-Drüing, Sarah; Grosse-Brinkhaus, Christine; Lind, Bianca; Erbe, Malena; Schellander, Karl; Simianer, Henner; Tholen, Ernst

    2015-01-01

    The number of piglets born alive (NBA) per litter is one of the most important traits in pig breeding due to its influence on production efficiency. It is difficult to improve NBA because the heritability of the trait is low and it is governed by a high number of loci with low to moderate effects. To clarify the biological and genetic background of NBA, genome-wide association studies (GWAS) were performed using 4,012 Large White and Landrace pigs from herdbook and commercial breeding companies in Germany (3), Austria (1) and Switzerland (1). The animals were genotyped with the Illumina PorcineSNP60 BeadChip. Because of population stratifications within and between breeds, clusters were formed using the genetic distances between the populations. Five clusters for each breed were formed and analysed by GWAS approaches. In total, 17 different significant markers affecting NBA were found in regions with known effects on female reproduction. No overlapping significant chromosome areas or QTL between Large White and Landrace breed were detected. PMID:25781935

  10. Genome-wide association study identifies a variant in HDAC9 associated with large vessel ischemic stroke

    PubMed Central

    2012-01-01

    Genetic factors have been implicated in stroke risk but few replicated associations have been reported. We conducted a genome-wide association study (GWAS) in ischemic stroke and its subtypes in 3,548 cases and 5,972 controls, all of European ancestry. Replication of potential signals was performed in 5,859 cases and 6,281 controls. We replicated reported associations between variants close to PITX2 and ZFHX3 with cardioembolic stroke, and a 9p21 locus with large vessel stroke. We identified a novel association for a SNP within the histone deacetylase 9 (HDAC9) gene on chromosome 7p21.1 which was associated with large vessel stroke including additional replication in a further 735 cases and 28583 controls (rs11984041, combined P = 1.87×10−11, OR=1.42 (95% CI) 1.28-1.57). All four loci exhibit evidence for heterogeneity of effect across the stroke subtypes, with some, and possibly all, affecting risk for only one subtype. This suggests differing genetic architectures for different stroke subtypes. PMID:22306652

  11. Efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data

    PubMed Central

    Bhaskar, Anand; Wang, Y.X. Rachel; Song, Yun S.

    2015-01-01

    With the recent increase in study sample sizes in human genetics, there has been growing interest in inferring historical population demography from genomic variation data. Here, we present an efficient inference method that can scale up to very large samples, with tens or hundreds of thousands of individuals. Specifically, by utilizing analytic results on the expected frequency spectrum under the coalescent and by leveraging the technique of automatic differentiation, which allows us to compute gradients exactly, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum. In addition to inferring demography, our method can also accurately estimate locus-specific mutation rates. We perform extensive validation of our method on simulated data and show that it can accurately infer multiple recent epochs of rapid exponential growth, a signal that is difficult to pick up with small sample sizes. Lastly, we use our method to analyze data from recent sequencing studies, including a large-sample exome-sequencing data set of tens of thousands of individuals assayed at a few hundred genic regions. PMID:25564017

  12. Large number of replacement polymorphisms in rapidly evolving genes of Drosophila. Implications for genome-wide surveys of DNA polymorphism.

    PubMed Central

    Schmid, K J; Nigro, L; Aquadro, C F; Tautz, D

    1999-01-01

    We present a survey of nucleotide polymorphism of three novel, rapidly evolving genes in populations of Drosophila melanogaster and D. simulans. Levels of silent polymorphism are comparable to other loci, but the number of replacement polymorphisms is higher than that in most other genes surveyed in D. melanogaster and D. simulans. Tests of neutrality fail to reject neutral evolution with one exception. This concerns a gene located in a region of high recombination rate in D. simulans and in a region of low recombination rate in D. melanogaster, due to an inversion. In the latter case it shows a very low number of polymorphisms, presumably due to selective sweeps in the region. Patterns of nucleotide polymorphism suggest that most substitutions are neutral or nearly neutral and that weak (positive and purifying) selection plays a significant role in the evolution of these genes. At all three loci, purifying selection of slightly deleterious replacement mutations appears to be more efficient in D. simulans than in D. melanogaster, presumably due to different effective population sizes. Our analysis suggests that current knowledge about genome-wide patterns of nucleotide polymorphism is far from complete with respect to the types and range of nucleotide substitutions and that further analysis of differences between local populations will be required to understand the forces more completely. We note that rapidly diverging and nearly neutrally evolving genes cannot be expected only in the genome of Drosophila, but are likely to occur in large numbers also in other organisms and that their function and evolution are little understood so far. PMID:10581279

  13. RNA-seq pinpoints a Xanthomonas TAL-effector activated resistance gene in a large-crop genome

    PubMed Central

    Strauß, Tina; van Poecke, Remco M. P.; Strauß, Annett; Römer, Patrick; Minsavage, Gerald V.; Singh, Sylvia; Wolf, Christina; Strauß, Axel; Kim, Seungill; Lee, Hyun-Ah; Yeom, Seon-In; Parniske, Martin; Stall, Robert E.; Jones, Jeffrey B.; Choi, Doil; Prins, Marcel; Lahaye, Thomas

    2012-01-01

    Transcription activator-like effector (TALE) proteins of the plant pathogenic bacterial genus Xanthomonas bind to and transcriptionally activate host susceptibility genes, promoting disease. Plant immune systems have taken advantage of this mechanism by evolving TALE binding sites upstream of resistance (R) genes. For example, the pepper Bs3 and rice Xa27 genes are hypersensitive reaction plant R genes that are transcriptionally activated by corresponding TALEs. Both R genes have a hallmark expression pattern in which their transcripts are detectable only in the presence and not the absence of the corresponding TALE. By transcriptome profiling using next-generation sequencing (RNA-seq), we tested whether we could avoid laborious positional cloning for the isolation of TALE-induced R genes. In a proof-of-principle experiment, RNA-seq was used to identify a candidate for Bs4C, an R gene from pepper that mediates recognition of the Xanthomonas TALE protein AvrBs4. We identified one major Bs4C candidate transcript by RNA-seq that was expressed exclusively in the presence of AvrBs4. Complementation studies confirmed that the candidate corresponds to the Bs4C gene and that an AvrBs4 binding site in the Bs4C promoter directs its transcriptional activation. Comparison of Bs4C with a nonfunctional allele that is unable to recognize AvrBs4 revealed a 2-bp polymorphism within the TALE binding site of the Bs4C promoter. Bs4C encodes a structurally unique R protein and Bs4C-like genes that are present in many solanaceous genomes seem to be as tightly regulated as pepper Bs4C. These findings demonstrate that TALE-specific R genes can be cloned from large-genome crops with a highly efficient RNA-seq approach. PMID:23132937

  14. Large-Scale Genome-Wide Association Studies and Meta-Analyses of Longitudinal Change in Adult Lung Function

    PubMed Central

    Tang, Wenbo; Kowgier, Matthew; Loth, Daan W.; Soler Artigas, María; Joubert, Bonnie R.; Hodge, Emily; Gharib, Sina A.; Smith, Albert V.; Ruczinski, Ingo; Gudnason, Vilmundur; Mathias, Rasika A.; Harris, Tamara B.; Hansel, Nadia N.; Launer, Lenore J.; Barnes, Kathleen C.; Hansen, Joyanna G.; Albrecht, Eva; Aldrich, Melinda C.; Allerhand, Michael; Barr, R. Graham; Brusselle, Guy G.; Couper, David J.; Curjuric, Ivan; Davies, Gail; Deary, Ian J.; Dupuis, Josée; Fall, Tove; Foy, Millennia; Franceschini, Nora; Gao, Wei; Gläser, Sven; Gu, Xiangjun; Hancock, Dana B.; Heinrich, Joachim; Hofman, Albert; Imboden, Medea; Ingelsson, Erik; James, Alan; Karrasch, Stefan; Koch, Beate; Kritchevsky, Stephen B.; Kumar, Ashish; Lahousse, Lies; Li, Guo; Lind, Lars; Lindgren, Cecilia; Liu, Yongmei; Lohman, Kurt; Lumley, Thomas; McArdle, Wendy L.; Meibohm, Bernd; Morris, Andrew P.; Morrison, Alanna C.; Musk, Bill; North, Kari E.; Palmer, Lyle J.; Probst-Hensch, Nicole M.; Psaty, Bruce M.; Rivadeneira, Fernando; Rotter, Jerome I.; Schulz, Holger; Smith, Lewis J.; Sood, Akshay; Starr, John M.; Strachan, David P.; Teumer, Alexander; Uitterlinden, André G.; Völzke, Henry; Voorman, Arend; Wain, Louise V.; Wells, Martin T.; Wilk, Jemma B.; Williams, O. Dale; Heckbert, Susan R.; Stricker, Bruno H.; London, Stephanie J.; Fornage, Myriam; Tobin, Martin D.; O′Connor, George T.; Hall, Ian P.; Cassano, Patricia A.

    2014-01-01

    Background Genome-wide association studies (GWAS) have identified numerous loci influencing cross-sectional lung function, but less is known about genes influencing longitudinal change in lung function. Methods We performed GWAS of the rate of change in forced expiratory volume in the first second (FEV1) in 14 longitudinal, population-based cohort studies comprising 27,249 adults of European ancestry using linear mixed effects model and combined cohort-specific results using fixed effect meta-analysis to identify novel genetic loci associated with longitudinal change in lung function. Gene expression analyses were subsequently performed for identified genetic loci. As a secondary aim, we estimated the mean rate of decline in FEV1 by smoking pattern, irrespective of genotypes, across these 14 studies using meta-analysis. Results The overall meta-analysis produced suggestive evidence for association at the novel IL16/STARD5/TMC3 locus on chromosome 15 (P  =  5.71 × 10-7). In addition, meta-analysis using the five cohorts with ≥3 FEV1 measurements per participant identified the novel ME3 locus on chromosome 11 (P  =  2.18 × 10-8) at genome-wide significance. Neither locus was associated with FEV1 decline in two additional cohort studies. We confirmed gene expression of IL16, STARD5, and ME3 in multiple lung tissues. Publicly available microarray data confirmed differential expression of all three genes in lung samples from COPD patients compared with controls. Irrespective of genotypes, the combined estimate for FEV1 decline was 26.9, 29.2 and 35.7 mL/year in never, former, and persistent smokers, respectively. Conclusions In this large-scale GWAS, we identified two novel genetic loci in association with the rate of change in FEV1 that harbor candidate genes with biologically plausible functional links to lung function. PMID:24983941

  15. ``Black Holes" and Bacterial Pathogenicity: A Large Genomic Deletion that Enhances the Virulence of Shigella spp. and Enteroinvasive Escherichia coli

    NASA Astrophysics Data System (ADS)

    Maurelli, Anthony T.; Fernandez, Reinaldo E.; Bloch, Craig A.; Rode, Christopher K.; Fasano, Alessio

    1998-03-01

    Plasmids, bacteriophages, and pathogenicity islands are genomic additions that contribute to the evolution of bacterial pathogens. For example, Shigella spp., the causative agents of bacillary dysentery, differ from the closely related commensal Escherichia coli in the presence of a plasmid in Shigella that encodes virulence functions. However, pathogenic bacteria also may lack properties that are characteristic of nonpathogens. Lysine decarboxylate (LDC) activity is present in ≈ 90% of E. coli strains but is uniformly absent in Shigella strains. When the gene for LDC, cadA, was introduced into Shigella flexneri 2a, virulence became attenuated, and enterotoxin activity was inhibited greatly. The enterotoxin inhibitor was identified as cadaverine, a product of the reaction catalyzed by LDC. Comparison of the S. flexneri 2a and laboratory E. coli K-12 genomes in the region of cadA revealed a large deletion in Shigella. Representative strains of Shigella spp. and enteroinvasive E. coli displayed similar deletions of cadA. Our results suggest that, as Shigella spp. evolved from E. coli to become pathogens, they not only acquired virulence genes on a plasmid but also shed genes via deletions. The formation of these ``black holes,'' deletions of genes that are detrimental to a pathogenic lifestyle, provides an evolutionary pathway that enables a pathogen to enhance virulence. Furthermore, the demonstration that cadaverine can inhibit enterotoxin activity may lead to more general models about toxin activity or entry into cells and suggests an avenue for antitoxin therapy. Thus, understanding the role of black holes in pathogen evolution may yield clues to new treatments of infectious diseases.

  16. Evolution of pathogenicity and sexual reproduction in eight Candida genomes.

    PubMed

    Butler, Geraldine; Rasmussen, Matthew D; Lin, Michael F; Santos, Manuel A S; Sakthikumar, Sharadha; Munro, Carol A; Rheinbay, Esther; Grabherr, Manfred; Forche, Anja; Reedy, Jennifer L; Agrafioti, Ino; Arnaud, Martha B; Bates, Steven; Brown, Alistair J P; Brunke, Sascha; Costanzo, Maria C; Fitzpatrick, David A; de Groot, Piet W J; Harris, David; Hoyer, Lois L; Hube, Bernhard; Klis, Frans M; Kodira, Chinnappa; Lennard, Nicola; Logue, Mary E; Martin, Ronny; Neiman, Aaron M; Nikolaou, Elissavet; Quail, Michael A; Quinn, Janet; Santos, Maria C; Schmitzberger, Florian F; Sherlock, Gavin; Shah, Prachi; Silverstein, Kevin A T; Skrzypek, Marek S; Soll, David; Staggs, Rodney; Stansfield, Ian; Stumpf, Michael P H; Sudbery, Peter E; Srikantha, Thyagarajan; Zeng, Qiandong; Berman, Judith; Berriman, Matthew; Heitman, Joseph; Gow, Neil A R; Lorenz, Michael C; Birren, Bruce W; Kellis, Manolis; Cuomo, Christina A

    2009-06-01

    Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes. PMID:19465905

  17. Cloning of complete genome sets of six dsRNA viruses using an improved cloning method for large dsRNA genes.

    PubMed

    Potgieter, A C; Steele, A D; van Dijk, A A

    2002-09-01

    Cloning full-length large (>3 kb) dsRNA genome segments from small amounts of dsRNA has thus far remained problematic. Here, a single-primer amplification sequence-independent dsRNA cloning procedure was perfected for large genes and tailored for routine use to clone complete genome sets or individual genes. Nine complete viral genome sets were amplified by PCR, namely those of two human rotaviruses, two African horsesickness viruses (AHSV), two equine encephalosis viruses (EEV), one bluetongue virus (BTV), one reovirus and bacteriophage Phi12. Of these amplified genomes, six complete genome sets were cloned for viruses with genes ranging in size from 0.8 to 6.8 kb. Rotavirus dsRNA was extracted directly from stool samples. Co-expressed EEV VP3 and VP7 assembled into core-like particles that have typical orbivirus capsomeres. This work presents the first EEV sequence data and establishes that EEV genes have the same conserved termini (5' GUU and UAC 3') and coding assignment as AHSV and BTV. To clone complete genome sets, one-tube reactions were developed for oligo-ligation, cDNA synthesis and PCR amplification. The method is simple and efficient compared to other methods. Complete genomes can be cloned from as little as 1 ng dsRNA and a considerably reduced number of PCR cycles (22-30 cycles compared to 30-35 of other methods). This progress with cloning large dsRNA genes is important for recombinant vaccine development and determination of the role of terminal sequences for replication and gene expression. PMID:12185276

  18. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.)

    PubMed Central

    Lashermes, Philippe; Hueber, Yann; Combes, Marie-Christine; Severac, Dany; Dereeper, Alexis

    2016-01-01

    Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing. PMID:27440920

  19. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    PubMed Central

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  20. Complete mitochondrial genome of the Antarctic midge Parochlus steinenii (Diptera: Chironomidae).

    PubMed

    Kim, Sanghee; Kim, Hanna; Shin, Seung Chul

    2016-09-01

    Parochlus steinenii is a winged midge found in the Antarctic Peninsula and its offshore islands. We determined the complete mitochondrial genome sequence of P. steinenii, which is comprised of 16 803 nucleotides and contains 13 protein-coding genes (PCGs), 22 tRNA genes, and the large (rrnL) and small (rrnS) rRNA genes. Its total A + T content is 72.5%. The PCG arrangement of P. steinenii is identical to that of the ancestral Diptera ground pattern. This is the first report on the mitogenome sequence of an Antarctic midge, and provides insights into the evolution of dipterans in Antarctica. PMID:26642812

  1. Evidence from opsin genes rejects nocturnality in ancestral primates

    PubMed Central

    Tan, Ying; Yoder, Anne D.; Yamashita, Nayuta; Li, Wen-Hsiung

    2005-01-01

    It is firmly believed that ancestral primates were nocturnal, with nocturnality having been maintained in most prosimian lineages. Under this traditional view, the opsin genes in all nocturnal prosimians should have undergone similar degrees of functional relaxation and accumulated similar extents of deleterious mutations. This expectation is rejected by the short-wavelength (S) opsin gene sequences from 14 representative prosimians. We found severe defects of the S opsin gene only in lorisiforms, but no defect in five nocturnal and two diurnal lemur species and only minor defects in two tarsiers and two nocturnal lemurs. Further, the nonsynonymous-to-synonymous rate ratio of the S opsin gene is highest in the lorisiforms and varies among the other prosimian branches, indicating different time periods of functional relaxation among lineages. These observations suggest that the ancestral primates were diurnal or cathemeral and that nocturnality has evolved several times in the prosimians, first in the lorisiforms but much later in other lineages. This view is further supported by the distribution pattern of the middle-wavelength (M) and long-wavelength (L) opsin genes among prosimians. PMID:16192351

  2. Functional conservation of an ancestral Pellino protein in helminth species.

    PubMed

    Cluxton, Christopher D; Caffrey, Brian E; Kinsella, Gemma K; Moynagh, Paul N; Fares, Mario A; Fallon, Padraic G

    2015-01-01

    The immune system of H. sapiens has innate signaling pathways that arose in ancestral species. This is exemplified by the discovery of the Toll-like receptor (TLR) pathway using free-living model organisms such as Drosophila melanogaster. The TLR pathway is ubiquitous and controls sensitivity to pathogen-associated molecular patterns (PAMPs) in eukaryotes. There is, however, a marked absence of this pathway from the plathyhelminthes, with the exception of the Pellino protein family, which is present in a number of species from this phylum. Helminth Pellino proteins are conserved having high similarity, both at the sequence and predicted structural protein level, with that of human Pellino proteins. Pellino from a model helminth, Schistosoma mansoni Pellino (SmPellino), was shown to bind and poly-ubiquitinate human IRAK-1, displaying E3 ligase activity consistent with its human counterparts. When transfected into human cells SmPellino is functional, interacting with signaling proteins and modulating mammalian signaling pathways. Strict conservation of a protein family in species lacking its niche signalling pathway is rare and provides a platform to examine the ancestral functions of Pellino proteins that may translate into novel mechanisms of immune regulation in humans. PMID:26120048

  3. Paired-End Sequencing of Long-Range DNA Fragments for De Novo Assembly of Large, Complex Mammalian Genomes by Direct Intra-Molecule Ligation

    PubMed Central

    Wu, Kui; Cai, Qingle; Wang, Yu; Lang, Yongshan; Cao, Hongzhi; Yang, Huangming; Wang, Jian; Zhang, Xiuqing

    2012-01-01

    Background The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS. Results We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data. Conclusions In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads. PMID:23029438

  4. Patterns of metabolite changes identified from large-scale gene perturbations in Arabidopsis using a genome-scale metabolic network.

    PubMed

    Kim, Taehyong; Dreher, Kate; Nilo-Poyanco, Ricardo; Lee, Insuk; Fiehn, Oliver; Lange, Bernd Markus; Nikolau, Basil J; Sumner, Lloyd; Welti, Ruth; Wurtele, Eve S; Rhee, Seung Y

    2015-04-01

    Metabolomics enables quantitative evaluation of metabolic changes caused by genetic or environmental perturbations. However, little is known about how perturbing a single gene changes the metabolic system as a whole and which network and functional properties are involved in this response. To answer this question, we investigated the metabolite profiles from 136 mutants with single gene perturbations of functionally diverse Arabidopsis (Arabidopsis thaliana) genes. Fewer than 10 metabolites were changed significantly relative to the wild type in most of the mutants, indicating that the metabolic network was robust to perturbations of single metabolic genes. These changed metabolites were closer to each other in a genome-scale metabolic network than expected by chance, supporting the notion that the genetic perturbations changed the network more locally than globally. Surprisingly, the changed metabolites were close to the perturbed reactions in only 30% of the mutants of the well-characterized genes. To determine the factors that contributed to the distance between the observed metabolic changes and the perturbation site in the network, we examined nine network and functional properties of the perturbed genes. Only the isozyme number affected the distance between the perturbed reactions and changed metabolites. This study revealed patterns of metabolic changes from large-scale gene perturbations and relationships between characteristics of the perturbed genes and metabolic changes. PMID:25670818

  5. Once a Batesian mimic, not always a Batesian mimic: mimic reverts back to ancestral phenotype when the model is absent

    PubMed Central

    Prudic, Kathleen L; Oliver, Jeffrey C

    2008-01-01

    Batesian mimics gain protection from predation through the evolution of physical similarities to a model species that possesses anti-predator defences. This protection should not be effective in the absence of the model since the predator does not identify the mimic as potentially dangerous and both the model and the mimic are highly conspicuous. Thus, Batesian mimics should probably encounter strong predation pressure outside the geographical range of the model species. There are several documented examples of Batesian mimics occurring in locations without their models, but the evolutionary responses remain largely unidentified. A mimetic species has four alternative evolutionary responses to the loss of model presence. If predation is weak, it could maintain its mimetic signal. If predation is intense, it is widely presumed the mimic will go extinct. However, the mimic could also evolve a new colour pattern to mimic another model species or it could revert back to its ancestral, less conspicuous phenotype. We used molecular phylogenetic approaches to reconstruct and test the evolution of mimicry in the North American admiral butterflies (Limenitis: Nymphalidae). We confirmed that the more cryptic white-banded form is the ancestral phenotype of North American admiral butterflies. However, one species, Limenitis arthemis, evolved the black pipevine swallowtail mimetic form but later reverted to the white-banded more cryptic ancestral form. This character reversion is strongly correlated with the geographical absence of the model species and its host plant, but not the host plant distribution of L. arthemis. Our results support the prediction that a Batesian mimic does not persist in locations without its model, but it does not go extinct either. The mimic can revert back to its ancestral, less conspicuous form and persist. PMID:18285285

  6. Large and variable genome size unrelated to serpentine adaptation but supportive of cryptic sexuality in Cenococcum geophilum.

    PubMed

    Bourne, Elizabeth C; Mina, Diogo; Gonçalves, Susana C; Loureiro, João; Freitas, Helena; Muller, Ludo A H

    2014-01-01

    Estimations of genome size and its variation can provide valuable information regarding the genetic diversity of organisms and their adaptation potential to heterogeneous environments. We used flow cytometry to characterize the variation in genome size among 40 isolates of Cenococcum geophilum, an ectomycorrhizal fungus with a wide ecological and geographical distribution, obtained from two serpentine and two non-serpentine sites in Portugal. Besides determining the genome size and its intraspecies variation, we wanted to assess whether a relationship exists between genome size and the edaphic background of the C. geophilum isolates. Our results reveal C. geophilum to have one of the largest genome sizes so far measured in the Ascomycota, with a mean haploid genome size estimate of 0.208 pg (203 Mbp). However, no relationship was found between genome size and the edaphic background of the sampled isolates, indicating genetic and demographic processes to be more important for shaping the genome size variation in this species than environmental selection. The detection of variation in ploidy level among our isolates, including a single individual with both presumed haploid and diploid nuclei, provides supportive evidence for a possible cryptic sexual or parasexual cycle in C. geophilum (although other mechanisms may have caused this variation). The existence of such a cycle would have wide significance, explaining the high levels of genetic diversity and likelihood of recombination previously reported in this species, and adds to the increasing number of studies suggesting sexual cycles in previously assumed asexual fungi. PMID:23754539

  7. A novel sandwich hybridization method for selecting cDNAs from large genomic regions: Identification of cDNAs from the cloned genomic DNA spanning the XLRP locus

    SciTech Connect

    Yan, D.; McHenry, C.; Fujita, R.

    1994-09-01

    We have developed an efficient hybridization-based cDNA-selection method. A sandwich of three species - single-stranded cDNA, tagged RNA derived from genomic DNA, and biotinylated RNA complementary to the tag - allows specific retention of hybrids on an avidin-matrix. Previously, using model experiments, we demonstrated highly specific and efficient selection of a retinal gene, NRL, from complex mixtures of cDNA clones, using a sub-library from a 5 kb NRL genomic clone. We have now applied this selection strategy to isolate cDNAs from human adult retina and fetal eye libraries, with the {open_quotes}genomic RNA{close_quotes} derived from two YAC clones (OTC-C and 55B) spanning the region of X-linked retinitis pigmentosa (XLRP) locus RP3 at Xp21.1. Effectiveness of the selection-method was monitored by enrichment of TCTEX-1L gene that maps within the 55B YAC. Of the 15 selected cDNA clones that hybridized to the 55B YAC DNA, five appear to the map to specific cosmid clones derived from the 55B YAC. Inserts in these selected cDNA clones range from 0.5 to 2.3 kb in size. Additional clones are now being isolated and characterized. This procedure should be independent of the size or complexity of genomic DNA being used for selection, allow for the isolation of full-length cDNAs, and may have wider application.

  8. DEDUCTIONS ABOUT THE NUMBER, ORGANIZATION AND EVOLUTION OF GENES IN THE TOMATO GENOME BASED ON ANALYSIS OF LARGE EST COLLECTION AND SELECTIVE GENOMIC SEQUENCING

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Analysis of a collection of 120,892 single pass ESTs, derived from 26 different tomato cDNA libraries and reduced to a set of 27,274 unique consensus sequences (unigenes) reveals that 70% of the unigenes have identifiable homologs in the arabidopsis genome. Many of the most highly conserved multige...

  9. Did warfare among ancestral hunter-gatherers affect the evolution of human social behaviors?

    PubMed

    Bowles, Samuel

    2009-06-01

    Since Darwin, intergroup hostilities have figured prominently in explanations of the evolution of human social behavior. Yet whether ancestral humans were largely "peaceful" or "warlike" remains controversial. I ask a more precise question: If more cooperative groups were more likely to prevail in conflicts with other groups, was the level of intergroup violence sufficient to influence the evolution of human social behavior? Using a model of the evolutionary impact of between-group competition and a new data set that combines archaeological evidence on causes of death during the Late Pleistocene and early Holocene with ethnographic and historical reports on hunter-gatherer populations, I find that the estimated level of mortality in intergroup conflicts would have had substantial effects, allowing the proliferation of group-beneficial behaviors that were quite costly to the individual altruist. PMID:19498163

  10. Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors.

    PubMed

    Burgess, Ralph; Yang, Ziheng

    2008-09-01

    Estimation of population parameters for the common ancestors of humans and the great apes is important in understanding our evolutionary history. In particular, inference of population size for the human-chimpanzee common ancestor may shed light on the process by which the 2 species separated and on whether the human population experienced a severe size reduction in its early evolutionary history. In this study, the Bayesian method of ancestral inference of Rannala and Yang (2003. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 164:1645-1656) was extended to accommodate variable mutation rates among loci and random species-specific sequencing errors. The model was applied to analyze a genome-wide data set of approximately 15,000 neutral loci (7.4 Mb) aligned for human, chimpanzee, gorilla, orangutan, and macaque. We obtained robust and precise estimates for effective population sizes along the hominoid lineage extending back approximately 30 Myr to the cercopithecoid divergence. The results showed that ancestral populations were 5-10 times larger than modern humans along the entire hominoid lineage. The estimates were robust to the priors used and to model assumptions about recombination. The unusually low X chromosome divergence between human and chimpanzee could not be explained by variation in the male mutation bias or by current models of hybridization and introgression. Instead, our parameter estimates were consistent with a simple instantaneous process for human-chimpanzee speciation but showed a major reduction in X chromosome effective population size peculiar to the human-chimpanzee common ancestor, possibly due to selective sweeps on the X prior to separation of the 2 species. PMID:18603620

  11. Experimental evidence for the thermophilicity of ancestral life

    PubMed Central

    Akanuma, Satoshi; Nakajima, Yoshiki; Yokobori, Shin-ichi; Kimura, Mitsuo; Nemoto, Naoki; Mase, Tomoko; Miyazono, Ken-ichi; Tanokura, Masaru; Yamagishi, Akihiko

    2013-01-01

    Theoretical studies have focused on the environmental temperature of the universal common ancestor of life with conflicting conclusions. Here we provide experimental support for the existence of a thermophilic universal common ancestor. We present the thermal stabilities and catalytic efficiencies of nucleoside diphosphate kinases (NDK), designed using the information contained in predictive phylogenetic trees, that seem to represent the last common ancestors of Archaea and of Bacteria. These enzymes display extreme thermal stabilities, suggesting thermophilic ancestries for Archaea and Bacteria. The results are robust to the uncertainties associated with the sequence predictions and to the tree topologies used to infer the ancestral sequences. Moreover, mutagenesis experiments suggest that the universal ancestor also possessed a very thermostable NDK. Because, as we show, the stability of an NDK is directly related to the environmental temperature of its host organism, our results indicate that the last common ancestor of extant life was a thermophile that flourished at a very high temperature. PMID:23776221

  12. Ancestral reproductive structure in basal kelp Aureophycus aleuticus

    PubMed Central

    Kawai, Hiroshi; Hanyuda, Takeaki; Ridgway, L. Michelle; Holser, Karin

    2013-01-01

    Laminarialean species (so-called kelps) are the largest photosynthetic organisms in aquatic environments, constituting significant ecological components of coastal ecosystems. The largest kelps such as Macrocystis exhibit differentiation between stipe and blade, as well as buoyancy to maintain the distal portion at the water's surface for photosynthesis, while bearing reproductive structures only near the base on special blades (sporophylls). There is a considerable gap between basic kelps such as Chorda and derived kelps, and the evolution of kelp specialization remains unclear. Here we report novel reproductive adaptations in the recently discovered species Aureophycus aleuticus; unlike any known kelps, A. aleuticus forms zoidangia only on the expanded, disc-shaped holdfast. Molecular phylogeny suggests that A. aleuticus is most basal among derived kelps. Because Aureophycus lacks any of the elaborate anatomical structures found in other derived kelps, we suggest that it exhibits some of the most ancestral morphological features of kelps. PMID:23966101

  13. The evolution of MICOS: Ancestral and derived functions and interactions

    PubMed Central

    Muñoz-Gómez, Sergio A; Slamovits, Claudio H; Dacks, Joel B; Wideman, Jeremy G

    2015-01-01

    The MItochondrial Contact Site and Cristae Organizing System (MICOS) is required for the biogenesis and maintenance of mitochondrial cristae as well as the proper tethering of the mitochondrial inner and outer membranes. We recently demonstrated that the core components of MICOS, Mic10 and Mic60, are near-ubiquitous eukaryotic features inferred to have been present in the last eukaryote common ancestor. We also showed that Mic60 could be traced to α-proteobacteria, which suggests that mitochondrial cristae evolved from α-proteobacterial intracytoplasmic membranes. Here, we extend our evolutionary analysis to MICOS-interacting proteins (e.g., Sam50, Mia40, DNAJC11, DISC-1, QIL1, Aim24, and Cox17) and discuss the implications for both derived and ancestral functions of MICOS. PMID:27065250

  14. Female song is widespread and ancestral in songbirds.

    PubMed

    Odom, Karan J; Hall, Michelle L; Riebel, Katharina; Omland, Kevin E; Langmore, Naomi E

    2014-01-01

    Bird song has historically been considered an almost exclusively male trait, an observation fundamental to the formulation of Darwin's theory of sexual selection. Like other male ornaments, song is used by male songbirds to attract females and compete with rivals. Thus, bird song has become a textbook example of the power of sexual selection to lead to extreme neurological and behavioural sex differences. Here we present an extensive survey and ancestral state reconstruction of female song across songbirds showing that female song is present in 71% of surveyed species including 32 families, and that females sang in the common ancestor of modern songbirds. Our results reverse classical assumptions about the evolution of song and sex differences in birds. The challenge now is to identify whether sexual selection alone or broader processes, such as social or natural selection, best explain the evolution of elaborate traits in both sexes. PMID:24594930

  15. Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum[OPEN

    PubMed Central

    Hardigan, Michael A.; Crisovan, Emily; Hamilton, John P.; Laimbeer, Parker; Leisner, Courtney P.; Manrique-Carpintero, Norma C.; Newton, Linsey; Pham, Gina M.; Vaillancourt, Brieanne; Zeng, Zixian; Jiang, Jiming

    2016-01-01

    Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996

  16. Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence

    PubMed Central

    2012-01-01

    Background Along with the draft sequencing of the pig genome, which has been completed by an international consortium, collection of the nucleotide sequences of genes expressed in various tissues and determination of entire cDNA sequences are necessary for investigations of gene function. The sequences of expressed genes are also useful for genome annotation, which is important for isolating the genes responsible for particular traits. Results We performed a large-scale expressed sequence tag (EST) analysis in pigs by using 32 full-length-enriched cDNA libraries derived from 28 kinds of tissues and cells, including seven tissues (brain, cerebellum, colon, hypothalamus, inguinal lymph node, ovary, and spleen) derived from pigs that were cloned from a sow subjected to genome sequencing. We obtained more than 330,000 EST reads from the 5′-ends of the cDNA clones. Comparison with human and bovine gene catalogs revealed that the ESTs corresponded to at least 15,000 genes. cDNA clones representing contigs and singlets generated by assembly of the EST reads were subjected to full-length determination of inserts. We have finished sequencing 31,079 cDNA clones corresponding to more than 12,000 genes. Mapping of the sequences of these cDNA clones on the draft sequence of the pig genome has indicated that the clones are derived from about 15,000 independent loci on the pig genome. Conclusions ESTs and cDNA sequences derived from full-length-enriched libraries are valuable for annotation of the draft sequence of the pig genome. This information will also contribute to the exploration of promoter sequences on the genome and to molecular biology-based analyses in pigs. PMID:23150988

  17. Large scale full-length cDNA sequencing reveals a unique genomic landscape in a lepidopteran model insect, Bombyx mori.

    PubMed

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-09-01

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes. PMID:23821615

  18. Large Scale Full-Length cDNA Sequencing Reveals a Unique Genomic Landscape in a Lepidopteran Model Insect, Bombyx mori

    PubMed Central

    Suetsugu, Yoshitaka; Futahashi, Ryo; Kanamori, Hiroyuki; Kadono-Okuda, Keiko; Sasanuma, Shun-ichi; Narukawa, Junko; Ajimura, Masahiro; Jouraku, Akiya; Namiki, Nobukazu; Shimomura, Michihiko; Sezutsu, Hideki; Osanai-Futahashi, Mizuko; Suzuki, Masataka G; Daimon, Takaaki; Shinoda, Tetsuro; Taniai, Kiyoko; Asaoka, Kiyoshi; Niwa, Ryusuke; Kawaoka, Shinpei; Katsuma, Susumu; Tamura, Toshiki; Noda, Hiroaki; Kasahara, Masahiro; Sugano, Sumio; Suzuki, Yutaka; Fujiwara, Haruhiko; Kataoka, Hiroshi; Arunkumar, Kallare P.; Tomar, Archana; Nagaraju, Javaregowda; Goldsmith, Marian R.; Feng, Qili; Xia, Qingyou; Yamamoto, Kimiko; Shimada, Toru; Mita, Kazuei

    2013-01-01

    The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes. PMID:23821615

  19. Positive-selection and ligation-independent cloning vectors for large scale in planta expression for plant functional genomics.

    PubMed

    Oh, Sang-Keun; Kim, Saet-Byul; Yeom, Seon-In; Lee, Hyun-Ah; Choi, Doil

    2010-12-01

    Transient expression is an easy, rapid and powerful technique for producing proteins of interest in plants. Recombinational cloning is highly efficient but has disadvantages, including complicated, time consuming cloning procedures and expensive enzymes for large-scale gene cloning. To overcome these limitations, we developed new ligation-independent cloning (LIC) vectors derived from binary vectors including tobacco mosaic virus (pJL-TRBO), potato virus X (pGR106) and the pBI121 vector-based pMBP1. LIC vectors were modified to enable directional cloning of PCR products without restriction enzyme digestion or ligation reactions. In addition, the ccdB gene, which encodes a potent cell-killing protein, was introduced between the two LIC adapter sites in the pJL-LIC, pGR-LIC, and pMBP-LIC vectors for the efficient selection of recombinant clones. This new vector does not require restriction enzymes, alkaline phosphatase, or DNA ligase for cloning. To clone, the three LIC vectors are digested with SnaBI and treated with T4 DNA polymerase, which includes 3' to 5' exonuclease activity in the presence of only one dNTP (dGTP for the inserts and dCTP for the vector). To make recombinants, the vector plasmid and the insert PCR fragment were annealed at room temperature for 20 min prior to transformation into the host. Bacterial transformation was accomplished with 100% efficiency. To validate the new LIC vector systems, we were used to coexpressed the Phytophthora AVR and potato resistance (R) genes in N. benthamiana by infiltration of Agrobacterium. Coexpressed AVR and R genes in N. benthamiana induced the typical hypersensitive cell death resulting from in vivo interaction of the two proteins. These LIC vectors could be efficiently used for high-throughput cloning and laboratory-scale in planta expression. These vectors could provide a powerful tool for high-throughput transient expression assays for functional genomic studies in plants. PMID:21340673

  20. Divergence in Enzymatic Activities in the Soybean GST Supergene Family Provides New Insight into the Evolutionary Dynamics of Whole-Genome Duplicates

    PubMed Central

    Liu, Hai-Jing; Tang, Zhen-Xin; Han, Xue-Min; Yang, Zhi-Ling; Zhang, Fu-Min; Yang, Hai-Ling; Liu, Yan-Jing; Zeng, Qing-Yin

    2015-01-01

    Whole-genome duplication (WGD), or polyploidy, is a major force in plant genome evolution. A duplicate of all genes is present in the genome immediately following a WGD event. However, the evolutionary mechanisms responsible for the loss of, or retention and subsequent functional divergence of polyploidy-derived duplicates remain largely unknown. In this study we reconstructed the evolutionary history of the glutathione S-transferase (GST) gene family from the soybean genome, and identified 72 GST duplicated gene pairs formed by a recent Glycine-specific WGD event occurring approximately 13 Ma. We found that 72% of duplicated GST gene pairs experienced gene losses or pseudogenization, whereas 28% of GST gene pairs have been retained in the soybean genome. The GST pseudogenes were under relaxed selective constraints, whereas functional GSTs were subject to strong purifying selection. Plant GST genes play important roles in stress tolerance and detoxification metabolism. By examining the gene expression responses to abiotic stresses and enzymatic properties of the ancestral and current proteins, we found that polyploidy-derived GST duplicates show the divergence in enzymatic activities. Through site-directed mutagenesis of ancestral proteins, this study revealed that nonsynonymous substitutions of key amino acid sites play an important role in the divergence of enzymatic functions of polyploidy-derived GST duplicates. These findings provide new insights into the evolutionary and functional dynamics of polyploidy-derived duplicate genes. PMID:26219583

  1. Transgenerational actions of environmental compounds on reproductive disease and identification of epigenetic biomarkers of ancestral exposures.

    PubMed

    Manikkam, Mohan; Guerrero-Bosagna, Carlos; Tracey, Rebecca; Haque, Md M; Skinner, Michael K

    2012-01-01

    Environmental factors during fetal development can induce a permanent epigenetic change in the germ line (sperm) that then transmits epigenetic transgenerational inheritance of adult-onset disease in the absence of any subsequent exposure. The epigenetic transgenerational actions of various environmental compounds and relevant mixtures were investigated with the use of a pesticide mixture (permethrin and insect repellant DEET), a plastic mixture (bisphenol A and phthalates), dioxin (TCDD) and a hydrocarbon mixture (jet fuel, JP8). After transient exposure of F0 gestating female rats during the period of embryonic gonadal sex determination, the subsequent F1-F3 generations were obtained in the absence of any environmental exposure. The effects on the F1, F2 and F3 generations pubertal onset and gonadal function were assessed. The plastics, dioxin and jet fuel were found to promote early-onset female puberty transgenerationally (F3 generation). Spermatogenic cell apoptosis was affected transgenerationally. Ovarian primordial follicle pool size was significantly decreased with all treatments transgenerationally. Differential DNA methylation of the F3 generation sperm promoter epigenome was examined. Differential DNA methylation regions (DMR) were identified in the sperm of all exposure lineage males and found to be consistent within a specific exposure lineage, but different between the exposures. Several genomic features of the DMR, such as low density CpG content, were identified. Exposure-specific epigenetic biomarkers were identified that may allow for the assessment of ancestral environmental exposures associated with adult onset disease. PMID:22389676

  2. Generation of large numbers of SNP in cattle by coupling reduced genome representation with high throughput sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Whole genome sequencing projects have produced draft sequences for species from diverse evolutionary clades for comparative evolutionary studies. Generally, these projects have not simultaneously created extensive single nucleotide polymorphism (SNP) resources for use in genetics studies within the...

  3. Phylogenetic analysis of a newfound bat-borne hantavirus supports a laurasiatherian host association for ancestral mammalian hantaviruses.

    PubMed

    Witkowski, Peter T; Drexler, Jan F; Kallies, René; Ličková, Martina; Bokorová, Silvia; Mananga, Gael D; Szemes, Tomáš; Leroy, Eric M; Krüger, Detlev H; Drosten, Christian; Klempa, Boris

    2016-07-01

    Until recently, hantaviruses (family Bunyaviridae) were believed to originate from rodent reservoirs. However, genetically distinct hantaviruses were lately found in shrews and moles, as well as in bats from Africa and Asia. Bats (order Chiroptera) are considered important reservoir hosts for emerging human pathogens. Here, we report on the identification of a novel hantavirus, provisionally named Makokou virus (MAKV), in Noack's Roundleaf Bat (Hipposideros ruber) in Gabon, Central Africa. Phylogenetic analysis of the genomic l-segment showed that MAKV was the most closely related to other bat-borne hantaviruses and shared a most recent common ancestor with the Asian hantaviruses Xuan Son and Laibin. Breakdown of the virus load in a bat animal showed that MAKV resembles rodent-borne hantaviruses in its organ distribution in that it predominantly occurred in the spleen and kidney; this provides a first insight into the infection pattern of bat-borne hantaviruses. Ancestral state reconstruction based on a tree of l gene sequences of all relevant hantavirus lineages was combined with phylogenetic fossil host hypothesis testing, leading to a statistically significant rejection of the mammalian superorder Euarchontoglires (including rodents) but not the superorder Laurasiatheria (including shrews, moles, and bats) as potential hosts of ancestral hantaviruses at most basal tree nodes. Our data supports the emerging concept of bats as previously overlooked hantavirus reservoir hosts. PMID:27051047

  4. Adaptive Memory: Ancestral Priorities and the Mnemonic Value of Survival Processing

    ERIC Educational Resources Information Center

    Nairne, James S.; Pandeirada, Josefa N. S.

    2010-01-01

    Evolutionary psychologists often propose that humans carry around "stone-age" brains, along with a toolkit of cognitive adaptations designed originally to solve hunter-gatherer problems. This perspective predicts that optimal cognitive performance might sometimes be induced by ancestrally-based problems, those present in ancestral environments,…

  5. Ovarian Cancers Harboring Inactivating Mutations in CDK12 Display a Distinct Genomic Instability Pattern Characterized by Large Tandem Duplications.

    PubMed

    Popova, Tatiana; Manié, Elodie; Boeva, Valentina; Battistella, Aude; Goundiam, Oumou; Smith, Nicholas K; Mueller, Christopher R; Raynal, Virginie; Mariani, Odette; Sastre-Garau, Xavier; Stern, Marc-Henri

    2016-04-01

    CDK12 is a recurrently mutated gene in serous ovarian carcinoma, whose downregulation is associated with impaired expression of DNA damage repair genes and subsequent hypersensitivity to DNA-damaging agents and PARP1/2 inhibitors. In this study, we investigated the genomic landscape associated with CDK12 inactivation in patients with serous ovarian carcinoma. We show that CDK12 loss was consistently associated with a particular genomic instability pattern characterized by hundreds of tandem duplications of up to 10 megabases (Mb) in size. Tandem duplications were characterized by a bimodal (∼0.3 and ∼3 Mb) size distribution and overlapping microhomology at the breakpoints. This genomic instability, denoted as the CDK12 TD-plus phenotype, is remarkably distinct from other alteration patterns described in breast and ovarian cancers. The CDK12 TD-plus phenotype was associated with a greater than 10% gain in genomic content and occurred at a 3% to 4% rate in The Cancer Genome Atlas-derived and in-house cohorts of patients with serous ovarian carcinoma. Moreover, CDK12-inactivating mutations together with the TD-plus phenotype were also observed in prostate cancers. Our finding provides new insight toward deciphering the function of CDK12 in genome maintenance and oncogenesis. Cancer Res; 76(7); 1882-91. ©2016 AACR. PMID:26787835

  6. Genome physical mapping from large-insert clones by fingerprint analysis with capillary electrophoresis: a robust physical map of Penicillium chrysogenum.

    PubMed

    Xu, Zhanyou; van den Berg, Marco A; Scheuring, Chantel; Covaleda, Lina; Lu, Hong; Santos, Felipe A; Uhm, Taesik; Lee, Mi-Kyung; Wu, Chengcang; Liu, Steve; Zhang, Hong-Bin

    2005-01-01

    Physical mapping with large-insert clones is becoming an active area of genomics research, and capillary electrophoresis (CE) promises to revolutionize the physical mapping technology. Here, we demonstrate the utility of the CE technology for genome physical mapping with large-insert clones by constructing a robust, binary bacterial artificial chromosome (BIBAC)-based physical map of Penicillium chrysogenum. We fingerprinted 23.1x coverage BIBAC clones with five restriction enzymes and the SNaPshot kit containing four fluorescent-ddNTPs using the CE technology, and explored various strategies to construct quality physical maps. It was shown that the fingerprints labeled with one or two colors, resulting in 40-70 bands per clone, were assembled into much better quality maps than those labeled with three or four colors. The selection of fingerprinting enzymes was crucial to quality map construction. From the dataset labeled with ddTTP-dROX, we assembled a physical map for P.chrysogenum, with 2-3 contigs per chromosome and anchored the map to its chromosomes. This map represents the first physical map constructed using the CE technology, thus providing not only a platform for genomic studies of the penicillin-producing species, but also strategies for efficient use of the CE technology for genome physical mapping of plants, animals and microbes. PMID:15767275

  7. Genes Suggest Ancestral Colour Polymorphisms Are Shared across Morphologically Cryptic Species in Arctic Bumblebees

    PubMed Central

    Williams, Paul H.; Byvaltsev, Alexandr M.; Cederberg, Björn; Berezin, Mikhail V.; Ødegaard, Frode; Rasmussen, Claus; Richardson, Leif L.; Huang, Jiaxing; Sheffield, Cory S.; Williams, Suzanne T.

    2015-01-01

    Our grasp of biodiversity is fine-tuned through the process of revisionary taxonomy. If species do exist in nature and can be discovered with available techniques, then we expect these revisions to converge on broadly shared interpretations of species. But for the primarily arctic bumblebees of the subgenus Alpinobombus of the genus Bombus, revisions by some of the most experienced specialists are unusual for bumblebees in that they have all reached different conclusions on the number of species present. Recent revisions based on skeletal morphology have concluded that there are from four to six species, while variation in colour pattern of the hair raised questions as to whether at least seven species might be present. Even more species are supported if we accept the recent move away from viewing species as morphotypes to viewing them instead as evolutionarily independent lineages (EILs) using data from genes. EILs are recognised here in practice from the gene coalescents that provide direct evidence for their evolutionary independence. We show from fitting both general mixed Yule/coalescent (GMYC) models and Poisson-tree-process (PTP) models to data for the mitochondrial COI gene that there is support for nine species in the subgenus Alpinobombus. Examination of the more slowly evolving nuclear PEPCK gene shows further support for a previously unrecognised taxon as a new species in northwestern North America. The three pairs of the most morphologically similar sister species are separated allopatrically and prevented from interbreeding by oceans. We also find that most of the species show multiple shared colour patterns, giving the appearance of mimicry among parts of the different species. However, reconstructing ancestral colour-pattern states shows that speciation is likely to have cut across widespread ancestral polymorphisms, without or largely without convergence. In the particular case of Alpinobombus, morphological, colour-pattern, and genetic groups show

  8. Multiple lineages of ancient CR1 retroposons shaped the early genome evolution of amniotes.

    PubMed

    Suh, Alexander; Churakov, Gennady; Ramakodi, Meganathan P; Platt, Roy N; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Smit, Arian F; Vliet, Kent A; Hoffmann, Federico G; Brosius, Jürgen; Green, Richard E; Braun, Edward L; Ray, David A; Schmitz, Jürgen

    2015-01-01

    Chicken repeat 1 (CR1) retroposons are long interspersed elements (LINEs) that are ubiquitous within amniote genomes and constitute the most abundant family of transposed elements in birds, crocodilians, turtles, and snakes. They are also present in mammalian genomes, where they reside as numerous relics of ancient retroposition events. Yet, despite their relevance for understanding amniote genome evolution, the diversity and evolution of CR1 elements has never been studied on an amniote-wide level. We reconstruct the temporal and quantitative activity of CR1 subfamilies via presence/absence analyses across crocodilian phylogeny and comparative analyses of 12 crocodilian genomes, revealing relative genomic stasis of retroposition during genome evolution of extant Crocodylia. Our large-scale phylogenetic analysis of amniote CR1 subfamilies suggests the presence of at least seven ancient CR1 lineages in the amniote ancestor; and amniote-wide analyses of CR1 successions and quantities reveal differential retention (presence of ancient relics or recent activity) of these CR1 lineages across amniote genome evolution. Interestingly, birds and lepidosaurs retained the fewest ancient CR1 lineages among amniotes and also exhibit smaller genome sizes. Our study is the first to analyze CR1 evolution in a genome-wide and amniote-wide context and the data strongly suggest that the ancestral amniote genome contained myriad CR1 elements from multiple ancient lineages, and remnants of these are still detectable in the relatively stable genomes of crocodilians and turtles. Early mammalian genome evolution was thus characterized by a drastic shift from CR1 prevalence to dominance and hyperactivity of L2 LINEs in monotremes and L1 LINEs in therians. PMID:25503085

  9. Multiple Lineages of Ancient CR1 Retroposons Shaped the Early Genome Evolution of Amniotes

    PubMed Central

    Suh, Alexander; Churakov, Gennady; Ramakodi, Meganathan P.; Platt, Roy N.; Jurka, Jerzy; Kojima, Kenji K.; Caballero, Juan; Smit, Arian F.; Vliet, Kent A.; Hoffmann, Federico G.; Brosius, Jürgen; Green, Richard E.; Braun, Edward L.; Ray, David A.; Schmitz, Jürgen

    2015-01-01

    Chicken repeat 1 (CR1) retroposons are long interspersed elements (LINEs) that are ubiquitous within amniote genomes and constitute the most abundant family of transposed elements in birds, crocodilians, turtles, and snakes. They are also present in mammalian genomes, where they reside as numerous relics of ancient retroposition events. Yet, despite their relevance for understanding amniote genome evolution, the diversity and evolution of CR1 elements has never been studied on an amniote-wide level. We reconstruct the temporal and quantitative activity of CR1 subfamilies via presence/absence analyses across crocodilian phylogeny and comparative analyses of 12 crocodilian genomes, revealing relative genomic stasis of retroposition during genome evolution of extant Crocodylia. Our large-scale phylogenetic analysis of amniote CR1 subfamilies suggests the presence of at least seven ancient CR1 lineages in the amniote ancestor; and amniote-wide analyses of CR1 successions and quantities reveal differential retention (presence of ancient relics or recent activity) of these CR1 lineages across amniote genome evolution. Interestingly, birds and lepidosaurs retained the fewest ancient CR1 lineages among amniotes and also exhibit smaller genome sizes. Our study is the first to analyze CR1 evolution in a genome-wide and amniote-wide context and the data strongly suggest that the ancestral amniote genome contained myriad CR1 elements from multiple ancient lineages, and remnants of these are still detectable in the relatively stable genomes of crocodilians and turtles. Early mammalian genome evolution was thus characterized by a drastic shift from CR1 prevalence to dominance and hyperactivity of L2 LINEs in monotremes and L1 LINEs in therians. PMID:25503085

  10. Evolutionary site-number changes of ribosomal DNA loci during speciation: complex scenarios of ancestral and more recent polyploid events

    PubMed Central

    Rosato, Marcela; Moreno-Saiz, Juan C.; Galián, José A.; Rosselló, Josep A.

    2015-01-01

    Several genome duplications have been identified in the evolution of seed plants, providing unique systems for studying karyological processes promoting diversification and speciation. Knowledge about the number of ribosomal DNA (rDNA) loci, together with their chromosomal distribution and structure, provides clues about organismal and molecular evolution at various phylogenetic levels. In this work, we aim to elucidate the evolutionary dynamics of karyological and rDNA site-number variation in all known taxa of subtribe Vellinae, showing a complex scenario of ancestral and more recent polyploid events. Specifically, we aim to infer the ancestral chromosome numbers and patterns of chromosome number variation, assess patterns of variation of both 45S and 5S rDNA families, trends in site-number change of rDNA loci within homoploid and polyploid series, and reconstruct the evolutionary history of rDNA site number using a phylogenetic hypothesis as a framework. The best-fitting model of chromosome number evolution with a high likelihood score suggests that the Vellinae core showing x = 17 chromosomes arose by duplication events from a recent x = 8 ancestor. Our survey suggests more complex patterns of polyploid evolution than previously noted for Vellinae. High polyploidization events (6x, 8x) arose independently in the basal clade Vella castrilensis–V. lucentina, where extant diploid species are unknown. Reconstruction of ancestral rDNA states in Vellinae supports the inference that the ancestral number of loci in the subtribe was two for each multigene family, suggesting that an overall tendency towards a net loss of 5S rDNA loci occurred during the splitting of Vellinae ancestors from the remaining Brassiceae lineages. A contrasting pattern for rDNA site change in both paleopolyploid and neopolyploid species was linked to diversification of Vellinae lineages. This suggests dynamic and independent changes in rDNA site number during speciation processes and a

  11. Streptococcus thermophilus Biofilm Formation: A Remnant Trait of Ancestral Commensal Life?

    PubMed Central

    Gautier, Céline; Renault, Pierre; Briandet, Romain; Guédon, Eric

    2015-01-01

    Microorganisms have a long history of use in food production and preservation. Their adaptation to food environments has profoundly modified their features, mainly through genomic flux. Streptococcus thermophilus, one of the most frequent starter culture organisms consumed daily by humans emerged recently from a commensal ancestor. As such, it is a useful model for genomic studies of bacterial domestication processes. Many streptococcal species form biofilms, a key feature of the major lifestyle of these bacteria in nature. However, few descriptions of S. thermophilus biofilms have been reported. An analysis of the ability of a representative collection of natural isolates to form biofilms revealed that S. thermophilus was a poor biofilm producer and that this characteristic was associated with an inability to attach firmly to surfaces. The identification of three biofilm-associated genes in the strain producing the most biofilms shed light on the reasons for the rarity of this trait in this species. These genes encode proteins involved in crucial stages of biofilm formation and are heterogeneously distributed between strains. One of the biofilm genes appears to have been acquired by horizontal transfer. The other two are located in loci presenting features of reductive evolution, and are absent from most of the strains analyzed. Their orthologs in commensal bacteria are involved in adhesion to host cells, suggesting that they are remnants of ancestral functions. The biofilm phenotype appears to be a commensal trait that has been lost during the genetic domestication of S. thermophilus, consistent with its adaptation to the milk environment and the selection of starter strains for dairy fermentations. PMID:26035177

  12. Reconstructing an ancestral genotype of two hexachlorocyclohexane-degrading Sphingobium species using metagenomic sequence data

    PubMed Central

    Sangwan, Naseer; Verma, Helianthous; Kumar, Roshan; Negi, Vivek; Lax, Simon; Khurana, Paramjit; Khurana, Jitendra P; Gilbert, Jack A; Lal, Rup

    2014-01-01

    Over the last 60 years, the use of hexachlorocyclohexane (HCH) as a pesticide has resulted in the production of >4 million tons of HCH waste, which has been dumped in open sinks across the globe. Here, the combination of the genomes of two genetic subspecies (Sphingobium japonicum UT26 and Sphingobium indicum B90A; isolated from two discrete geographical locations, Japan and India, respectively) capable of degrading HCH, with metagenomic data from an HCH dumpsite (∼450 mg HCH per g soil), enabled the reconstruction and validation of the last-common ancestor (LCA) genotype. Mapping the LCA genotype (3128 genes) to the subspecies genomes demonstrated that >20% of the genes in each subspecies were absent in the LCA. This includes two enzymes from the ‘upper' HCH degradation pathway, suggesting that the ancestor was unable to degrade HCH isomers, but descendants acquired lin genes by transposon-mediated lateral gene transfer. In addition, anthranilate and homogentisate degradation traits were found to be strain (selectively retained only by UT26) and environment (absent in the LCA and subspecies, but prevalent in the metagenome) specific, respectively. One draft secondary chromosome, two near complete plasmids and eight complete lin transposons were assembled from the metagenomic DNA. Collectively, these results reinforce the elastic nature of the genus Sphingobium, and describe the evolutionary acquisition mechanism of a xenobiotic degradation phenotype in response to environmental pollution. This also demonstrates for the first time the use of metagenomic data in ancestral genotype reconstruction, highlighting its potential to provide significant insight into the development of such phenotypes. PMID:24030592

  13. Convergence of ion channel genome content in early animal evolution.

    PubMed

    Liebeskind, Benjamin J; Hillis, David M; Zakon, Harold H

    2015-02-24

    Multicellularity has evolved multiple times, but animals are the only multicellular lineage with nervous systems. This fact implies that the origin of nervous systems was an unlikely event, yet recent comparisons among extant taxa suggest that animal nervous systems may have evolved multiple times independently. Here, we use ancestral gene content reconstruction to track the timing of gene family expansions for the major families of ion-channel proteins that drive nervous system function. We find that animals with nervous systems have broadly similar complements of ion-channel types but that these complements likely evolved independently. We also find that ion-channel gene family evolution has included large loss events, two of which were immediately followed by rounds of duplication. Ctenophores, cnidarians, and bilaterians underwent independent bouts of gene expansion in channel families involved in synaptic transmission and action potential shaping. We suggest that expansions of these family types may represent a genomic signature of expanding nervous system complexity. Ancestral nodes in which nervous systems are currently hypothesized to have originated did not experience large expansions, making it difficult to distinguish among competing hypotheses of nervous system origins and suggesting that the origin of nerves was not attended by an immediate burst of complexity. Rather, the evolution of nervous system complexity appears to resemble a slow fuse in stem animals followed by many independent bouts of gene gain and loss. PMID:25675537

  14. Convergence of ion channel genome content in early animal evolution

    PubMed Central

    Liebeskind, Benjamin J.; Hillis, David M.; Zakon, Harold H.

    2015-01-01

    Multicellularity has evolved multiple times, but animals are the only multicellular lineage with nervous systems. This fact implies that the origin of nervous systems was an unlikely event, yet recent comparisons among extant taxa suggest that animal nervous systems may have evolved multiple times independently. Here, we use ancestral gene content reconstruction to track the timing of gene family expansions for the major families of ion-channel proteins that drive nervous system function. We find that animals with nervous systems have broadly similar complements of ion-channel types but that these complements likely evolved independently. We also find that ion-channel gene family evolution has included large loss events, two of which were immediately followed by rounds of duplication. Ctenophores, cnidarians, and bilaterians underwent independent bouts of gene expansion in channel families involved in synaptic transmission and action potential shaping. We suggest that expansions of these family types may represent a genomic signature of expanding nervous system complexity. Ancestral nodes in which nervous systems are currently hypothesized to have originated did not experience large expansions, making it difficult to distinguish among competing hypotheses of nervous system origins and suggesting that the origin of nerves was not attended by an immediate burst of complexity. Rather, the evolution of nervous system complexity appears to resemble a slow fuse in stem animals followed by many independent bouts of gene gain and loss. PMID:25675537

  15. Extensive and biased intergenomic nonreciprocal DNA exchanges shaped a nascent polyploid genome, Gossypium (cotton)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cultivated cotton is composed of a tetraploid genome derived from two ancestral genomes that are related but divergent from each other. The “A” genome is derived from a cotton species that is used for low quality spinnable-fiber production in low production areas and has an African origin. The “D”...

  16. The evolution and functional impact of human deletion variants shared with archaic hominin genomes.

    PubMed

    Lin, Yen-Lung; Pavlidis, Pavlos; Karakoc, Emre; Ajay, Jerry; Gokcumen, Omer

    2015-04-01

    Allele sharing between modern and archaic hominin genomes has been variously interpreted to have originated from ancestral genetic structure or through non-African introgression from archaic hominins. However, evolution of polymorphic human deletions that are shared with archaic hominin genomes has yet to be studied. We identified 427 polymorphic human deletions that are shared with archaic hominin genomes, approximately 87% of which originated before the Human-Neandertal divergence (ancient) and only approximately 9% of which have been introgressed from Neandertals (introgressed). Recurrence, incomplete lineage sorting between human and chimp lineages, and hominid-specific insertions constitute the remaining approximately 4% of allele sharing between humans and archaic hominins. We observed that ancient deletions correspond to more than 13% of all common (>5% allele frequency) deletion variation among modern humans. Our analyses indicate that the genomic landscapes of both ancient and introgressed deletion variants were primarily shaped by purifying selection, eliminating large and exonic variants. We found 17 exonic deletions that are shared with archaic hominin genomes, including those leading to three fusion transcripts. The affected genes are involved in metabolism of external and internal compounds, growth and sperm formation, as well as susceptibility to psoriasis and Crohn's disease. Our analyses suggest that these "exonic" deletion variants have evolved through different adaptive forces, including balancing and population-specific positive selection. Our findings reveal that genomic structural variants that are shared between humans and archaic hominin genomes are common among modern humans and can influence biomedically and evolutionarily important phenotypes. PMID:25556237

  17. Large differences in the genome organization of different plant Trypanosomatid parasites (Phytomonas spp.) reveal wide evolutionary divergences between taxa.

    PubMed

    Marín, C; Dollet, M; Pagès, M; Bastien, P

    2009-03-01

    All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera. PMID:19111630

  18. Genome sequence reveals that Pseudomonas fluorescens F113 possesses a large and diverse array of systems for rhizosphere function and host interaction

    PubMed Central

    2013-01-01

    Background Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) isolated from the sugar-beet rhizosphere. This bacterium has been extensively studied as a model strain for genetic regulation of secondary metabolite production in P. fluorescens, as a candidate biocontrol agent against phytopathogens, and as a heterologous host for expression of genes with biotechnological application. The F113 genome sequence and annotation has been recently reported. Results Comparative analysis of 50 genome sequences of strains belonging to the P. fluorescens group has revealed the existence of five distinct subgroups. F113 belongs to subgroup I, which is mostly composed of strains classified as P. brassicacearum. The core genome of these five strains is highly conserved and represents approximately 76% of the protein-coding genes in any given genome. Despite this strong conservation, F113 also contains a large number of unique protein-coding genes that encode traits potentially involved in the rhizocompetence of this strain. These features include protein coding genes required for denitrification, diterpenoids catabolism, motility and chemotaxis, protein secretion and production of antimicrobial compounds and insect toxins. Conclusions The genome of P. fluorescens F113 is composed of numerous protein-coding genes, not usually found together in previously sequenced genomes, which are potentially decisive during the colonisation of the rhizosphere and/or interaction with other soil organisms. This includes genes encoding proteins involved in the production of a second flagellar apparatus, the use of abietic acid as a growth substrate, the complete denitrification pathway, the possible production of a macrolide antibiotic and the assembly of multiple protein secretion systems. PMID:23350846

  19. SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints

    PubMed Central

    2014-01-01

    Background Genomic disorders are caused by copy number changes that may exhibit recurrent breakpoints processed by nonallelic homologous recombination. However, region-specific disease-associated copy number changes have also been observed which exhibit non-recurrent breakpoints. The mechanisms underlying these non-recurrent copy number changes have not yet been fully elucidated. Results We analyze large NF1 deletions with non-recurrent breakpoints as a model to investigate the full spectrum of causative mechanisms, and observe that they are mediated by various DNA double strand break repair mechanisms, as well as aberrant replication. Further, two of the 17 NF1 deletions with non-recurrent breakpoints, identified in unrelated patients, occur in association with the concomitant insertion of SINE/variable number of tandem repeats/Alu (SVA) retrotransposons at the deletion breakpoints. The respective breakpoints are refractory to analysis by standard breakpoint-spanning PCRs and are only identified by means of optimized PCR protocols designed to amplify across GC-rich sequences. The SVA elements are integrated within SUZ12P intron 8 in both patients, and were mediated by target-primed reverse transcription of SVA mRNA intermediates derived from retrotranspositionally active source elements. Both SVA insertions occurred during early postzygotic development and are uniquely associated with large deletions of 1 Mb and 867 kb, respectively, at the insertion sites. Conclusions Since active SVA elements are abundant in the human genome and the retrotranspositional activity of many SVA source elements is high, SVA insertion-associated large genomic deletions encompassing many hundreds of kilobases could constitute a novel and as yet under-appreciated mechanism underlying large-scale copy number changes in the human genome. PMID:24958239

  20. Orientia tsutsugamushi, agent of scrub typhus, displays a single metapopulation with maintenance of ancestral haplotypes throughout continental South East Asia.

    PubMed

    Wongprompitak, Patimaporn; Duong, Veasna; Anukool, Wichittra; Sreyrath, Lay; Mai, Trinh Thi Xuan; Gavotte, Laurent; Moulia, Catherine; Cornillot, Emmanuel; Ekpo, Pattama; Suputtamongkol, Yupin; Buchy, Philippe; Frutos, Roger

    2015-04-01

    Orientia tsutsugamushi is the causative agent of scrub typhus, a major cause of febrile illness in rural area of Asia-Pacific region. A multi-locus sequence typing (MLST) analysis was performed on strains isolated from human patients from 3 countries in Southeast Asia: Cambodia, Vietnam and Thailand. The phylogeny of the 56-kDa protein encoding gene was analyzed on the same strains and showed a structured topology with genetically distinct clusters. MLST analysis did not lead to the same conclusion. DNA polymorphism and phylogeny of individual gene loci indicated a significant level of recombination and genetic diversity whereas the ST distribution indicated the presence of isolated patches. No correlation was found with the geographic origin. This work suggests that weak divergence in core genome and ancestral haplotypes are maintained by permanent recombination in mites while the 56-kDa protein gene is diverging in higher speed due to selection by the mammalian immune system. PMID:25577986

  1. 'Solo' large terminal repeats (LTR) of an endogenous retrovirus-like gene family (VL30) in the mouse genome.

    PubMed Central

    Rotman, G; Itin, A; Keshet, E

    1984-01-01

    VL30 genetic elements constitute a murine multicopy gene family that is retrovirus-like, despite the lack of sequence homology with any known retrovirus. Over one hundred copies of VL30 units are dispersed throughout the mouse genome. We report here that the mouse genome also contains 'solo' VL30 long terminal repeats (LTRs). These are structures which contain the LTR detached from the rest of the VL30 sequences. The isolation of solo LTRs from a mouse embryonic gene library with the aid of sub-genomic VL30 probes is described. Direct DNA sequencing established that the solo LTR unit is grossly similar to a standard VL30 LTR and that the LTR is flanked by a 4-base pair duplication. The analogy to the occurrence of solitary LTR units of transposable elements is discussed. Images PMID:6324110

  2. Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits

    PubMed Central

    2011-01-01

    Background Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. Results A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. Conclusions The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few

  3. Genomics of Volvocine Algae

    PubMed Central

    Umen, James G.; Olson, Bradley J.S.C.

    2015-01-01

    Volvocine algae are a group of chlorophytes that together comprise a unique model for evolutionary and developmental biology. The species Chlamydomonas reinhardtii and Volvox carteri represent extremes in morphological diversity within the Volvocine clade. Chlamydomonas is unicellular and reflects the ancestral state of the group, while Volvox is multicellular and has evolved numerous innovations including germ-soma differentiation, sexual dimorphism, and complex morphogenetic patterning. The Chlamydomonas genome sequence has shed light on several areas of eukaryotic cell biology, metabolism and evolution, while the Volvox genome sequence has enabled a comparison with Chlamydomonas that reveals some of the underlying changes that enabled its transition to multicellularity, but also underscores the subtlety of this transition. Many of the tools and resources are in place to further develop Volvocine algae as a model for evolutionary genomics. PMID:25883411

  4. Allatotropin: An Ancestral Myotropic Neuropeptide Involved in Feeding

    PubMed Central

    Alzugaray, María Eugenia; Adami, Mariana Laura; Diambra, Luis Anibal; Hernandez-Martinez, Salvador; Damborenea, Cristina; Noriega, Fernando Gabriel; Ronderos, Jorge Rafael

    2013-01-01

    Background Cell-cell interactions are a basic principle for the organization of tissues and organs allowing them to perform integrated functions and to organize themselves spatially and temporally. Peptidic molecules secreted by neurons and epithelial cells play fundamental roles in cell-cell interactions, acting as local neuromodulators, neurohormones, as well as endocrine and paracrine messengers. Allatotropin (AT) is a neuropeptide originally described as a regulator of Juvenile Hormone synthesis, which plays multiple neural, endocrine and myoactive roles in insects and other organisms. Methods A combination of immunohistochemistry using AT-antibodies and AT-Qdot nanocrystal conjugates was used to identify immunoreactive nerve cells containing the peptide and epithelial-muscular cells targeted by AT in Hydra plagiodesmica. Physiological assays using AT and AT- antibodies revealed that while AT stimulated the extrusion of the hypostome in a dose-response fashion in starved hydroids, the activity of hypostome in hydroids challenged with food was blocked by treatments with different doses of AT-antibodies. Conclusions AT antibodies immunolabeled nerve cells in the stalk, pedal disc, tentacles and hypostome. AT-Qdot conjugates recognized epithelial-muscular cell in the same tissues, suggesting the existence of anatomical and functional relationships between these two cell populations. Physiological assays indicated that the AT-like peptide is facilitating food ingestion. Significance Immunochemical, physiological and bioinformatics evidence advocates that AT is an ancestral neuropeptide involved in myoregulatory activities associated with meal ingestion and digestion. PMID:24143240

  5. Deep phylogeny, ancestral groups and the four ages of life.

    PubMed

    Cavalier-Smith, Thomas

    2010-01-12

    Organismal phylogeny depends on cell division, stasis, mutational divergence, cell mergers (by sex or symbiogenesis), lateral gene transfer and death. The tree of life is a useful metaphor for organismal genealogical history provided we recognize that branches sometimes fuse. Hennigian cladistics emphasizes only lineage splitting, ignoring most other major phylogenetic processes. Though methodologically useful it has been conceptually confusing and harmed taxonomy, especially in mistakenly opposing ancestral (paraphyletic) taxa. The history of life involved about 10 really major innovations in cell structure. In membrane topology, there were five successive kinds of cell: (i) negibacteria, with two bounding membranes, (ii) unibacteria, with one bounding and no internal membranes, (iii) eukaryotes with endomembranes and mitochondria, (iv) plants with chloroplasts and (v) finally, chromists with plastids inside the rough endoplasmic reticulum. Membrane chemistry divides negibacteria into the more advanced Glycobacteria (e.g. Cyanobacteria and Proteobacteria) with outer membrane lipolysaccharide and primitive Eobacteria without lipopolysaccharide (deserving intenser study). It also divides unibacteria into posibacteria, ancestors of eukaryotes, and archaebacteria-the sisters (not ancestors) of eukaryotes and the youngest bacterial phylum. Anaerobic eobacteria, oxygenic cyanobacteria, desiccation-resistant posibacteria and finally neomura (eukaryotes plus archaebacteria) successively transformed Earth. Accidents and organizational constraints are as important as adaptiveness in body plan evolution. PMID:20008390

  6. Female rule in lemurs is ancestral and hormonally mediated.

    PubMed

    Petty, Joseph M A; Drea, Christine M

    2015-01-01

    Female social dominance (FSD) over males is unusual in mammals, yet characterizes most Malagasy lemurs, which represent almost 30% of all primates. Despite its prevalence in this suborder, both the evolutionary trajectory and proximate mechanism of FSD remain unclear. Potentially associated with FSD is a suite of behavioural, physiological and morphological traits in females that implicates (as a putative mechanism) 'masculinization' via androgen exposure; however, relative to conspecific males, female lemurs curiously show little evidence of raised androgen concentrations. By observing mixed-sex pairs of related Eulemur species, we identified two key study groups--one comprised of species expressing FSD and increased female scent marking, the other comprised of species (from a recently evolved clade) showing equal status between the sexes and the more traditional pattern of sexually dimorphic behaviour. Comparing females from these two groups, we show that FSD is associated with more masculine androgen profiles. Based on the widespread prevalence of male-like features in female lemurs and a current phylogeny, we suggest that relaxation of hormonally mediated FSD emerged only recently and that female masculinization may be the ancestral lemur condition, an idea that could revolutionize our understanding of the ancient socioecology and evolution of primate social systems. PMID:25950904

  7. Female rule in lemurs is ancestral and hormonally mediated

    PubMed Central

    Petty, Joseph M. A.; Drea, Christine M.

    2015-01-01

    Female social dominance (FSD) over males is unusual in mammals, yet characterizes most Malagasy lemurs, which represent almost 30% of all primates. Despite its prevalence in this suborder, both the evolutionary trajectory and proximate mechanism of FSD remain unclear. Potentially associated with FSD is a suite of behavioural, physiological and morphological traits in females that implicates (as a putative mechanism) ‘masculinization’ via androgen exposure; however, relative to conspecific males, female lemurs curiously show little evidence of raised androgen concentrations. By observing mixed‐sex pairs of related Eulemur species, we identified two key study groups ‐‐ one comprised of species expressing FSD and increased female scent marking, the other comprised of species (from a recently evolved clade) showing equal status between the sexes and the more traditional pattern of sexually dimorphic behaviour. Comparing females from these two groups, we show that FSD is associated with more masculine androgen profiles. Based on the widespread prevalence of male‐like features in female lemurs and a current phylogeny, we suggest that relaxation of hormonally mediated FSD emerged only recently and that female masculinization may be the ancestral lemur condition, an idea that could revolutionize our understanding of the ancient socioecology and evolution of primate social systems. PMID:25950904

  8. Ancestral genetic complexity of arachidonic acid metabolism in Metazoa.

    PubMed

    Yuan, Dongjuan; Zou, Qiuqiong; Yu, Ting; Song, Cuikai; Huang, Shengfeng; Chen, Shangwu; Ren, Zhenghua; Xu, Anlong

    2014-09-01

    Eicosanoids play an important role in inducing complex and crucial physiological processes in animals. Eicosanoid biosynthesis in animals is widely reported; however, eicosanoid production in invertebrate tissue is remarkably different to vertebrates and in certain respects remains elusive. We, for the first time, compared the orthologs involved in arachidonic acid (AA) metabolism in 14 species of invertebrates and 3 species of vertebrates. Based on parsimony, a complex AA-metabolic system may have existed in the common ancestor of the Metazoa, and then expanded and diversified through invertebrate lineages. A primary vertebrate-like AA-metabolic system via cyclooxygenase (COX), lipoxygenase (LOX), and cytochrome P450 (CYP) pathways was further identified in the basal chordate, amphioxus. The expression profiling of AA-metabolic enzymes and lipidomic analysis of eicosanoid production in the tissues of amphioxus supported our supposition. Thus, we proposed that the ancestral complexity of AA-metabolic network diversified with the different lineages of invertebrates, adapting with the diversity of body plans and ecological opportunity, and arriving at the vertebrate-like pattern in the basal chordate, amphioxus. PMID:24801744

  9. Deep phylogeny, ancestral groups and the four ages of life

    PubMed Central

    Cavalier-Smith, Thomas

    2010-01-01

    Organismal phylogeny depends on cell division, stasis, mutational divergence, cell mergers (by sex or symbiogenesis), lateral gene transfer and death. The tree of life is a useful metaphor for organismal genealogical history provided we recognize that branches sometimes fuse. Hennigian cladistics emphasizes only lineage splitting, ignoring most other major phylogenetic processes. Though methodologically useful it has been conceptually confusing and harmed taxonomy, especially in mistakenly opposing ancestral (paraphyletic) taxa. The history of life involved about 10 really major innovations in cell structure. In membrane topology, there were five successive kinds of cell: (i) negibacteria, with two bounding membranes, (ii) unibacteria, with one bounding and no internal membranes, (iii) eukaryotes with endomembranes and mitochondria, (iv) plants with chloroplasts and (v) finally, chromists with plastids inside the rough endoplasmic reticulum. Membrane chemistry divides negibacteria into the more advanced Glycobacteria (e.g. Cyanobacteria and Proteobacteria) with outer membrane lipolysaccharide and primitive Eobacteria without lipopolysaccharide (deserving intenser study). It also divides unibacteria into posibacteria, ancestors of eukaryotes, and archaebacteria—the sisters (not ancestors) of eukaryotes and the youngest bacterial phylum. Anaerobic eobacteria, oxygenic cyanobacteria, desiccation-resistant posibacteria and finally neomura (eukaryotes plus archaebacteria) successively transformed Earth. Accidents and organizational constraints are as important as adaptiveness in body plan evolution. PMID:20008390

  10. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor

    PubMed Central

    Luo, Ming-Cheng; Gu, Yong Q.; You, Frank M.; Deal, Karin R.; Ma, Yaqin; Hu, Yuqin; Huo, Naxin; Wang, Yi; Wang, Jirui; Chen, Shiyong; Jorgensen, Chad M.; Zhang, Yong; McGuire, Patrick E.; Pasternak, Shiran; Stein, Joshua C.; Ware, Doreen; Kramer, Melissa; McCombie, W. Richard; Kianian, Shahryar F.; Martis, Mihaela M.; Mayer, Klaus F. X.; Sehgal, Sunish K.; Li, Wanlong; Gill, Bikram S.; Bevan, Michael W.; Šimková, Hana; Doležel, Jaroslav; Weining, Song; Lazo, Gerard R.; Anderson, Olin D.; Dvorak, Jan

    2013-01-01

    The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii, the wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions. PMID:23610408

  11. An atypical human induced pluripotent stem cell line with a complex, stable, and balanced genomic rearrangement including a large de novo 1q uniparental disomy.

    PubMed

    Steichen, Clara; Maluenda, Jérôme; Tosca, Lucie; Luce, Eléanor; Pineau, Dominique; Dianat, Noushin; Hannoun, Zara; Tachdjian, Gérard; Melki, Judith; Dubart-Kupperschmitt, Anne

    2015-03-01

    Human induced pluripotent stem cells (hiPSCs) hold great promise for cell therapy through their use as vital tools for regenerative and personalized medicine. However, the genomic integrity of hiPSCs still raises some concern and is one of the barriers limiting their use in clinical applications. Numerous articles have reported the occurrence of aneuploidies, copy number variations, or single point mutations in hiPSCs, and nonintegrative reprogramming strategies have been developed to minimize the impact of the reprogramming process on the hiPSC genome. Here, we report the characterization of an hiPSC line generated by daily transfections of modified messenger RNAs, displaying several genomic abnormalities. Karyotype analysis showed a complex genomic rearrangement, which remained stable during long-term culture. Fluorescent in situ hybridization analyses were performed on the hiPSC line showing that this karyotype is balanced. Interestingly, single-nucleotide polymorphism analysis revealed the presence of a large 1q region of uniparental disomy (UPD), demonstrating for the first time that UPD can occur in a noncompensatory context during nonintegrative reprogramming of normal fibroblasts. PMID:25650439

  12. Development of genome-wide informative simple sequence repeat markers for large-scale genotyping applications in chickpea and development of web resource

    PubMed Central

    Parida, Swarup K.; Verma, Mohit; Yadav, Santosh K.; Ambawat, Supriya; Das, Shouvik; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Development of informative polymorphic simple sequence repeat (SSR) markers at a genome-wide scale is essential for efficient large-scale genotyping applications. We identified genome-wide 1835 SSRs showing polymorphism between desi and kabuli chickpea. A total of 1470 polymorphic SSR markers from diverse coding and non-coding regions of the chickpea genome were developed. These physically mapped SSR markers exhibited robust amplification efficiency (73.9%) and high intra- and inter-specific polymorphic potential (63.5%), thereby suggesting their immense use in various genomics-assisted breeding applications. The SSR markers particularly derived from intergenic and intronic sequences revealed high polymorphic potential. Using the mapped SSR markers, a wider functional molecular diversity (16–94%, mean: 68%), and parentage- and cultivar-specific admixed domestication pattern and phylogenetic relationships in a structured population of desi and kabuli chickpea genotypes was evident. The intra-specific polymorphism (47.6%) and functional molecular diversity (65%) potential of polymorphic SSR markers developed in our study is much higher than that of previous documentations. Finally, we have developed a user-friendly web resource, Chickpea Microsatellite Database (CMsDB; http://www.nipgr.res.in/CMsDB.html), which provides public access to the data and results reported in this study. The developed informative SSR markers can serve as a resource for various genotyping applications, including genetic enhancement studies in chickpea. PMID:26347762

  13. Small but Powerful, the Primary Endosymbiont of Moss Bugs, Candidatus Evansia muelleri, Holds a Reduced Genome with Large Biosynthetic Capabilities

    PubMed Central

    Santos-Garcia, Diego; Latorre, Amparo; Moya, Andrés; Gibbs, George; Hartung, Viktor; Dettner, Konrad; Kuechler, Stefan Martin; Silva, Francisco J.

    2014-01-01

    Moss bugs (Coleorrhyncha: Peloridiidae) are members of the order Hemiptera, and like many hemipterans, they have symbiotic associations with intracellular bacteria to fulfill nutritional requirements resulting from their unbalanced diet. The primary endosymbiont of the moss bugs, Candidatus Evansia muelleri, is phylogenetically related to Candidatus Carsonella ruddii and Candidatus Portiera aleyrodidarum, primary endosymbionts of psyllids and whiteflies, respectively. In this work, we report the genome of Candidatus Evansia muelleri Xc1 from Xenophyes cascus, which is the only obligate endosymbiont present in the association. This endosymbiont possesses an extremely reduced genome similar to Carsonella and Portiera. It has crossed the borderline to be considered as an autonomous cell, requiring the support of the insect host for some housekeeping cell functions. Interestingly, in spite of its small genome size, Evansia maintains enriched amino acid (complete or partial pathways for ten essential and six nonessential amino acids) and sulfur metabolisms, probably related to the poor diet of the insect, based on bryophytes, which contains very low levels of nitrogenous and sulfur compounds. Several facts, including the congruence of host (moss bugs, whiteflies, and psyllids) and endosymbiont phylogenies and the retention of the same ribosomal RNA operon during genome reduction in Evansia, Portiera, and Carsonella, suggest the existence of an ancient endosymbiotic Halomonadaceae clade associated with Hemiptera. Three possible scenarios for the origin of these three primary endosymbiont genera are proposed and discussed. PMID:25115011

  14. Evolutionary assembly patterns of prokaryotic genomes.

    PubMed

    Press, Maximilian O; Queitsch, Christine; Borenstein, Elhanan

    2016-06-01

    Evolutionary innovation must occur in the context of some genomic background, which limits available evolutionary paths. For example, protein evolution by sequence substitution is constrained by epistasis between residues. In prokaryotes, evolutionary innovation frequently happens by macrogenomic events such as horizontal gene transfer (HGT). Previous work has suggested that HGT can be influenced by ancestral genomic content, yet the extent of such gene-level constraints has not yet been systematically characterized. Here, we evaluated the evolutionary impact of such constraints in prokaryotes, using probabilistic ancestral reconstructions from 634 extant prokaryotic genomes and a novel framework for detecting evolutionary constraints on HGT events. We identified 8228 directional dependencies between genes and demonstrated that many such dependencies reflect known functional relationships, including for example, evolutionary dependencies of the photosynthetic enzyme RuBisCO. Modeling all dependencies as a network, we adapted an approach from graph theory to establish chronological precedence in the acquisition of different genomic functions. Specifically, we demonstrated that specific functions tend to be gained sequentially, suggesting that evolution in prokaryotes is governed by functional assembly patterns. Finally, we showed that these dependencies are universal rather than clade-specific and are often sufficient for predicting whether or not a given ancestral genome will acquire specific genes. Combined, our results indicate that evolutionary innovation via HGT is profoundly constrained by epistasis and historical contingency, similar to the evolution of proteins and phenotypic characters, and suggest that the emergence of specific metabolic and pathological phenotypes in prokaryotes can be predictable from current genomes. PMID:27197212

  15. Serotype IV Streptococcus agalactiae ST-452 has arisen from large genomic recombination events between CC23 and the hypervirulent CC17 lineages

    PubMed Central

    Campisi, Edmondo; Rinaudo, C. Daniela; Donati, Claudio; Barucco, Mara; Torricelli, Giulia; Edwards, Morven S.; Baker, Carol J.; Margarit, Imma; Rosini, Roberto

    2016-01-01

    Streptococcus agalactiae (Group B Streptococcus, GBS) causes life-threatening infections in newborns and adults with chronic medical conditions. Serotype IV strains are emerging both among carriers and as cause of invasive disease and recent studies revealed two main Sequence Types (STs), ST-452 and ST-459 assigned to Clonal Complexes CC23 and CC1, respectively. Whole genome sequencing of 70 type IV GBS and subsequent phylogenetic analysis elucidated the localization of type IV isolates in a SNP-based phylogenetic tree and suggested that ST-452 could have originated through genetic recombination. SNPs density analysis of the core genome confirmed that the founder strain of this lineage originated from a single large horizontal gene transfer event between CC23 and the hypervirulent CC17. Indeed, ST-452 genomes are composed by two parts that are nearly identical to corresponding regions in ST-24 (CC23) and ST-291 (CC17). Chromosome mapping of the major GBS virulence factors showed that ST-452 strains have an intermediate yet unique profile among CC23 and CC17 strains. We described unreported large recombination events, involving the cps IV operon and resulting in the expansion of serotype IV to CC23. This work sheds further light on the evolution of GBS providing new insights on the recent emergence of serotype IV. PMID:27411639

  16. A large-scale introgression of genomic components of Brassica rapa into B. napus by the bridge of hexaploid derived from hybridization between B. napus and B. oleracea.

    PubMed

    Li, Qinfei; Mei, Jiaqin; Zhang, Yongjing; Li, Jiana; Ge, Xianhong; Li, Zaiyun; Qian, Wei

    2013-08-01

    Brassica rapa (AA) has been used to widen the genetic basis of B. napus (AACC), which is a new but important oilseed crop worldwide. In the present study, we have proposed a strategy to develop new type B. napus carrying genomic components of B. rapa by crossing B. rapa with hexaploid (AACCCC) derived from B. napus and B. oleracea (CC). The hexaploid exhibited large flowers and high frequency of normal chromosome segregation, resulting in good seed set (average of 4.48 and 12.53 seeds per pod by self and open pollination, respectively) and high pollen fertility (average of 87.05 %). It was easy to develop new type B. napus by crossing the hexaploid with 142 lines of B. rapa from three ecotype groups, with the average crossability of 9.24 seeds per pod. The genetic variation of new type B. napus was diverse from that of current B. napus, especially in the A subgenome, revealed by genome-specific simple sequence repeat markers. Our data suggest that the strategy proposed here is a large-scale and highly efficient method to introgress genomic components of B. rapa into B. napus. PMID:23699961

  17. Serotype IV Streptococcus agalactiae ST-452 has arisen from large genomic recombination events between CC23 and the hypervirulent CC17 lineages.

    PubMed

    Campisi, Edmondo; Rinaudo, C Daniela; Donati, Claudio; Barucco, Mara; Torricelli, Giulia; Edwards, Morven S; Baker, Carol J; Margarit, Imma; Rosini, Roberto

    2016-01-01

    Streptococcus agalactiae (Group B Streptococcus, GBS) causes life-threatening infections in newborns and adults with chronic medical conditions. Serotype IV strains are emerging both among carriers and as cause of invasive disease and recent studies revealed two main Sequence Types (STs), ST-452 and ST-459 assigned to Clonal Complexes CC23 and CC1, respectively. Whole genome sequencing of 70 type IV GBS and subsequent phylogenetic analysis elucidated the localization of type IV isolates in a SNP-based phylogenetic tree and suggested that ST-452 could have originated through genetic recombination. SNPs density analysis of the core genome confirmed that the founder strain of this lineage originated from a single large horizontal gene transfer event between CC23 and the hypervirulent CC17. Indeed, ST-452 genomes are composed by two parts that are nearly identical to corresponding regions in ST-24 (CC23) and ST-291 (CC17). Chromosome mapping of the major GBS virulence factors showed that ST-452 strains have an intermediate yet unique profile among CC23 and CC17 strains. We described unreported large recombination events, involving the cps IV operon and resulting in the expansion of serotype IV to CC23. This work sheds further light on the evolution of GBS providing new insights on the recent emergence of serotype IV. PMID:27411639

  18. Ancestral state reconstruction of ontogeny supports a bilaterian affinity for Dickinsonia.

    PubMed

    Gold, David A; Runnegar, Bruce; Gehling, James G; Jacobs, David K

    2015-01-01

    Despite numerous attempts, classification of the Precambrian fossil Dickinsonia has eluded scientific consensus. This is largely because Dickinsonia and its relatives are structurally simple, lacking morphological synapomorphies to clarify their relationship to modern taxa. However, there is increasing precedence for using ontogeny to constrain enigmatic fossils, and growth of the type species Dickinsonia costata is well understood. This study formalizes the connection between ontogeny in Dickinsonia-which grows by the addition of metameric units onto one end of its primary axis-with terminal addition, defined as growth and patterning from a posterior, subtermial growth zone. We employ ancestral state reconstruction and stochastic character mapping to conclude that terminal addition is a synapomorphy of bilaterian animals. Thus, terminal addition allies Dickinsonia with the bilaterians, providing evidence that large stem- or crown-group bilaterians made up a significant proportion of the Precambrian biota. This study also illustrates the potential for combining developmental and phylogenetic data in constraining the placement of ancient problematic fossil taxa on the evolutionary tree. PMID:26492825

  19. Data sharing and intellectual property in a genomic epidemiology network: policies for large-scale research collaboration.

    PubMed Central

    Chokshi, Dave A.; Parker, Michael; Kwiatkowski, Dominic P.

    2006-01-01

    Genomic epidemiology is a field of research that seeks to improve the prevention and management of common diseases through an understanding of their molecular origins. It involves studying thousands of individuals, often from different populations, with exacting techniques. The scale and complexity of such research has required the formation of research consortia. Members of these consortia need to agree on policies for managing shared resources and handling genetic data. Here we consider data-sharing and intellectual property policies for an international research consortium working on the genomic epidemiology of malaria. We outline specific guidelines governing how samples and data are transferred among its members; how results are released into the public domain; when to seek protection for intellectual property; and how intellectual property should be managed. We outline some pragmatic solutions founded on the basic principles of promoting innovation and access. PMID:16710548

  20. The mitochondrial genomes of the early land plants Treubia lacunosa and Anomodon rugelii: dynamic and conservative evolution.

    PubMed

    Liu, Yang; Xue, Jia-Yu; Wang, Bin; Li, Libo; Qiu, Yin-Long

    2011-01-01

    Early land plant mitochondrial genomes captured important changes of mitochondrial genome evolution when plants colonized land. The chondromes of seed plants show several derived characteristics, e.g., large genome size variation, rapid intra-genomic rearrangement, abundant introns, and highly variable levels of RNA editing. On the other hand, the chondromes of charophytic algae are still largely ancestral in these aspects, resembling those of early eukaryotes. When the transition happened has been a long-standing question in studies of mitochondrial genome evolution. Here we report complete mitochondrial genome sequences from an early-diverging liverwort, Treubia lacunosa, and a late-evolving moss, Anomodon rugelii. The two genomes, 151,983 and 104,239 base pairs in size respectively, contain standard sets of protein coding genes for respiration and protein synthesis, as well as nearly full sets of rRNA and tRNA genes found in the chondromes of the liverworts Marchantia polymorpha and Pleurozia purpurea and the moss Physcomitrella patens. The gene orders of these two chondromes are identical to those of the other liverworts and moss. Their intron contents, with all cis-spliced group I or group II introns, are also similar to those in the previously sequenced liverwort and moss chondromes. These five chondromes plus the two from the hornworts Phaeoceros laevis and Megaceros aenigmaticus for the first time allowed comprehensive comparative analyses of structure and organization of mitochondrial genomes both within and across the three major lineages of bryophytes. These analyses led to the conclusion that the mitochondrial genome experienced dynamic evolution in genome size, gene content, intron acquisition, gene order, and RNA editing during the origins of land plants and their major clades. However, evolution of this organellar genome has remained rather conservative since the origin and initial radiation of early land plants, except within vascular plants. PMID

  1. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    SciTech Connect

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric; Abernathy, Jason; Waldbieser, Geoff; Lindquist, Erika; Richardson, Paul; Lucas, Susan; Wang, Mei; Li, Ping; Thimmapuram, Jyothi; Liu, Lei; Vullaganti, Deepika; Kucuktas, Huseyin; Murdock, Christopher; Small, Brian C; Wilson, Melanie; Liu, Hong; Jiang, Yanliang; Lee, Yoona; Chen, Fei; Lu, Jianguo; Wang, Wenqi; Xu, Peng; Somridhivej, Benjaporn; Baoprasertkul, Puttharat; Quilang, Jonas; Sha, Zhenxia; Bao, Baolong; Wang, Yaping; Wang, Qun; Takano, Tomokazu; Nandi, Samiran; Liu, Shikai; Wong, Lilian; Kaltenboeck, Ludmilla; Quiniou, Sylvie; Bengten, Eva; Miller, Norman; Trant, John; Rokhsar, Daniel; Liu, Zhanjiang

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.

  2. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    PubMed Central

    2010-01-01

    Background Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies. PMID:20096101

  3. Genome Wide Analysis of Drug-Induced Torsades de Pointes: Lack of Common Variants with Large Effect Sizes

    PubMed Central

    Behr, Elijah R.; Ritchie, Marylyn D.; Tanaka, Toshihiro; Kääb, Stefan; Crawford, Dana C.; Nicoletti, Paola; Floratos, Aris; Sinner, Moritz F.; Kannankeril, Prince J.; Wilde, Arthur A. M.; Bezzina, Connie R.; Schulze-Bahr, Eric; Zumhagen, Sven; Guicheney, Pascale; Bishopric, Nanette H.; Marshall, Vanessa; Shakir, Saad; Dalageorgou, Chrysoula; Bevan, Steve; Jamshidi, Yalda; Bastiaenen, Rachel; Myerburg, Robert J.; Schott, Jean-Jacques; Camm, A. John; Steinbeck, Gerhard; Norris, Kris; Altman, Russ B.; Tatonetti, Nicholas P.; Jeffery, Steve; Kubo, Michiaki; Nakamura, Yusuke; Shen, Yufeng; George, Alfred L.; Roden, Dan M.

    2013-01-01

    Marked prolongation of the QT interval on the electrocardiogram associated with the polymorphic ventricular tachycardia Torsades de Pointes is a serious adverse event during treatment with antiarrhythmic drugs and other culprit medications, and is a common cause for drug relabeling and withdrawal. Although clinical risk factors have been identified, the syndrome remains unpredictable in an individual patient. Here we used genome-wide association analysis to search for common predisposing genetic variants. Cases of drug-induced Torsades de Pointes (diTdP), treatment tolerant controls, and general population controls were ascertained across multiple sites using common definitions, and genotyped on the Illumina 610k or 1M-Duo BeadChips. Principal Components Analysis was used to select 216 Northwestern European diTdP cases and 771 ancestry-matched controls, including treatment-tolerant and general population subjects. With these sample sizes, there is 80% power to detect a variant at genome-wide significance with minor allele frequency of 10% and conferring an odds ratio of ≥2.7. Tests of association were carried out for each single nucleotide polymorphism (SNP) by logistic regression adjusting for gender and population structure. No SNP reached genome wide-significance; the variant with the lowest P value was rs2276314, a non-synonymous coding variant in C18orf21 (p  =  3×10−7, odds ratio = 2, 95% confidence intervals: 1.5–2.6). The haplotype formed by rs2276314 and a second SNP, rs767531, was significantly more frequent in controls than cases (p  =  3×10−9). Expanding the number of controls and a gene-based analysis did not yield significant associations. This study argues that common genomic variants do not contribute importantly to risk for drug-induced Torsades de Pointes across multiple drugs. PMID:24223155

  4. Thermotolerant Yeast Strains Adapted by Laboratory Evolution Show Trade-Off at Ancestral Temperatures and Preadaptation to Other Stresses

    PubMed Central

    Nielsen, Jens

    2015-01-01

    ABSTRACT A major challenge for the production of ethanol from biomass-derived feedstocks is to develop yeasts that can sustain growth under the variety of inhibitory conditions present in the production process, e.g., high osmolality, high ethanol titers, and/or elevated temperatures (≥40°C). Using adaptive laboratory evolution, we previously isolated seven Saccharomyces cerevisiae strains with improved growth at 40°C. Here, we show that genetic adaptations to high temperature caused a growth trade-off at ancestral temperatures, reduced cellular functions, and improved tolerance of other stresses. Thermotolerant yeast strains showed horizontal displacement of their thermal reaction norms to higher temperatures. Hence, their optimal and maximum growth temperatures increased by about 3°C, whereas they showed a growth trade-off at temperatures below 34°C. Computational analysis of the physical properties of proteins showed that the lethal temperature for yeast is around 49°C, as a large fraction of the yeast proteins denature above this temperature. Our analysis also indicated that the number of functions involved in controlling the growth rate decreased in the thermotolerant strains compared with the number in the ancestral strain. The latter is an advantageous attribute for acquiring thermotolerance and correlates with the reduction of yeast functions associated with loss of respiration capacity. This trait caused glycerol overproduction that was associated with the growth trade-off at ancestral temperatures. In combination with altered sterol composition of cellular membranes, glycerol overproduction was also associated with yeast osmotolerance and improved tolerance of high concentrations of glucose and ethanol. Our study shows that thermal adaptation of yeast is suitable for improving yeast resistance to inhibitory conditions found in industrial ethanol production processes. PMID:26199325

  5. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies

    PubMed Central

    2012-01-01

    Background Rosaceae include numerous economically important and morphologically diverse species. Comparative mapping between the member species in Rosaceae have indicated some level of synteny. Recently the whole genome of three crop species, peach, apple and strawberry, which belong to different genera of the Rosaceae family, have been sequenced, allowing in-depth comparison of these genomes. Results Our analysis using the whole genome sequences of peach, apple and strawberry identified 1399 orthologous regions between the three genomes, with a mean length of around 100 kb. Each peach chromosome showed major orthology mostly to one strawberry chromosome, but to more than two apple chromosomes, suggesting that the apple genome went through more chromosomal fissions in addition to the whole genome duplication after the divergence of the three genera. However, the distribution of contiguous ancestral regions, identified using the multiple genome rearrangements and ancestors (MGRA) algorithm, suggested that the Fragaria genome went through a greater number of small scale rearrangements compared to the other genomes since they diverged from a common ancestor. Using the contiguous ancestral regions, we reconstructed a hypothetical ancestral genome for the Rosaceae 7 composed of nine chromosomes and propose the evolutionary steps from the ancestral genome to the extant Fragaria, Prunus and Malus genomes. Conclusion Our analysis shows that different modes of evolution may have played major roles in different subfamilies of Rosaceae. The hypothetical ancestral genome of Rosaceae and the evolutionary steps that lead to three different lineages of Rosaceae will facilitate our understanding of plant genome evolution as well as have a practical impact on knowledge transfer among member species of Rosaceae. PMID:22475018

  6. Community structure and metabolism through reconstruction of microbial genomes from the environment

    SciTech Connect

    Tyson, Gene W.; Chapman, Jarrod; Hugenholtz, Phillip; Allen, Eric E.; Rachna, Ram J.; Richardson, Paul M.; Solovyev, Victor V.; Rubin, Edward M.; Rokhsar, Daniel S.; Banfield, Jillian F.

    2004-01-01

    Microbial communities are vital in the functioning of all ecosystems; however, most microorganisms are uncultivated, and their roles in natural systems are unclear. Here, using random shotgun sequencing of DNA from a natural acidophilic biofilm, we report reconstruction of near-complete genomes of Leptospirillum group II and Ferroplasma type II, and partial recovery of three other genomes. This was possible because the biofilm was dominated by a small number of species populations and the frequency of genomic rearrangements and gene insertions or deletions was relatively low. Because each sequence read came from a different individual, we could determine that single-nucleotide polymorphisms are the predominant form of heterogeneity at the strain level. The Leptospirillum group II genome had remarkably few nucleotide polymorphisms, despite the existence of low-abundance variants. The Ferroplasma type II genome seems to be a composite from three ancestral strains that have undergone homologous recombination to form a large population of mosaic genomes. Analysis of the gene complement for each organism revealed the pathways for carbon and nitrogen fixation and energy generation, and provided insights into survival strategies in an extreme environment.

  7. WARACS: Wrappers to Automate the Reconstruction of Ancestral Character States1

    PubMed Central

    Gruenstaeudl, Michael

    2016-01-01

    Premise of the study: Reconstructions of ancestral character states are among the most widely used analyses for evaluating the morphological, cytological, or ecological evolution of an organismic lineage. The software application Mesquite remains the most popular application for such reconstructions among plant scientists, even though its support for automating complex analyses is limited. A software tool is needed that automates the reconstruction and visualization of ancestral character states with Mesquite and similar applications. Methods and Results: A set of command line–based Python scripts was developed that (a) communicates standardized input to and output from the software applications Mesquite, BayesTraits, and TreeGraph2; (b) automates the process of ancestral character state reconstruction; and (c) facilitates the visualization of reconstruction results. Conclusions: WARACS provides a simple tool that streamlines the reconstruction and visualization of ancestral character states over a wide array of parameters, including tree distribution, character state, and optimality criterion. PMID:26949580

  8. An Ancestral Role for CONSTITUTIVE TRIPLE RESPONSE1 Proteins in Both Ethylene and Abscisic Acid Signaling.

    PubMed

    Yasumura, Yuki; Pierik, Ronald; Kelly, Steven; Sakuta, Masaaki; Voesenek, Laurentius A C J; Harberd, Nicholas P

    2015-09-01

    Land plants have evolved adaptive regulatory mechanisms enabling the survival of environmental stresses associated with terrestrial life. Here, we focus on the evolution of the regulatory CONSTITUTIVE TRIPLE RESPONSE1 (CTR1) component of the ethylene signaling pathway that modulates stress-related changes in plant growth and development. First, we compare CTR1-like proteins from a bryophyte, Physcomitrella patens (representative of early divergent land plants), with those of more recently diverged lycophyte and angiosperm species (including Arabidopsis [Arabidopsis thaliana]) and identify a monophyletic CTR1 family. The fully sequenced P. patens genome encodes only a single member of this family (PpCTR1L). Next, we compare the functions of PpCTR1L with that of related angiosperm proteins. We show that, like angiosperm CTR1 proteins (e.g. AtCTR1 of Arabidopsis), PpCTR1L modulates downstream ethylene signaling via direct interaction with ethylene receptors. These functions, therefore, likely predate the divergence of the bryophytes from the land-plant lineage. However, we also show that PpCTR1L unexpectedly has dual functions and additionally modulates abscisic acid (ABA) signaling. In contrast, while AtCTR1 lacks detectable ABA signaling functions, Arabidopsis has during evolution acquired another homolog that is functionally distinct from AtCTR1. In conclusion, the roles of CTR1-related proteins appear to have functionally diversified during land-plant evolution, and angiosperm CTR1-related proteins appear to have lost an ancestral ABA signaling function. Our study provides new insights into how molecular events such as gene duplication and functional differentiation may have contributed to the adaptive evolution of regulatory mechanisms in plants. PMID:26243614

  9. Distinguishing noise from signal in patterns of genomic divergence in a highly polymorphic avian radiation.

    PubMed

    Campagna, Leonardo; Gronau, Ilan; Silveira, Luís Fábio; Siepel, Adam; Lovette, Irby J

    2015-08-01

    Recently diverged taxa provide the opportunity to search for the genetic basis of the phenotypes that distinguish them. Genomic scans aim to identify loci that are diverged with respect to an otherwise weakly differentiated genetic background. These loci are candidates for being past targets of selection because they behave differently from the rest of the genome that has either not yet differentiated or that may cross species barriers through introgressive hybridization. Here we use a reduced-representation genomic approach to explore divergence among six species of southern capuchino seedeaters, a group of recently radiated sympatric passerine birds in the genus Sporophila. For the first time in these taxa, we discovered a small proportion of markers that appeared differentiated among species. However, when assessing the significance of these signatures of divergence, we found that similar patterns can also be recovered from random grouping of individuals representing different species. A detailed demographic inference indicates that genetic differences among Sporophila species could be the consequence of neutral processes, which include a very large ancestral effective population size that accentuates the effects of incomplete lineage sorting. As these neutral phenomena can generate genomic scan patterns that mimic those of markers involved in speciation and phenotypic differentiation, they highlight the need for caution when ascertaining and interpreting differentiated markers between species, especially when large numbers of markers are surveyed. Our study provides new insights into the demography of the southern capuchino radiation and proposes controls to distinguish signal from noise in similar genomic scans. PMID:26175196

  10. Global MLST of Salmonella Typhi Revisited in Post-genomic Era: Genetic Conservation, Population Structure, and Comparative Genomics of Rare Sequence Types

    PubMed Central

    Yap, Kien-Pong; Ho, Wing S.; Gan, Han M.; Chai, Lay C.; Thong, Kwai L.

    2016-01-01

    Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus sequence typing (MLST) is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2) co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC, and tviD that may explain the variations that differentiate between seemingly successful (widespread) and unsuccessful (poor dissemination) S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure. PMID:26973639

  11. The Survival Effect in Memory: Does It Hold into Old Age and Non-Ancestral Scenarios?

    PubMed Central

    Yang, Lixia; Lau, Karen P. L.; Truong, Linda

    2014-01-01

    The survival effect in memory refers to the memory enhancement for materials encoded in reference to a survival scenario compared to those encoded in reference to a control scenario or with other encoding strategies [1]. The current study examined whether this effect is well maintained in old age by testing young (ages 18–29) and older adults (ages 65–87) on the survival effect in memory for words encoded in ancestral and/or non-ancestral modern survival scenarios relative to a non-survival control scenario. A pilot study was conducted to select the best matched comparison scenarios based on potential confounding variables, such as valence and arousal. Experiment 1 assessed the survival effect with a well-matched negative control scenario in both young and older adults. The results showed an age-equivalent survival effect across an ancestral and a non-ancestral modern survival scenario. Experiment 2 replicated the survival effect in both age groups with a positive control scenario. Taken together, the data suggest a robust survival effect that is well preserved in old age across ancestral and non-ancestral survival scenarios. PMID:24788755

  12. Ancestral polymorphism at the major histocompatibility complex (MHCIIß) in the Nesospiza bunting species complex and its sister species (Rowettia goughensis)

    PubMed Central

    2012-01-01

    Background The major histocompatibility complex (MHC) is an important component of the vertebrate immune system and is frequently used to characterise adaptive variation in wild populations due to its co-evolution with pathogens. Passerine birds have an exceptionally diverse MHC with multiple gene copies and large numbers of alleles compared to other avian taxa. The Nesospiza bunting species complex (two species on Nightingale Island; one species with three sub-species on Inaccessible Island) represents a rapid adaptive radiation at a small, isolated archipelago, and is thus an excellent model for the study of adaptation and speciation. In this first study of MHC in Nesospiza buntings, we aim to characterize MHCIIß variation, determine the strength of selection acting at this gene region and assess the level of shared polymorphism between the Nesospiza species complex and its putative sister taxon, Rowettia goughensis, from Gough Island. Results In total, 23 unique alleles were found in 14 Nesospiza and 2 R. goughensis individuals encoding at least four presumably functional loci and two pseudogenes. There was no evidence of ongoing selection on the peptide binding region (PBR). Of the 23 alleles, 15 were found on both the islands inhabited by Nesospiza species, and seven in both Nesospiza and Rowettia; indications of shared, ancestral polymorphism. A gene tree of Nesospiza MHCIIß alleles with several other passerine birds shows three highly supported Nesospiza-specific groups. All R. goughensis alleles were shared with Nesospiza, and these alleles were found in all three Nesospiza sequence groups in the gene tree, suggesting that most of the observed variation predates their phylogenetic split. Conclusions Lack of evidence of selection on the PBR, together with shared polymorphism across the gene tree, suggests that population variation of MHCIIß among Nesospiza and Rowettia is due to ancestral polymorphism rather than local selective forces. Weak or no

  13. Integrons in Xanthomonas: A source of species genome diversity

    PubMed Central

    Gillings, Michael R.; Holley, Marita P.; Stokes, H. W.; Holmes, Andrew J.

    2005-01-01

    Integrons are best known for assembling antibiotic resistance genes in clinical bacteria. They capture genes by using integrase-mediated site-specific recombination of mobile gene cassettes. Integrons also occur in the chromosomes of many bacteria, notably β- and γ-Proteobacteria. In a survey of Xanthomonas, integrons were found in all 32 strains representing 12 pathovars of two species. Their chromosomal location was downstream from the acid dehydratase gene, ilvD, suggesting that an integron was present at this site in the ancestral xanthomonad. There was considerable sequence and structural diversity among the extant integrons. The majority of integrase genes were predicted to be inactivated by frameshifts, stop codons, or large deletions, suggesting that the associated gene cassettes can no longer be mobilized. In support, groups of strains with the same deletions or stop codons/frameshifts in their integrase gene usually contained identical arrays of gene cassettes. In general, strains within individual pathovars had identical cassettes, and these exhibited no similarity to cassettes detected in other pathovars. The variety and characteristics of contemporary gene cassettes suggests that the ancestral integron had access to a diverse pool of these mobile elements, and that their genes originated outside the Xanthomonas genome. Subsequent inactivation of the integrase gene in particular lineages has largely fixed the gene cassette arrays in particular pathovars during their differentiation and specialization into ecological niches. The acquisition of diverse gene cassettes by different lineages within Xanthomonas has contributed to the species-genome diversity of the genus. The role of gene cassettes in survival on plant surfaces is currently unknown. PMID:15755815

  14. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes

    PubMed Central

    Pombert, Jean-François; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA) sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae), in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR) featuring an inverted rRNA operon and a small single-copy (SSC) region containing 14 genes normally found in the large single-copy (LSC) region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of Oltmannsiellopsis cp

  15. Systematic CpT (ApG) depletion and CpG excess are unique genomic signatures of large DNA viruses infecting invertebrates.

    PubMed

    Upadhyay, Mohita; Sharma, Neha; Vivekanandan, Perumal

    2014-01-01

    Differences in the relative abundance of dinucleotides, if any may provide important clues on host-driven evolution of viruses. We studied dinucleotide frequencies of large DNA viruses infecting vertebrates (n = 105; viruses infecting mammals = 99; viruses infecting aves = 6; viruses infecting reptiles = 1) and invertebrates (n = 88; viruses infecting insects = 84; viruses infecting crustaceans = 4). We have identified systematic depletion of CpT(ApG) dinucleotides and over-representation of CpG dinucleotides as the unique genomic signature of large DNA viruses infecting invertebrates. Detailed investigation of this unique genomic signature suggests the existence of invertebrate host-induced pressures specifically targeting CpT(ApG) and CpG dinucleotides. The depletion of CpT dinucleotides among large DNA viruses infecting invertebrates is at least in part, explained by non-canonical DNA methylation by the infected host. Our findings highlight the role of invertebrate host-related factors in shaping virus evolution and they also provide the necessary framework for future studies on evolution, epigenetics and molecular biology of viruses infecting this group of hosts. PMID:25369195

  16. Systematic CpT (ApG) Depletion and CpG Excess Are Unique Genomic Signatures of Large DNA Viruses Infecting Invertebrates

    PubMed Central

    Upadhyay, Mohita; Sharma, Neha; Vivekanandan, Perumal

    2014-01-01

    Differences in the relative abundance of dinucleotides, if any may provide important clues on host-driven evolution of viruses. We studied dinucleotide frequencies of large DNA viruses infecting vertebrates (n = 105; viruses infecting mammals = 99; viruses infecting aves = 6; viruses infecting reptiles = 1) and invertebrates (n = 88; viruses infecting insects = 84; viruses infecting crustaceans = 4). We have identified systematic depletion of CpT(ApG) dinucleotides and over-representation of CpG dinucleotides as the unique genomic signature of large DNA viruses infecting invertebrates. Detailed investigation of this unique genomic signature suggests the existence of invertebrate host-induced pressures specifically targeting CpT(ApG) and CpG dinucleotides. The depletion of CpT dinucleotides among large DNA viruses infecting invertebrates is at least in part, explained by non-canonical DNA methylation by the infected host. Our findings highlight the role of invertebrate host-related factors in shaping virus evolution and they also provide the necessary framework for future studies on evolution, epigenetics and molecular biology of viruses infecting this group of hosts. PMID:25369195

  17. Large-scale exploration of ge