Science.gov

Sample records for large ancestral genomes

  1. Reconstruction of ancestral gene orders using intermediate genomes

    PubMed Central

    2015-01-01

    Background The problem of reconstructing ancestral genomes in a given phylogenetic tree arises in many different comparative genomics fields. Here, we focus on reconstructing the gene order of ancestral genomes, a problem that has been largely studied in the past 20 years, especially with the increasing availability of whole genome DNA sequences. There are two main approaches to this problem: event-based methods, that try to find the ancestral genomes that minimize the number of rearrangement events in the tree; and homology-based, that look for conserved structures, such as adjacent genes in the extant genomes, to build the ancestral genomes. Results We propose algorithms that use the concept of intermediate genomes, arising in optimal pairwise rearrangement scenarios. We show that intermediate genomes have combinatorial properties that make them easy to reconstruct, and develop fast algorithms with better reconstructed ancestral genomes than current event-based methods. The proposed framework is also designed to accept extra information, such as results from homology-based approaches, giving rise to combined algorithms with better results than the original methods. PMID:26451811

  2. Yeast Ancestral Genome Reconstructions: The Possibilities of Computational Methods

    NASA Astrophysics Data System (ADS)

    Tannier, Eric

    In 2006, a debate has risen on the question of the efficiency of bioinformatics methods to reconstruct mammalian ancestral genomes. Three years later, Gordon et al. (PLoS Genetics, 5(5), 2009) chose not to use automatic methods to build up the genome of a 100 million year old Saccharomyces cerevisiae ancestor. Their manually constructed ancestor provides a reference genome to test whether automatic methods are indeed unable to approach confident reconstructions. Adapting several methodological frameworks to the same yeast gene order data, I discuss the possibilities, differences and similarities of the available algorithms for ancestral genome reconstructions. The methods can be classified into two types: local and global. Studying the properties of both helps to clarify what we can expect from their usage. Both methods propose contiguous ancestral regions that come very close (> 95% identity) to the manually predicted ancestral yeast chromosomes, with a good coverage of the extant genomes.

  3. Ancestral genome inference using a genetic algorithm approach.

    PubMed

    Gao, Nan; Yang, Ning; Tang, Jijun

    2013-01-01

    Recent advancement of technologies has now made it routine to obtain and compare gene orders within genomes. Rearrangements of gene orders by operations such as reversal and transposition are rare events that enable researchers to reconstruct deep evolutionary histories. An important application of genome rearrangement analysis is to infer gene orders of ancestral genomes, which is valuable for identifying patterns of evolution and for modeling the evolutionary processes. Among various available methods, parsimony-based methods (including GRAPPA and MGR) are the most widely used. Since the core algorithms of these methods are solvers for the so called median problem, providing efficient and accurate median solver has attracted lots of attention in this field. The "double-cut-and-join" (DCJ) model uses the single DCJ operation to account for all genome rearrangement events. Because mathematically it is much simpler than handling events directly, parsimony methods using DCJ median solvers has better speed and accuracy. However, the DCJ median problem is NP-hard and although several exact algorithms are available, they all have great difficulties when given genomes are distant. In this paper, we present a new algorithm that combines genetic algorithm (GA) with genomic sorting to produce a new method which can solve the DCJ median problem in limited time and space, especially in large and distant datasets. Our experimental results show that this new GA-based method can find optimal or near optimal results for problems ranging from easy to very difficult. Compared to existing parsimony methods which may severely underestimate the true number of evolutionary events, the sorting-based approach can infer ancestral genomes which are much closer to their true ancestors. The code is available at http://phylo.cse.sc.edu. PMID:23658708

  4. Deciphering the diploid ancestral genome of the Mesohexaploid Brassica rapa.

    PubMed

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A; Wang, Xiaowu

    2013-05-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers. PMID:23653472

  5. Comparative paleogenomics of crucifers: ancestral genomic blocks revisited.

    PubMed

    Lysak, Martin A; Mandáková, Terezie; Schranz, M Eric

    2016-04-01

    A decade ago the concept of the Ancestral Crucifer Karyotype (ACK) and the definition of 24 conserved genomic blocks was presented. Subsequently, 35 cytogenetic reconstructions and/or draft genome sequences of crucifer species (members of the Brassicaceae family) have been analyzed in the context of this system; placing crucifers at the forefront of plant phylogenomics. In this review, we highlight how the ACK and genomic blocks have facilitated and guided genomic analysis of crucifers in the last 10 years and provide an update of this robust model. PMID:26945766

  6. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We

  7. Reconstruction of the ancestral plastid genome in Geraniaceae reveals a correlation between genome rearrangements, repeats, and nucleotide substitution rates.

    PubMed

    Weng, Mao-Lun; Blazier, John C; Govindu, Madhumita; Jansen, Robert K

    2014-03-01

    Geraniaceae plastid genomes are highly rearranged, and each of the four genera already sequenced in the family has a distinct genome organization. This study reports plastid genome sequences of six additional species, Francoa sonchifolia, Melianthus villosus, and Viviania marifolia from Geraniales, and Pelargonium alternans, California macrophylla, and Hypseocharis bilobata from Geraniaceae. These genome sequences, combined with previously published species, provide sufficient taxon sampling to reconstruct the ancestral plastid genome organization of Geraniaceae and the rearrangements unique to each genus. The ancestral plastid genome of Geraniaceae has a 4 kb inversion and a reduced, Pelargonium-like small single copy region. Our ancestral genome reconstruction suggests that a few minor rearrangements occurred in the stem branch of Geraniaceae followed by independent rearrangements in each genus. The genomic comparison demonstrates that a series of inverted repeat boundary shifts and inversions played a major role in shaping genome organization in the family. The distribution of repeats is strongly associated with breakpoints in the rearranged genomes, and the proportion and the number of large repeats (>20 bp and >60 bp) are significantly correlated with the degree of genome rearrangements. Increases in the degree of plastid genome rearrangements are correlated with the acceleration in nonsynonymous substitution rates (dN) but not with synonymous substitution rates (dS). Possible mechanisms that might contribute to this correlation, including DNA repair system and selection, are discussed. PMID:24336877

  8. Co-evolutionary Models for Reconstructing Ancestral Genomic Sequences: Computational Issues and Biological Examples

    NASA Astrophysics Data System (ADS)

    Tuller, Tamir; Birin, Hadas; Kupiec, Martin; Ruppin, Eytan

    The inference of ancestral genomes is a fundamental problem in molecular evolution. Due to the statistical nature of this problem, the most likely or the most parsimonious ancestral genomes usually include considerable error rates. In general, these errors cannot be abolished by utilizing more exhaustive computational approaches, by using longer genomic sequences, or by analyzing more taxa. In recent studies we showed that co-evolution is an important force that can be used for significantly improving the inference of ancestral genome content.

  9. Reflections on ancestral haplotypes: medical genomics, evolution, and human individuality.

    PubMed

    Steele, Edward J

    2014-01-01

    The major histocompatibility complex (MHC), once labelled the "sphinx of immunology" by Jan Klein, provides powerful challenges to evolutionary thinking. This essay highlights the main discoveries that established the block ancestral haplotype structure of the MHC and the wider genome, focusing on the work by the Perth (Australia) group, led by Roger Dawkins, and the Boston group, led by Chester Alper and Edmond Yunis. Their achievements have been overlooked in the rush to sequence the first and subsequent drafts of the human genome. In Caucasoids, where most of the detailed work has been done, about 70% of all known allelic MHC diversity can be accounted for by 30 or so ancestral haplotypes (AHs), or conserved sequences of many mega-bases, and their recombinants. The block haplotype structure of the genome, as shown for the MHC (and other genetic regions), is a story that needs to be understood in its own right, particularly given the promotion of the "HapMap" project and single nucleotide polymorphism (SNP) linkage disequilibrium (LD) analysis, which has been wrongly touted as the only way to pinpoint those genes that are important in genetic disorders or other desired (qualitative) characteristics. PMID:25544323

  10. A Cooperative Co-Evolutionary Genetic Algorithm for Tree Scoring and Ancestral Genome Inference.

    PubMed

    Gao, Nan; Zhang, Yan; Feng, Bing; Tang, Jijun

    2015-01-01

    Recent advances of technology have made it easy to obtain and compare whole genomes. Rearrangements of genomes through operations such as reversals and transpositions are rare events that enable researchers to reconstruct deep evolutionary history among species. Some of the popular methods need to search a large tree space for the best scored tree, thus it is desirable to have a fast and accurate method that can score a given tree efficiently. During the tree scoring procedure, the genomic structures of internal tree nodes are also provided, which provide important information for inferring ancestral genomes and for modeling the evolutionary processes. However, computing tree scores and ancestral genomes are very difficult and a lot of researchers have to rely on heuristic methods which have various disadvantages. In this paper, we describe the first genetic algorithm for tree scoring and ancestor inference, which uses a fitness function considering co-evolution, adopts different initial seeding methods to initialize the first population pool, and utilizes a sorting-based approach to realize evolution. Our extensive experiments show that compared with other existing algorithms, this new method is more accurate and can infer ancestral genomes that are much closer to the true ancestors. PMID:26671797

  11. Consistency of genome-wide associations across major ancestral groups.

    PubMed

    Ntzani, Evangelia E; Liberopoulos, George; Manolio, Teri A; Ioannidis, John P A

    2012-07-01

    It is not well known whether genetic markers identified through genome-wide association studies (GWAS) confer similar or different risks across people of different ancestry. We screened a regularly updated catalog of all published GWAS curated at the NHGRI website for GWAS-identified associations that had reached genome-wide significance (p ≤ 5 × 10(-8)) in at least one major ancestry group (European, Asian, African) and for which replication data were available for comparison in at least two different major ancestry groups. These groups were compared for the correlation between and differences in risk allele frequencies and genetic effects' estimates. Data on 108 eligible GWAS-identified associations with a total of 900 datasets (European, n = 624; Asian, n = 217; African, n = 60) were analyzed. Risk-allele frequencies were modestly correlated between ancestry groups, with >10% absolute differences in 75-89% of the three pairwise comparisons of ancestry groups. Genetic effect (odds ratio) point estimates between ancestry groups correlated modestly (pairwise comparisons' correlation coefficients: 0.20-0.33) and point estimates of risks were opposite in direction or differed more than twofold in 57%, 79%, and 89% of the European versus Asian, European versus African, and Asian versus African comparisons, respectively. The modest correlations, differing risk estimates, and considerable between-association heterogeneity suggest that differential ancestral effects can be anticipated and genomic risk markers may need separate further evaluation in different ancestry groups. PMID:22183176

  12. Genome-Wide Inference of Ancestral Recombination Graphs

    PubMed Central

    Rasmussen, Matthew D.; Hubisz, Melissa J.; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of chromosomes conditional on an ARG of chromosomes, an operation we call “threading.” Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  13. Genome-wide inference of ancestral recombination graphs.

    PubMed

    Rasmussen, Matthew D; Hubisz, Melissa J; Gronau, Ilan; Siepel, Adam

    2014-01-01

    The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps. PMID:24831947

  14. Reconstruction of an ancestral Yersinia pestis genome and comparison with an ancient sequence

    PubMed Central

    2015-01-01

    Background We propose the computational reconstruction of a whole bacterial ancestral genome at the nucleotide scale, and its validation by a sequence of ancient DNA. This rare possibility is offered by an ancient sequence of the late middle ages plague agent. It has been hypothesized to be ancestral to extant Yersinia pestis strains based on the pattern of nucleotide substitutions. But the dynamics of indels, duplications, insertion sequences and rearrangements has impacted all genomes much more than the substitution process, which makes the ancestral reconstruction task challenging. Results We use a set of gene families from 13 Yersinia species, construct reconciled phylogenies for all of them, and determine gene orders in ancestral species. Gene trees integrate information from the sequence, the species tree and gene order. We reconstruct ancestral sequences for ancestral genic and intergenic regions, providing nearly a complete genome sequence for the ancestor, containing a chromosome and three plasmids. Conclusion The comparison of the ancestral and ancient sequences provides a unique opportunity to assess the quality of ancestral genome reconstruction methods. But the quality of the sequencing and assembly of the ancient sequence can also be questioned by this comparison. PMID:26450112

  15. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  16. DUPCAR: Reconstructing Contiguous Ancestral Regions with Duplications

    PubMed Central

    Ratan, Aakrosh; Raney, Brian J.; Suh, Bernard B.; Zhang, Louxin; Miller, Webb; Haussler, David

    2008-01-01

    Abstract Accurately reconstructing the large-scale gene order in an ancestral genome is a critical step to better understand genome evolution. In this paper, we propose a heuristic algorithm, called DUPCAR, for reconstructing ancestral genomic orders with duplications. The method starts from the order of genes in modern genomes and predicts predecessor and successor relationships in the ancestor. Then a greedy algorithm is used to reconstruct the ancestral orders by connecting genes into contiguous regions based on predicted adjacencies. Computer simulation was used to validate the algorithm. We also applied the method to reconstruct the ancestral chromosome X of placental mammals and the ancestral genomes of the ciliate Paramecium tetraurelia. PMID:18774902

  17. Reconstruction of Ancestral Genomes in Presence of Gene Gain and Loss.

    PubMed

    Avdeyev, Pavel; Jiang, Shuai; Aganezov, Sergey; Hu, Fei; Alekseyev, Max A

    2016-03-01

    Since most dramatic genomic changes are caused by genome rearrangements as well as gene duplications and gain/loss events, it becomes crucial to understand their mechanisms and reconstruct ancestral genomes of the given genomes. This problem was shown to be NP-complete even in the "simplest" case of three genomes, thus calling for heuristic rather than exact algorithmic solutions. At the same time, a larger number of input genomes may actually simplify the problem in practice as it was earlier illustrated with MGRA, a state-of-the-art software tool for reconstruction of ancestral genomes of multiple genomes. One of the key obstacles for MGRA and other similar tools is presence of breakpoint reuses when the same breakpoint region is broken by several different genome rearrangements in the course of evolution. Furthermore, such tools are often limited to genomes composed of the same genes with each gene present in a single copy in every genome. This limitation makes these tools inapplicable for many biological datasets and degrades the resolution of ancestral reconstructions in diverse datasets. We address these deficiencies by extending the MGRA algorithm to genomes with unequal gene contents. The developed next-generation tool MGRA2 can handle gene gain/loss events and shares the ability of MGRA to reconstruct ancestral genomes uniquely in the case of limited breakpoint reuse. Furthermore, MGRA2 employs a number of novel heuristics to cope with higher breakpoint reuse and process datasets inaccessible for MGRA. In practical experiments, MGRA2 shows superior performance for simulated and real genomes as compared to other ancestral genome reconstruction tools. PMID:26885568

  18. Genomic evolution in domestic cattle: ancestral haplotypes and healthy beef.

    PubMed

    Williamson, Joseph F; Steele, Edward J; Lester, Susan; Kalai, Oscar; Millman, John A; Wolrige, Lindsay; Bayard, Dominic; McLure, Craig; Dawkins, Roger L

    2011-05-01

    We have identified numerous Ancestral Haplotypes encoding a 14-Mb region of Bota C19. Three are frequent in Simmental, Angus and Wagyu and have been conserved since common progenitor populations. Others are more relevant to the differences between these 3 breeds including fat content and distribution in muscle. SREBF1 and Growth Hormone, which have been implicated in the production of healthy beef, are included within these haplotypes. However, we conclude that alleles at these 2 loci are less important than other sequences within the haplotypes. Identification of breeds and hybrids is improved by using haplotypes rather than individual alleles. PMID:21338665

  19. Ancient hybridizations among the ancestral genomes of bread wheat.

    PubMed

    Marcussen, Thomas; Sandve, Simen R; Heier, Lise; Spannagl, Manuel; Pfeifer, Matthias; Jakobsen, Kjetill S; Wulff, Brande B H; Steuernagel, Burkhard; Mayer, Klaus F X; Olsen, Odd-Arne

    2014-07-18

    The allohexaploid bread wheat genome consists of three closely related subgenomes (A, B, and D), but a clear understanding of their phylogenetic history has been lacking. We used genome assemblies of bread wheat and five diploid relatives to analyze genome-wide samples of gene trees, as well as to estimate evolutionary relatedness and divergence times. We show that the A and B genomes diverged from a common ancestor ~7 million years ago and that these genomes gave rise to the D genome through homoploid hybrid speciation 1 to 2 million years later. Our findings imply that the present-day bread wheat genome is a product of multiple rounds of hybrid speciation (homoploid and polyploid) and lay the foundation for a new framework for understanding the wheat genome as a multilevel phylogenetic mosaic. PMID:25035499

  20. Whole genome profiling physical map and ancestral annotation of tobacco Hicks Broadleaf

    PubMed Central

    Sierro, Nicolas; van Oeveren, Jan; van Eijk, Michiel J T; Martin, Florian; Stormo, Keith E; Peitsch, Manuel C; Ivanov, Nikolai V

    2013-01-01

    Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200 000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance. PMID:23672264

  1. Unexpectedly large number of conserved noncoding regions within the ancestral chordate Hox cluster.

    PubMed

    Pascual-Anaya, Juan; D'Aniello, Salvatore; Garcia-Fernàndez, Jordi

    2008-12-01

    The single amphioxus Hox cluster contains 15 genes and may well resemble the ancestral chordate Hox cluster. We have sequenced the Hox genomic complement of the European amphioxus Branchiostoma lanceolatum and compared it to the American species, Branchiostoma floridae, by phylogenetic footprinting to gain insights into the evolution of Hox gene regulation in chordates. We found that Hox intergenic regions are largely conserved between the two amphioxus species, especially in the case of genes located at the 3' of the cluster, a trend previously observed in vertebrates. We further compared the amphioxus Hox cluster with the human HoxA, HoxB, HoxC, and HoxD clusters, finding several conserved noncoding regions, both in intergenic and intronic regions. This suggests that the regulation of Hox genes is highly conserved across chordates, consistent with the similar Hox expression patterns in vertebrates and amphioxus. PMID:18791732

  2. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs

    PubMed Central

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2015-01-01

    To provide context for the diversifications of archosaurs, the group that includes crocodilians, dinosaurs and birds, we generated draft genomes of three crocodilians, Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the relatively rapid evolution of bird genomes represents an autapomorphy within that clade. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these new data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  3. Three crocodilian genomes reveal ancestral patterns of evolution among archosaurs.

    PubMed

    Green, Richard E; Braun, Edward L; Armstrong, Joel; Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Vandewege, Michael W; St John, John A; Capella-Gutiérrez, Salvador; Castoe, Todd A; Kern, Colin; Fujita, Matthew K; Opazo, Juan C; Jurka, Jerzy; Kojima, Kenji K; Caballero, Juan; Hubley, Robert M; Smit, Arian F; Platt, Roy N; Lavoie, Christine A; Ramakodi, Meganathan P; Finger, John W; Suh, Alexander; Isberg, Sally R; Miles, Lee; Chong, Amanda Y; Jaratlerdsiri, Weerachai; Gongora, Jaime; Moran, Christopher; Iriarte, Andrés; McCormack, John; Burgess, Shane C; Edwards, Scott V; Lyons, Eric; Williams, Christina; Breen, Matthew; Howard, Jason T; Gresham, Cathy R; Peterson, Daniel G; Schmitz, Jürgen; Pollock, David D; Haussler, David; Triplett, Eric W; Zhang, Guojie; Irie, Naoki; Jarvis, Erich D; Brochu, Christopher A; Schmidt, Carl J; McCarthy, Fiona M; Faircloth, Brant C; Hoffmann, Federico G; Glenn, Travis C; Gabaldón, Toni; Paten, Benedict; Ray, David A

    2014-12-12

    To provide context for the diversification of archosaurs--the group that includes crocodilians, dinosaurs, and birds--we generated draft genomes of three crocodilians: Alligator mississippiensis (the American alligator), Crocodylus porosus (the saltwater crocodile), and Gavialis gangeticus (the Indian gharial). We observed an exceptionally slow rate of genome evolution within crocodilians at all levels, including nucleotide substitutions, indels, transposable element content and movement, gene family evolution, and chromosomal synteny. When placed within the context of related taxa including birds and turtles, this suggests that the common ancestor of all of these taxa also exhibited slow genome evolution and that the comparatively rapid evolution is derived in birds. The data also provided the opportunity to analyze heterozygosity in crocodilians, which indicates a likely reduction in population size for all three taxa through the Pleistocene. Finally, these data combined with newly published bird genomes allowed us to reconstruct the partial genome of the common ancestor of archosaurs, thereby providing a tool to investigate the genetic starting material of crocodilians, birds, and dinosaurs. PMID:25504731

  4. Monotreme IGF2 expression and ancestral origin of genomic imprinting.

    PubMed

    Killian, J K; Nolan, C M; Stewart, N; Munday, B L; Andersen, N A; Nicol, S; Jirtle, R L

    2001-08-15

    IGF2 (insulin-like growth factor 2) and M6P/IGF2R (mannose 6-phosphate/insulin-like growth factor 2 receptor) are imprinted in marsupials and eutherians but not in birds. These results along with the absence of M6P/IGF2R imprinting in the egg-laying monotremes indicate that the parental imprinting of fetal growth-regulatory genes may be unique to viviparous mammals. In this investigation, we have cloned IGF2 from two monotreme mammals, the platypus and echidna, to further investigate the origin of imprinting. We report herein that like M6P/IGF2R, IGF2 is not imprinted in monotremes. Thus, although IGF2 encodes for a highly conserved growth factor in chordates, it is only imprinted in therian mammals. These findings support a concurrent origin of IGF2 and M6P/IGF2R imprinting in the late Jurassic/early Cretaceous period. The absence of imprinting in monotremes, despite apparent interparental conflicts over maternal-offspring exchange, argues that a fortuitous congruency of genetic and epigenetic events may have limited the phylogenetic breadth of genomic imprinting to therian mammals. J. Exp. Zool. (Mol. Dev. Evol.) 291:205-212, 2001. PMID:11479919

  5. Analyses of Charophyte Chloroplast Genomes Help Characterize the Ancestral Chloroplast Genome of Land Plants

    PubMed Central

    Civáň, Peter; Foster, Peter G.; Embley, Martin T.; Séneca, Ana; Cox, Cymon J.

    2014-01-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  6. Analyses of charophyte chloroplast genomes help characterize the ancestral chloroplast genome of land plants.

    PubMed

    Civaň, Peter; Foster, Peter G; Embley, Martin T; Séneca, Ana; Cox, Cymon J

    2014-04-01

    Despite the significance of the relationships between embryophytes and their charophyte algal ancestors in deciphering the origin and evolutionary success of land plants, few chloroplast genomes of the charophyte algae have been reconstructed to date. Here, we present new data for three chloroplast genomes of the freshwater charophytes Klebsormidium flaccidum (Klebsormidiophyceae), Mesotaenium endlicherianum (Zygnematophyceae), and Roya anglica (Zygnematophyceae). The chloroplast genome of Klebsormidium has a quadripartite organization with exceptionally large inverted repeat (IR) regions and, uniquely among streptophytes, has lost the rrn5 and rrn4.5 genes from the ribosomal RNA (rRNA) gene cluster operon. The chloroplast genome of Roya differs from other zygnematophycean chloroplasts, including the newly sequenced Mesotaenium, by having a quadripartite structure that is typical of other streptophytes. On the basis of the improbability of the novel gain of IR regions, we infer that the quadripartite structure has likely been lost independently in at least three zygnematophycean lineages, although the absence of the usual rRNA operonic synteny in the IR regions of Roya may indicate their de novo origin. Significantly, all zygnematophycean chloroplast genomes have undergone substantial genomic rearrangement, which may be the result of ancient retroelement activity evidenced by the presence of integrase-like and reverse transcriptase-like elements in the Roya chloroplast genome. Our results corroborate the close phylogenetic relationship between Zygnematophyceae and land plants and identify 89 protein-coding genes and 22 introns present in the chloroplast genome at the time of the evolutionary transition of plants to land, all of which can be found in the chloroplast genomes of extant charophytes. PMID:24682153

  7. Exploring the diploid wheat ancestral A genome through sequence comparison at the high-molecular-weight glutenin locus region.

    PubMed

    Dong, Lingli; Huo, Naxin; Wang, Yi; Deal, Karin; Luo, Ming-Cheng; Wang, Daowen; Anderson, Olin D; Gu, Yong Qiang

    2012-12-01

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurately assembly of its genome. To explore the utility of ancestral diploid species of polyploid wheat, sequence variation of T. urartu (A(u)A(u)) was analyzed by comparing its 277-kb large genomic region carrying the important Glu-1 locus with the homologous regions from the A genomes of the diploid T. monococcum (A(m)A(m)), tetraploid T. turgidum (AABB), and hexaploid T. aestivum (AABBDD). Our results revealed that in addition to a high degree of the gene collinearity, nested retroelement structures were also considerably conserved among the A(u) genome and the A genomes in polyploid wheats, suggesting that the majority of the repetitive sequences in the A genomes of polyploid wheats originated from the diploid A(u) genome. The difference in the compared region between A(u) and A is mainly caused by four differential TE insertion and two deletion events between these genomes. The estimated divergence time of A genomes calculated on nucleotide substitution rate in both shared TEs and collinear genes further supports the closer evolutionary relationship of A to A(u) than to A(m). The structure conservation in the repetitive regions promoted us to develop repeat junction markers based on the A(u) sequence for mapping the A genome in hexaploid wheat. Eighty percent of these repeat junction markers were successfully mapped to the corresponding region in hexaploid wheat, suggesting that T. urartu could serve as a useful resource for developing molecular markers for genetic and breeding studies in hexaploid wheat. PMID:23052831

  8. Evolution of the ancestral recombination graph along the genome in case of selective sweep.

    PubMed

    Leocard, Stephanie; Pardoux, Etienne

    2010-12-01

    We consider the genome of a sample of n individuals taken at the end of a selective sweep, which is the fixation of an advantageous allele in the population. When the selective advantage is high, the genealogy at a locus under selective sweep can be approximated by a comb with n teeth. However, because of recombinations during the selective sweep, the hitchhiking effect decreases as the distance from the selected site increases, so that far from this locus, the tree can be approximated by a Kingman coalescent tree, as in the neutral case. We first give the distribution of the tree at a given locus. Then we focus on the evolution of this tree along the genome. Since this tree-valued process is not Markovian, we study the evolution of the Ancestral Recombination Graph along the genome in case of selective sweep. PMID:20077118

  9. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral “Salicoid” Genome Duplication

    PubMed Central

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the “salicoid” duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  10. Major Chromosomal Rearrangements Distinguish Willow and Poplar After the Ancestral "Salicoid" Genome Duplication.

    PubMed

    Hou, Jing; Ye, Ning; Dong, Zhongyuan; Lu, Mengzhu; Li, Laigeng; Yin, Tongming

    2016-01-01

    Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. In both lineages extant species are predominantly diploid. Genome analysis previously revealed that the two lineages originated from a common tetraploid ancestor. In this study, we conducted a syntenic comparison of the corresponding 19 chromosome members of the poplar and willow genomes. Our observations revealed that almost every chromosomal segment had a parallel paralogous segment elsewhere in the genomes, and the two lineages shared a similar syntenic pinwheel pattern for most of the chromosomes, which indicated that the two lineages diverged after the genome reorganization in the common progenitor. The pinwheel patterns showed distinct differences for two chromosome pairs in each lineage. Further analysis detected two major interchromosomal rearrangements that distinguished the karyotypes of willow and poplar. Chromosome I of willow was a conjunction of poplar chromosome XVI and the lower portion of poplar chromosome I, whereas willow chromosome XVI corresponded to the upper portion of poplar chromosome I. Scientists have suggested that Populus is evolutionarily more primitive than Salix. Therefore, we propose that, after the "salicoid" duplication event, fission and fusion of the ancestral chromosomes first give rise to the diploid progenitor of extant Populus species. During the evolutionary process, fission and fusion of poplar chromosomes I and XVI subsequently give rise to the progenitor of extant Salix species. This study contributes to an improved understanding of genome divergence after ancient genome duplication in closely related lineages of higher plants. PMID:27352946

  11. Exploiting ancestral mammalian genomes for the prediction of human transcription factor binding sites

    PubMed Central

    2012-01-01

    Background The computational prediction of Transcription Factor Binding Sites (TFBS) remains a challenge due to their short length and low information content. Comparative genomics approaches that simultaneously consider several related species and favor sites that have been conserved throughout evolution improve the accuracy (specificity) of the predictions but are limited due to a phenomenon called binding site turnover, where sequence evolution causes one TFBS to replace another in the same region. In parallel to this development, an increasing number of mammalian genomes are now sequenced and it is becoming possible to infer, to a surprisingly high degree of accuracy, ancestral mammalian sequences. Results We propose a TFBS prediction approach that makes use of the availability of inferred ancestral mammalian genomes to improve its accuracy. This method aims to identify binding loci, which are regions of a few hundred base pairs that have preserved their potential to bind a given transcription factor over evolutionary time. After proposing a neutral evolutionary model of predicted TFBS counts in a DNA region of a given length, we use it to identify regions that have preserved the number of predicted TFBS they contain to an unexpected degree given their divergence. The approach is applied to human chromosome 1 and shows significant gains in accuracy as compared to both existing single-species and multi-species TFBS prediction approaches, in particular for transcription factors that are subject to high turnover rates. Availability The source code and predictions made by the program are available at http://www.cs.mcgill.ca/~blanchem/bindingLoci. PMID:23281809

  12. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure

    PubMed Central

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P.

    2016-01-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  13. Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure.

    PubMed

    Basu, Analabha; Sarkar-Roy, Neeta; Majumder, Partha P

    2016-02-01

    India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform. PMID:26811443

  14. Exploring the diploid wheat ancestral A genome through sequence comparison at the High-Molecular-Weight glutenin locus region

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The polyploid nature of hexaploid wheat (T. aestivum, AABBDD) often represents a great challenge in various aspects of research including genetic mapping, map-based cloning of important genes, and sequencing and accurate assembly of its genome. To explore the utility of ancestral diploid species o...

  15. MADS goes genomic in conifers: towards determining the ancestral set of MADS-box genes in seed plants

    PubMed Central

    Gramzow, Lydia; Weilandt, Lisa; Theißen, Günter

    2014-01-01

    Background and Aims MADS-box genes comprise a gene family coding for transcription factors. This gene family expanded greatly during land plant evolution such that the number of MADS-box genes ranges from one or two in green algae to around 100 in angiosperms. Given the crucial functions of MADS-box genes for nearly all aspects of plant development, the expansion of this gene family probably contributed to the increasing complexity of plants. However, the expansion of MADS-box genes during one important step of land plant evolution, namely the origin of seed plants, remains poorly understood due to the previous lack of whole-genome data for gymnosperms. Methods The newly available genome sequences of Picea abies, Picea glauca and Pinus taeda were used to identify the complete set of MADS-box genes in these conifers. In addition, MADS-box genes were identified in the growing number of transcriptomes available for gymnosperms. With these datasets, phylogenies were constructed to determine the ancestral set of MADS-box genes of seed plants and to infer the ancestral functions of these genes. Key Results Type I MADS-box genes are under-represented in gymnosperms and only a minimum of two Type I MADS-box genes have been present in the most recent common ancestor (MRCA) of seed plants. In contrast, a large number of Type II MADS-box genes were found in gymnosperms. The MRCA of extant seed plants probably possessed at least 11–14 Type II MADS-box genes. In gymnosperms two duplications of Type II MADS-box genes were found, such that the MRCA of extant gymnosperms had at least 14–16 Type II MADS-box genes. Conclusions The implied ancestral set of MADS-box genes for seed plants shows simplicity for Type I MADS-box genes and remarkable complexity for Type II MADS-box genes in terms of phylogeny and putative functions. The analysis of transcriptome data reveals that gymnosperm MADS-box genes are expressed in a great variety of tissues, indicating diverse roles of MADS

  16. Genomic organization of the crested ibis MHC provides new insight into ancestral avian MHC structure

    PubMed Central

    Chen, Li-Cheng; Lan, Hong; Sun, Li; Deng, Yan-Li; Tang, Ke-Yi; Wan, Qiu-Hong

    2015-01-01

    The major histocompatibility complex (MHC) plays an important role in immune response. Avian MHCs are not well characterized, only reporting highly compact Galliformes MHCs and extensively fragmented zebra finch MHC. We report the first genomic structure of an endangered Pelecaniformes (crested ibis) MHC containing 54 genes in three regions spanning ~500 kb. In contrast to the loose BG (26 loci within 265 kb) and Class I (11 within 150) genomic structures, the Core Region is condensed (17 within 85). Furthermore, this Region exhibits a COL11A2 gene, followed by four tandem MHC class II αβ dyads retaining two suites of anciently duplicated “αβ” lineages. Thus, the crested ibis MHC structure is entirely different from the known avian MHC architectures but similar to that of mammalian MHCs, suggesting that the fundamental structure of ancestral avian class II MHCs should be “COL11A2-IIαβ1-IIαβ2.” The gene structures, residue characteristics, and expression levels of the five class I genes reveal inter-locus functional divergence. However, phylogenetic analysis indicates that these five genes generate a well-supported intra-species clade, showing evidence for recent duplications. Our analyses suggest dramatic structural variation among avian MHC lineages, help elucidate avian MHC evolution, and provide a foundation for future conservation studies. PMID:25608659

  17. Complete Genome Sequence of Macrococcus caseolyticus Strain JSCS5402, Reflecting the Ancestral Genome of the Human-Pathogenic Staphylococci▿

    PubMed Central

    Baba, Tadashi; Kuwahara-Arai, Kyoko; Uchiyama, Ikuo; Takeuchi, Fumihiko; Ito, Teruyo; Hiramatsu, Keiichi

    2009-01-01

    We isolated the methicillin-resistant Macrococcus caseolyticus strain JCSC5402 from animal meat in a supermarket and determined its whole-genome nucleotide sequence. This is the first report on the genome analysis of a macrococcal species that is evolutionarily closely related to the human pathogens Staphylococcus aureus and Bacillus anthracis. The essential biological pathways of M. caseolyticus are similar to those of staphylococci. However, the species has a small chromosome (2.1 MB) and lacks many sugar and amino acid metabolism pathways and a plethora of virulence genes that are present in S. aureus. On the other hand, M. caseolyticus possesses a series of oxidative phosphorylation machineries that are closely related to those in the family Bacillaceae. We also discovered a probable primordial form of a Macrococcus methicillin resistance gene complex, mecIRAm, on one of the eight plasmids harbored by the M. caseolyticus strain. This is the first finding of a plasmid-encoding methicillin resistance gene. Macrococcus is considered to reflect the genome of ancestral bacteria before the speciation of staphylococcal species and may be closely associated with the origin of the methicillin resistance gene complex of the notorious human pathogen methicillin-resistant S. aureus. PMID:19074389

  18. Complete genome sequence of Macrococcus caseolyticus strain JCSCS5402, [corrected] reflecting the ancestral genome of the human-pathogenic staphylococci.

    PubMed

    Baba, Tadashi; Kuwahara-Arai, Kyoko; Uchiyama, Ikuo; Takeuchi, Fumihiko; Ito, Teruyo; Hiramatsu, Keiichi

    2009-02-01

    We isolated the methicillin-resistant Macrococcus caseolyticus strain JCSC5402 from animal meat in a supermarket and determined its whole-genome nucleotide sequence. This is the first report on the genome analysis of a macrococcal species that is evolutionarily closely related to the human pathogens Staphylococcus aureus and Bacillus anthracis. The essential biological pathways of M. caseolyticus are similar to those of staphylococci. However, the species has a small chromosome (2.1 MB) and lacks many sugar and amino acid metabolism pathways and a plethora of virulence genes that are present in S. aureus. On the other hand, M. caseolyticus possesses a series of oxidative phosphorylation machineries that are closely related to those in the family Bacillaceae. We also discovered a probable primordial form of a Macrococcus methicillin resistance gene complex, mecIRAm, on one of the eight plasmids harbored by the M. caseolyticus strain. This is the first finding of a plasmid-encoding methicillin resistance gene. Macrococcus is considered to reflect the genome of ancestral bacteria before the speciation of staphylococcal species and may be closely associated with the origin of the methicillin resistance gene complex of the notorious human pathogen methicillin-resistant S. aureus. PMID:19074389

  19. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  20. The mitochondrial genome of the onychophoran Opisthopatus cinctipes (Peripatopsidae) reflects the ancestral mitochondrial gene arrangement of Panarthropoda and Ecdysozoa.

    PubMed

    Braband, Anke; Cameron, Stephen L; Podsiadlowski, Lars; Daniels, Savel R; Mayer, Georg

    2010-10-01

    The ancestral genome composition in Onychophora (velvet worms) is unknown since only a single species of Peripatidae has been studied thus far, which shows a highly derived gene order with numerous translocated genes. Due to this lack of information from Onychophora, it is difficult to infer the ancestral mitochondrial gene arrangement patterns for Panarthropoda and Ecdysozoa. Hence, we analyzed the complete mitochondrial genome of the onychophoran Opisthopatus cinctipes, a representative of Peripatopsidae. Our data show that O. cinctipes possesses a highly conserved gene order, similar to that found in various arthropods. By comparing our results to those from different outgroups, we reconstruct the ancestral gene arrangement in Panarthropoda and Ecdysozoa. Our phylogenetic analysis of protein-coding gene sequences from 60 protostome species (including outgroups) provides some support for the sister group relationship of Onychophora and Arthropoda, which was not recovered by using a single species of Peripatidae, Epiperipatus biolleyi, in a previous study. A comparison of the strand-specific bias between onychophorans, arthropods, and a priapulid suggests that the peripatid E. biolleyi is less suitable for phylogenetic analyses of Ecdysozoa using mitochondrial genomic data than the peripatopsid O. cinctipes. PMID:20493270

  1. Reconstruction of ancestral chromosome architecture and gene repertoire reveals principles of genome evolution in a model yeast genus.

    PubMed

    Vakirlis, Nikolaos; Sarilar, Véronique; Drillon, Guénola; Fleiss, Aubin; Agier, Nicolas; Meyniel, Jean-Philippe; Blanpain, Lou; Carbone, Alessandra; Devillers, Hugo; Dubois, Kenny; Gillet-Markowska, Alexandre; Graziani, Stéphane; Huu-Vang, Nguyen; Poirel, Marion; Reisser, Cyrielle; Schott, Jonathan; Schacherer, Joseph; Lafontaine, Ingrid; Llorente, Bertrand; Neuvéglise, Cécile; Fischer, Gilles

    2016-07-01

    Reconstructing genome history is complex but necessary to reveal quantitative principles governing genome evolution. Such reconstruction requires recapitulating into a single evolutionary framework the evolution of genome architecture and gene repertoire. Here, we reconstructed the genome history of the genus Lachancea that appeared to cover a continuous evolutionary range from closely related to more diverged yeast species. Our approach integrated the generation of a high-quality genome data set; the development of AnChro, a new algorithm for reconstructing ancestral genome architecture; and a comprehensive analysis of gene repertoire evolution. We found that the ancestral genome of the genus Lachancea contained eight chromosomes and about 5173 protein-coding genes. Moreover, we characterized 24 horizontal gene transfers and 159 putative gene creation events that punctuated species diversification. We retraced all chromosomal rearrangements, including gene losses, gene duplications, chromosomal inversions and translocations at single gene resolution. Gene duplications outnumbered losses and balanced rearrangements with 1503, 929, and 423 events, respectively. Gene content variations between extant species are mainly driven by differential gene losses, while gene duplications remained globally constant in all lineages. Remarkably, we discovered that balanced chromosomal rearrangements could be responsible for up to 14% of all gene losses by disrupting genes at their breakpoints. Finally, we found that nonsynonymous substitutions reached fixation at a coordinated pace with chromosomal inversions, translocations, and duplications, but not deletions. Overall, we provide a granular view of genome evolution within an entire eukaryotic genus, linking gene content, chromosome rearrangements, and protein divergence into a single evolutionary framework. PMID:27247244

  2. Vertebrate codon bias indicates a highly GC-rich ancestral genome.

    PubMed

    Nabiyouni, Maryam; Prakash, Ashwin; Fedorov, Alexei

    2013-04-25

    Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences. PMID:23376453

  3. A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia.

    PubMed

    Ogedengbe, Mosun E; Qvarnstrom, Yvonne; da Silva, Alexandre J; Arrowood, Michael J; Barta, John R

    2015-05-01

    The near complete mitochondrial genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mitochondrial genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5'-terminus of the genome starting with a 21 bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mitochondrial genome 5'-terminus are found in all eimeriid mitochondrial genomes available and suggest this may be the ancestral start of eimeriid mitochondrial genomes. PMID:25812835

  4. Comparative Genome-Scale Reconstruction of Gapless Metabolic Networks for Present and Ancestral Species

    PubMed Central

    Pitkänen, Esa; Jouhten, Paula; Hou, Jian; Syed, Muhammad Fahad; Blomberg, Peter; Kludas, Jana; Oja, Merja; Holm, Liisa; Penttilä, Merja; Rousu, Juho; Arvas, Mikko

    2014-01-01

    We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/. PMID:24516375

  5. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage

    PubMed Central

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H.; Hugenholtz, Philip

    2016-01-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0–1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  6. Comparative Genomics of Candidate Phylum TM6 Suggests That Parasitism Is Widespread and Ancestral in This Lineage.

    PubMed

    Yeoh, Yun Kit; Sekiguchi, Yuji; Parks, Donovan H; Hugenholtz, Philip

    2016-04-01

    Candidate phylum TM6 is a major bacterial lineage recognized through culture-independent rRNA surveys to be low abundance members in a wide range of habitats; however, they are poorly characterized due to a lack of pure culture representatives. Two recent genomic studies of TM6 bacteria revealed small genomes and limited gene repertoire, consistent with known or inferred dependence on eukaryotic hosts for their metabolic needs. Here, we obtained additional near-complete genomes of TM6 populations from agricultural soil and upflow anaerobic sludge blanket reactor metagenomes which, together with the two publicly available TM6 genomes, represent seven distinct family level lineages in the TM6 phylum. Genome-based phylogenetic analysis confirms that TM6 is an independent phylum level lineage in the bacterial domain, possibly affiliated with the Patescibacteria superphylum. All seven genomes are small (1.0-1.5 Mb) and lack complete biosynthetic pathways for various essential cellular building blocks including amino acids, lipids, and nucleotides. These and other features identified in the TM6 genomes such as a degenerated cell envelope, ATP/ADP translocases for parasitizing host ATP pools, and protein motifs to facilitate eukaryotic host interactions indicate that parasitism is widespread in this phylum. Phylogenetic analysis of ATP/ADP translocase genes suggests that the ancestral TM6 lineage was also parasitic. We propose the name Dependentiae (phyl. nov.) to reflect dependence of TM6 bacteria on host organisms. PMID:26615204

  7. Genomic structure and evolution of the ancestral chromosome fusion site in 2q13-2q14.1 and paralogous regions on other human chromosomes.

    PubMed

    Fan, Yuxin; Linardopoulou, Elena; Friedman, Cynthia; Williams, Eleanor; Trask, Barbara J

    2002-11-01

    Human chromosome 2 was formed by the head-to-head fusion of two ancestral chromosomes that remained separate in other primates. Sequences that once resided near the ends of the ancestral chromosomes are now interstitially located in 2q13-2q14.1. Portions of these sequences had duplicated to other locations prior to the fusion. Here we present analyses of the genomic structure and evolutionary history of >600 kb surrounding the fusion site and closely related sequences on other human chromosomes. Sequence blocks that closely flank the inverted arrays of degenerate telomere repeats marking the fusion site are duplicated at many, primarily subtelomeric, locations. In addition, large portions of a 168-kb centromere-proximal block are duplicated at 9pter, 9p11.2, and 9q13, with 98%-99% average sequence identity. A 67-kb block on the distal side of the fusion site is highly homologous to sequences at 22qter. A third ~100-kb segment is 96% identical to a region in 2q11.2. By integrating data on the extent and similarity of these paralogous blocks, including the presence of phylogenetically informative repetitive elements, with observations of their chromosomal distribution in nonhuman primates, we infer the order of the duplications that led to their current arrangement. Several of these duplicated blocks may be associated with breakpoints of inversions that occurred during primate evolution and of recurrent chromosome rearrangements in humans. PMID:12421751

  8. Evaluation of the TREX1 gene in a large multi-ancestral lupus cohort

    PubMed Central

    Namjou, Bahram; Kothari, Parul H.; Kelly, Jennifer A.; Glenn, Stuart B.; Ojwang, Joshua O.; Adler, Adam; Alarcón-Riquelme, Marta E.; Gallant, Caroline J.; Boackle, Susan A.; Criswell, Lindsey A.; Kimberly, Robert P.; Brown, Elizabeth; Edberg, Jeffrey; Stevens, Anne M.; Jacob, Chaim O.; Tsao, Betty P.; Gilkeson, Gary S.; Kamen, Diane L.; Merrill, Joan T.; Petri, Michelle; Goldman, Rosalind Ramsey; Vila, Luis M.; Anaya, Juan-Manuel; Niewold, Timothy B.; Martin, Javier; Pons-Estel, Bernardo A.; Sabio, Jose M.; Callejas, Jose L.; Vyse, Timothy J.; Bae, Sang-Cheol; Perrino, Fred W.; Freedman, Barry I.; Scofield, R. Hal; Moser, Kathy L.; Gaffney, Patrick M.; James, Judith A.; Langefeld, Carl D.; Kaufman, Kenneth M.; Harley, John B.; Atkinson, John P.

    2011-01-01

    Systemic Lupus Erythematosus (SLE) is a prototypic autoimmune disorder with a complex pathogenesis in which genetic, hormonal and environmental factors play a role. Rare mutations in the TREX1 gene, the major mammalian 3′-5′ exonuclease, have been reported in sporadic SLE cases. Some of these mutations have also been identified in a rare pediatric neurologic condition featuring an inflammatory encephalopathy known as Aicardi-Goutières syndrome (AGS). We sought to investigate the frequency of these mutations in a large multi-ancestral cohort of SLE cases and controls. Methods Forty single-nucleotide polymorphisms (SNPs), including both common and rare variants, across the TREX1 gene were evaluated in ∼8370 patients with SLE and ∼7490 control subjects. Stringent quality control procedures were applied and principal components and admixture proportions were calculated to identify outliers for removal from analysis. Population-based case-control association analyses were performed. P values, false discovery rate q values, and odds ratios with 95% confidence intervals were calculated. Results The estimated frequency of TREX1 mutations in our lupus cohort was 0.5%. Five heterozygous mutations were detected at the Y305C polymorphism in European lupus cases but none were observed in European controls. Five African cases incurred heterozygous mutations at the E266G polymorphism and, again, none were observed in the African controls. A rare homozygous R114H mutation was identified in one Asian SLE patient whereas all genotypes at this mutation in previous reports for SLE were heterozygous. Analysis of common TREX1 SNPs (MAF >10%) revealed a relatively common risk haplotype in European SLE patients with neurologic manifestations, especially seizures, with a frequency of 58% in lupus cases compared to 45% in normal controls (p=0.0008, OR=1.73, 95% CI=1.25-2.39). Finally, the presence or absence of specific autoantibodies in certain populations produced significant

  9. Phylogenomics of primates and their ancestral populations

    PubMed Central

    Siepel, Adam

    2009-01-01

    Genome assemblies are now available for nine primate species, and large-scale sequencing projects are underway or approved for six others. An explicitly evolutionary and phylogenetic approach to comparative genomics, called phylogenomics, will be essential in unlocking the valuable information about evolutionary history and genomic function that is contained within these genomes. However, most phylogenomic analyses so far have ignored the effects of variation in ancestral populations on patterns of sequence divergence. These effects can be pronounced in the primates, owing to large ancestral effective population sizes relative to the intervals between speciation events. In particular, local genealogies can vary considerably across loci, which can produce biases and diminished power in many phylogenomic analyses of interest, including phylogeny reconstruction, the identification of functional elements, and the detection of natural selection. At the same time, this variation in genealogies can be exploited to gain insight into the nature of ancestral populations. In this Perspective, I explore this area of intersection between phylogenetics and population genetics, and its implications for primate phylogenomics. I begin by “lifting the hood” on the conventional tree-like representation of the phylogenetic relationships between species, to expose the population-genetic processes that operate along its branches. Next, I briefly review an emerging literature that makes use of the complex relationships among coalescence, recombination, and speciation to produce inferences about evolutionary histories, ancestral populations, and natural selection. Finally, I discuss remaining challenges and future prospects at this nexus of phylogenetics, population genetics, and genomics. PMID:19801602

  10. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  11. The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera

    PubMed Central

    2012-01-01

    Background Lepidoptera encompasses more than 160,000 described species that have been classified into 45–48 superfamilies. The previously determined Lepidoptera mitochondrial genomes (mitogenomes) are limited to six superfamilies of the lineage Ditrysia. Compared with the ancestral insect gene order, these mitogenomes all contain a tRNA rearrangement. To gain new insights into Lepidoptera mitogenome evolution, we sequenced the mitogenomes of two ghost moths that belong to the non-ditrysian lineage Hepialoidea and conducted a comparative mitogenomic analysis across Lepidoptera. Results The mitogenomes of Thitarodes renzhiensis and T. yunnanensis are 16,173 bp and 15,816 bp long with an A + T content of 81.28 % and 82.34 %, respectively. Both mitogenomes include 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and the A + T-rich region. Different tandem repeats in the A + T-rich region mainly account for the size difference between the two mitogenomes. All the protein-coding genes start with typical mitochondrial initiation codons, except for cox1 (CGA) and nad1 (TTG) in both mitogenomes. The anticodon of trnS(AGN) in T. renzhiensis and T. yunnanensis is UCU instead of the mostly used GCU in other sequenced Lepidoptera mitogenomes. The 1,584-bp sequence from rrnS to nad2 was also determined for an unspecified ghost moth (Thitarodes sp.), which has no repetitive sequence in the A + T-rich region. All three Thitarodes species possess the ancestral gene order with trnI-trnQ-trnM located between the A + T-rich region and nad2, which is different from the gene order trnM-trnI-trnQ in all previously sequenced Lepidoptera species. The formerly identified conserved elements of Lepidoptera mitogenomes (i.e. the motif ‘ATAGA’ and poly-T stretch in the A + T-rich region and the long intergenic spacer upstream of nad2) are absent in the Thitarodes mitogenomes. Conclusion The mitogenomes of T. renzhiensis and T

  12. Reconstruction of the ancestral marsupial karyotype from comparative gene maps

    PubMed Central

    2013-01-01

    Background The increasing number of assembled mammalian genomes makes it possible to compare genome organisation across mammalian lineages and reconstruct chromosomes of the ancestral marsupial and therian (marsupial and eutherian) mammals. However, the reconstruction of ancestral genomes requires genome assemblies to be anchored to chromosomes. The recently sequenced tammar wallaby (Macropus eugenii) genome was assembled into over 300,000 contigs. We previously devised an efficient strategy for mapping large evolutionarily conserved blocks in non-model mammals, and applied this to determine the arrangement of conserved blocks on all wallaby chromosomes, thereby permitting comparative maps to be constructed and resolve the long debated issue between a 2n = 14 and 2n = 22 ancestral marsupial karyotype. Results We identified large blocks of genes conserved between human and opossum, and mapped genes corresponding to the ends of these blocks by fluorescence in situ hybridization (FISH). A total of 242 genes was assigned to wallaby chromosomes in the present study, bringing the total number of genes mapped to 554 and making it the most densely cytogenetically mapped marsupial genome. We used these gene assignments to construct comparative maps between wallaby and opossum, which uncovered many intrachromosomal rearrangements, particularly for genes found on wallaby chromosomes X and 3. Expanding comparisons to include chicken and human permitted the putative ancestral marsupial (2n = 14) and therian mammal (2n = 19) karyotypes to be reconstructed. Conclusions Our physical mapping data for the tammar wallaby has uncovered the events shaping marsupial genomes and enabled us to predict the ancestral marsupial karyotype, supporting a 2n = 14 ancestor. Futhermore, our predicted therian ancestral karyotype has helped to understand the evolution of the ancestral eutherian genome. PMID:24261750

  13. Ancient human genomes suggest three ancestral populations for present-day Europeans.

    PubMed

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H; Schraiber, Joshua G; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M; Brisighelli, Francesca; Busby, George B J; Cali, Francesco; Churnosov, Mikhail; Cole, David E C; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A; Gallego Romero, Irene; Gubina, Marina; Hammer, Michael; Henn, Brenna M; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl A; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G; Ruiz-Linares, Andres; Tishkoff, Sarah A; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-09-18

    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations' deep relationships and show that early European farmers had ∼44% ancestry from a 'basal Eurasian' population that split before the diversification of other non-African lineages. PMID:25230663

  14. Ancient human genomes suggest three ancestral populations for present-day Europeans

    PubMed Central

    Lazaridis, Iosif; Patterson, Nick; Mittnik, Alissa; Renaud, Gabriel; Mallick, Swapan; Kirsanow, Karola; Sudmant, Peter H.; Schraiber, Joshua G.; Castellano, Sergi; Lipson, Mark; Berger, Bonnie; Economou, Christos; Bollongino, Ruth; Fu, Qiaomei; Bos, Kirsten I.; Nordenfelt, Susanne; Li, Heng; de Filippo, Cesare; Prüfer, Kay; Sawyer, Susanna; Posth, Cosimo; Haak, Wolfgang; Hallgren, Fredrik; Fornander, Elin; Rohland, Nadin; Delsate, Dominique; Francken, Michael; Guinet, Jean-Michel; Wahl, Joachim; Ayodo, George; Babiker, Hamza A.; Bailliet, Graciela; Balanovska, Elena; Balanovsky, Oleg; Barrantes, Ramiro; Bedoya, Gabriel; Ben-Ami, Haim; Bene, Judit; Berrada, Fouad; Bravi, Claudio M.; Brisighelli, Francesca; Busby, George B. J.; Cali, Francesco; Churnosov, Mikhail; Cole, David E. C.; Corach, Daniel; Damba, Larissa; van Driem, George; Dryomov, Stanislav; Dugoujon, Jean-Michel; Fedorova, Sardana A.; Romero, Irene Gallego; Gubina, Marina; Hammer, Michael; Henn, Brenna M.; Hervig, Tor; Hodoglugil, Ugur; Jha, Aashish R.; Karachanak-Yankova, Sena; Khusainova, Rita; Khusnutdinova, Elza; Kittles, Rick; Kivisild, Toomas; Klitz, William; Kučinskas, Vaidutis; Kushniarevich, Alena; Laredj, Leila; Litvinov, Sergey; Loukidis, Theologos; Mahley, Robert W.; Melegh, Béla; Metspalu, Ene; Molina, Julio; Mountain, Joanna; Näkkäläjärvi, Klemetti; Nesheva, Desislava; Nyambo, Thomas; Osipova, Ludmila; Parik, Jüri; Platonov, Fedor; Posukh, Olga; Romano, Valentino; Rothhammer, Francisco; Rudan, Igor; Ruizbakiev, Ruslan; Sahakyan, Hovhannes; Sajantila, Antti; Salas, Antonio; Starikovskaya, Elena B.; Tarekegn, Ayele; Toncheva, Draga; Turdikulova, Shahlo; Uktveryte, Ingrida; Utevska, Olga; Vasquez, René; Villena, Mercedes; Voevoda, Mikhail; Winkler, Cheryl; Yepiskoposyan, Levon; Zalloua, Pierre; Zemunik, Tatijana; Cooper, Alan; Capelli, Cristian; Thomas, Mark G.; Ruiz-Linares, Andres; Tishkoff, Sarah A.; Singh, Lalji; Thangaraj, Kumarasamy; Villems, Richard; Comas, David; Sukernik, Rem; Metspalu, Mait; Meyer, Matthias; Eichler, Evan E.; Burger, Joachim; Slatkin, Montgomery; Pääbo, Svante; Kelso, Janet; Reich, David; Krause, Johannes

    2014-01-01

    We sequenced the genomes of a ~7,000 year old farmer from Germany and eight ~8,000 year old hunter-gatherers from Luxembourg and Sweden. We analyzed these and other ancient genomes1–4 with 2,345 contemporary humans to show that most present Europeans derive from at least three highly differentiated populations: West European Hunter-Gatherers (WHG), who contributed ancestry to all Europeans but not to Near Easterners; Ancient North Eurasians (ANE) related to Upper Paleolithic Siberians3, who contributed to both Europeans and Near Easterners; and Early European Farmers (EEF), who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model these populations’ deep relationships and show that EEF had ~44% ancestry from a “Basal Eurasian” population that split prior to the diversification of other non-African lineages. PMID:25230663

  15. Genesis of the vertebrate FoxP subfamily member genes occurred during two ancestral whole genome duplication events.

    PubMed

    Song, Xiaowei; Tang, Yezhong; Wang, Yajun

    2016-08-22

    The vertebrate FoxP subfamily genes play important roles in the construction of essential functional modules involved in physiological and developmental processes. To explore the adaptive evolution of functional modules associated with the FoxP subfamily member genes, it is necessary to study the gene duplication process. We detected four member genes of the FoxP subfamily in sea lampreys (a representative species of jawless vertebrates) through genome screenings and phylogenetic analyses. Reliable paralogons (i.e. paralogous chromosome segments) have rarely been detected in scaffolds of FoxP subfamily member genes in sea lampreys due to the considerable existence of HTH_Tnp_Tc3_2 transposases. However, these transposases did not alter gene numbers of the FoxP subfamily in sea lampreys. The coincidence between the "1-4" gene duplication pattern of FoxP subfamily genes from invertebrates to vertebrates and two rounds of ancestral whole genome duplication (1R- and 2R-WGD) events reveal that the FoxP subfamily of vertebrates was quadruplicated in the 1R- and 2R-WGD events. Furthermore, we deduced that a synchronous gene duplication process occurred for the FoxP subfamily and for three linked gene families/subfamilies (i.e. MIT family, mGluR group III and PLXNA subfamily) in the 1R- and 2R-WGD events using phylogenetic analyses and mirror-dendrogram methods (i.e. algorithms to test protein-protein interactions). Specifically, the ancestor of FoxP1 and FoxP3 and the ancestor of FoxP2 and FoxP4 were generated in 1R-WGD event. In the subsequent 2R-WGD event, these two ancestral genes were changed into FoxP1, FoxP2, FoxP3 and FoxP4. The elucidation of these gene duplication processes shed light on the phylogenetic relationships between functional modules of the FoxP subfamily member genes. PMID:27188254

  16. Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

    PubMed Central

    Lipson, Mark; Loh, Po-Ru; Sankararaman, Sriram; Patterson, Nick; Berger, Bonnie; Reich, David

    2015-01-01

    The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Most notably, recent approaches based on counting de novo mutations in family pedigrees have yielded significantly smaller values than classical methods based on sequence divergence. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.61 ± 0.13 × 10−8 mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes. PMID:26562831

  17. Genome-wide association study and ancestral origins of the slick-hair coat in tropically adapted cattle

    PubMed Central

    Huson, Heather J.; Kim, Eui-Soo; Godfrey, Robert W.; Olson, Timothy A.; McClure, Matthew C.; Chase, Chad C.; Rizzi, Rita; O'Brien, Ana M. P.; Van Tassell, Curt P.; Garcia, José F.; Sonstegard, Tad S.

    2014-01-01

    The slick hair coat (SLICK) is a dominantly inherited trait typically associated with tropically adapted cattle that are from Criollo descent through Spanish colonization of cattle into the New World. The trait is of interest relative to climate change, due to its association with improved thermo-tolerance and subsequent increased productivity. Previous studies localized the SLICK locus to a 4 cM region on chromosome (BTA) 20 and identified signatures of selection in this region derived from Senepol cattle. The current study compares three slick-haired Criollo-derived breeds including Senepol, Carora, and Romosinuano and three additional slick-haired cross-bred lineages to non-slick ancestral breeds. Genome-wide association (GWA), haplotype analysis, signatures of selection, runs of homozygosity (ROH), and identity by state (IBS) calculations were used to identify a 0.8 Mb (37.7–38.5 Mb) consensus region for the SLICK locus on BTA20 in which contains SKP2 and SPEF2 as possible candidate genes. Three specific haplotype patterns are identified in slick individuals, all with zero frequency in non-slick individuals. Admixture analysis identified common genetic patterns between the three slick breeds at the SLICK locus. Principal component analysis (PCA) and admixture results show Senepol and Romosinuano sharing a higher degree of genetic similarity to one another with a much lesser degree of similarity to Carora. Variation in GWA, haplotype analysis, and IBS calculations with accompanying population structure information supports potentially two mutations, one common to Senepol and Romosinuano and another in Carora, effecting genes contained within our refined location for the SLICK locus. PMID:24808908

  18. The first complete mitochondrial genome sequences of Amblypygi (Chelicerata: Arachnida) reveal conservation of the ancestral arthropod gene order.

    PubMed

    Fahrein, Kathrin; Masta, Susan E; Podsiadlowski, Lars

    2009-05-01

    Amblypygi (whip spiders) are terrestrial chelicerates inhabiting the subtropics and tropics. In morphological and rRNA-based phylogenetic analyses, Amblypygi cluster with Uropygi (whip scorpions) and Araneae (spiders) to form the taxon Tetrapulmonata, but there is controversy regarding the interrelationship of these three taxa. Mitochondrial genomes provide an additional large data set of phylogenetic information (sequences, gene order, RNA secondary structure), but in arachnids, mitochondrial genome data are missing for some of the major orders. In the course of an ongoing project concerning arachnid mitochondrial genomics, we present the first two complete mitochondrial genomes from Amblypygi. Both genomes were found to be typical circular duplex DNA molecules with all 37 genes usually present in bilaterian mitochondrial genomes. In both species, gene order is identical to that of Limulus polyphemus (Xiphosura), which is assumed to reflect the putative arthropod ground pattern. All tRNA gene sequences have the potential to fold into structures that are typical of metazoan mitochondrial tRNAs, except for tRNA-Ala, which lacks the D arm in both amblypygids, suggesting the loss of this feature early in amblypygid evolution. Phylogenetic analysis resulted in weak support for Uropygi being the sister group of Amblypygi. PMID:19448726

  19. Genome Content and Phylogenomics Reveal both Ancestral and Lateral Evolutionary Pathways in Plant-Pathogenic Streptomyces Species.

    PubMed

    Huguet-Tapia, Jose C; Lefebure, Tristan; Badger, Jonathan H; Guan, Dongli; Pettis, Gregg S; Stanhope, Michael J; Loria, Rosemary

    2016-04-01

    Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232

  20. Comparative sequence analyses indicate that Coffea (Asterids) and Vitis (Rosids) derive from the same paleo-hexaploid ancestral genome.

    PubMed

    Cenci, Alberto; Combes, Marie-Christine; Lashermes, Philippe

    2010-05-01

    The complete sequence of Vitis vinifera revealed that the rosid clade derives from a hexaploid ancestor. At present, no analysis of complete genome sequence is available for an asterid, the other large eudicot clade, which includes the economically important species potato, tomato and coffee. To elucidate the genomic history of asterids, we compared the sequence of an 800 kb region of diploid Coffea genome to the orthologous regions of V. vinifera, Populus trichocarpa and Arabidopsis thaliana. We found a very high level of collinearity between around 80 genes of the three rosid species and Coffea. Collinearity comparisons between orthologous and paralogous regions indicates that (1) the Coffea (and consequently all asterids) and rosids share the same hexaploid ancestor; (2) the diploidization process (loss of duplicated and redundant copies from the whole genome duplication) was very advanced in the most recent common ancestor of rosids and asterids. Finally, no additional polyploidization events were detected in the Coffea lineage. Differences in gene loss rates were detected among the three rosid species and linked to the divergence in protein sequences. PMID:20361338

  1. Annotating Large Genomes With Exact Word Matches

    PubMed Central

    Healy, John; Thomas, Elizabeth E.; Schwartz, Jacob T.; Wigler, Michael

    2003-01-01

    We have developed a tool for rapidly determining the number of exact matches of any word within large, internally repetitive genomes or sets of genomes. Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words. We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM. Our original interest was motivated by oligonucleotide probe design, and we describe a general protocol for defining unique hybridization probes. But our method also has applications for the analysis of genome structure and assembly. We demonstrate the identification of chromosome-specific repeats, and outline a general procedure for finding undiscovered repeats. We also illustrate the changing contents of the human genome assemblies by comparing the annotations built from different genome freezes. PMID:12975312

  2. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  3. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  4. A Dense Linkage Map for Chinook salmon (Oncorhynchus tshawytscha) Reveals Variable Chromosomal Divergence After an Ancestral Whole Genome Duplication Event

    PubMed Central

    Brieuc, Marine S. O.; Waters, Charles D.; Seeb, James E.; Naish, Kerry A.

    2014-01-01

    Comparisons between the genomes of salmon species reveal that they underwent extensive chromosomal rearrangements following whole genome duplication that occurred in their lineage 58−63 million years ago. Extant salmonids are diploid, but occasional pairing between homeologous chromosomes exists in males. The consequences of re-diploidization can be characterized by mapping the position of duplicated loci in such species. Linkage maps are also a valuable tool for genome-wide applications such as genome-wide association studies, quantitative trait loci mapping or genome scans. Here, we investigated chromosomal evolution in Chinook salmon (Oncorhynchus tshawytscha) after genome duplication by mapping 7146 restriction-site associated DNA loci in gynogenetic haploid, gynogenetic diploid, and diploid crosses. In the process, we developed a reference database of restriction-site associated DNA loci for Chinook salmon comprising 48528 non-duplicated loci and 6409 known duplicated loci, which will facilitate locus identification and data sharing. We created a very dense linkage map anchored to all 34 chromosomes for the species, and all arms were identified through centromere mapping. The map positions of 799 duplicated loci revealed that homeologous pairs have diverged at different rates following whole genome duplication, and that degree of differentiation along arms was variable. Many of the homeologous pairs with high numbers of duplicated markers appear conserved with other salmon species, suggesting that retention of conserved homeologous pairing in some arms preceded species divergence. As chromosome arms are highly conserved across species, the major resources developed for Chinook salmon in this study are also relevant for other related species. PMID:24381192

  5. Global Alignment System for Large Genomic Sequencing

    Energy Science and Technology Software Center (ESTSC)

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  6. TYPES AND RATES OF SEQUENCE EVOLUTION AT HMW-GLUTENIN LOCUS IN HEXAPLOID WHEAT AND ITS ANCESTRAL GENOMES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Glu-1 locus, encoding the High Molecular Weight-glutenin protein subunits, controls bread-making quality in hexaploid wheat (Triticum aestivum) and represents a recently evolved region unique to Triticeae genomes. To understand the molecular evolution of this locus region, three orthologous Glu...

  7. Rapid genome-wide evolution in Brassica rapa populations following drought revealed by sequencing of ancestral and descendant gene pools.

    PubMed

    Franks, Steven J; Kane, Nolan C; O'Hara, Niamh B; Tittes, Silas; Rest, Joshua S

    2016-08-01

    There is increasing evidence that evolution can occur rapidly in response to selection. Recent advances in sequencing suggest the possibility of documenting genetic changes as they occur in populations, thus uncovering the genetic basis of evolution, particularly if samples are available from both before and after selection. Here, we had a unique opportunity to directly assess genetic changes in natural populations following an evolutionary response to a fluctuation in climate. We analysed genome-wide differences between ancestors and descendants of natural populations of Brassica rapa plants from two locations that rapidly evolved changes in multiple phenotypic traits, including flowering time, following a multiyear late-season drought in California. These ancestor-descendant comparisons revealed evolutionary shifts in allele frequencies in many genes. Some genes showing evolutionary shifts have functions related to drought stress and flowering time, consistent with an adaptive response to selection. Loci differentiated between ancestors and descendants (FST outliers) were generally different from those showing signatures of selection based on site frequency spectrum analysis (Tajima's D), indicating that the loci that evolved in response to the recent drought and those under historical selection were generally distinct. Very few genes showed similar evolutionary responses between two geographically distinct populations, suggesting independent genetic trajectories of evolution yielding parallel phenotypic changes. The results show that selection can result in rapid genome-wide evolutionary shifts in allele frequencies in natural populations, and highlight the usefulness of combining resurrection experiments in natural populations with genomics for studying the genetic basis of adaptive evolution. PMID:27072809

  8. Evolutionary genomics of nucleo-cytoplasmic large DNA viruses.

    PubMed

    Iyer, Lakshminarayan M; Balaji, S; Koonin, Eugene V; Aravind, L

    2006-04-01

    A previous comparative-genomic study of large nuclear and cytoplasmic DNA viruses (NCLDVs) of eukaryotes revealed the monophyletic origin of four viral families: poxviruses, asfarviruses, iridoviruses, and phycodnaviruses [Iyer, L.M., Aravind, L., Koonin, E.V., 2001. Common origin of four diverse families of large eukaryotic DNA viruses. J. Virol. 75 (23), 11720-11734]. Here we update this analysis by including the recently sequenced giant genome of the mimiviruses and several additional genomes of iridoviruses, phycodnaviruses, and poxviruses. The parsimonious reconstruction of the gene complement of the ancestral NCLDV shows that it was a complex virus with at least 41 genes that encoded the replication machinery, up to four RNA polymerase subunits, at least three transcription factors, capping and polyadenylation enzymes, the DNA packaging apparatus, and structural components of an icosahedral capsid and the viral membrane. The phylogeny of the NCLDVs is reconstructed by cladistic analysis of the viral gene complements, and it is shown that the two principal lineages of NCLDVs are comprised of poxviruses grouped with asfarviruses and iridoviruses grouped with phycodnaviruses-mimiviruses. The phycodna-mimivirus grouping was strongly supported by several derived shared characters, which seemed to rule out the previously suggested basal position of the mimivirus [Raoult, D., Audic, S., Robert, C., Abergel, C., Renesto, P., Ogata, H., La Scola, B., Suzan, M., Claverie, J.M. 2004. The 1.2-megabase genome sequence of Mimivirus. Science 306 (5700), 1344-1350]. These results indicate that the divergence of the major NCLDV families occurred at an early stage of evolution, prior to the divergence of the major eukaryotic lineages. It is shown that subsequent evolution of the NCLDV genomes involved lineage-specific expansion of paralogous gene families and acquisition of numerous genes via horizontal gene transfer from the eukaryotic hosts, other viruses, and bacteria

  9. Genome-wide association study identifies HLA 8.1 ancestral haplotype alleles as major genetic risk factors for myositis phenotypes.

    PubMed

    Miller, F W; Chen, W; O'Hanlon, T P; Cooper, R G; Vencovsky, J; Rider, L G; Danko, K; Wedderburn, L R; Lundberg, I E; Pachman, L M; Reed, A M; Ytterberg, S R; Padyukov, L; Selva-O'Callaghan, A; Radstake, T R; Isenberg, D A; Chinoy, H; Ollier, W E R; Scheet, P; Peng, B; Lee, A; Byun, J; Lamb, J A; Gregersen, P K; Amos, C I

    2015-10-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis, 473 juvenile dermatomyositis, 532 polymyositis and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl-tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P<5×10(-8)) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1 comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  10. Genome-wide Association Study Identifies HLA 8.1 Ancestral Haplotype Alleles as Major Genetic Risk Factors for Myositis Phenotypes

    PubMed Central

    Miller, Frederick W.; Chen, Wei; O’Hanlon, Terrance P.; Cooper, Robert G.; Vencovsky, Jiri; Rider, Lisa G.; Danko, Katalin; Wedderburn, Lucy R.; Lundberg, Ingrid E.; Pachman, Lauren M.; Reed, Ann M.; Ytterberg, Steven R.; Padyukov, Leonid; Selva-O’Callaghan, Albert; Radstake, Timothy R.; Isenberg, David A.; Chinoy, Hector; Ollier, William E.R.; Scheet, Paul; Peng, Bo; Lee, Annette; Byun, Jinyoung; Lamb, Janine A.; Gregersen, Peter K.; Amos, Christopher I.

    2016-01-01

    Autoimmune muscle diseases (myositis) comprise a group of complex phenotypes influenced by genetic and environmental factors. To identify genetic risk factors in patients of European ancestry, we conducted a genome-wide association study (GWAS) of the major myositis phenotypes in a total of 1710 cases, which included 705 adult dermatomyositis; 473 juvenile dermatomyositis; 532 polymyositis; and 202 adult dermatomyositis, juvenile dermatomyositis or polymyositis patients with anti-histidyl tRNA synthetase (anti-Jo-1) autoantibodies, and compared them with 4724 controls. Single-nucleotide polymorphisms showing strong associations (P < 5 × 10−8) in GWAS were identified in the major histocompatibility complex (MHC) region for all myositis phenotypes together, as well as for the four clinical and autoantibody phenotypes studied separately. Imputation and regression analyses found that alleles comprising the human leukocyte antigen (HLA) 8.1 ancestral haplotype (AH8.1) defined essentially all the genetic risk in the phenotypes studied. Although the HLA DRB1*03:01 allele showed slightly stronger associations with adult and juvenile dermatomyositis, and HLA B*08:01 with polymyositis and anti-Jo-1 autoantibody-positive myositis, multiple alleles of AH8.1 were required for the full risk effects. Our findings establish that alleles of the AH8.1haplotype comprise the primary genetic risk factors associated with the major myositis phenotypes in geographically diverse Caucasian populations. PMID:26291516

  11. Genomes of Helicobacter pylori from native Peruvians suggest admixture of ancestral and modern lineages and reveal a western type cag-pathogenicity island

    PubMed Central

    Devi, S Manjulata; Ahmed, Irshad; Khan, Aleem A; Rahman, Syed Asad; Alvi, Ayesha; Sechi, Leonardo A; Ahmed, Niyaz

    2006-01-01

    Background Helicobacter pylori is presumed to be co-evolved with its human host and is a highly diverse gastric pathogen at genetic levels. Ancient origins of H. pylori in the New World are still debatable. It is not clear how different waves of human migrations in South America contributed to the evolution of strain diversity of H. pylori. The objective of our 'phylogeographic' study was to gain fresh insights into these issues through mapping genetic origins of H. pylori of native Peruvians (of Amerindian ancestry) and their genomic comparison with isolates from Spain, and Japan. Results For this purpose, we attempted to dissect genetic identity of strains by fluorescent amplified fragment length polymorphism (FAFLP) analysis, multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and the sequence analyses of the babB adhesin and oipA genes. The whole cag pathogenicity-island (cagPAI) from these strains was analyzed using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. We observed that while European genotype (hp-Europe) predominates in native Peruvian strains, approximately 20% of these strains represent a sub-population with an Amerindian ancestry (hsp-Amerind). All of these strains however, irrespective of their ancestral affiliation harbored a complete, 'western' type cagPAI and the motifs surrounding it. This indicates a possible acquisition of cagPAI by the hsp-Amerind strains from the European strains, during decades of co-colonization. Conclusion Our observations suggest presence of ancestral H. pylori (hsp-Amerind) in Peruvian Amerindians which possibly managed to survive and compete against the Spanish strains that arrived to the New World about 500 years ago. We suggest that this might have happened after native Peruvian H. pylori strains acquired cagPAI sequences, either by new acquisition in cag-negative strains or by recombination in cag positive

  12. Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals.

    PubMed

    Yang, Fengtang; Graphodatsky, Alexander S; Li, Tangliang; Fu, Beiyuan; Dobigny, Gauthier; Wang, Jinghuan; Perelman, Polina L; Serdukova, Natalya A; Su, Weiting; O'Brien, Patricia Cm; Wang, Yingxiang; Ferguson-Smith, Malcolm A; Volobouev, Vitaly; Nie, Wenhui

    2006-01-01

    To better understand the evolution of genome organization of eutherian mammals, comparative maps based on chromosome painting have been constructed between human and representative species of three eutherian orders: Xenarthra, Pholidota, and Eulipotyphla, as well as between representative species of the Carnivora and Pholidota. These maps demonstrate the conservation of such syntenic segment associations as HSA3/21, 4/8, 7/16, 12/22, 14/15 and 16/19 in Eulipotyphla, Pholidota and Xenarthra and thus further consolidate the notion that they form part of the ancestral karyotype of the eutherian mammals. Our study has revealed many potential ancestral syntenic associations of human chromosomal segments that serve to link the families as well as orders within the major superordinial eutherian clades defined by molecular markers. The HSA2/8 and 7/10 associations could be the cytogenetic signatures that unite the Xenarthrans, while the HSA1/19p could be a putative signature that links the Afrotheria and Xenarthra. But caution is required in the interpretation of apparently shared syntenic associations as detailed analyses also show examples of apparent convergent evolution that differ in breakpoints and extent of the involved segments. PMID:16628499

  13. The Psychiatric Genomics Consortium Posttraumatic Stress Disorder Workgroup: Posttraumatic Stress Disorder Enters the Age of Large-Scale Genomic Collaboration

    PubMed Central

    Logue, Mark W; Amstadter, Ananda B; Baker, Dewleen G; Duncan, Laramie; Koenen, Karestan C; Liberzon, Israel; Miller, Mark W; Morey, Rajendra A; Nievergelt, Caroline M; Ressler, Kerry J; Smith, Alicia K; Smoller, Jordan W; Stein, Murray B; Sumner, Jennifer A; Uddin, Monica

    2015-01-01

    The development of posttraumatic stress disorder (PTSD) is influenced by genetic factors. Although there have been some replicated candidates, the identification of risk variants for PTSD has lagged behind genetic research of other psychiatric disorders such as schizophrenia, autism, and bipolar disorder. Psychiatric genetics has moved beyond examination of specific candidate genes in favor of the genome-wide association study (GWAS) strategy of very large numbers of samples, which allows for the discovery of previously unsuspected genes and molecular pathways. The successes of genetic studies of schizophrenia and bipolar disorder have been aided by the formation of a large-scale GWAS consortium: the Psychiatric Genomics Consortium (PGC). In contrast, only a handful of GWAS of PTSD have appeared in the literature to date. Here we describe the formation of a group dedicated to large-scale study of PTSD genetics: the PGC-PTSD. The PGC-PTSD faces challenges related to the contingency on trauma exposure and the large degree of ancestral genetic diversity within and across participating studies. Using the PGC analysis pipeline supplemented by analyses tailored to address these challenges, we anticipate that our first large-scale GWAS of PTSD will comprise over 10 000 cases and 30 000 trauma-exposed controls. Following in the footsteps of our PGC forerunners, this collaboration—of a scope that is unprecedented in the field of traumatic stress—will lead the search for replicable genetic associations and new insights into the biological underpinnings of PTSD. PMID:25904361

  14. Large-Scale Development of Gene-Associated Single-Nucleotide Polymorphism Markers for Molluscan Population Genomic, Comparative Genomic, and Genome-Wide Association Studies

    PubMed Central

    Jiao, Wenqian; Fu, Xiaoteng; Li, Jinqin; Li, Ling; Feng, Liying; Lv, Jia; Zhang, Lu; Wang, Xiaojian; Li, Yangping; Hou, Rui; Zhang, Lingling; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

    2014-01-01

    Mollusca is the second most diverse group of animals in the world. Despite their perceived importance, omics-level studies have seldom been applied to this group of animals largely due to a paucity of genomic resources. Here, we report the first large-scale gene-associated marker development and evaluation for a bivalve mollusc, Chlamys farreri. More than 21,000 putative single-nucleotide polymorphisms (SNPs) were identified from the C. farreri transcriptome. Primers and probes were designed and synthesized for 4500 SNPs, and 1492 polymorphic markers were successfully developed using a high-resolution melting genotyping platform. These markers are particularly suitable for population genomic analysis due to high polymorphism within and across populations, a low frequency of null alleles, and conformation to neutral expectations. Unexpectedly, high cross-species transferability was observed, suggesting that the transferable SNPs may largely represent ancestral genetic variations that have been preserved differentially among subfamilies of Pectinidae. Gene annotations were available for 73% of the markers, and 65% could be anchored to the recently released Pacific oyster genome. Large-scale association analysis revealed key candidate genes responsible for scallop growth regulation, and provided markers for further genetic improvement of C. farreri in breeding programmes. PMID:24277739

  15. A consensus map in cultivated hexaploid oat reveals conserved grass synteny with substantial sub-genome rearrangement

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Hexaploid oat (Avena sativa, 2n = 6x = 42) is a member of the Poaceae family with a very large genome (~13 Gb) containing 21 chromosome pairs: seven from each of two similar ancestral diploids (A and D) and seven from a more diverged ancestral diploid (C). Physical rearrangements among ancestral oat...

  16. Core-SINE blocks comprise a large fraction of monotreme genomes; implications for vertebrate chromosome evolution.

    PubMed

    Kirby, Patrick J; Greaves, Ian K; Koina, Edda; Waters, Paul D; Marshall Graves, Jennifer A

    2007-01-01

    The genomes of the egg-laying platypus and echidna are of particular interest because monotremes are the most basal mammal group. The chromosomal distribution of an ancient family of short interspersed repeats (SINEs), the core-SINEs, was investigated to better understand monotreme genome organization and evolution. Previous studies have identified the core-SINE as the predominant SINE in the platypus genome, and in this study we quantified, characterized and localized subfamilies. Dot blot analysis suggested that a very large fraction (32% of the platypus and 16% of the echidna genome) is composed of Mon core-SINEs. Core-SINE-specific primers were used to amplify PCR products from platypus and echidna genomic DNA. Sequence analysis suggests a common consensus sequence Mon 1-B, shared by platypus and echidna, as well as platypus-specific Mon 1-C and echidna specific Mon 1-D consensus sequences. FISH mapping of the Mon core-SINE products to platypus metaphase spreads demonstrates that the Mon-1C subfamily is responsible for the striking Mon core-SINE accumulation in the distal regions of the six large autosomal pairs and the largest X chromosome. This unusual distribution highlights the dichotomy between the seven large chromosome pairs and the 19 smaller pairs in the monotreme karyotype, which has some similarity to the macro- and micro-chromosomes of birds and reptiles, and suggests that accumulation of repetitive sequences may have enlarged small chromosomes in an ancestral vertebrate. In the forthcoming sequence of the platypus genome there are still large gaps, and the extensive Mon core-SINE accumulation on the distal regions of the six large autosomal pairs may provide one explanation for this missing sequence. PMID:18185983

  17. Genome comparison of Pseudomonas aeruginosa large phages.

    PubMed

    Hertveldt, Kirsten; Lavigne, Rob; Pleteneva, Elena; Sernova, Natalia; Kurochkina, Lidia; Korchevskii, Roman; Robben, Johan; Mesyanzhinov, Vadim; Krylov, Victor N; Volckaert, Guido

    2005-12-01

    Pseudomonas aeruginosa phage EL is a dsDNA phage related to the giant phiKZ-like Myoviridae. The EL genome sequence comprises 211,215 bp and has 201 predicted open reading frames (ORFs). The EL genome does not share DNA sequence homology with other viruses and micro-organisms sequenced to date. However, one-third of the predicted EL gene products (gps) shares similarity (Blast alignments of 17-55% amino acid identity) with phiKZ proteins. Comparative EL and phiKZ genomics reveals that these giant phages are an example of substantially diverged genetic mosaics. Based on the position of similar EL and phiKZ predicted gene products, five genome regions can be delineated in EL, four of which are relatively conserved between EL and phiKZ. Region IV, a 17.7 kb genome region with 28 predicted ORFs, is unique to EL. Fourteen EL ORFs have been assigned a putative function based on protein similarity. Assigned proteins are involved in DNA replication and nucleotide metabolism (NAD+-dependent DNA ligase, ribonuclease HI, helicase, thymidylate kinase), host lysis and particle structure. EL-gp146 is the first chaperonin GroEL sequence identified in a viral genome. Besides a putative transposase, EL harbours predicted mobile endonucleases related to H-N-H and LAGLIDADG homing endonucleases associated with group I intron and intein intervening sequences. PMID:16256135

  18. Precision Editing of Large Animal Genomes

    PubMed Central

    Tan, Wenfang (Spring); Carlson, Daniel F.; Walton, Mark W.; Fahrenkrug, Scott C.; Hackett, Perry B.

    2013-01-01

    Transgenic animals are an important source of protein and nutrition for most humans and will play key roles in satisfying the increasing demand for food in an ever-increasing world population. The past decade has experienced a revolution in the development of methods that permit the introduction of specific alterations to complex genomes. This precision will enhance genome-based improvement of farm animals for food production. Precision genetics also will enhance the development of therapeutic biomaterials and models of human disease as resources for the development of advanced patient therapies. PMID:23084873

  19. Ancestral Origins and Genetic History of Tibetan Highlanders.

    PubMed

    Lu, Dongsheng; Lou, Haiyi; Yuan, Kai; Wang, Xiaoji; Wang, Yuchen; Zhang, Chao; Lu, Yan; Yang, Xiong; Deng, Lian; Zhou, Ying; Feng, Qidi; Hu, Ya; Ding, Qiliang; Yang, Yajun; Li, Shilin; Jin, Li; Guan, Yaqun; Su, Bing; Kang, Longli; Xu, Shuhua

    2016-09-01

    The origin of Tibetans remains one of the most contentious puzzles in history, anthropology, and genetics. Analyses of deeply sequenced (30×-60×) genomes of 38 Tibetan highlanders and 39 Han Chinese lowlanders, together with available data on archaic and modern humans, allow us to comprehensively characterize the ancestral makeup of Tibetans and uncover their origins. Non-modern human sequences compose ∼6% of the Tibetan gene pool and form unique haplotypes in some genomic regions, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled and elevated. The shared ancestry of Tibetan-enriched sequences dates back to ∼62,000-38,000 years ago, predating the Last Glacial Maximum (LGM) and representing early colonization of the plateau. Nonetheless, most of the Tibetan gene pool is of modern human origin and diverged from that of Han Chinese ∼15,000 to ∼9,000 years ago, which can be largely attributed to post-LGM arrivals. Analysis of ∼200 contemporary populations showed that Tibetans share ancestry with populations from East Asia (∼82%), Central Asia and Siberia (∼11%), South Asia (∼6%), and western Eurasia and Oceania (∼1%). Our results support that Tibetans arose from a mixture of multiple ancestral gene pools but that their origins are much more complicated and ancient than previously suspected. We provide compelling evidence of the co-existence of Paleolithic and Neolithic ancestries in the Tibetan gene pool, indicating a genetic continuity between pre-historical highland-foragers and present-day Tibetans. In particular, highly differentiated sequences harbored in highlanders' genomes were most likely inherited from pre-LGM settlers of multiple ancestral origins (SUNDer) and maintained in high frequency by natural selection. PMID:27569548

  20. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  1. GDC 2: Compression of large collections of genomes.

    PubMed

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  2. GDC 2: Compression of large collections of genomes

    PubMed Central

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  3. Identification of large-scale genomic variation in cancer genomes using in silico reference models.

    PubMed

    Killcoyne, Sarah; Del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  4. Identification of large-scale genomic variation in cancer genomes using in silico reference models

    PubMed Central

    Killcoyne, Sarah; del Sol, Antonio

    2016-01-01

    Identifying large-scale structural variation in cancer genomes continues to be a challenge to researchers. Current methods rely on genome alignments based on a reference that can be a poor fit to highly variant and complex tumor genomes. To address this challenge we developed a method that uses available breakpoint information to generate models of structural variations. We use these models as references to align previously unmapped and discordant reads from a genome. By using these models to align unmapped reads, we show that our method can help to identify large-scale variations that have been previously missed. PMID:26264669

  5. Ancestral gene synteny reconstruction improves extant species scaffolding

    PubMed Central

    2015-01-01

    We exploit the methodological similarity between ancestral genome reconstruction and extant genome scaffolding. We present a method, called ARt-DeCo that constructs neighborhood relationships between genes or contigs, in both ancestral and extant genomes, in a phylogenetic context. It is able to handle dozens of complete genomes, including genes with complex histories, by using gene phylogenies reconciled with a species tree, that is, annotated with speciation, duplication and loss events. Reconstructed ancestral or extant synteny comes with a support computed from an exhaustive exploration of the solution space. We compare our method with a previously published one that follows the same goal on a small number of genomes with universal unicopy genes. Then we test it on the whole Ensembl database, by proposing partial ancestral genome structures, as well as a more complete scaffolding for many partially assembled genomes on 69 eukaryote species. We carefully analyze a couple of extant adjacencies proposed by our method, and show that they are indeed real links in the extant genomes, that were missing in the current assembly. On a reduced data set of 39 eutherian mammals, we estimate the precision and sensitivity of ARt-DeCo by simulating a fragmentation in some well assembled genomes, and measure how many adjacencies are recovered. We find a very high precision, while the sensitivity depends on the quality of the data and on the proximity of closely related genomes. PMID:26450761

  6. BACFinder: genomic localisation of large insert genomic clones based on restriction fingerprinting

    PubMed Central

    Crowe, Mark L.; Rana, Debashis; Fraser, Fiona; Bancroft, Ian; Trick, Martin

    2002-01-01

    We have developed software that allows the prediction of the genomic location of a bacterial artificial chromosome (BAC) clone, or other large genomic clone, based on a simple restriction digest of the BAC. The mapping is performed by comparing the experimentally derived restriction digest of the BAC DNA with a virtual restriction digest of the whole genome sequence. Our trials indicate that this program identified the genomic regions represented by BAC clones with a degree of accuracy comparable to that of end-sequencing, but at considerably less cost. Although the program has been developed principally for use with Arabidopsis BACs, it should align large insert genomic clones to any fully sequenced genome. PMID:12409477

  7. Exon capture optimization in amphibians with large genomes.

    PubMed

    McCartney-Melstad, Evan; Mount, Genevieve G; Shaffer, H Bradley

    2016-09-01

    Gathering genomic-scale data efficiently is challenging for nonmodel species with large, complex genomes. Transcriptome sequencing is accessible for organisms with large genomes, and sequence capture probes can be designed from such mRNA sequences to enrich and sequence exonic regions. Maximizing enrichment efficiency is important to reduce sequencing costs, but relatively few data exist for exon capture experiments in nonmodel organisms with large genomes. Here, we conducted a replicated factorial experiment to explore the effects of several modifications to standard protocols that might increase sequence capture efficiency for amphibians and other taxa with large, complex genomes. Increasing the amounts of c0 t-1 repetitive sequence blocker and individual input DNA used in target enrichment reactions reduced the rates of PCR duplication. This reduction led to an increase in the percentage of unique reads mapping to target sequences, essentially doubling overall efficiency of the target capture from 10.4% to nearly 19.9% and rendering target capture experiments more efficient and affordable. Our results indicate that target capture protocols can be modified to efficiently screen vertebrates with large genomes, including amphibians. PMID:27223337

  8. Large-scale structure of genomic methylation patterns.

    PubMed

    Rollins, Robert A; Haghighi, Fatemeh; Edwards, John R; Das, Rajdeep; Zhang, Michael Q; Ju, Jingyue; Bestor, Timothy H

    2006-02-01

    The mammalian genome depends on patterns of methylated cytosines for normal function, but the relationship between genomic methylation patterns and the underlying sequence is unclear. We have characterized the methylation landscape of the human genome by global analysis of patterns of CpG depletion and by direct sequencing of 3073 unmethylated domains and 2565 methylated domains from human brain DNA. The genome was found to consist of short (<4 kb) unmethylated domains embedded in a matrix of long methylated domains. Unmethylated domains were enriched in promoters, CpG islands, and first exons, while methylated domains comprised interspersed and tandem-repeated sequences, exons other than first exons, and non-annotated single-copy sequences that are depleted in the CpG dinucleotide. The enrichment of regulatory sequences in the relatively small unmethylated compartment suggests that cytosine methylation constrains the effective size of the genome through the selective exposure of regulatory sequences. This buffers regulatory networks against changes in total genome size and provides an explanation for the C value paradox, which concerns the wide variations in genome size that scale independently of gene number. This suggestion is compatible with the finding that cytosine methylation is universal among large-genome eukaryotes, while many eukaryotes with genome sizes <5 x 10(8) bp do not methylate their DNA. PMID:16365381

  9. A method to capture large DNA fragments from genomic DNA.

    PubMed

    Ball, Geneviève; Filloux, Alain; Voulhoux, Romé

    2014-01-01

    The gene capture technique is a powerful tool that allows the cloning of large DNA regions (up to 80 kb), such as entire genomic islands, without using restriction enzymes or DNA amplification. This technique takes advantage of the high recombinant capacity of the yeast. A "capture" vector containing both ends of the target DNA region must first be constructed. The target region is then captured by co-transformation and recombination in yeast between the "capture" vector and appropriate genomic DNA. The selected recombinant plasmid can be verified by sequencing and transferred in the bacteria for multiple applications. This chapter describes a protocol specifically adapted for Pseudomonas aeruginosa genomic DNA capture. PMID:24818928

  10. Stability analysis of chickpea large genomic DNA inserts in Agrobacterium.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Agrobacterium tumefaciens-mediated transformation of large DNA inserts directly into plants facilitates the transfer of gene clusters and flanking regulatory elements. It is recommended that the integrity of large genomic fragments in Agrobacterium be verified prior to plant transformation. In this ...

  11. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    PubMed

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. PMID:25200798

  12. Unraveling recombination rate evolution using ancestral recombination maps

    PubMed Central

    Munch, Kasper; Schierup, Mikkel H; Mailund, Thomas

    2014-01-01

    Recombination maps of ancestral species can be constructed from comparative analyses of genomes from closely related species, exemplified by a recently published map of the human-chimpanzee ancestor. Such maps resolve differences in recombination rate between species into changes along individual branches in the speciation tree, and allow identification of associated changes in the genomic sequences. We describe how coalescent hidden Markov models are able to call individual recombination events in ancestral species through inference of incomplete lineage sorting along a genomic alignment. In the great apes, speciation events are sufficiently close in time that a map can be inferred for the ancestral species at each internal branch - allowing evolution of recombination rate to be tracked over evolutionary time scales from speciation event to speciation event. We see this approach as a way of characterizing the evolution of recombination rate and the genomic properties that influence it. PMID:25043668

  13. Territorial Polymers and Large Scale Genome Organization

    NASA Astrophysics Data System (ADS)

    Grosberg, Alexander

    2012-02-01

    Chromatin fiber in interphase nucleus represents effectively a very long polymer packed in a restricted volume. Although polymer models of chromatin organization were considered, most of them disregard the fact that DNA has to stay not too entangled in order to function properly. One polymer model with no entanglements is the melt of unknotted unconcatenated rings. Extensive simulations indicate that rings in the melt at large length (monomer numbers) N approach the compact state, with gyration radius scaling as N^1/3, suggesting every ring being compact and segregated from the surrounding rings. The segregation is consistent with the known phenomenon of chromosome territories. Surface exponent β (describing the number of contacts between neighboring rings scaling as N^β) appears only slightly below unity, β 0.95. This suggests that the loop factor (probability to meet for two monomers linear distance s apart) should decay as s^-γ, where γ= 2 - β is slightly above one. The later result is consistent with HiC data on real human interphase chromosomes, and does not contradict to the older FISH data. The dynamics of rings in the melt indicates that the motion of one ring remains subdiffusive on the time scale well above the stress relaxation time.

  14. Roary: rapid large-scale prokaryote pan genome analysis

    PubMed Central

    Page, Andrew J.; Cummins, Carla A.; Hunt, Martin; Wong, Vanessa K.; Reuter, Sandra; Holden, Matthew T.G.; Fookes, Maria; Falush, Daniel; Keane, Jacqueline A.; Parkhill, Julian

    2015-01-01

    Summary: A typical prokaryote population sequencing study can now consist of hundreds or thousands of isolates. Interrogating these datasets can provide detailed insights into the genetic structure of prokaryotic genomes. We introduce Roary, a tool that rapidly builds large-scale pan genomes, identifying the core and accessory genes. Roary makes construction of the pan genome of thousands of prokaryote samples possible on a standard desktop without compromising on the accuracy of results. Using a single CPU Roary can produce a pan genome consisting of 1000 isolates in 4.5 hours using 13 GB of RAM, with further speedups possible using multiple processors. Availability and implementation: Roary is implemented in Perl and is freely available under an open source GPLv3 license from http://sanger-pathogens.github.io/Roary Contact: roary@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26198102

  15. Large-scale data mining pilot project in human genome

    SciTech Connect

    Musick, R.; Fidelis, R.; Slezak, T.

    1997-05-01

    This whitepaper briefly describes a new, aggressive effort in large- scale data Livermore National Labs. The implications of `large- scale` will be clarified Section. In the short term, this effort will focus on several @ssion-critical questions of Genome project. We will adapt current data mining techniques to the Genome domain, to quantify the accuracy of inference results, and lay the groundwork for a more extensive effort in large-scale data mining. A major aspect of the approach is that we will be fully-staffed data warehousing effort in the human Genome area. The long term goal is strong applications- oriented research program in large-@e data mining. The tools, skill set gained will be directly applicable to a wide spectrum of tasks involving a for large spatial and multidimensional data. This includes applications in ensuring non-proliferation, stockpile stewardship, enabling Global Ecology (Materials Database Industrial Ecology), advancing the Biosciences (Human Genome Project), and supporting data for others (Battlefield Management, Health Care).

  16. Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships.

    PubMed

    Legarra, Andres; Christensen, Ole F; Vitezica, Zulma G; Aguilar, Ignacio; Misztal, Ignacy

    2015-06-01

    Recent use of genomic (marker-based) relationships shows that relationships exist within and across base population (breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicates the conciliation of both approaches and, in particular, combining pedigree with genomic relationships. We present a coherent theoretical framework to consider base population in pedigree relationships. We suggest a conceptual framework that considers each ancestral population as a finite-sized pool of gametes. This generates across-individual relationships and contrasts with the classical view which each population is considered as an infinite, unrelated pool. Several ancestral populations may be connected and therefore related. Each ancestral population can be represented as a "metafounder," a pseudo-individual included as founder of the pedigree and similar to an "unknown parent group." Metafounders have self- and across relationships according to a set of parameters, which measure ancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees. Equivalences of genetic variance and variance components between the classical and this new parameterization are shown. Segregation variance on crosses of populations is modeled. Efficient algorithms for computation of relationship matrices, their inverses, and inbreeding coefficients are presented. Use of metafounders leads to compatibility of genomic and pedigree relationship matrices and to simple computing algorithms. Examples and code are given. PMID:25873631

  17. Optimization of AFLP for extremely large genomes over 70 Gb.

    PubMed

    Veselá, Petra; Volařík, Daniel; Mráček, Jaroslav

    2016-07-01

    Here, we present an improved amplified fragment length polymorphism (AFLP) protocol using restriction enzymes (AscI and SbfI) that recognize 8-base pair sequences to provide alternative optimization suitable for species with a genome size over 70 Gb. This cost-effective optimization massively reduces the number of amplified fragments using only +3 selective bases per primer during selective amplification. We demonstrate the effects of the number of fragments and genome size on the appearance of nonidentical comigrating fragments (size homoplasy), which has a negative impact on the informative value of AFLP genotypes. We also present various reaction conditions and their effects on reproducibility and the band intensity of the extremely large genome of Viscum album. The reproducibility of this octo-cutter protocol was calculated using several species with genome sizes ranging from 1 Gb (Carex panicea) to 76 Gb (V. album). The improved protocol also succeeded in detecting high intraspecific variability in species with large genomes (V. album, Galanthus nivalis and Pinus pumila). PMID:26849414

  18. Kernel methods for large-scale genomic data analysis

    PubMed Central

    Xing, Eric P.; Schaid, Daniel J.

    2015-01-01

    Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today’s explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion. PMID:25053743

  19. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    SciTech Connect

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan; DiFazio, Steven P; Tuskan, Gerald A

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel and fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.

  20. Primate chromosome evolution: ancestral karyotypes, marker order and neocentromeres.

    PubMed

    Stanyon, R; Rocchi, M; Capozzi, O; Roberto, R; Misceo, D; Ventura, M; Cardone, M F; Bigoni, F; Archidiacono, N

    2008-01-01

    In 1992 the Japanese macaque was the first species for which the homology of the entire karyotype was established by cross-species chromosome painting. Today, there are chromosome painting data on more than 50 species of primates. Although chromosome painting is a rapid and economical method for tracking translocations, it has limited utility for revealing intrachromosomal rearrangements. Fortunately, the use of BAC-FISH in the last few years has allowed remarkable progress in determining marker order along primate chromosomes and there are now marker order data on an array of primate species for a good number of chromosomes. These data reveal inversions, but also show that centromeres of many orthologous chromosomes are embedded in different genomic contexts. Even if the mechanisms of neocentromere formation and progression are just beginning to be understood, it is clear that these phenomena had a significant impact on shaping the primate genome and are fundamental to our understanding of genome evolution. In this report we complete and integrate the dataset of BAC-FISH marker order for human syntenies 1, 2, 4, 5, 8, 12, 17, 18, 19, 21, 22 and the X. These results allowed us to develop hypotheses about the content, marker order and centromere position in ancestral karyotypes at five major branching points on the primate evolutionary tree: ancestral primate, ancestral anthropoid, ancestral platyrrhine, ancestral catarrhine and ancestral hominoid. Current models suggest that between-species structural rearrangements are often intimately related to speciation. Comparative primate cytogenetics has become an important tool for elucidating the phylogeny and the taxonomy of primates. It has become increasingly apparent that molecular cytogenetic data in the future can be fruitfully combined with whole-genome assemblies to advance our understanding of primate genome evolution as well as the mechanisms and processes that have led to the origin of the human genome. PMID

  1. Whole genome analysis of Vietnamese G2P[4] rotavirus strains possessing the NSP2 gene sharing an ancestral sequence with Chinese sheep and goat rotavirus strains.

    PubMed

    Do, Loan Phuong; Doan, Yen Hai; Nakagomi, Toyoko; Gauchan, Punita; Kaneko, Miho; Agbemabiese, Chantal; Dang, Anh Duc; Nakagomi, Osamu

    2015-10-01

    Because imminent introduction into Vietnam of a vaccine against Rotavirus A is anticipated, baseline information on the whole genome of representative strains is needed to understand changes in circulating strains that may occur after vaccine introduction. In this study, the whole genomes of two G2P[4] strains detected in Nha Trang, Vietnam in 2008 were sequenced, this being the last period during which virtually no rotavirus vaccine was used in this country. The two strains were found to be >99.9% identical in sequence and had a typical DS-1 like G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-H2 genotype constellation. Analysis of the Vietnamese strains with >184 G2P[4] strains retrieved from GenBank/EMBL/DDBJ DNA databases placed the Vietnamese strains in one of the lineages commonly found among contemporary strains, with the exception of the NSP2 and NSP4 genes. The NSP2 genes were found to belong to a previously undescribed lineage that diverged from Chinese sheep and goat rotavirus strains, including a Chinese rotavirus vaccine strain LLR with 95% nucleotide identity; the time of their most recent common ancestor was 1975. The NSP4 genes were found to belong, together with Thai and USA strains, to an emergent lineage (VIII), adding further diversity to ever diversifying NSP4 lineages. Thus, there is a need to enhance surveillance of locally-circulating strains from both children and animals at the whole genome level to address the effect of rotavirus vaccines on changing strain distribution. PMID:26382233

  2. Whole-Genome Comparison of Two Campylobacter jejuni Isolates of the Same Sequence Type Reveals Multiple Loci of Different Ancestral Lineage

    PubMed Central

    Biggs, Patrick J.; Fearnhead, Paul; Hotter, Grant; Mohan, Vathsala; Collins-Emerson, Julie; Kwan, Errol; Besser, Thomas E.; Cookson, Adrian; Carter, Philip E.; French, Nigel P.

    2011-01-01

    Campylobacter jejuni ST-474 is the most important human enteric pathogen in New Zealand, and yet this genotype is rarely found elsewhere in the world. Insight into the evolution of this organism was gained by a whole genome comparison of two ST-474, flaA SVR-14 isolates and other available C. jejuni isolates and genomes. The two isolates were collected from different sources, human (H22082) and retail poultry (P110b), at the same time and from the same geographical location. Solexa sequencing of each isolate resulted in 1.659 Mb (H22082) and 1.656 Mb (P110b) of assembled sequences within 28 (H22082) and 29 (P110b) contigs. We analysed 1502 genes for which we had sequences within both ST-474 isolates and within at least one of 11 C. jejuni reference genomes. Although 94.5% of genes were identical between the two ST-474 isolates, we identified 83 genes that differed by at least one nucleotide, including 55 genes with non-synonymous substitutions. These covered 101 kb and contained 672 point differences. We inferred that 22 (3.3%) of these differences were due to mutation and 650 (96.7%) were imported via recombination. Our analysis estimated 38 recombinant breakpoints within these 83 genes, which correspond to recombination events affecting at least 19 loci regions and gives a tract length estimate of 2 kb. This includes a 12 kb region displaying non-homologous recombination in one of the ST-474 genomes, with the insertion of two genes, including ykgC, a putative oxidoreductase, and a conserved hypothetical protein of unknown function. Furthermore, our analysis indicates that the source of this recombined DNA is more likely to have come from C. jejuni strains that are more closely related to ST-474. This suggests that the rates of recombination and mutation are similar in order of magnitude, but that recombination has been much more important for generating divergence between the two ST-474 isolates. PMID:22096527

  3. Indexes of Large Genome Collections on a PC

    PubMed Central

    Danek, Agnieszka; Deorowicz, Sebastian; Grabowski, Szymon

    2014-01-01

    The availability of thousands of individual genomes of one species should boost rapid progress in personalized medicine or understanding of the interaction between genotype and phenotype, to name a few applications. A key operation useful in such analyses is aligning sequencing reads against a collection of genomes, which is costly with the use of existing algorithms due to their large memory requirements. We present MuGI, Multiple Genome Index, which reports all occurrences of a given pattern, in exact and approximate matching model, against a collection of thousand(s) genomes. Its unique feature is the small index size, which is customisable. It fits in a standard computer with 16–32 GB, or even 8 GB, of RAM, for the 1000GP collection of 1092 diploid human genomes. The solution is also fast. For example, the exact matching queries (of average length 150 bp) are handled in average time of 39 µs and with up to 3 mismatches in 373 µs on the test PC with the index size of 13.4 GB. For a smaller index, occupying 7.4 GB in memory, the respective times grow to 76 µs and 917 µs. Software is available at http://sun.aei.polsl.pl/mugi under a free license. Data S1 is available at PLOS One online. PMID:25289699

  4. BMPER Mutation in Diaphanospondylodysostosis Identified by Ancestral Autozygosity Mapping and Targeted High-Throughput Sequencing

    PubMed Central

    Funari, Vincent A.; Krakow, Deborah; Nevarez, Lisette; Chen, Zugen; Funari, Tara L.; Vatanavicharn, Nithiwat; Wilcox, William R.; Rimoin, David L.; Nelson, Stanley F.; Cohn, Daniel H.

    2010-01-01

    Diaphanospondylodysostosis (DSD) is a rare, recessively inherited, perinatal lethal skeletal disorder. The low frequency and perinatal lethality of DSD makes assembling a large set of families for traditional linkage-based genetic approaches challenging. By searching for evidence of unknown ancestral consanguinity, we identified two autozygous intervals, comprising 34 Mbps, unique to a single case of DSD. Empirically testing for ancestral consanguinity was effective in localizing the causative variant, thereby reducing the genomic space within which the mutation resides. High-throughput sequence analysis of exons captured from these intervals demonstrated that the affected individual was homozygous for a null mutation in BMPER, which encodes the bone morphogenetic protein-binding endothelial cell precursor-derived regulator. Mutations in BMPER were subsequently found in three additional DSD cases, confirming that defects in BMPER produce DSD. Phenotypic similarities between DSD and Bmper null mice indicate that BMPER-mediated signaling plays an essential role in vertebral segmentation early in human development. PMID:20869035

  5. What was the ancestral sex-determining mechanism in amniote vertebrates?

    PubMed

    Johnson Pokorná, Martina; Kratochvíl, Lukáš

    2016-02-01

    Amniote vertebrates, the group consisting of mammals and reptiles including birds, possess various mechanisms of sex determination. Under environmental sex determination (ESD), the sex of individuals depends on the environmental conditions occurring during their development and therefore there are no sexual differences present in their genotypes. Alternatively, through the mode of genotypic sex determination (GSD), sex is determined by a sex-specific genotype, i.e. by the combination of sex chromosomes at various stages of differentiation at conception. As well as influencing sex determination, sex-specific parts of genomes may, and often do, develop specific reproductive or ecological roles in their bearers. Accordingly, an individual with a mismatch between phenotypic (gonadal) and genotypic sex, for example an individual sex-reversed by environmental effects, should have a lower fitness due to the lack of specialized, sex-specific parts of their genome. In this case, evolutionary transitions from GSD to ESD should be less likely than transitions in the opposite direction. This prediction contrasts with the view that GSD was the ancestral sex-determining mechanism for amniote vertebrates. Ancestral GSD would require several transitions from GSD to ESD associated with an independent dedifferentiation of sex chromosomes, at least in the ancestors of crocodiles, turtles, and lepidosaurs (tuataras and squamate reptiles). In this review, we argue that the alternative theory postulating ESD as ancestral in amniotes is more parsimonious and is largely concordant with the theoretical expectations and current knowledge of the phylogenetic distribution and homology of sex-determining mechanisms. PMID:25424152

  6. The Mitochondrial Genome of the Leaf-Cutter Ant Atta laevigata: A Mitogenome with a Large Number of Intergenic Spacers

    PubMed Central

    Rodovalho, Cynara de Melo; Lyra, Mariana Lúcio; Ferro, Milene; Bacci, Maurício

    2014-01-01

    In this paper we describe the nearly complete mitochondrial genome of the leaf-cutter ant Atta laevigata, assembled using transcriptomic libraries from Sanger and Illumina next generation sequencing (NGS), and PCR products. This mitogenome was found to be very large (18,729 bp), given the presence of 30 non-coding intergenic spacers (IGS) spanning 3,808 bp. A portion of the putative control region remained unsequenced. The gene content and organization correspond to that inferred for the ancestral pancrustacea, except for two tRNA gene rearrangements that have been described previously in other ants. The IGS were highly variable in length and dispersed through the mitogenome. This pattern was also found for the other hymenopterans in particular for the monophyletic Apocrita. These spacers with unknown function may be valuable for characterizing genome evolution and distinguishing closely related species and individuals. NGS provided better coverage than Sanger sequencing, especially for tRNA and ribosomal subunit genes, thus facilitating efforts to fill in sequence gaps. The results obtained showed that data from transcriptomic libraries contain valuable information for assembling mitogenomes. The present data also provide a source of molecular markers that will be very important for improving our understanding of genomic evolutionary processes and phylogenetic relationships among hymenopterans. PMID:24828084

  7. Recombination-mediated genetic engineering of large genomic DNA transgenes.

    PubMed

    Ejsmont, Radoslaw Kamil; Ahlfeld, Peter; Pozniakovsky, Andrei; Stewart, A Francis; Tomancak, Pavel; Sarov, Mihail

    2011-01-01

    Faithful gene activity reporters are a useful tool for evo-devo studies enabling selective introduction of specific loci between species and assaying the activity of large gene regulatory sequences. The use of large genomic constructs such as BACs and fosmids provides an efficient platform for exploration of gene function under endogenous regulatory control. Despite their large size they can be easily engineered using in vivo homologous recombination in Escherichia coli (recombineering). We have previously demonstrated that the efficiency and fidelity of recombineering are sufficient to allow high-throughput transgene engineering in liquid culture, and have successfully applied this approach in several model systems. Here, we present a detailed protocol for recombineering of BAC/fosmid transgenes for expression of fluorescent or affinity tagged proteins in Drosophila under endogenous in vivo regulatory control. The tag coding sequence is seamlessly recombineered into the genomic region contained in the BAC/fosmid clone, which is then integrated into the fly genome using ϕC31 recombination. This protocol can be easily adapted to other recombineering projects. PMID:22065454

  8. The vertebrate ancestral repertoire of visual opsins, transducin alpha subunits and oxytocin/vasopressin receptors was established by duplication of their shared genomic region in the two rounds of early vertebrate genome duplications

    PubMed Central

    2013-01-01

    Background Vertebrate color vision is dependent on four major color opsin subtypes: RH2 (green opsin), SWS1 (ultraviolet opsin), SWS2 (blue opsin), and LWS (red opsin). Together with the dim-light receptor rhodopsin (RH1), these form the family of vertebrate visual opsins. Vertebrate genomes contain many multi-membered gene families that can largely be explained by the two rounds of whole genome duplication (WGD) in the vertebrate ancestor (2R) followed by a third round in the teleost ancestor (3R). Related chromosome regions resulting from WGD or block duplications are said to form a paralogon. We describe here a paralogon containing the genes for visual opsins, the G-protein alpha subunit families for transducin (GNAT) and adenylyl cyclase inhibition (GNAI), the oxytocin and vasopressin receptors (OT/VP-R), and the L-type voltage-gated calcium channels (CACNA1-L). Results Sequence-based phylogenies and analyses of conserved synteny show that the above-mentioned gene families, and many neighboring gene families, expanded in the early vertebrate WGDs. This allows us to deduce the following evolutionary scenario: The vertebrate ancestor had a chromosome containing the genes for two visual opsins, one GNAT, one GNAI, two OT/VP-Rs and one CACNA1-L gene. This chromosome was quadrupled in 2R. Subsequent gene losses resulted in a set of five visual opsin genes, three GNAT and GNAI genes, six OT/VP-R genes and four CACNA1-L genes. These regions were duplicated again in 3R resulting in additional teleost genes for some of the families. Major chromosomal rearrangements have taken place in the teleost genomes. By comparison with the corresponding chromosomal regions in the spotted gar, which diverged prior to 3R, we could time these rearrangements to post-3R. Conclusions We present an extensive analysis of the paralogon housing the visual opsin, GNAT and GNAI, OT/VP-R, and CACNA1-L gene families. The combined data imply that the early vertebrate WGD events contributed to the

  9. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions fr...

  10. The Ancestral Gene for Transcribed, Low-Copy Repeats in the Prader-Willi/Angleman Region Encodes a Large Protein Implicated in Protein Trafficking that is Deficient in Mice with Neuromuscular and

    SciTech Connect

    Ji, Y.

    1999-01-01

    Transcribed, low-copy repeat elements are associated with the breakpoint regions of common deletions in Prader-Willi and Angelman syndromes. We report here the identification of the ancestral gene ( HERC2 ) and a family of duplicated, truncated copies that comprise these low-copy repeats. This gene encodes a highly conserved giant protein, HERC2, that is distantly related to p532 (HERC1), a guanine nucleotide exchange factor (GEF) implicated in vesicular trafficking. The mouse genome contains a single Herc2 locus, located in the jdf2 (juvenile development and fertility-2) interval of chromosome 7C. We have identified single nucleotide splice junction mutations in Herc2 in three independent N-ethyl-N-nitrosourea-induced jdf2 mutant alleles, each leading to exon skipping with premature termination of translation and/or deletion of conserved amino acids. Therefore, mutations in Herc2 lead to the neuromuscular secretory vesicle and sperm acrosome defects, other developmental abnormalities and juvenile lethality of jdf2 mice. Combined, these findings suggest that HERC2 is an important gene encoding a GEF involved in protein trafficking and degradation pathways in the cell.

  11. ProCARs: Progressive Reconstruction of Ancestral Gene Orders

    PubMed Central

    2015-01-01

    Background In the context of ancestral gene order reconstruction from extant genomes, there exist two main computational approaches: rearrangement-based, and homology-based methods. The rearrangement-based methods consist in minimizing a total rearrangement distance on the branches of a species tree. The homology-based methods consist in the detection of a set of potential ancestral contiguity features, followed by the assembling of these features into Contiguous Ancestral Regions (CARs). Results In this paper, we present a new homology-based method that uses a progressive approach for both the detection and the assembling of ancestral contiguity features into CARs. The method is based on detecting a set of potential ancestral adjacencies iteratively using the current set of CARs at each step, and constructing CARs progressively using a 2-phase assembling method. Conclusion We show the usefulness of the method through a reconstruction of the boreoeutherian ancestral gene order, and a comparison with three other homology-based methods: AnGeS, InferCARs and GapAdj. The program, written in Python, and the dataset used in this paper are available at http://bioinfo.lifl.fr/procars/. PMID:26040958

  12. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    SciTech Connect

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin

  13. Large-scale genomic analysis suggests a neutral punctuated dynamics of transposable elements in bacterial genomes.

    PubMed

    Iranzo, Jaime; Gómez, Manuel J; López de Saro, Francisco J; Manrubia, Susanna

    2014-06-01

    Insertion sequences (IS) are the simplest and most abundant form of transposable DNA found in bacterial genomes. When present in multiple copies, it is thought that they can promote genomic plasticity and genetic exchange, thus being a major force of evolutionary change. The main processes that determine IS content in genomes are, though, a matter of debate. In this work, we take advantage of the large amount of genomic data currently available and study the abundance distributions of 33 IS families in 1811 bacterial chromosomes. This allows us to test simple models of IS dynamics and estimate their key parameters by means of a maximum likelihood approach. We evaluate the roles played by duplication, lateral gene transfer, deletion and purifying selection. We find that the observed IS abundances are compatible with a neutral scenario where IS proliferation is controlled by deletions instead of purifying selection. Even if there may be some cases driven by selection, neutral behavior dominates over large evolutionary scales. According to this view, IS and hosts tend to coexist in a dynamic equilibrium state for most of the time. Our approach also allows for a detection of recent IS expansions, and supports the hypothesis that rapid expansions constitute transient events-punctuations-during which the state of coexistence of IS and host becomes perturbated. PMID:24967627

  14. SMRT® Sequencing Solutions for Large Genomes and Transcriptomes

    PubMed Central

    Chin, J.; Peluso, P.; Rank, D.; Kim, K.; Landolin, J.; Koren, S.; Phillippy, A.M.; Tseng, E.; Wang, S.; Baybayan, P.; Gu, J.

    2014-01-01

    Single Molecule, Real-Time (SMRT) Sequencing holds promise for addressing new frontiers in large genome complexities, such as long, highly repetitive, low-complexity regions and duplication events, and differentiating between transcript isoforms that are difficult to resolve with short-read technologies. We present solutions available for both reference genome improvement (100 MB) and transcriptome research to best leverage long reads that have exceeded 20 Kb in length. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. Highlights from our genome improvement projects using the latest P5-C3 chemistry on model organisms with contig N50 exceeding 6 Mb and longest contig exceeding 12.5 Mb with an average base quality of QV50 will be shared. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented.

  15. Fast randomization of large genomic datasets while preserving alteration counts

    PubMed Central

    Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J.; Wedge, David C.; Tamborero, David; Alexandrov, Ludmil B.; Lopez-Bigas, Nuria; Garnett, Mathew J.; Jurman, Giuseppe; Saez-Rodriguez, Julio

    2014-01-01

    Motivation: Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a ‘mutually exclusive’ manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. Results: We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. Availability and implementation: BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html Contact: iorio@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161255

  16. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine

    PubMed Central

    Biesecker, Leslie G.; Mullikin, James C.; Facio, Flavia M.; Turner, Clesson; Cherukuri, Praveen F.; Blakesley, Robert W.; Bouffard, Gerard G.; Chines, Peter S.; Cruz, Pedro; Hansen, Nancy F.; Teer, Jamie K.; Maskeri, Baishali; Young, Alice C.; Manolio, Teri A.; Wilson, Alexander F.; Finkel, Toren; Hwang, Paul; Arai, Andrew; Remaley, Alan T.; Sachdev, Vandana; Shamburek, Robert; Cannon, Richard O.; Green, Eric D.

    2009-01-01

    ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine. PMID:19602640

  17. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  18. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach.

    PubMed

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-03-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  19. Inferring Population Size History from Large Samples of Genome-Wide Molecular Data - An Approximate Bayesian Computation Approach

    PubMed Central

    Boitard, Simon; Rodríguez, Willy; Jay, Flora; Mona, Stefano; Austerlitz, Frédéric

    2016-01-01

    Inferring the ancestral dynamics of effective population size is a long-standing question in population genetics, which can now be tackled much more accurately thanks to the massive genomic data available in many species. Several promising methods that take advantage of whole-genome sequences have been recently developed in this context. However, they can only be applied to rather small samples, which limits their ability to estimate recent population size history. Besides, they can be very sensitive to sequencing or phasing errors. Here we introduce a new approximate Bayesian computation approach named PopSizeABC that allows estimating the evolution of the effective population size through time, using a large sample of complete genomes. This sample is summarized using the folded allele frequency spectrum and the average zygotic linkage disequilibrium at different bins of physical distance, two classes of statistics that are widely used in population genetics and can be easily computed from unphased and unpolarized SNP data. Our approach provides accurate estimations of past population sizes, from the very first generations before present back to the expected time to the most recent common ancestor of the sample, as shown by simulations under a wide range of demographic scenarios. When applied to samples of 15 or 25 complete genomes in four cattle breeds (Angus, Fleckvieh, Holstein and Jersey), PopSizeABC revealed a series of population declines, related to historical events such as domestication or modern breed creation. We further highlight that our approach is robust to sequencing errors, provided summary statistics are computed from SNPs with common alleles. PMID:26943927

  20. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  1. Volume visualization of multiple alignment of large genomicDNA

    SciTech Connect

    Shah, Nameeta; Dillard, Scott E.; Weber, Gunther H.; Hamann, Bernd

    2005-07-25

    Genomes of hundreds of species have been sequenced to date, and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. As a result, tools using 1D representations are incapable of providing informatory overview for extremely large data sets. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We demonstrate our technique using multi-millions-basepair-long aligned DNA sequence data and compare it with traditional 1D line plots. The results show that our technique is superior in providing an overview of entire data sets. Our technique, coupled with 1D line plots, results in effective multi-resolution visualization of very large aligned sequence data sets.

  2. Genomic analysis of regulatory network dynamics reveals large topological changes

    NASA Astrophysics Data System (ADS)

    Luscombe, Nicholas M.; Madan Babu, M.; Yu, Haiyuan; Snyder, Michael; Teichmann, Sarah A.; Gerstein, Mark

    2004-09-01

    Network analysis has been applied widely, providing a unifying language to describe disparate systems ranging from social interactions to power grids. It has recently been used in molecular biology, but so far the resulting networks have only been analysed statically. Here we present the dynamics of a biological network on a genomic scale, by integrating transcriptional regulatory information and gene-expression data for multiple conditions in Saccharomyces cerevisiae. We develop an approach for the statistical analysis of network dynamics, called SANDY, combining well-known global topological measures, local motifs and newly derived statistics. We uncover large changes in underlying network architecture that are unexpected given current viewpoints and random simulations. In response to diverse stimuli, transcription factors alter their interactions to varying degrees, thereby rewiring the network. A few transcription factors serve as permanent hubs, but most act transiently only during certain conditions. By studying sub-network structures, we show that environmental responses facilitate fast signal propagation (for example, with short regulatory cascades), whereas the cell cycle and sporulation direct temporal progression through multiple stages (for example, with highly inter-connected transcription factors). Indeed, to drive the latter processes forward, phase-specific transcription factors inter-regulate serially, and ubiquitously active transcription factors layer above them in a two-tiered hierarchy. We anticipate that many of the concepts presented here-particularly the large-scale topological changes and hub transience-will apply to other biological networks, including complex sub-systems in higher eukaryotes.

  3. Ancestral reconstruction of tick lineages.

    PubMed

    Mans, Ben J; de Castro, Minique H; Pienaar, Ronel; de Klerk, Daniel; Gaven, Philasande; Genu, Siyamcela; Latif, Abdalla A

    2016-06-01

    Ancestral reconstruction in its fullest sense aims to describe the complete evolutionary history of a lineage. This depends on accurate phylogenies and an understanding of the key characters of each parental lineage. An attempt is made to delineate our current knowledge with regard to the ancestral reconstruction of the tick (Ixodida) lineage. Tick characters may be assigned to Core of Life, Lineages of Life or Edges of Life phenomena depending on how far back these characters may be assigned in the evolutionary Tree of Life. These include housekeeping genes, sub-cellular systems, heme processing (Core of Life), development, moulting, appendages, nervous and organ systems, homeostasis, respiration (Lineages of Life), specific adaptations to a blood-feeding lifestyle, including the complexities of salivary gland secretions and tick-host interactions (Edges of Life). The phylogenetic relationships of lineages, their origins and importance in ancestral reconstruction are discussed. Uncertainties with respect to systematic relationships, ancestral reconstruction and the challenges faced in comparative transcriptomics (next-generation sequencing approaches) are highlighted. While almost 150 years of information regarding tick biology have been assembled, progress in recent years indicates that we are in the infancy of understanding tick evolution. Even so, broad reconstructions can be made with relation to biological features associated with various lineages. Conservation of characters shared with sister and parent lineages are evident, but appreciable differences are present in the tick lineage indicating modification with descent, as expected for Darwinian evolutionary theory. Many of these differences can be related to the hematophagous lifestyle of ticks. PMID:26868413

  4. Intron-genome size relationship on a large evolutionary scale.

    PubMed

    Vinogradov, A E

    1999-09-01

    The intron-genome size relationship was studied across a wide evolutionary range (from slime mold and yeast to human and maize), as well as the relationship between genome size and the ratio of intervening/coding sequence size. The average intron size is scaled to genome size with a slope of about one-fourth for the log-transformed values; i.e., on the global scale its increase in evolution is lower than the increase in genome size by four orders of magnitude. There are exceptions to the general trend. In baker's yeast introns are extraordinarily long for its genome size. Tetrapods also have longer introns than expected for their genome sizes. In teleost fish the mean intron size does not differ significantly, notwithstanding the differences in genome size. In contrast to previous reports, avian introns were not found to be significantly shorter than introns of mammals, although avian genomes are smaller than genomes of mammals on average by about a factor of 2.5. The extra-/intragenic ratio of noncoding DNA can be higher in fungi than in animals, notwithstanding the smaller fungal genomes. In vertebrates and invertebrates taken separately, this ratio is increasing as the increase in genome size. Two hypotheses are proposed to explain the variation in the extra-/intragenic ratio of noncoding DNA in organisms with similar numbers of genes: transition (dynamic) and equilibrium (static). According to the transition model, this variation arises with the rapid shift of genome size because the bulk of extragenic DNA can be changed more rapidly than the finely interspersed intron sequences. The equilibrium model assumes that this variation is a result of selective adjustment of genome size with constraints imposed on the intron size due to its putative link to chromatin structure (and constraints of the splicing machinery). PMID:10473779

  5. Evaluation of Target Preparation Methods for Single Feature Polymorphism Detection in Large Complex Plant Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    For those genomes low in repetitive DNA, hybridizing total genomic DNA to high-density expression arrays offers an effective strategy for scoring single feature polymorphisms (SFPs). Of the ~2.5 Gb that constitute the maize genome (Zea mays L.), only 10-20% are genic sequences, with large amounts o...

  6. Combining p-values in large-scale genomics experiments.

    PubMed

    Zaykin, Dmitri V; Zhivotovsky, Lev A; Czika, Wendy; Shao, Susan; Wolfinger, Russell D

    2007-01-01

    In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher's and Lancaster's combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher's method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis - that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K-ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  7. Combining p-values in large scale genomics experiments

    PubMed Central

    Zaykin, Dmitri V.; Zhivotovsky, Lev A.; Czika, Wendy; Shao, Susan; Wolfinger, Russell D.

    2008-01-01

    Summary In large-scale genomics experiments involving thousands of statistical tests, such as association scans and microarray expression experiments, a key question is: Which of the L tests represent true associations (TAs)? The traditional way to control false findings is via individual adjustments. In the presence of multiple TAs, p-value combination methods offer certain advantages. Both Fisher’s and Lancaster’s combination methods use an inverse gamma transformation. We identify the relation of the shape parameter of that distribution to the implicit threshold value; p-values below that threshold are favored by the inverse gamma method (GM). We explore this feature to improve power over Fisher’s method when L is large and the number of TAs is moderate. However, the improvement in power provided by combination methods is at the expense of a weaker claim made upon rejection of the null hypothesis – that there are some TAs among the L tests. Thus, GM remains a global test. To allow a stronger claim about a subset of p-values that is smaller than L, we investigate two methods with an explicit truncation: the rank truncated product method (RTP) that combines the first K ordered p-values, and the truncated product method (TPM) that combines p-values that are smaller than a specified threshold. We conclude that TPM allows claims to be made about subsets of p-values, while the claim of the RTP is, like GM, more appropriately about all L tests. GM gives somewhat higher power than TPM, RTP, Fisher, and Simes methods across a range of simulations. PMID:17879330

  8. Genomic evidence for large, long-lived ancestors to placental mammals.

    PubMed

    Romiguier, J; Ranwez, V; Douzery, E J P; Galtier, N

    2013-01-01

    It is widely assumed that our mammalian ancestors, which lived in the Cretaceous era, were tiny animals that survived massive asteroid impacts in shelters and evolved into modern forms after dinosaurs went extinct, 65 Ma. The small size of most Mesozoic mammalian fossils essentially supports this view. Paleontology, however, is not conclusive regarding the ancestry of extant mammals, because Cretaceous and Paleocene fossils are not easily linked to modern lineages. Here, we use full-genome data to estimate the longevity and body mass of early placental mammals. Analyzing 36 fully sequenced mammalian genomes, we reconstruct two aspects of the ancestral genome dynamics, namely GC-content evolution and nonsynonymous over synonymous rate ratio. Linking these molecular evolutionary processes to life-history traits in modern species, we estimate that early placental mammals had a life span above 25 years and a body mass above 1 kg. This is similar to current primates, cetartiodactyls, or carnivores, but markedly different from mice or shrews, challenging the dominant view about mammalian origin and evolution. Our results imply that long-lived mammals existed in the Cretaceous era and were the most successful in evolution, opening new perspectives about the conditions for survival to the Cretaceous-Tertiary crisis. PMID:22949523

  9. Antarctic krill population genomics: apparent panmixia, but genome complexity and large population size muddy the water.

    PubMed

    Deagle, Bruce E; Faux, Cassandra; Kawaguchi, So; Meyer, Bettina; Jarman, Simon N

    2015-10-01

    Antarctic krill (Euphausia superba; hereafter krill) are an incredibly abundant pelagic crustacean which has a wide, but patchy, distribution in the Southern Ocean. Several studies have examined the potential for population genetic structuring in krill, but DNA-based analyses have focused on a limited number of markers and have covered only part of their circum-Antarctic range. We used mitochondrial DNA and restriction site-associated DNA sequencing (RAD-seq) to investigate genetic differences between krill from five sites, including two from East Antarctica. Our mtDNA results show no discernible genetic structuring between sites separated by thousands of kilometres, which is consistent with previous studies. Using standard RAD-seq methodology, we obtained over a billion sequences from >140 krill, and thousands of variable nucleotides were identified at hundreds of loci. However, downstream analysis found that markers with sufficient coverage were primarily from multicopy genomic regions. Careful examination of these data highlights the complexity of the RAD-seq approach in organisms with very large genomes. To characterize the multicopy markers, we recorded sequence counts from variable nucleotide sites rather than the derived genotypes; we also examined a small number of manually curated genotypes. Although these analyses effectively fingerprinted individuals, and uncovered a minor laboratory batch effect, no population structuring was observed. Overall, our results are consistent with panmixia of krill throughout their distribution. This result may indicate ongoing gene flow. However, krill's enormous population size creates substantial panmictic inertia, so genetic differentiation may not occur on an ecologically relevant timescale even if demographically separate populations exist. PMID:26340718

  10. Accommodating the load: The transposable element content of very large genomes.

    PubMed

    Metcalfe, Cushla J; Casane, Didier

    2013-03-01

    Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the 'missing' portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the

  11. Invariants of DNA genomic signals

    NASA Astrophysics Data System (ADS)

    Cristea, Paul Dan A.

    2005-02-01

    For large scale analysis purposes, the conversion of genomic sequences into digital signals opens the possibility to use powerful signal processing methods for handling genomic information. The study of complex genomic signals reveals large scale features, maintained over the scale of whole chromosomes, that would be difficult to find by using only the symbolic representation. Based on genomic signal methods and on statistical techniques, the paper defines parameters of DNA sequences which are invariant to transformations induced by SNPs, splicing or crossover. Re-orienting concatenated coding regions in the same direction, regularities shared by the genomic material in all exons are revealed, pointing towards the hypothesis of a regular ancestral structure from which the current chromosome structures have evolved. This property is not found in non-nuclear genomic material, e.g., plasmids.

  12. Identifying Recent Adaptations in Large-scale Genomic Data

    PubMed Central

    Grossman, Sharon R.; Andersen, Kristian G.; Shlyakhter, Ilya; Tabrizi, Shervin; Winnicki, Sarah; Yen, Angela; Park, Daniel J.; Griesemer, Dustin; Karlsson, Elinor K.; Wong, Sunny H.; Cabili, Moran; Adegbola, Richard A.; Bamezai, Rameshwar N. K.; Hill, Adrian V. S.; Vannberg, Fredrik O.; Rinn, John L.; Lander, Eric S.; Schaffner, Stephen F.; Sabeti, Pardis C.

    2013-01-01

    SUMMARY While several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes Project (1000G) and the Composite of Multiple Signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes thirty-five high-scoring non-synonymous variants, fifty-nine variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate non-synonymous variant in TLR5, and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PMID:23415221

  13. Targeted Large-Scale Deletion of Bacterial Genomes Using CRISPR-Nickases

    PubMed Central

    2015-01-01

    Programmable CRISPR-Cas systems have augmented our ability to produce precise genome manipulations. Here we demonstrate and characterize the ability of CRISPR-Cas derived nickases to direct targeted recombination of both small and large genomic regions flanked by repetitive elements in Escherichia coli. While CRISPR directed double-stranded DNA breaks are highly lethal in many bacteria, we show that CRISPR-guided nickase systems can be programmed to make precise, nonlethal, single-stranded incisions in targeted genomic regions. This induces recombination events and leads to targeted deletion. We demonstrate that dual-targeted nicking enables deletion of 36 and 97 Kb of the genome. Furthermore, multiplex targeting enables deletion of 133 Kb, accounting for approximately 3% of the entire E. coli genome. This technology provides a framework for methods to manipulate bacterial genomes using CRISPR-nickase systems. We envision this system working synergistically with preexisting bacterial genome engineering methods. PMID:26451892

  14. The influence of large scale genomics and the changing role of ex situ collections

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The development of large scale genomics resources in non-model organisms promises to have a fundamental impact on the utilization of genetic resources. Technical innovation in high through-put sequencing has reduced the cost to a point where genome-wide SNP development is feasible across a range of ...

  15. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  16. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    PubMed

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. PMID:23463597

  17. GE-17ALTERATION OF THE p53 PATHWAY AND ANCESTRAL PROGENITORS ARE ASSOCIATED WITH TUMOR RECURRENCE IN GLIOBLASTOMA

    PubMed Central

    Kim, Hoon; Zheng, Siyuan; Amini, Seyed; Virk, Selene; Mikkelsen, Tom; Brat, Daniel; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew; Cohen, Mark; Van Meir, Erwin; Scarpace, Lisa; Lander, Eric; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill; Verhaak, Roel

    2014-01-01

    To evaluate evolutionary patterns of GBM recurrence, we analyzed whole genome sequencing (WGS) and multi-sector exome sequencing data from pairs of primary and posttreatment GBM. WGS on ten primary-recurrent pairs detected a median number of 12,214 mutations which we utilized to uncover clonal structures, by analyzing the distribution of mutation cellular frequencies (the fraction of tumor cells harboring a mutation). On average, 41 % of the mutations were shared by primary and recurrence. The majority of shared mutations were clonal in both primary and recurrence, but we also observed many clonal mutations that were uniquely detected in either the primary or the recurrence. This raises the intriguing possibility that major tumor clones in the primary tumor and disease relapse both evolved from a shared ancestral tumor cell population. At least one subclone was identified in the majority of WGS samples, and we observed groups of mutations that were at low cancer cell fractions in both primary and recurrence, suggesting that both subclones evolved from the same ancestral tumor cells separate from the major clone ancestral cells. To address the possibility that the lack of overlap between subsequent tumors was due to intratumoral heterogeneity, we analyzed exome sequencing from a second tumor sector of seven primary and six recurrent tumors. We found that the majority of "second biopsy" mutations were not conserved between time points, suggesting that intratumoral heterogeneity did not explain the large number of mutations uniquely detected in primary and recurrence. The limited overlap of mutations in primary and recurrence provides evidence for ancestral tumor cell populations that could not be eradicated by therapy, while offspring cell populations contained unique mutations, were selectively killed by treatment and could therefore no longer be detected after disease relapse. This study has provided new insights into patterns and dynamics of tumor evolution.

  18. BactoGeNIE: a large-scale comparative genome visualization for big displays

    PubMed Central

    2015-01-01

    Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021

  19. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGESBeta

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  20. BactoGeNIE: A large-scale comparative genome visualization for big displays

    SciTech Connect

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; Marai, Elisabeta G.; Leigh, Jason

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.

  1. Recreating a functional ancestral archosaur visual pigment.

    PubMed

    Chang, Belinda S W; Jönsson, Karolina; Kazmi, Manija A; Donoghue, Michael J; Sakmar, Thomas P

    2002-09-01

    The ancestors of the archosaurs, a major branch of the diapsid reptiles, originated more than 240 MYA near the dawn of the Triassic Period. We used maximum likelihood phylogenetic ancestral reconstruction methods and explored different models of evolution for inferring the amino acid sequence of a putative ancestral archosaur visual pigment. Three different types of maximum likelihood models were used: nucleotide-based, amino acid-based, and codon-based models. Where possible, within each type of model, likelihood ratio tests were used to determine which model best fit the data. Ancestral reconstructions of the ancestral archosaur node using the best-fitting models of each type were found to be in agreement, except for three amino acid residues at which one reconstruction differed from the other two. To determine if these ancestral pigments would be functionally active, the corresponding genes were chemically synthesized and then expressed in a mammalian cell line in tissue culture. The expressed artificial genes were all found to bind to 11-cis-retinal to yield stable photoactive pigments with lambda(max) values of about 508 nm, which is slightly redshifted relative to that of extant vertebrate pigments. The ancestral archosaur pigments also activated the retinal G protein transducin, as measured in a fluorescence assay. Our results show that ancestral genes from ancient organisms can be reconstructed de novo and tested for function using a combination of phylogenetic and biochemical methods. PMID:12200476

  2. Multiway admixture deconvolution using phased or unphased ancestral panels.

    PubMed

    Churchhouse, Claire; Marchini, Jonathan

    2013-01-01

    We describe a novel method for inferring the local ancestry of admixed individuals from dense genome-wide single nucleotide polymorphism data. The method, called MULTIMIX, allows multiple source populations, models population linkage disequilibrium between markers and is applicable to datasets in which the sample and source populations are either phased or unphased. The model is based upon a hidden Markov model of switches in ancestry between consecutive windows of loci. We model the observed haplotypes within each window using a multivariate normal distribution with parameters estimated from the ancestral panels. We present three methods to fit the model-Markov chain Monte Carlo sampling, the Expectation Maximization algorithm, and a Classification Expectation Maximization algorithm. The performance of our method on individuals simulated to be admixed with European and West African ancestry shows it to be comparable to HAPMIX, the ancestry calls of the two methods agreeing at 99.26% of loci across the three parameter groups. In addition to it being faster than HAPMIX, it is also found to perform well over a range of extent of admixture in a simulation involving three ancestral populations. In an analysis of real data, we estimate the contribution of European, West African and Native American ancestry to each locus in the Mexican samples of HapMap, giving estimates of ancestral proportions that are consistent with those previously reported. PMID:23136122

  3. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. PMID:25919952

  4. Ancestral gene and "complementary" antibody dominate early ontogeny.

    PubMed

    Arend, Peter

    2013-05-01

    According to N.K. Jerne the somatic generation of immune recognition occurs in conjunction with germ cell evolution and precedes the formation of the zygote, i.e. operates before clonal selection. We propose that it is based on interspecies inherent, ancestral forces maintaining the lineage. Murine oogenesis may be offered as a model. So in C57BL/10BL sera an anti-A reactive, mercapto-ethanol sensitive glycoprotein of up to now unknown cellular origin, but exhibiting immunoglobulin M character, presents itself "complementary" to a syngeneic epitope, which encoded by histocompatibility gene A or meanwhile accepted ancestor of the ABO gene family, arises predominantly in ovarian tissue and was detected statistically significant exclusively in polar glycolipids. Reports either on loss, pronounced expressions or de novo appearances of A-type structures in various conditions of accelerated growth like germ cell evolution, wound healing, inflammation and tumor proliferation in man and ABO related animals might show the dynamics of ancestral functions guarantying stem cell fidelity in maturation and tissue renewal processes. Procedures vice versa generating pluripotent stem cells for therapeutical reasons may indicate, that any artificially started growth should somehow pass through the germ line from the beginning, where according to growing knowledge exclusively the oocyte's genome provides a completely channeling ancestral information. In predatory animals such as the modern-day sea anemone, ancestral proteins, particularly those of the p53 gene family govern the reproduction processes, and are active up to the current mammalian female germ line. Lectins, providing the dual function of growth promotion and defense in higher plants, are suggested to represent the evolutionary precursors of the mammalian immunoglobulin M molecules, or protein moiety implying the greatest functional diversity in nature. And apart from any established mammalian genetic tree, a common vetch

  5. Large-scale genomic comparison using two-dimensional DNA gels

    SciTech Connect

    Sidman, C.L.; Shaffer, D.J.

    1994-09-01

    Two-dimensional electrophoresis (2DE) of DNA fragments, in which separation occurs first by size and then by sequence variation, is a method enabling large-scale comparison of complex genomes. Combining 2DE with probing for various classes of repetitive genomic elements allows rapid and efficient comparison of thousands of fragments and millions of basepairs of DNA distributed across most genomic regions. This approach is demonstrated here by analyzing the extent of genomic relatedness of different inbred strains of mice. Such strains are shown to differ from each other by approximately 0.2-1% of their nucleotides, above which level reproductive speciation occurs. The 2DE method of assessing the overall relationship between two genomes represents an appropriate tool for analyzing members of a single species, but is too sensitive for use in interspecies comparisons. 51 refs., 4 figs., 1 tab.

  6. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  7. Large-scale profiling of microRNAs for The Cancer Genome Atlas.

    PubMed

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ~11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  8. Large-scale profiling of microRNAs for The Cancer Genome Atlas

    PubMed Central

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J.; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A.

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ∼11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  9. The draft genome of the large yellow croaker reveals well-developed innate immunity.

    PubMed

    Wu, Changwen; Zhang, Di; Kan, Mengyuan; Lv, Zhengmin; Zhu, Aiyi; Su, Yongquan; Zhou, Daizhan; Zhang, Jianshe; Zhang, Zhou; Xu, Meiying; Jiang, Lihua; Guo, Baoying; Wang, Ting; Chi, Changfeng; Mao, Yong; Zhou, Jiajian; Yu, Xinxiu; Wang, Hailing; Weng, Xiaoling; Jin, Jason Gang; Ye, Junyi; He, Lin; Liu, Yun

    2014-01-01

    The large yellow croaker, Larimichthys crocea, is one of the most economically important marine fish species endemic to China. Its wild stocks have severely suffered from overfishing, and the aquacultured species are vulnerable to various marine pathogens. Here we report the creation of a draft genome of a wild large yellow croaker using a whole-genome sequencing strategy. We estimate the genome size to be 728 Mb with 19,362 protein-coding genes. Phylogenetic analysis shows that the stickleback is most closely related to the large yellow croaker. Rapidly evolving genes under positive selection are significantly enriched in pathways related to innate immunity. We also confirm the existence of several genes and identify the expansion of gene families that are important for innate immunity. Our results may reflect a well-developed innate immune system in the large yellow croaker, which could aid in the development of wild resource preservation and mariculture strategies. PMID:25407894

  10. Inference of Ancestral Recombination Graphs through Topological Data Analysis

    PubMed Central

    Cámara, Pablo G.; Levine, Arnold J.; Rabadán, Raúl

    2016-01-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  11. Inference of Ancestral Recombination Graphs through Topological Data Analysis.

    PubMed

    Cámara, Pablo G; Levine, Arnold J; Rabadán, Raúl

    2016-08-01

    The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relationships within and across species. Recombination, reassortment and horizontal gene transfer constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, they are computationally costly to reconstruct, usually being infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build upon previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations, human recombination, and horizontal evolution in finches inhabiting the Galápagos Islands. PMID:27532298

  12. Feasibility of Large-Scale Genomic Testing to Facilitate Enrollment Onto Genomically Matched Clinical Trials

    PubMed Central

    Meric-Bernstam, Funda; Brusco, Lauren; Shaw, Kenna; Horombe, Chacha; Kopetz, Scott; Davies, Michael A.; Routbort, Mark; Piha-Paul, Sarina A.; Janku, Filip; Ueno, Naoto; Hong, David; De Groot, John; Ravi, Vinod; Li, Yisheng; Luthra, Raja; Patel, Keyur; Broaddus, Russell; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Purpose We report the experience with 2,000 consecutive patients with advanced cancer who underwent testing on a genomic testing protocol, including the frequency of actionable alterations across tumor types, subsequent enrollment onto clinical trials, and the challenges for trial enrollment. Patients and Methods Standardized hotspot mutation analysis was performed in 2,000 patients, using either an 11-gene (251 patients) or a 46- or 50-gene (1,749 patients) multiplex platform. Thirty-five genes were considered potentially actionable based on their potential to be targeted with approved or investigational therapies. Results Seven hundred eighty-nine patients (39%) had at least one mutation in potentially actionable genes. Eighty-three patients (11%) with potentially actionable mutations went on genotype-matched trials targeting these alterations. Of 230 patients with PIK3CA/AKT1/PTEN/BRAF mutations that returned for therapy, 116 (50%) received a genotype-matched drug. Forty patients (17%) were treated on a genotype-selected trial requiring a mutation for eligibility, 16 (7%) were treated on a genotype-relevant trial targeting a genomic alteration without biomarker selection, and 40 (17%) received a genotype-relevant drug off trial. Challenges to trial accrual included patient preference of noninvestigational treatment or local treatment, poor performance status or other reasons for trial ineligibility, lack of trials/slots, and insurance denial. Conclusion Broad implementation of multiplex hotspot testing is feasible; however, only a small portion of patients with actionable alterations were actually enrolled onto genotype-matched trials. Increased awareness of therapeutic implications and access to novel therapeutics are needed to optimally leverage results from broad-based genomic testing. PMID:26014291

  13. Phylogeny-driven target selection for large-scale genome-sequencing (and other) projects

    PubMed Central

    Göker, Markus; Klenk, Hans-Peter

    2013-01-01

    Despite the steadily decreasing costs of genome sequencing, prioritizing organisms for sequencing remains important in large-scale projects. Phylogeny-based selection is of interest to identify those organisms whose genomes can be expected to differ most from those that have already been sequenced. Here, we describe a method that infers a phylogenetic scoring independent of which set of organisms has previously been targeted, which is computationally simple and easy to apply in practice. The scoring itself, as well as pre- and post-processing of the data, is illustrated using two real-world examples in which the method has already been applied for selecting targets for genome sequencing. These projects are the JGI CSP Genomic Encyclopedia of Bacteria and Archaea phase I, targeting 1,000 type strains, and, on a smaller-scale, the phylogenomics of the Roseobacter clade. Potential artifacts of the method are discussed and compared to a selection approach based on the taxonomic classification. PMID:23991265

  14. Patterns and Mechanisms of Ancestral Histone Protein Inheritance in Budding Yeast

    PubMed Central

    van Welsem, Tibor; Friedman, Nir; Rando, Oliver J.; van Leeuwen, Fred

    2011-01-01

    Replicating chromatin involves disruption of histone-DNA contacts and subsequent reassembly of maternal histones on the new daughter genomes. In bulk, maternal histones are randomly segregated to the two daughters, but little is known about the fine details of this process: do maternal histones re-assemble at preferred locations or close to their original loci? Here, we use a recently developed method for swapping epitope tags to measure the disposition of ancestral histone H3 across the yeast genome over six generations. We find that ancestral H3 is preferentially retained at the 5′ ends of most genes, with strongest retention at long, poorly transcribed genes. We recapitulate these observations with a quantitative model in which the majority of maternal histones are reincorporated within 400 bp of their pre-replication locus during replication, with replication-independent replacement and transcription-related retrograde nucleosome movement shaping the resulting distributions of ancestral histones. We find a key role for Topoisomerase I in retrograde histone movement during transcription, and we find that loss of Chromatin Assembly Factor-1 affects replication-independent turnover. Together, these results show that specific loci are enriched for histone proteins first synthesized several generations beforehand, and that maternal histones re-associate close to their original locations on daughter genomes after replication. Our findings further suggest that accumulation of ancestral histones could play a role in shaping histone modification patterns. PMID:21666805

  15. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity

    PubMed Central

    Nikolaev, L.G; Akopov, S.B; Didych, D.A; Sverdlov, E.D

    2009-01-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  16. Vertebrate Protein CTCF and its Multiple Roles in a Large-Scale Regulation of Genome Activity.

    PubMed

    Nikolaev, L G; Akopov, S B; Didych, D A; Sverdlov, E D

    2009-08-01

    The CTCF transcription factor is an 11 zinc fingers multifunctional protein that uses different zinc finger combinations to recognize and bind different sites within DNA. CTCF is thought to participate in various gene regulatory networks including transcription activation and repression, formation of independently functioning chromatin domains and regulation of imprinting. Sequencing of human and other genomes opened up a possibility to ascertain the genomic distribution of CTCF binding sites and to identify CTCF-dependent cis-regulatory elements, including insulators. In the review, we summarized recent data on genomic distribution of CTCF binding sites in the human and other genomes within a framework of the loop domain hypothesis of large-scale regulation of the genome activity. We also tried to formulate possible lines of studies on a variety of CTCF functions which probably depend on its ability to specifically bind DNA, interact with other proteins and form di- and multimers. These three fundamental properties allow CTCF to serve as a transcription factor, an insulator and a constitutive dispersed genome-wide demarcation tool able to recruit various factors that emerge in response to diverse external and internal signals, and thus to exert its signal-specific function(s). PMID:20119526

  17. The PRRS Host Genomic Consortium (PHGC) Database: Management of large data sets.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In any consortium project where large amounts of phenotypic and genotypic data are collected across several research labs, issues arise with maintenance and analysis of datasets. The PRRS Host Genomic Consortium (PHGC) Database was developed to meet this need for the PRRS research community. The sch...

  18. Software engineering the mixed model for genome-wide association studies on large samples

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample siz...

  19. Physical mapping resources for large plant genomes: radiation hybrids for wheat D-genome progenitor Aegilops tauschii

    PubMed Central

    2012-01-01

    Background Development of a high quality reference sequence is a daunting task in crops like wheat with large (~17Gb), highly repetitive (>80%) and polyploid genome. To achieve complete sequence assembly of such genomes, development of a high quality physical map is a necessary first step. However, due to the lack of recombination in certain regions of the chromosomes, genetic mapping, which uses recombination frequency to map marker loci, alone is not sufficient to develop high quality marker scaffolds for a sequence ready physical map. Radiation hybrid (RH) mapping, which uses radiation induced chromosomal breaks, has proven to be a successful approach for developing marker scaffolds for sequence assembly in animal systems. Here, the development and characterization of a RH panel for the mapping of D-genome of wheat progenitor Aegilops tauschii is reported. Results Radiation dosages of 350 and 450 Gy were optimized for seed irradiation of a synthetic hexaploid (AABBDD) wheat with the D-genome of Ae. tauschii accession AL8/78. The surviving plants after irradiation were crossed to durum wheat (AABB), to produce pentaploid RH1s (AABBD), which allows the simultaneous mapping of the whole D-genome. A panel of 1,510 RH1 plants was obtained, of which 592 plants were generated from the mature RH1 seeds, and 918 plants were rescued through embryo culture due to poor germination (<3%) of mature RH1 seeds. This panel showed a homogenous marker loss (2.1%) after screening with SSR markers uniformly covering all the D-genome chromosomes. Different marker systems mostly detected different lines with deletions. Using markers covering known distances, the mapping resolution of this RH panel was estimated to be <140kb. Analysis of only 16 RH lines carrying deletions on chromosome 2D resulted in a physical map with cM/cR ratio of 1:5.2 and 15 distinct bins. Additionally, with this small set of lines, almost all the tested ESTs could be mapped. A set of 399 most informative RH

  20. Captured segment exchange: a strategy for custom engineering large genomic regions in Drosophila melanogaster.

    PubMed

    Bateman, Jack R; Palopoli, Michael F; Dale, Sarah T; Stauffer, Jennifer E; Shah, Anita L; Johnson, Justine E; Walsh, Conor W; Flaten, Hanna; Parsons, Christine M

    2013-02-01

    Site-specific recombinases (SSRs) are valuable tools for manipulating genomes. In Drosophila, thousands of transgenic insertions carrying SSR recognition sites have been distributed throughout the genome by several large-scale projects. Here we describe a method with the potential to use these insertions to make custom alterations to the Drosophila genome in vivo. Specifically, by employing recombineering techniques and a dual recombinase-mediated cassette exchange strategy based on the phiC31 integrase and FLP recombinase, we show that a large genomic segment that lies between two SSR recognition-site insertions can be "captured" as a target cassette and exchanged for a sequence that was engineered in bacterial cells. We demonstrate this approach by targeting a 50-kb segment spanning the tsh gene, replacing the existing segment with corresponding recombineered sequences through simple and efficient manipulations. Given the high density of SSR recognition-site insertions in Drosophila, our method affords a straightforward and highly efficient approach to explore gene function in situ for a substantial portion of the Drosophila genome. PMID:23150604

  1. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  2. Insertion sequence-caused large-scale rearrangements in the genome of Escherichia coli

    PubMed Central

    Lee, Heewook; Doak, Thomas G.; Popodi, Ellen; Foster, Patricia L.; Tang, Haixu

    2016-01-01

    A majority of large-scale bacterial genome rearrangements involve mobile genetic elements such as insertion sequence (IS) elements. Here we report novel insertions and excisions of IS elements and recombination between homologous IS elements identified in a large collection of Escherichia coli mutation accumulation lines by analysis of whole genome shotgun sequencing data. Based on 857 identified events (758 IS insertions, 98 recombinations and 1 excision), we estimate that the rate of IS insertion is 3.5 × 10−4 insertions per genome per generation and the rate of IS homologous recombination is 4.5 × 10−5 recombinations per genome per generation. These events are mostly contributed by the IS elements IS1, IS2, IS5 and IS186. Spatial analysis of new insertions suggest that transposition is biased to proximal insertions, and the length spectrum of IS-caused deletions is largely explained by local hopping. For any of the ISs studied there is no region of the circular genome that is favored or disfavored for new insertions but there are notable hotspots for deletions. Some elements have preferences for non-coding sequence or for the beginning and end of coding regions, largely explained by target site motifs. Interestingly, transposition and deletion rates remain constant across the wild-type and 12 mutant E. coli lines, each deficient in a distinct DNA repair pathway. Finally, we characterized the target sites of four IS families, confirming previous results and characterizing a highly specific pattern at IS186 target-sites, 5′-GGGG(N6/N7)CCCC-3′. We also detected 48 long deletions not involving IS elements. PMID:27431326

  3. FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data.

    PubMed

    Huang, Meiyan; Nichols, Thomas; Huang, Chao; Yu, Yang; Lu, Zhaohua; Knickmeyer, Rebecca C; Feng, Qianjin; Zhu, Hongtu

    2015-09-01

    More and more large-scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical data to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. Several major big-data challenges arise from testing genome-wide (NC>12 million known variants) associations with signals at millions of locations (NV~10(6)) in the brain from thousands of subjects (n~10(3)). The aim of this paper is to develop a Fast Voxelwise Genome Wide Association analysiS (FVGWAS) framework to efficiently carry out whole-genome analyses of whole-brain data. FVGWAS consists of three components including a heteroscedastic linear model, a global sure independence screening (GSIS) procedure, and a detection procedure based on wild bootstrap methods. Specifically, for standard linear association, the computational complexity is O (nNVNC) for voxelwise genome wide association analysis (VGWAS) method compared with O ((NC+NV)n(2)) for FVGWAS. Simulation studies show that FVGWAS is an efficient method of searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. Finally, we have successfully applied FVGWAS to a large-scale imaging genetic data analysis of ADNI data with 708 subjects, 193,275voxels in RAVENS maps, and 501,584 SNPs, and the total processing time was 203,645s for a single CPU. Our FVGWAS may be a valuable statistical toolbox for large-scale imaging genetic analysis as the field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. PMID:26025292

  4. Bringing large-scale multiple genome analysis one step closer: ScalaBLAST and beyond

    SciTech Connect

    Oehmen, Christopher S.; Sofia, Heidi J.; Baxter, Douglas; Szeto, Ernest; Hugenholtz, Philip; Kyrpides, Nikos; Markowitz, Victor; Straatsma, Tjerk P.

    2007-06-01

    Genome sequence comparisons of exponentially growing data sets form the foundation for the comparative analysis tools provided by community biological data resources such as the Integrated Microbial Genome (IMG) system at the Joint Genome Institute (JGI). We present an example of how ScalaBLAST, a high-throughput sequence analysis program harnesses increasingly critical high-performance computing to perform sequence analysis which is a critical component of maintaining a state-of-the-art sequence data repository. The Integrated Microbial Genomes (IMG) system1 is a data management and analysis platform for microbial genomes hosted at the JGI. IMG contains both draft and complete JGI genomes integrated with other publicly available microbial genomes of all three domains of life. IMG provides tools and viewers for interactive analysis of genomes, genes and functions, individually or in a comparative context. Most of these tools are based on pre-computed pairwise sequence similarities involving millions of genes. These computations are becoming prohibitively time consuming with the rapid increase in the number of newly sequenced genomes incorporated into IMG and the need to refresh regularly the content of IMG in order to reflect changes in the annotations of existing genomes. Thus, building IMG 2.0 (released on December 1st 2006) entailed reloading from NCBI's RefSeq all the genomes in the previous version of IMG (IMG 1.6, as of September 1st, 2006) together with 1,541 new public microbial,viral and eukaryal genomes, bringing the total of IMG genomes to 2,301. A critical part of building IMG 2.0 involved using PNNL ScalaBLAST software for computing pairwise similarities for over 2.2 million genes in under 26 hours on 1,000 processors, thus illustrating the impact that new generation bioinformatics tools are poised to make in biology. The BLAST algorithm2, 3 is a familiar bioinformatics application for computing sequence similarity, and has become a workhorse in large

  5. Epistatic interactions between ancestral genotype and beneficial mutations shape evolvability in Pseudomonas aeruginosa.

    PubMed

    Gifford, Danna R; Toll-Riera, Macarena; MacLean, R Craig

    2016-07-01

    The idea that interactions between mutations influence adaptation by driving populations to low and high fitness peaks on adaptive landscapes is deeply ingrained in evolutionary theory. Here, we investigate the impact of epistasis on evolvability by challenging populations of two Pseudomonas aeruginosa clones bearing different initial mutations (in rpoB conferring rifampicin resistance, and the type IV pili gene network) to adaptation to a medium containing l-serine as the sole carbon source. Despite being initially indistinguishable in fitness, populations founded by the two ancestral genotypes reached different fitness following 300 generations of evolution. Genome sequencing revealed that the difference could not be explained by acquiring mutations in different targets of selection; the majority of clones from both ancestors converged on one of the following two strategies: (1) acquiring mutations in either PA2449 (gcsR, an l-serine-metabolism RpoN enhancer binding protein) or (2) protease genes. Additionally, populations from both ancestors converged on loss-of-function mutations in the type IV pili gene network, either due to ancestral or acquired mutations. No compensatory or reversion mutations were observed in RNA polymerase (RNAP) genes, in spite of the large fitness costs typically associated with mutations in rpoB. Although current theory points to sign epistasis as the dominant constraint on evolvability, these results suggest that the role of magnitude epistasis in constraining evolvability may be underappreciated. The contribution of magnitude epistasis is likely to be greatest under the biologically relevant mutation supply rates that make back mutations probabilistically unlikely. PMID:27230588

  6. Biological Consequences of Ancient Gene Acquisition and Duplication in the Large Genome of Candidatus Solibacter usitatus Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Hauser, Loren John; Land, Miriam L; Xie, Gary; Kuske, Cheryl R

    2011-01-01

    Members of the bacterial phylum Acidobacteria are widespread in soils and sediments worldwide, and are abundant in many soils. Acidobacteria are challenging to culture in vitro, and many basic features of their biology and functional roles in the soil have not been determined. Candidatus Solibacter usitatus strain Ellin6076 has a 9.9 Mb genome that is approximately 2 5 times as large as the other sequenced Acidobacteria genomes. Bacterial genome sizes typically range from 0.5 to 10 Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Our comparative genome analyses indicate that the Ellin6076 large genome has arisen by horizontal gene transfer via ancient bacteriophage and/or plasmid-mediated transduction, and widespread small-scale gene duplications, resulting in an increased number of paralogs. Low amino acid sequence identities among functional group members, and lack of conserved gene order and orientation in regions containing similar groups of paralogs, suggest that most of the paralogs are not the result of recent duplication events. The genome sizes of additional cultured Acidobacteria strains were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 3 had larger genomes than those of subdivision 1, but none were as large as the Ellin6076 genome. The large genome of Ellin6076 may not be typical of the phylum, and encodes traits that could provide a selective metabolic, defensive and regulatory advantage in the soil environment.

  7. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  8. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick [Kerfeld Lab, UC Berkeley and JGI

    2013-01-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  9. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    SciTech Connect

    Shih, Patrick

    2012-03-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  10. The Dunaliella salina organelle genomes: large sequences, inflated with intronic and intergenic DNA

    SciTech Connect

    Smith, David R.; Lee, Robert W.; Cushman, John C.; Magnuson, Jon K.; Tran, Duc; Polle, Juergen E.

    2010-05-07

    Abstract Background: Dunaliella salina Teodoresco, a unicellular, halophilic green alga belonging to the Chlorophyceae, is among the most industrially important microalgae. This is because D. salina can produce massive amounts of β-carotene, which can be collected for commercial purposes, and because of its potential as a feedstock for biofuels production. Although the biochemistry and physiology of D. salina have been studied in great detail, virtually nothing is known about the genomes it carries, especially those within its mitochondrion and plastid. This study presents the complete mitochondrial and plastid genome sequences of D. salina and compares them with those of the model green algae Chlamydomonas reinhardtii and Volvox carteri. Results: The D. salina organelle genomes are large, circular-mapping molecules with ~60% noncoding DNA, placing them among the most inflated organelle DNAs sampled from the Chlorophyta. In fact, the D. salina plastid genome, at 269 kb, is the largest complete plastid DNA (ptDNA) sequence currently deposited in GenBank, and both the mitochondrial and plastid genomes have unprecedentedly high intron densities for organelle DNA: ~1.5 and ~0.4 introns per gene, respectively. Moreover, what appear to be the relics of genes, introns, and intronic open reading frames are found scattered throughout the intergenic ptDNA regions -- a trait without parallel in other characterized organelle genomes and one that gives insight into the mechanisms and modes of expansion of the D. salina ptDNA. Conclusions: These findings confirm the notion that chlamydomonadalean algae have some of the most extreme organelle genomes of all eukaryotes. They also suggest that the events giving rise to the expanded ptDNA architecture of D. salina and other Chlamydomonadales may have occurred early in the evolution of this lineage. Although interesting from a genome evolution standpoint, the D. salina organelle DNA sequences will aid in the development of a viable

  11. Hyper-expansion of large DNA segments in the genome of kuruma shrimp, Marsupenaeus japonicus

    PubMed Central

    2010-01-01

    Background Higher crustaceans (class Malacostraca) represent the most species-rich and morphologically diverse group of non-insect arthropods and many of its members are commercially important. Although the crustacean DNA sequence information is growing exponentially, little is known about the genome organization of Malacostraca. Here, we constructed a bacterial artificial chromosome (BAC) library and performed BAC-end sequencing to provide genomic information for kuruma shrimp (Marsupenaeus japonicus), one of the most widely cultured species among crustaceans, and found the presence of a redundant sequence in the BAC library. We examined the BAC clone that includes the redundant sequence to further analyze its length, copy number and location in the kuruma shrimp genome. Results Mj024A04 BAC clone, which includes one redundant sequence, contained 27 putative genes and seemed to display a normal genomic DNA structure. Notably, of the putative genes, 3 genes encode homologous proteins to the inhibitor of apoptosis protein and 7 genes encode homologous proteins to white spot syndrome virus, a virulent pathogen known to affect crustaceans. Colony hybridization and PCR analysis of 381 BAC clones showed that almost half of the BAC clones maintain DNA segments whose sequences are homologous to the representative BAC clone Mj024A04. The Mj024A04 partial sequence was detected multiple times in the kuruma shrimp nuclear genome with a calculated copy number of at least 100. Microsatellites based BAC genotyping clearly showed that Mj024A04 homologous sequences were cloned from at least 48 different chromosomal loci. The absence of micro-syntenic relationships with the available genomic sequences of Daphnia and Drosophila suggests the uniqueness of these fragments in kuruma shrimp from current arthropod genome sequences. Conclusions Our results demonstrate that hyper-expansion of large DNA segments took place in the kuruma shrimp genome. Although we analyzed only a part of the

  12. Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains

    PubMed Central

    Salipante, Stephen J.; Roach, David J.; Kitzman, Jacob O.; Snyder, Matthew W.; Stackhouse, Bethany; Butler-Wu, Susan M.; Lee, Choli; Cookson, Brad T.

    2015-01-01

    Large-scale bacterial genome sequencing efforts to date have provided limited information on the most prevalent category of disease: sporadically acquired infections caused by common pathogenic bacteria. Here, we performed whole-genome sequencing and de novo assembly of 312 blood- or urine-derived isolates of extraintestinal pathogenic (ExPEC) Escherichia coli, a common agent of sepsis and community-acquired urinary tract infections, obtained during the course of routine clinical care at a single institution. We find that ExPEC E. coli are highly genomically heterogeneous, consistent with pan-genome analyses encompassing the larger species. Investigation of differential virulence factor content and antibiotic resistance phenotypes reveals markedly different profiles among lineages and among strains infecting different body sites. We use high-resolution molecular epidemiology to explore the dynamics of infections at the level of individual patients, including identification of possible person-to-person transmission. Notably, a limited number of discrete lineages caused the majority of bloodstream infections, including one subclone (ST131-H30) responsible for 28% of bacteremic E. coli infections over a 3-yr period. We additionally use a microbial genome-wide-association study (GWAS) approach to identify individual genes responsible for antibiotic resistance, successfully recovering known genes but notably not identifying any novel factors. We anticipate that in the near future, whole-genome sequencing of microorganisms associated with clinical disease will become routine. Our study reveals what kind of information can be obtained from sequencing clinical isolates on a large scale, even well-characterized organisms such as E. coli, and provides insight into how this information might be utilized in a healthcare setting. PMID:25373147

  13. Insights into the Genome of Large Sulfur Bacteria Revealed by Analysis of Single Filaments

    PubMed Central

    Richter, Michael; de Beer, Dirk; Preisler, André; Jørgensen, Bo B; Huntemann, Marcel; Glöckner, Frank Oliver; Amann, Rudolf; Koopman, Werner J. H; Lasken, Roger S; Janto, Benjamin; Hogg, Justin; Stoodley, Paul; Boissy, Robert; Ehrlich, Garth D

    2007-01-01

    Marine sediments are frequently covered by mats of the filamentous Beggiatoa and other large nitrate-storing bacteria that oxidize hydrogen sulfide using either oxygen or nitrate, which they store in intracellular vacuoles. Despite their conspicuous metabolic properties and their biogeochemical importance, little is known about their genetic repertoire because of the lack of pure cultures. Here, we present a unique approach to access the genome of single filaments of Beggiatoa by combining whole genome amplification, pyrosequencing, and optical genome mapping. Sequence assemblies were incomplete and yielded average contig sizes of approximately 1 kb. Pathways for sulfur oxidation, nitrate and oxygen respiration, and CO2 fixation confirm the chemolithoautotrophic physiology of Beggiatoa. In addition, Beggiatoa potentially utilize inorganic sulfur compounds and dimethyl sulfoxide as electron acceptors. We propose a mechanism of vacuolar nitrate accumulation that is linked to proton translocation by vacuolar-type ATPases. Comparative genomics indicates substantial horizontal gene transfer of storage, metabolic, and gliding capabilities between Beggiatoa and cyanobacteria. These capabilities enable Beggiatoa to overcome non-overlapping availabilities of electron donors and acceptors while gliding between oxic and sulfidic zones. The first look into the genome of these filamentous sulfur-oxidizing bacteria substantially deepens the understanding of their evolution and their contribution to sulfur and nitrogen cycling in marine sediments. PMID:17760503

  14. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets

    PubMed Central

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    Background As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Methods Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Results Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Conclusions Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. PMID:24464852

  15. An improved method for oriT-directed cloning and functionalization of large bacterial genomic regions.

    PubMed

    Kvitko, Brian H; McMillan, Ian A; Schweizer, Herbert P

    2013-08-01

    We have made significant improvements to a broad-host-range system for the cloning and manipulation of large bacterial genomic regions based on site-specific recombination between directly repeated oriT sites during conjugation. Using two suicide capture vectors carrying flanking homology regions, oriT sites are recombined on either side of the target region. Using a broad-host-range conjugation helper plasmid, the region between the oriT sites is conjugated into an Escherichia coli recipient strain, where it is circularized and maintained as a chimeric mini-F vector. The cloned target region is functionalized in multiple ways to accommodate downstream manipulation. The target region is flanked with Gateway attB sites for recombination into other vectors and by rare 18-bp I-SceI restriction sites for subcloning. The Tn7-functionalized target can also be inserted at a naturally occurring chromosomal attTn7 site(s) or maintained as a broad-host-range plasmid for complementation or heterologous expression studies. We have used the oriTn7 capture technique to clone and complement Burkholderia pseudomallei genomic regions up to 140 kb in size and have created isogenic Burkholderia strains with various combinations of genomic islands. We believe this system will greatly aid the cloning and genetic analysis of genomic islands, biosynthetic gene clusters, and large open reading frames. PMID:23747708

  16. An Improved Method for oriT-Directed Cloning and Functionalization of Large Bacterial Genomic Regions

    PubMed Central

    Kvitko, Brian H.; McMillan, Ian A.

    2013-01-01

    We have made significant improvements to a broad-host-range system for the cloning and manipulation of large bacterial genomic regions based on site-specific recombination between directly repeated oriT sites during conjugation. Using two suicide capture vectors carrying flanking homology regions, oriT sites are recombined on either side of the target region. Using a broad-host-range conjugation helper plasmid, the region between the oriT sites is conjugated into an Escherichia coli recipient strain, where it is circularized and maintained as a chimeric mini-F vector. The cloned target region is functionalized in multiple ways to accommodate downstream manipulation. The target region is flanked with Gateway attB sites for recombination into other vectors and by rare 18-bp I-SceI restriction sites for subcloning. The Tn7-functionalized target can also be inserted at a naturally occurring chromosomal attTn7 site(s) or maintained as a broad-host-range plasmid for complementation or heterologous expression studies. We have used the oriTn7 capture technique to clone and complement Burkholderia pseudomallei genomic regions up to 140 kb in size and have created isogenic Burkholderia strains with various combinations of genomic islands. We believe this system will greatly aid the cloning and genetic analysis of genomic islands, biosynthetic gene clusters, and large open reading frames. PMID:23747708

  17. The ancestral gene repertoire of animal stem cells.

    PubMed

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-12-22

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the "germ-line multipotency program" and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  18. The ancestral gene repertoire of animal stem cells

    PubMed Central

    Alié, Alexandre; Hayashi, Tetsutaro; Sugimura, Itsuro; Manuel, Michaël; Sugano, Wakana; Mano, Akira; Satoh, Nori; Agata, Kiyokazu; Funayama, Noriko

    2015-01-01

    Stem cells are pivotal for development and tissue homeostasis of multicellular animals, and the quest for a gene toolkit associated with the emergence of stem cells in a common ancestor of all metazoans remains a major challenge for evolutionary biology. We reconstructed the conserved gene repertoire of animal stem cells by transcriptomic profiling of totipotent archeocytes in the demosponge Ephydatia fluviatilis and by tracing shared molecular signatures with flatworm and Hydra stem cells. Phylostratigraphy analyses indicated that most of these stem-cell genes predate animal origin, with only few metazoan innovations, notably including several partners of the Piwi machinery known to promote genome stability. The ancestral stem-cell transcriptome is strikingly poor in transcription factors. Instead, it is rich in RNA regulatory actors, including components of the “germ-line multipotency program” and many RNA-binding proteins known as critical regulators of mammalian embryonic stem cells. PMID:26644562

  19. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2. PMID:27527786

  20. OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

    PubMed

    Gao, Song; Bertrand, Denis; Chia, Burton K H; Nagarajan, Niranjan

    2016-01-01

    The assembly of large, repeat-rich eukaryotic genomes represents a significant challenge in genomics. While long-read technologies have made the high-quality assembly of small, microbial genomes increasingly feasible, data generation can be expensive for larger genomes. OPERA-LG is a scalable, exact algorithm for the scaffold assembly of large, repeat-rich genomes, out-performing state-of-the-art programs for scaffold correctness and contiguity. It provides a rigorous framework for scaffolding of repetitive sequences and a systematic approach for combining data from different second-generation and third-generation sequencing technologies. OPERA-LG provides an avenue for systematic augmentation and improvement of thousands of existing draft eukaryotic genome assemblies. PMID:27169502

  1. Distilling Artificial Recombinants from Large Sets of Complete mtDNA Genomes

    PubMed Central

    Kong, Qing-Peng; Salas, Antonio; Sun, Chang; Fuku, Noriyuki; Tanaka, Masashi; Zhong, Li; Wang, Cheng-Ye; Yao, Yong-Gang; Bandelt, Hans-Jürgen

    2008-01-01

    Background Large-scale genome sequencing poses enormous problems to the logistics of laboratory work and data handling. When numerous fragments of different genomes are PCR amplified and sequenced in a laboratory, there is a high immanent risk of sample confusion. For genetic markers, such as mitochondrial DNA (mtDNA), which are free of natural recombination, single instances of sample mix-up involving different branches of the mtDNA phylogeny would give rise to reticulate patterns and should therefore be detectable. Methodology/Principal Findings We have developed a strategy for comparing new complete mtDNA genomes, one by one, to a current skeleton of the worldwide mtDNA phylogeny. The mutations distinguishing the reference sequence from a putative recombinant sequence can then be allocated to two or more different branches of this phylogenetic skeleton. Thus, one would search for two (or three) near-matches in the total mtDNA database that together best explain the variation seen in the recombinants. The evolutionary pathway from the mtDNA tree connecting this pair together with the recombinant then generate a grid-like median network, from which one can read off the exchanged segments. Conclusions We have applied this procedure to a large collection of complete human mtDNA sequences, where several recombinants could be distilled by our method. All these recombinant sequences were subsequently corrected by de novo experiments – fully concordant with the predictions from our data-analytical approach. PMID:18714389

  2. Genome-scale phylogenetic function annotation of large and diverse protein families

    PubMed Central

    Engelhardt, Barbara E.; Jordan, Michael I.; Srouji, John R.; Brenner, Steven E.

    2011-01-01

    The Statistical Inference of Function Through Evolutionary Relationships (SIFTER) framework uses a statistical graphical model that applies phylogenetic principles to automate precise protein function prediction. Here we present a revised approach (SIFTER version 2.0) that enables annotations on a genomic scale. SIFTER 2.0 produces equivalently precise predictions compared to the earlier version on a carefully studied family and on a collection of 100 protein families. We have added an approximation method to SIFTER 2.0 and show a 500-fold improvement in speed with minimal impact on prediction results in the functionally diverse sulfotransferase protein family. On the Nudix protein family, previously inaccessible to the SIFTER framework because of the 66 possible molecular functions, SIFTER achieved 47.4% accuracy on experimental data (where BLAST achieved 34.0%). Finally, we used SIFTER to annotate all of the Schizosaccharomyces pombe proteins with experimental functional characterizations, based on annotations from proteins in 46 fungal genomes. SIFTER precisely predicted molecular function for 45.5% of the characterized proteins in this genome, as compared with four current function prediction methods that precisely predicted function for 62.6%, 30.6%, 6.0%, and 5.7% of these proteins. We use both precision-recall curves and ROC analyses to compare these genome-scale predictions across the different methods and to assess performance on different types of applications. SIFTER 2.0 is capable of predicting protein molecular function for large and functionally diverse protein families using an approximate statistical model, enabling phylogenetics-based protein function prediction for genome-wide analyses. The code for SIFTER and protein family data are available at http://sifter.berkeley.edu. PMID:21784873

  3. Biological consequences of ancient gene acquisition and duplication in the large genome soil bacterium, ""solibacter usitatus"" strain Ellin6076

    SciTech Connect

    Challacombe, Jean F; Eichorst, Stephanie A; Xie, Gary; Kuske, Cheryl R; Hauser, Loren; Land, Miriam

    2009-01-01

    Bacterial genome sizes range from ca. 0.5 to 10Mb and are influenced by gene duplication, horizontal gene transfer, gene loss and other evolutionary processes. Sequenced genomes of strains in the phylum Acidobacteria revealed that 'Solibacter usistatus' strain Ellin6076 harbors a 9.9 Mb genome. This large genome appears to have arisen by horizontal gene transfer via ancient bacteriophage and plasmid-mediated transduction, as well as widespread small-scale gene duplications. This has resulted in an increased number of paralogs that are potentially ecologically important (ecoparalogs). Low amino acid sequence identities among functional group members and lack of conserved gene order and orientation in the regions containing similar groups of paralogs suggest that most of the paralogs were not the result of recent duplication events. The genome sizes of cultured subdivision 1 and 3 strains in the phylum Acidobacteria were estimated using pulsed-field gel electrophoresis to determine the prevalence of the large genome trait within the phylum. Members of subdivision 1 were estimated to have smaller genome sizes ranging from ca. 2.0 to 4.8 Mb, whereas members of subdivision 3 had slightly larger genomes, from ca. 5.8 to 9.9 Mb. It is hypothesized that the large genome of strain Ellin6076 encodes traits that provide a selective metabolic, defensive and regulatory advantage in the variable soil environment.

  4. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing

    PubMed Central

    Keinath, Melissa C.; Timoshevskiy, Vladimir A.; Timoshevskaya, Nataliya Y.; Tsonis, Panagiotis A.; Voss, S. Randal; Smith, Jeramiah J.

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  5. Initial characterization of the large genome of the salamander Ambystoma mexicanum using shotgun and laser capture chromosome sequencing.

    PubMed

    Keinath, Melissa C; Timoshevskiy, Vladimir A; Timoshevskaya, Nataliya Y; Tsonis, Panagiotis A; Voss, S Randal; Smith, Jeramiah J

    2015-01-01

    Vertebrates exhibit substantial diversity in genome size, and some of the largest genomes exist in species that uniquely inform diverse areas of basic and biomedical research. For example, the salamander Ambystoma mexicanum (the Mexican axolotl) is a model organism for studies of regeneration, development and genome evolution, yet its genome is ~10× larger than the human genome. As part of a hierarchical approach toward improving genome resources for the species, we generated 600 Gb of shotgun sequence data and developed methods for sequencing individual laser-captured chromosomes. Based on these data, we estimate that the A. mexicanum genome is ~32 Gb. Notably, as much as 19 Gb of the A. mexicanum genome can potentially be considered single copy, which presumably reflects the evolutionary diversification of mobile elements that accumulated during an ancient episode of genome expansion. Chromosome-targeted sequencing permitted the development of assemblies within the constraints of modern computational platforms, allowed us to place 2062 genes on the two smallest A. mexicanum chromosomes and resolves key events in the history of vertebrate genome evolution. Our analyses show that the capture and sequencing of individual chromosomes is likely to provide valuable information for the systematic sequencing, assembly and scaffolding of large genomes. PMID:26553646

  6. A Roadmap for Natural Product Discovery Based on Large-Scale Genomics and Metabolomics

    PubMed Central

    Doroghazi, James R.; Albright, Jessica C.; Goering, Anthony W.; Ju, Kou-San; Haines, Robert R.; Tchalukov, Konstantin A.; Labeda, David P.; Kelleher, Neil L.; Metcalf, William W.

    2014-01-01

    Actinobacteria encode a wealth of natural product biosynthetic gene clusters (NPGCs), whose systematic study is complicated by numerous repetitive motifs. By combining several metrics we developed a method for global classification of these gene clusters into families (GCFs) and analyzed the biosynthetic capacity of Actinobacteria in 830 genome sequences, including 344 obtained for this project. The GCF network, comprised of 11,422 gene clusters grouped into 4,122 GCFs, was validated in hundreds of strains by correlating confident mass spectrometric detection of known small molecules with the presence/absence of their established biosynthetic gene clusters. The method also linked previously unassigned GCFs to known natural products, an approach that will enable de novo, bioassay-free discovery of novel natural products using large data sets. Extrapolation from the 830-genome dataset reveals that Actinobacteria encode hundreds of thousands of future drug leads, while the strong correlation between phylogeny and GCFs frames a roadmap to efficiently access them. PMID:25262415

  7. Ultra Large Gene Families: A Matter of Adaptation or Genomic Parasites?

    PubMed

    Schiffer, Philipp H; Gravemeyer, Jan; Rauscher, Martina; Wiehe, Thomas

    2016-01-01

    Gene duplication is an important mechanism of molecular evolution. It offers a fast track to modification, diversification, redundancy or rescue of gene function. However, duplication may also be neutral or (slightly) deleterious, and often ends in pseudo-geneisation. Here, we investigate the phylogenetic distribution of ultra large gene families on long and short evolutionary time scales. In particular, we focus on a family of NACHT-domain and leucine-rich-repeat-containing (NLR)-genes, which we previously found in large numbers to occupy one chromosome arm of the zebrafish genome. We were interested to see whether such a tight clustering is characteristic for ultra large gene families. Our data reconfirm that most gene family inflations are lineage-specific, but we can only identify very few gene clusters. Based on our observations we hypothesise that, beyond a certain size threshold, ultra large gene families continue to proliferate in a mechanism we term "run-away evolution". This process might ultimately lead to the failure of genomic integrity and drive species to extinction. PMID:27509525

  8. Draft genome sequence of the Daphnia pathogen Octosporea bayeri: insights into the gene content of a large microsporidian genome and a model for host-parasite interactions

    PubMed Central

    2009-01-01

    Background The highly compacted 2.9-Mb genome of Encephalitozoon cuniculi placed the microsporidia in the spotlight, encoding a mere 2,000 proteins and a highly reduced suite of biochemical pathways. This extreme level of reduction is not universal across the microsporidia, with genomes known to vary up to sixfold in size, suggesting that some genomes may harbor a gene content that is not as reduced as that of Enc. cuniculi. In this study, we present an in-depth survey of the large genome of Octosporea bayeri, a pathogen of Daphnia magna, with an estimated genome size of 24 Mb, in order to shed light on the organization and content of a large microsporidian genome. Results Using Illumina sequencing, 898 Mb of O. bayeri genome sequence was generated, resulting in 13.3 Mb of unique sequence. We annotated a total of 2,174 genes, of which 893 encodes proteins with assigned function. The gene density of the O. bayeri genome is very low on average, but also highly uneven, so gene-dense regions also occur. The data presented here suggest that the O. bayeri proteome is well represented in this analysis and is more complex that that of Enc. cuniculi. Functional annotation of O. bayeri proteins suggests that this species might be less biochemically dependent on its host for its metabolism than its more reduced relatives. Conclusions The combination of the data presented here, together with the imminent annotated genome of Daphnia magna, will provide a wealth of genetic and genomic tools to study host-parasite interactions in an interesting model for pathogenesis. PMID:19807911

  9. Ancestral European roots of Helicobacter pylori in India

    PubMed Central

    Devi, S Manjulata; Ahmed, Irshad; Francalacci, Paolo; Hussain, M Abid; Akhter, Yusuf; Alvi, Ayesha; Sechi, Leonardo A; Mégraud, Francis; Ahmed, Niyaz

    2007-01-01

    Background The human gastric pathogen Helicobacter pylori is co-evolved with its host and therefore, origins and expansion of multiple populations and sub populations of H. pylori mirror ancient human migrations. Ancestral origins of H. pylori in the vast Indian subcontinent are debatable. It is not clear how different waves of human migrations in South Asia shaped the population structure of H. pylori. We tried to address these issues through mapping genetic origins of present day H. pylori in India and their genomic comparison with hundreds of isolates from different geographic regions. Results We attempted to dissect genetic identity of strains by multilocus sequence typing (MLST) of the 7 housekeeping genes (atpA, efp, ureI, ppa, mutY, trpC, yphC) and phylogeographic analysis of haplotypes using MEGA and NETWORK software while incorporating DNA sequences and genotyping data of whole cag pathogenicity-islands (cagPAI). The distribution of cagPAI genes within these strains was analyzed by using PCR and the geographic type of cagA phosphorylation motif EPIYA was determined by gene sequencing. All the isolates analyzed revealed European ancestry and belonged to H. pylori sub-population, hpEurope. The cagPAI harbored by Indian strains revealed European features upon PCR based analysis and whole PAI sequencing. Conclusion These observations suggest that H. pylori strains in India share ancestral origins with their European counterparts. Further, non-existence of other sub-populations such as hpAfrica and hpEastAsia, at least in our collection of isolates, suggest that the hpEurope strains enjoyed a special fitness advantage in Indian stomachs to out-compete any endogenous strains. These results also might support hypotheses related to gene flow in India through Indo-Aryans and arrival of Neolithic practices and languages from the Fertile Crescent. PMID:17584914

  10. Reverse engineering and analysis of large genome-scale gene networks

    PubMed Central

    Aluru, Maneesha; Zola, Jaroslaw; Nettleton, Dan; Aluru, Srinivas

    2013-01-01

    Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. PMID:23042249

  11. Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection

    PubMed Central

    Xie, Weibo; Wang, Gongwei; Yuan, Meng; Yao, Wen; Lyu, Kai; Zhao, Hu; Yang, Meng; Li, Pingbo; Zhang, Xing; Yuan, Jing; Wang, Quanxiu; Liu, Fang; Dong, Huaxia; Zhang, Lejing; Li, Xinglei; Meng, Xiangzhou; Zhang, Wan; Xiong, Lizhong; He, Yuqing; Wang, Shiping; Yu, Sibin; Xu, Caiguo; Luo, Jie; Li, Xianghua; Xiao, Jinghua; Lian, Xingming; Zhang, Qifa

    2015-01-01

    Intensive rice breeding over the past 50 y has dramatically increased productivity especially in the indica subspecies, but our knowledge of the genomic changes associated with such improvement has been limited. In this study, we analyzed low-coverage sequencing data of 1,479 rice accessions from 73 countries, including landraces and modern cultivars. We identified two major subpopulations, indica I (IndI) and indica II (IndII), in the indica subspecies, which corresponded to the two putative heterotic groups resulting from independent breeding efforts. We detected 200 regions spanning 7.8% of the rice genome that had been differentially selected between IndI and IndII, and thus referred to as breeding signatures. These regions included large numbers of known functional genes and loci associated with important agronomic traits revealed by genome-wide association studies. Grain yield was positively correlated with the number of breeding signatures in a variety, suggesting that the number of breeding signatures in a line may be useful for predicting agronomic potential and the selected loci may provide targets for rice improvement. PMID:26358652

  12. Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell

    PubMed Central

    Makarova, Kira S.; Wolf, Yuri I.; Mekhedov, Sergey L.; Mirkin, Boris G.; Koonin, Eugene V.

    2005-01-01

    Gene duplication is a crucial mechanism of evolutionary innovation. A substantial fraction of eukaryotic genomes consists of paralogous gene families. We assess the extent of ancestral paralogy, which dates back to the last common ancestor of all eukaryotes, and examine the origins of the ancestral paralogs and their potential roles in the emergence of the eukaryotic cell complexity. A parsimonious reconstruction of ancestral gene repertoires shows that 4137 orthologous gene sets in the last eukaryotic common ancestor (LECA) map back to 2150 orthologous sets in the hypothetical first eukaryotic common ancestor (FECA) [paralogy quotient (PQ) of 1.92]. Analogous reconstructions show significantly lower levels of paralogy in prokaryotes, 1.19 for archaea and 1.25 for bacteria. The only functional class of eukaryotic proteins with a significant excess of paralogous clusters over the mean includes molecular chaperones and proteins with related functions. Almost all genes in this category underwent multiple duplications during early eukaryotic evolution. In structural terms, the most prominent sets of paralogs are superstructure-forming proteins with repetitive domains, such as WD-40 and TPR. In addition to the true ancestral paralogs which evolved via duplication at the onset of eukaryotic evolution, numerous pseudoparalogs were detected, i.e. homologous genes that apparently were acquired by early eukaryotes via different routes, including horizontal gene transfer (HGT) from diverse bacteria. The results of this study demonstrate a major increase in the level of gene paralogy as a hallmark of the early evolution of eukaryotes. PMID:16106042

  13. A method for the large scale isolation of high transformation efficiency fungal genomic DNA.

    PubMed

    Zhang, D; Yang, Y; Castlebury, L A; Cerniglia, C E

    1996-12-01

    A procedure for isolation of genomic DNA from the zygomycete Cunninghamella elegans and other filamentous fungi and yeasts is reported. This procedure involves disruption of cells by grinding using dry ice, removal of polysaccharides using cetyltrimethylammonium bromide and by phenol extractions, and precipitation of DNA with isopropanol at room temperature. The isolation method produced large scale (approximate 1 mg DNA/5 g wet cells) and highly purified high molecular mass DNA. Sau3AI partially digested DNA showed high transformation efficiency (> 10(6)/100 ng DNA) when ligated to ZAP-express lambda vector. PMID:8961565

  14. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    PubMed

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy

  15. In search of ancestral Kilauea volcano

    USGS Publications Warehouse

    Lipman, P.W.; Sisson, T.W.; Ui, T.; Naka, J.

    2000-01-01

    Submersible observations and samples show that the lower south flank of Hawaii, offshore from Kilauea volcano and the active Hilina slump system, consists entirely of compositionally diverse volcaniclastic rocks; pillow lavas are confined to shallow slopes. Submarine-erupted basalt clasts have strongly variable alkalic and transitional basalt compositions (to 41% SiO2, 10.8% alkalies), contrasting with present-day Kilauea tholeiites. The volcaniclastic rocks provide a unique record of ancestral alkalic growth of an archetypal hotspot volcano, including transition to its tholeiitic shield stage, and associated slope-failure events.

  16. Managing Large-Scale Genomic Datasets and Translation into Clinical Practice

    PubMed Central

    2014-01-01

    Summary Objective To summarize excellent current research in the field of Bioinformatics and Translational Informatics with application in the health domain. Method We provide a synopsis of the articles selected for the IMIA Yearbook 2014, from which we attempt to derive a synthetic overview of current and future activities in the field. A first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section. Each section editor evaluated independently the set of 1,851 articles and 15 articles were retained for peer-review. Results The selection and evaluation process of this Yearbook’s section on Bioinformatics and Translational Informatics yielded three excellent articles regarding data management and genome medicine. In the first article, the authors present VEST (Variant Effect Scoring Tool) which is a supervised machine learning tool for prioritizing variants found in exome sequencing projects that are more likely involved in human Mendelian diseases. In the second article, the authors show how to infer surnames of male individuals by crossing anonymous publicly available genomic data from the Y chromosome and public genealogy data banks. The third article presents a statistical framework called iCluster+ that can perform pattern discovery in integrated cancer genomic data. This framework was able to determine different tumor subtypes in colon cancer. Conclusions The current research activities still attest the continuous convergence of Bioinformatics and Medical Informatics, with a focus this year on large-scale biological, genomic, and Electronic Health Records data. Indeed, there is a need for powerful tools for managing and interpreting complex data, but also a need for user-friendly tools developed for the clinicians in their daily practice. All the recent research and development efforts are contributing to the challenge of impacting clinically the results and even going towards a

  17. A Common Ancestral Mutation in CRYBB3 Identified in Multiple Consanguineous Families with Congenital Cataracts

    PubMed Central

    Irum, Bushra; Khan, Arif O.; Wang, Qiwei; Li, David; Khan, Asma A.; Husnain, Tayyab; Akram, Javed; Riazuddin, Sheikh

    2016-01-01

    Purpose This study was performed to investigate the genetic determinants of autosomal recessive congenital cataracts in large consanguineous families. Methods Affected individuals underwent a detailed ophthalmological examination and slit-lamp photographs of the cataractous lenses were obtained. An aliquot of blood was collected from all participating family members and genomic DNA was extracted from white blood cells. Initially, a genome-wide scan was performed with genomic DNAs of family PKCC025 followed by exclusion analysis of our familial cohort of congenital cataracts. Protein-coding exons of CRYBB1, CRYBB2, CRYBB3, and CRYBA4 were sequenced bidirectionally. A haplotype was constructed with SNPs flanking the causal mutation for affected individuals in all four families, while the probability that the four familial cases have a common founder was estimated using EM and CHM-based algorithms. The expression of Crybb3 in the developing murine lens was investigated using TaqMan assays. Results The clinical and ophthalmological examinations suggested that all affected individuals had nuclear cataracts. Genome-wide linkage analysis localized the causal phenotype in family PKCC025 to chromosome 22q with statistically significant two-point logarithm of odds (LOD) scores. Subsequently, we localized three additional families, PKCC063, PKCC131, and PKCC168 to chromosome 22q. Bidirectional Sanger sequencing identified a missense variation: c.493G>C (p.Gly165Arg) in CRYBB3 that segregated with the disease phenotype in all four familial cases. This variation was not found in ethnically matched control chromosomes, the NHLBI exome variant server, or the 1000 Genomes or dbSNP databases. Interestingly, all four families harbor a unique disease haplotype that strongly suggests a common founder of the causal mutation (p<1.64E-10). We observed expression of Crybb3 in the mouse lens as early as embryonic day 15 (E15), and expression remained relatively steady throughout

  18. The search for ancestral nervous systems: an integrative and comparative approach.

    PubMed

    Satterlie, Richard A

    2015-02-15

    Even the most basal multicellular nervous systems are capable of producing complex behavioral acts that involve the integration and combination of simple responses, and decision-making when presented with conflicting stimuli. This requires an understanding beyond that available from genomic investigations, and calls for a integrative and comparative approach, where the power of genomic/transcriptomic techniques is coupled with morphological, physiological and developmental experimentation to identify common and species-specific nervous system properties for the development and elaboration of phylogenomic reconstructions. With careful selection of genes and gene products, we can continue to make significant progress in our search for ancestral nervous system organizations. PMID:25696824

  19. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  20. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach

    PubMed Central

    Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A.; Tenesa, Albert

    2015-01-01

    Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010

  1. A new tool called DISSECT for analysing large genomic data sets using a Big Data approach.

    PubMed

    Canela-Xandri, Oriol; Law, Andy; Gray, Alan; Woolliams, John A; Tenesa, Albert

    2015-01-01

    Large-scale genetic and genomic data are increasingly available and the major bottleneck in their analysis is a lack of sufficiently scalable computational tools. To address this problem in the context of complex traits analysis, we present DISSECT. DISSECT is a new and freely available software that is able to exploit the distributed-memory parallel computational architectures of compute clusters, to perform a wide range of genomic and epidemiologic analyses, which currently can only be carried out on reduced sample sizes or under restricted conditions. We demonstrate the usefulness of our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using mixed-linear model analysis. We analyse simulated traits from 470,000 individuals genotyped for 590,004 SNPs in ∼4 h using the combined computational power of 8,400 processor cores. We find that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large sample sizes. PMID:26657010

  2. Cross-Platform Assessment of Genomic Imbalance Confirms the Clinical Relevance of Genomic Complexity and Reveals Loci with Potential Pathogenic Roles in Diffuse Large B-Cell Lymphoma

    PubMed Central

    Dias, Lizalynn M.; Thodima, Venkata; Friedman, Julia; Ma, Charles; Guttapalli, Asha; Mendiratta, Geetu; Siddiqi, Imran N.; Syrbu, Sergei; Chaganti, R. S. K.; Houldsworth, Jane

    2016-01-01

    Genomic copy number alterations (CNAs) in diffuse large B-cell lymphoma (DLBCL) have roles in disease pathogenesis but overall clinical relevance remains unclear. Herein, an unbiased algorithm was uniformly applied across three genome profiling datasets comprising 392 newly-diagnosed DLBCL specimens that defined 32 overlapping CNAs, involving 36 minimal common regions (MCRs). Scoring criteria were established for 50 aberrations within the MCRs while considering peak gains/losses. Application of these criteria to independent datasets revealed novel candidate genes with coordinated expression, such as CNOT2, potentially with pathogenic roles. No one single aberration significantly associated with patient outcome across datasets, but genomic complexity, defined by imbalance in more than one MCR, significantly portended adverse outcome in two of three independent datasets. Thus, the standardized scoring of CNAs currently developed can be uniformly applied across platforms, affording robust validation of genomic imbalance and complexity in DLBCL and overall clinical utility as biomarkers of patient outcome. PMID:26294112

  3. An Ancestral Recombination Graph for Diploid Populations with Skewed Offspring Distribution

    PubMed Central

    Birkner, Matthias; Blath, Jochen; Eldon, Bjarki

    2013-01-01

    A large offspring-number diploid biparental multilocus population model of Moran type is our object of study. At each time step, a pair of diploid individuals drawn uniformly at random contributes offspring to the population. The number of offspring can be large relative to the total population size. Similar “heavily skewed” reproduction mechanisms have been recently considered by various authors (cf. e.g., Eldon and Wakeley 2006, 2008) and reviewed by Hedgecock and Pudovkin (2011). Each diploid parental individual contributes exactly one chromosome to each diploid offspring, and hence ancestral lineages can coalesce only when in distinct individuals. A separation-of-timescales phenomenon is thus observed. A result of Möhle (1998) is extended to obtain convergence of the ancestral process to an ancestral recombination graph necessarily admitting simultaneous multiple mergers of ancestral lineages. The usual ancestral recombination graph is obtained as a special case of our model when the parents contribute only one offspring to the population each time. Due to diploidy and large offspring numbers, novel effects appear. For example, the marginal genealogy at each locus admits simultaneous multiple mergers in up to four groups, and different loci remain substantially correlated even as the recombination rate grows large. Thus, genealogies for loci far apart on the same chromosome remain correlated. Correlation in coalescence times for two loci is derived and shown to be a function of the coalescence parameters of our model. Extending the observations by Eldon and Wakeley (2008), predictions of linkage disequilibrium are shown to be functions of the reproduction parameters of our model, in addition to the recombination rate. Correlations in ratios of coalescence times between loci can be high, even when the recombination rate is high and sample size is large, in large offspring-number populations, as suggested by simulations, hinting at how to distinguish between

  4. Transitions in Sexuality: Recapitulation of an Ancestral Tri- and Tetrapolar Mating System in Cryptococcus neoformans▿ †

    PubMed Central

    Hsueh, Yen-Ping; Fraser, James A.; Heitman, Joseph

    2008-01-01

    Sex is orchestrated by the mating-type locus (MAT) in fungi and by sex chromosomes in plants and animals. In fungi, two patterns of sexuality occur: bipolar with a single, typically biallelic sex determinant that promotes inbreeding, and tetrapolar with two unlinked, often multiallelic sex determinants that restrict inbreeding. Multiallelism in either bipolar or tetrapolar mating systems promotes outcrossing. Cryptococcus neoformans is a pathogenic bipolar yeast with two unusually large MAT alleles (a/α) spanning >100 kb, ∼100-fold larger than many other fungal MAT loci. Based on comparative genomic analysis, this unusual MAT locus is hypothesized to have evolved from an ancestral tetrapolar system. In this model, the unlinked homeodomain (HD) transcription factor and pheromone/receptor tetrapolar loci acquired additional sex-related genes and then fused via chromosomal translocation, forming an intermediate transitional mating system (which we term tripolar), which then underwent recombination and gene conversion to fashion the extant bipolar MAT alleles. To experimentally validate this model, C. neoformans was engineered to have a tetrapolar mating system by relocating the MAT SXI1α and SXI2a HD genes to an unlinked genomic locale. Genetic and molecular analyses revealed that this modified organism could complete a tetrapolar sexual cycle. Analysis of progeny generated from bipolar, tripolar, and tetrapolar crosses provides direct experimental evidence that the tripolar state confers decreased fertility and therefore may represent an unstable evolutionary intermediate. These findings illustrate how transitions between outcrossing and inbreeding preference occur by involving sex determinant linkage and collapse from multiallelic to biallelic sex determination, providing insights into both fungal sex evolution and early steps in sex chromosome evolution. PMID:18723606

  5. Rapid pair-wise synteny analysis of large bacterial genomes using web-based GeneOrder4.0

    PubMed Central

    2010-01-01

    Background The growing whole genome sequence databases necessitate the development of user-friendly software tools to mine these data. Web-based tools are particularly useful to wet-bench biologists as they enable platform-independent analysis of sequence data, without having to perform complex programming tasks and software compiling. Findings GeneOrder4.0 is a web-based "on-the-fly" synteny and gene order analysis tool for comparative bacterial genomics (ca. 8 Mb). It enables the visualization of synteny by plotting protein similarity scores between two genomes and it also provides visual annotation of "hypothetical" proteins from older archived genomes based on more recent annotations. Conclusions The web-based software tool GeneOrder4.0 is a user-friendly application that has been updated to allow the rapid analysis of synteny and gene order in large bacterial genomes. It is developed with the wet-bench researcher in mind. PMID:20178631

  6. Genomic mechanisms underlying PARK2 large deletions identified in a cohort of patients with PD

    PubMed Central

    Morais, Sara; Bastos-Ferreira, Rita; Sequeiros, Jorge

    2016-01-01

    Objectives: To identify the genomic mechanisms that result in PARK2 large gene deletions. Methods: We conducted mutation screening using PCR amplification of PARK2-coding regions and exon-intron boundaries, followed by sequencing to evaluate a large series of 244 unrelated Portuguese patients with symptoms of Parkinson disease. For the detection of large gene rearrangements, we performed multiplex ligation-dependent probe amplification, followed by long-range PCR and sequencing to map deletion breakpoints. Results: We identified biallelic pathogenic parkin mutations in 40 of the 244 patients. There were 18 different mutations, some of them novel. This study included mapping of 17 deletion breakpoints showing that nonhomologous end joining is the most common mechanism responsible for these gene rearrangements. None of these deletion breakpoints were previously described, and only one was present in 2 unrelated families, indicating that most of the deletions result from independent events. Conclusions: The c.155delA mutation is highly prevalent in the Portuguese population (62.5% of the cases). Large deletions were present in 42.5% of the patients. We present the largest study on the molecular mechanisms that mediate PARK2 deletions in a homogeneous population. PMID:27182553

  7. Differentially expressed genes match bill morphology and plumage despite largely undifferentiated genomes in a Holarctic songbird.

    PubMed

    Mason, Nicholas A; Taylor, Scott A

    2015-06-01

    Understanding the patterns and processes that contribute to phenotypic diversity and speciation is a central goal of evolutionary biology. Recently, high-throughput sequencing has provided unprecedented phylogenetic resolution in many lineages that have experienced rapid diversification. The Holarctic redpoll finches (Genus: Acanthis) provide an intriguing example of a recent, phenotypically diverse lineage; traditional sequencing and genotyping methods have failed to detect any genetic differences between currently recognized species, despite marked variation in plumage and morphology within the genus. We examined variation among 20 712 anonymous single nucleotide polymorphisms (SNPs) distributed throughout the redpoll genome in combination with 215 825 SNPs within the redpoll transcriptome, gene expression data and ecological niche modelling to evaluate genetic and ecological differentiation among currently recognized species. Expanding upon previous findings, we present evidence of (i) largely undifferentiated genomes among currently recognized species; (ii) substantial niche overlap across the North American Acanthis range; and (iii) a strong relationship between polygenic patterns of gene expression and continuous phenotypic variation within a sample of redpolls from North America. The patterns we report may be caused by high levels of ongoing gene flow between polymorphic populations, incomplete lineage sorting accompanying very recent or ongoing divergence, variation in cis-regulatory elements, or phenotypic plasticity, but do not support a scenario of prolonged isolation and subsequent secondary contact. Together, these findings highlight ongoing theoretical and computational challenges presented by recent, rapid bouts of phenotypic diversification and provide new insight into the evolutionary dynamics of an intriguing, understudied non-model system. PMID:25735539

  8. Large-scale analysis of tandem repeat variability in the human genome

    PubMed Central

    Duitama, Jorge; Zablotskaya, Alena; Gemayel, Rita; Jansen, An; Belet, Stefanie; Vermeesch, Joris R.; Verstrepen, Kevin J.; Froyen, Guy

    2014-01-01

    Tandem repeats are short DNA sequences that are repeated head-to-tail with a propensity to be variable. They constitute a significant proportion of the human genome, also occurring within coding and regulatory regions. Variation in these repeats can alter the function and/or expression of genes allowing organisms to swiftly adapt to novel environments. Importantly, some repeat expansions have also been linked to certain neurodegenerative diseases. Therefore, accurate sequencing of tandem repeats could contribute to our understanding of common phenotypic variability and might uncover missing genetic factors in idiopathic clinical conditions. However, despite long-standing evidence for the functional role of repeats, they are largely ignored because of technical limitations in sequencing, mapping and typing. Here, we report on a novel capture technique and data filtering protocol that allowed simultaneous sequencing of thousands of tandem repeats in the human genomes of a three generation family using GS-FLX-plus Titanium technology. Our results demonstrated that up to 7.6% of tandem repeats in this family (4% in coding sequences) differ from the reference sequence, and identified a de novo variation in the family tree. The method opens new routes to look at this underappreciated type of genetic variability, including the identification of novel disease-related repeats. PMID:24682812

  9. The Exceptionally Large Chloroplast Genome of the Green Alga Floydiella terrestris Illuminates the Evolutionary History of the Chlorophyceae

    PubMed Central

    Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique

    2010-01-01

    The Chlorophyceae, an advanced class of chlorophyte green algae, comprises five lineages that form two major clades (Chlamydomonadales + Sphaeropleales and Oedogoniales + Chaetopeltidales + Chaetophorales). The four complete chloroplast DNA (cpDNA) sequences currently available for chlorophyceans uncovered an extraordinarily fluid genome architecture as well as many structural features distinguishing this group from other green algae. We report here the 521,168-bp cpDNA sequence from a member of the Chaetopeltidales (Floydiella terrestris), the sole chlorophycean lineage not previously sampled for chloroplast genome analysis. This genome, which contains 97 conserved genes and 26 introns (19 group I and 7 group II introns), is the largest chloroplast genome ever sequenced. Intergenic regions account for 77.8% of the genome size and are populated by short repeats. Numerous genomic features are shared with the cpDNA of the chaetophoralean Stigeoclonium helveticum, notably the absence of a large inverted repeat and the presence of unique gene clusters and trans-spliced group II introns. Although only one of the Floydiella group I introns encodes a homing endonuclease gene, our finding of five free-standing reading frames having similarity with such genes suggests that chloroplast group I introns endowed with mobility were once more abundant in the Floydiella lineage. Parsimony analysis of structural genomic features and phylogenetic analysis of chloroplast sequence data unambiguously resolved the Oedogoniales as sister to the Chaetopeltidales and Chaetophorales. An evolutionary scenario of the molecular events that shaped the chloroplast genome in the Chlorophyceae is presented. PMID:20624729

  10. Rapidly Registering Identity-by-Descent Across Ancestral Recombination Graphs.

    PubMed

    Yang, Shuo; Carmi, Shai; Pe'er, Itsik

    2016-06-01

    The genomes of remotely related individuals occasionally contain long segments that are identical by descent (IBD). Sharing of IBD segments has many applications in population and medical genetics, and it is thus desirable to study their properties in simulations. However, no current method provides a direct, efficient means to extract IBD segments from simulated genealogies. Here, we introduce computationally efficient approaches to extract ground-truth IBD segments from a sequence of genealogies, or equivalently, an ancestral recombination graph. Specifically, we use a two-step scheme, where we first identify putative shared segments by comparing the common ancestors of all pairs of individuals at some distance apart. This reduces the search space considerably, and we then proceed by determining the true IBD status of the candidate segments. Under some assumptions and when allowing a limited resolution of segment lengths, our run-time complexity is reduced from O(n(3) log n) for the naïve algorithm to O(n log n), where n is the number of individuals in the sample. PMID:27104872

  11. The Bimodal Distribution of Genic GC Content Is Ancestral to Monocot Species

    PubMed Central

    Clément, Yves; Fustier, Margaux-Alison; Nabholz, Benoit; Glémin, Sylvain

    2015-01-01

    In grasses such as rice or maize, the distribution of genic GC content is well known to be bimodal. It is mainly driven by GC content at third codon positions (GC3 for short). This feature is thought to be specific to grasses as closely related species like banana have a unimodal GC3 distribution. GC3 is associated with numerous genomics features and uncovering the origin of this peculiar distribution will help understanding the potential roles and consequences of GC3 variations within and between genomes. Until recently, the origin of the peculiar GC3 distribution in grasses has remained unknown. Thanks to the recent publication of several complete genomes and transcriptomes of nongrass monocots, we studied more than 1,000 groups of one-to-one orthologous genes in seven grasses and three outgroup species (banana, palm tree, and yam). Using a maximum likelihood-based method, we reconstructed GC3 at several ancestral nodes. We found that the bimodal GC3 distribution observed in extant grasses is ancestral to both grasses and most monocot species, and that other species studied here have lost this peculiar structure. We also found that GC3 in grass lineages is globally evolving very slowly and that the decreasing GC3 gradient observed from 5′ to 3′ along coding sequences is also conserved and ancestral to monocots. This result strongly challenges the previous views on the specificity of grass genomes and we discuss its implications for the possible causes of the evolution of GC content in monocots. PMID:25527839

  12. Software engineering the mixed model for genome-wide association studies on large samples.

    PubMed

    Zhang, Zhiwu; Buckler, Edward S; Casstevens, Terry M; Bradbury, Peter J

    2009-11-01

    Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development. PMID:19933212

  13. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  14. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-08-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  15. Physical mapping in large genomes: accelerating anchoring of BAC contigs to genetic maps through in silico analysis.

    PubMed

    Paux, Etienne; Legeai, Fabrice; Guilhot, Nicolas; Adam-Blondon, Anne-Françoise; Alaux, Michaël; Salse, Jérôme; Sourdille, Pierre; Leroy, Philippe; Feuillet, Catherine

    2008-02-01

    Anchored physical maps represent essential frameworks for map-based cloning, comparative genomics studies, and genome sequencing projects. High throughput anchoring can be achieved by polymerase chain reaction (PCR) screening of bacterial artificial chromosome (BAC) library pools with molecular markers. However, for large genomes such as wheat, the development of high dimension pools and the number of reactions that need to be performed can be extremely large making the screening laborious and costly. To improve the cost efficiency of anchoring in such large genomes, we have developed a new software named Elephant (electronic physical map anchoring tool) that combines BAC contig information generated by FingerPrinted Contig with results of BAC library pools screening to identify BAC addresses with a minimal amount of PCR reactions. Elephant was evaluated during the construction of a physical map of chromosome 3B of hexaploid wheat. Results show that a one dimensional pool screening can be sufficient to anchor a BAC contig while reducing the number of PCR by 384-fold thereby demonstrating that Elephant is an efficient and cost-effective tool to support physical mapping in large genomes. PMID:18038165

  16. The genomic and physical organization of Ty1-copia-like sequences as a component of large genomes in Pinus elliottii var. elliottii and other gymnosperms.

    PubMed Central

    Kamm, A; Doudrick, R L; Heslop-Harrison, J S; Schmidt, T

    1996-01-01

    A DNA sequence, TPE1, representing the internal domain of a Ty1-copia retroelement, was isolated from genomic DNA of Pinus elliottii Engelm. var. elliottii (slash pine). Genomic Southern analysis showed that this sequence, carrying partial reverse transcriptase and integrase gene sequences, is highly amplified within the genome of slash pine and part of a dispersed element >4.8 kbp. Fluorescent in situ hybridization to metaphase chromosomes shows that the element is relatively uniformly dispersed over all 12 chromosome pairs and is highly abundant in the genome. It is largely excluded from centromeric regions and intercalary chromosomal sites representing the 18S-5.8S-25S rRNA genes. Southern hybridization with specific DNA probes for the reverse transcriptase gene shows that TPE1 represents a large subgroup of heterogeneous Ty1-copia retrotransposons in Pinus species. Because no TPE1 transcription could be detected, it is most likely an inactive element--at least in needle tissue. Further evidence for inactivity was found in recombinant reverse transcriptase and integrase sequences. The distribution of TPE1 within different gymnosperms that contain Ty1-copia group retrotransposons, as shown by a PCR assay, was investigated by Southern hybridization. The TPE1 family is highly amplified and conserved in all Pinus species analyzed, showing a similar genomic organization in the three- and five-needle pine species investigated. It is also present in spruce, bald cypress (swamp cypress), and in gingko but in fewer copies and a different genomic organization. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 PMID:8610105

  17. Evo-Devo: Variations on Ancestral Themes

    PubMed Central

    De Robertis, E.M.

    2008-01-01

    Most animals evolved from a common ancestor, Urbilateria, which already had in place the developmental genetic networks for shaping body plans. Comparative genomics has revealed rather unexpectedly that many of the genes present in bilaterian animal ancestors were lost by individual phyla during evolution. Reconstruction of the archetypal developmental genomic tool-kit present in Urbilateria will help to elucidate the contribution of gene loss and developmental constraints to the evolution of animal body plans. PMID:18243095

  18. Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction.

    PubMed

    De Coninck, Arne; De Baets, Bernard; Kourounis, Drosos; Verbosio, Fabio; Schenk, Olaf; Maenhout, Steven; Fostier, Jan

    2016-05-01

    Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions. PMID:26936924

  19. Diversity and relationships of cocirculating modern human rotaviruses revealed using large-scale comparative genomics.

    PubMed

    McDonald, Sarah M; McKell, Allison O; Rippinger, Christine M; McAllen, John K; Akopov, Asmik; Kirkness, Ewen F; Payne, Daniel C; Edwards, Kathryn M; Chappell, James D; Patton, John T

    2012-09-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  20. Diversity and Relationships of Cocirculating Modern Human Rotaviruses Revealed Using Large-Scale Comparative Genomics

    PubMed Central

    McKell, Allison O.; Rippinger, Christine M.; McAllen, John K.; Akopov, Asmik; Kirkness, Ewen F.; Payne, Daniel C.; Edwards, Kathryn M.; Chappell, James D.; Patton, John T.

    2012-01-01

    Group A rotaviruses (RVs) are 11-segmented, double-stranded RNA viruses and are primary causes of gastroenteritis in young children. Despite their medical relevance, the genetic diversity of modern human RVs is poorly understood, and the impact of vaccine use on circulating strains remains unknown. In this study, we report the complete genome sequence analysis of 58 RVs isolated from children with severe diarrhea and/or vomiting at Vanderbilt University Medical Center (VUMC) in Nashville, TN, during the years spanning community vaccine implementation (2005 to 2009). The RVs analyzed include 36 G1P[8], 18 G3P[8], and 4 G12P[8] Wa-like genogroup 1 strains with VP6-VP1-VP2-VP3-NSP1-NSP2-NSP3-NSP4-NSP5/6 genotype constellations of I1-R1-C1-M1-A1-N1-T1-E1-H1. By constructing phylogenetic trees, we identified 2 to 5 subgenotype alleles for each gene. The results show evidence of intragenogroup gene reassortment among the cocirculating strains. However, several isolates from different seasons maintained identical allele constellations, consistent with the notion that certain RV clades persisted in the community. By comparing the genes of VUMC RVs to those of other archival and contemporary RV strains for which sequences are available, we defined phylogenetic lineages and verified that the diversity of the strains analyzed in this study reflects that seen in other regions of the world. Importantly, the VP4 and VP7 proteins encoded by VUMC RVs and other contemporary strains show amino acid changes in or near neutralization domains, which might reflect antigenic drift of the virus. Thus, this large-scale, comparative genomic study of modern human RVs provides significant insight into how this pathogen evolves during its spread in the community. PMID:22696651

  1. Comparative genomics of protoploid Saccharomycetaceae.

    PubMed

    Souciet, Jean-Luc; Dujon, Bernard; Gaillardin, Claude; Johnston, Mark; Baret, Philippe V; Cliften, Paul; Sherman, David J; Weissenbach, Jean; Westhof, Eric; Wincker, Patrick; Jubin, Claire; Poulain, Julie; Barbe, Valérie; Ségurens, Béatrice; Artiguenave, François; Anthouard, Véronique; Vacherie, Benoit; Val, Marie-Eve; Fulton, Robert S; Minx, Patrick; Wilson, Richard; Durrens, Pascal; Jean, Géraldine; Marck, Christian; Martin, Tiphaine; Nikolski, Macha; Rolland, Thomas; Seret, Marie-Line; Casarégola, Serge; Despons, Laurence; Fairhead, Cécile; Fischer, Gilles; Lafontaine, Ingrid; Leh, Véronique; Lemaire, Marc; de Montigny, Jacky; Neuvéglise, Cécile; Thierry, Agnès; Blanc-Lenfle, Isabelle; Bleykasten, Claudine; Diffels, Julie; Fritsch, Emilie; Frangeul, Lionel; Goëffon, Adrien; Jauniaux, Nicolas; Kachouri-Lafond, Rym; Payen, Célia; Potier, Serge; Pribylova, Lenka; Ozanne, Christophe; Richard, Guy-Franck; Sacerdot, Christine; Straub, Marie-Laure; Talla, Emmanuel

    2009-10-01

    Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call "protoploid" because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified. PMID:19525356

  2. Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq.

    PubMed

    Karam, M-J; Lefèvre, F; Dagher-Kharrat, M Bou; Pinosio, S; Vendramin, G G

    2015-05-01

    We combined restriction site associated DNA sequencing (RADseq) using a hypomethylation-sensitive enzyme and messenger RNA sequencing (mRNAseq) to develop molecular markers for the 16 gigabase genome of Cedrus atlantica, a conifer tree species. With each method, Illumina(®) reads from one individual were used to generate de novo assemblies. SNPs from the RADseq data set were detected in a panel of one single individual and three pools of three individuals each. We developed a flexible script to estimate the ascertainment bias in SNP detection considering the pooling and sampling effects on the probability of not detecting an existing polymorphism. Gene Ontology (GO) and transposable element (TE) search analyses were applied to both data sets. The RADseq and the mRNAseq assemblies represented 0.1% and 0.6% of the genome, respectively. Genome complexity reduction resulted in 17% of the RADseq contigs potentially coding for proteins. This rate was doubled in the mRNAseq data set, suggesting that RADseq also explores noncoding low-repeat regions. The two methods gave very similar GO-slim profiles. As expected, the two assemblies were poor in TE-like sequences (<4% of contigs length). We identified 17,348 single nucleotide polymorphisms (SNPs) in the RADseq data set and 5,714 simple sequence repeats (SSRs) in the transcriptome. A subset of 282 SNPs was validated using the Fluidigm genotyping technology, giving a conversion rate of 50.4%, falling within the expected range for conifers. Increasing sample size had the greatest effect for ascertainment bias reduction. These results validated the utility of the RADseq approach for highly complex genomes such as conifers. PMID:25224750

  3. Genome Reduction Uncovers a Large Dispensable Genome and Adaptive Role for Copy Number Variation in Asexually Propagated Solanum tuberosum.

    PubMed

    Hardigan, Michael A; Crisovan, Emily; Hamilton, John P; Kim, Jeongwoon; Laimbeer, Parker; Leisner, Courtney P; Manrique-Carpintero, Norma C; Newton, Linsey; Pham, Gina M; Vaillancourt, Brieanne; Yang, Xueming; Zeng, Zixian; Douches, David S; Jiang, Jiming; Veilleux, Richard E; Buell, C Robin

    2016-02-01

    Clonally reproducing plants have the potential to bear a significantly greater mutational load than sexually reproducing species. To investigate this possibility, we examined the breadth of genome-wide structural variation in a panel of monoploid/doubled monoploid clones generated from native populations of diploid potato (Solanum tuberosum), a highly heterozygous asexually propagated plant. As rare instances of purely homozygous clones, they provided an ideal set for determining the degree of structural variation tolerated by this species and deriving its minimal gene complement. Extensive copy number variation (CNV) was uncovered, impacting 219.8 Mb (30.2%) of the potato genome with nearly 30% of genes subject to at least partial duplication or deletion, revealing the highly heterogeneous nature of the potato genome. Dispensable genes (>7000) were associated with limited transcription and/or a recent evolutionary history, with lower deletion frequency observed in genes conserved across angiosperms. Association of CNV with plant adaptation was highlighted by enrichment in gene clusters encoding functions for environmental stress response, with gene duplication playing a part in species-specific expansions of stress-related gene families. This study revealed unique impacts of CNV in a species with asexual reproductive habits and how CNV may drive adaption through evolution of key stress pathways. PMID:26772996

  4. Reconstruction of Oomycete Genome Evolution Identifies Differences in Evolutionary Trajectories Leading to Present-Day Large Gene Families

    PubMed Central

    Seidl, Michael F.; Van den Ackerveken, Guido; Govers, Francine; Snel, Berend

    2012-01-01

    The taxonomic class of oomycetes contains numerous pathogens of plants and animals but is related to nonpathogenic diatoms and brown algae. Oomycetes have flexible genomes comprising large gene families that play roles in pathogenicity. The evolutionary processes that shaped the gene content have not yet been studied by applying systematic tree reconciliation of the phylome of these species. We analyzed evolutionary dynamics of ten Stramenopiles. Gene gains, duplications, and losses were inferred by tree reconciliation of 18,459 gene trees constituting the phylome with a highly supported species phylogeny. We reconstructed a strikingly large last common ancestor of the Stramenopiles that contained ∼10,000 genes. Throughout evolution, the genomes of pathogenic oomycetes have constantly gained and lost genes, though gene gains through duplications outnumber the losses. The branch leading to the plant pathogenic Phytophthora genus was identified as a major transition point characterized by increased frequency of duplication events that has likely driven the speciation within this genus. Large gene families encoding different classes of enzymes associated with pathogenicity such as glycoside hydrolases are formed by complex and distinct patterns of duplications and losses leading to their expansion in extant oomycetes. This study unveils the large-scale evolutionary dynamics that shaped the genomes of pathogenic oomycetes. By the application of phylogenetic based analyses methods, it provides additional insights that shed light on the complex history of oomycete genome evolution and the emergence of large gene families characteristic for this important class of pathogens. PMID:22230142

  5. Comparative analysis of the primate X-inactivation center region and reconstruction of the ancestral primate XIST locus

    PubMed Central

    Horvath, Julie E.; Sheedy, Christina B.; Merrett, Stephanie L.; Diallo, Abdoulaye Banire; Swofford, David L.; NISC Comparative Sequencing Program; Green, Eric D.; Willard, Huntington F.

    2011-01-01

    Here we provide a detailed comparative analysis across the candidate X-Inactivation Center (XIC) region and the XIST locus in the genomes of six primates and three mammalian outgroup species. Since lemurs and other strepsirrhine primates represent the sister lineage to all other primates, this analysis focuses on lemurs to reconstruct the ancestral primate sequences and to gain insight into the evolution of this region and the genes within it. This comparative evolutionary genomics approach reveals significant expansion in genomic size across the XIC region in higher primates, with minimal size alterations across the XIST locus itself. Reconstructed primate ancestral XIC sequences show that the most dramatic changes during the past 80 million years occurred between the ancestral primate and the lineage leading to Old World monkeys. In contrast, the XIST locus compared between human and the primate ancestor does not indicate any dramatic changes to exons or XIST-specific repeats; rather, evolution of this locus reflects small incremental changes in overall sequence identity and short repeat insertions. While this comparative analysis reinforces that the region around XIST has been subject to significant genomic change, even among primates, our data suggest that evolution of the XIST sequences themselves represents only small lineage-specific changes across the past 80 million years. PMID:21518738

  6. Large Genomic Fragment Deletions and Insertions in Mouse Using CRISPR/Cas9

    PubMed Central

    Satheka, Achim Cchitvsanzwhoh; Togo, Jacques; An, Yao; Humphrey, Mabwi; Ban, Luying; Ji, Yan; Jin, Honghong; Feng, Xuechao; Zheng, Yaowu

    2015-01-01

    ZFN, TALENs and CRISPR/Cas9 system have been used to generate point mutations and large fragment deletions and insertions in genomic modifications. CRISPR/Cas9 system is the most flexible and fast developing technology that has been extensively used to make mutations in all kinds of organisms. However, the most mutations reported up to date are small insertions and deletions. In this report, CRISPR/Cas9 system was used to make large DNA fragment deletions and insertions, including entire Dip2a gene deletion, about 65kb in size, and β-galactosidase (lacZ) reporter gene insertion of larger than 5kb in mouse. About 11.8% (11/93) are positive for 65kb deletion from transfected and diluted ES clones. High targeting efficiencies in ES cells were also achieved with G418 selection, 46.2% (12/26) and 73.1% (19/26) for left and right arms respectively. Targeted large fragment deletion efficiency is about 21.4% of live pups or 6.0% of injected embryos. Targeted insertion of lacZ reporter with NEO cassette showed 27.1% (13/48) of targeting rate by ES cell transfection and 11.1% (2/18) by direct zygote injection. The procedures have bypassed in vitro transcription by directly co-injection of zygotes or co-transfection of embryonic stem cells with circular plasmid DNA. The methods are technically easy, time saving, and cost effective in generating mouse models and will certainly facilitate gene function studies. PMID:25803037

  7. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. PMID:27342254

  8. Physical mapping of a large plant genome using global high-information content fingerprinting: a distal region of wheat chromosome 3DS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of wheat. We report the use of the Ae. tauschii, the diploid ancestor of the wheat D genome, for the construction of t...

  9. Physical mapping of a large plant genome using global high-information-content-fingerprinting: the distal region of the wheat ancestor Aegilops tauschii chromosome 3DS.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Physical maps employing libraries of bacterial artificial chromosome (BAC) clones are essential for comparative genomics and sequencing of large and repetitive genomes such as those of the hexaploid bread wheat. The diploid ancestor of wheat genome, Aegilops tauschii, is used as a resource for wheat...

  10. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  11. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  12. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  13. Draft Genome Sequence of Rheinheimera sp. F8, a Biofilm-Forming Strain Which Produces Large Amounts of Extracellular DNA

    PubMed Central

    Szewzyk, Ulrich

    2016-01-01

    Rheinheimera sp. strain F8 is a biofilm-forming gammaproteobacterium that has been found to produce large amounts of filamentous extracellular DNA. Here, we announce the de novo assembly of its genome. It is estimated to be 4,464,511 bp in length, with 3,970 protein-coding sequences and 92 RNA-coding sequences. PMID:26966195

  14. An Integrative Approach for the Large-scale Identification of Human Genome Kinases Regulating Cancer Metastasis

    PubMed Central

    Zhang, Hanshuo; Wu, Pu-Yen; Ma, Ming; Ye, Yanzheng; Hao, Yang; Yang, Junyu; Yin, Shenyi; Sun, Changhong; Phan, John H.; Wang, May D.; Xi, Jianzhong Jeff

    2016-01-01

    Kinases regulate the majority of biological processes and become one of important groups of drug targets. To identify more kinases being potential for cancer therapy, we developed an integrative approach for the large-scale screen of functional genes capable of regulating the main traits of cancer metastasis, including cell migration as well as invasion. We first employed self-assembled cell microarray (SAMcell) to screen functional genes that regulate cancer cell migration using a siRNA library targeting 710 human genome kinase genes. We identified 81 genes capable of significantly regulating cancer cell migration. Following with invasion assays and bio-informatics analysis, we discovered that 16 genes with differentially expression in cancer samples can regulate both cell migration and invasion, among which 10 genes have been well known to play critical roles in the cancer development. The remaining 6 genes were experimentally validated to have the capacities of regulating the metastasis-related traits, including cell proliferation, apoptosis and anoikis activities besides cell motility. Together, these findings provide a new insight into the therapeutic use of human kinases. PMID:23751374

  15. Complete mitochondrial DNA sequence of the ark shell Scapharca broughtonii: an ultra-large metazoan mitochondrial genome.

    PubMed

    Liu, Yun-Guo; Kurokawa, Tadahide; Sekino, Masashi; Tanabe, Toru; Watanabe, Kazuhito

    2013-03-01

    The complete mitochondrial (mt) genome of the ark shell Scapharca broughtonii was determined using long PCR and a genome walking sequencing strategy with genus-specific primers. The S. broughtonii mt genome (GenBank accession number AB729113) contained 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 2 ribosomal RNA genes, and 42 transfer tRNA genes, in a length of 46,985 nucleotides for the size of mtDNA with only one copy of the heteroplasmic tandem repeat (HTR) unit. Moreover the S. broughtonii mt genome shows size variation; these genomes ranged in size from about 47 kb to about 50 kb because of variation in the number of repeat sequences in the non-coding region. The mt-genome of S. broughtonii is, to date, the longest reported metazoan mtDNA sequence. Sequence duplication in non-coding region and the formation of HTR arrays were two of the factors responsible for the ultra-large size of this mt genome. All the tRNA genes were found within the S. broughtonii mt genome, unlike the other bivalves usually lacking one or more tRNA genes. Twelve additional specimens were used to analyze the patterns of tandem repeat arrays by PCR amplification and agarose electrophoresis. Each of the 12 specimens displayed extensive heteroplasmy and had 8-10 length variants. The motifs of the HTR arrays are about 353-362 bp and the number of repeats ranges from 1 to 11. PMID:23291309

  16. The Korarchaeota: Archaeal orphans representing an ancestral lineage of life

    SciTech Connect

    Elkins, James G.; Kunin, Victor; Anderson, Iain; Barry, Kerrie; Goltsman, Eugene; Lapidus, Alla; Hedlund, Brian; Hugenholtz, Phil; Kyrpides, Nikos; Graham, David; Keller, Martin; Wanner, Gerhard; Richardson, Paul; Stetter, Karl O.

    2007-05-01

    Based on conserved cellular properties, all life on Earth can be grouped into different phyla which belong to the primary domains Bacteria, Archaea, and Eukarya. However, tracing back their evolutionary relationships has been impeded by horizontal gene transfer and gene loss. Within the Archaea, the kingdoms Crenarchaeota and Euryarchaeota exhibit a profound divergence. In order to elucidate the evolution of these two major kingdoms, representatives of more deeply diverged lineages would be required. Based on their environmental small subunit ribosomal (ss RNA) sequences, the Korarchaeota had been originally suggested to have an ancestral relationship to all known Archaea although this assessment has been refuted. Here we describe the cultivation and initial characterization of the first member of the Korarchaeota, highly unusual, ultrathin filamentous cells about 0.16 {micro}m in diameter. A complete genome sequence obtained from enrichment cultures revealed an unprecedented combination of signature genes which were thought to be characteristic of either the Crenarchaeota, Euryarchaeota, or Eukarya. Cell division appears to be mediated through a FtsZ-dependent mechanism which is highly conserved throughout the Bacteria and Euryarchaeota. An rpb8 subunit of the DNA-dependent RNA polymerase was identified which is absent from other Archaea and has been described as a eukaryotic signature gene. In addition, the representative organism possesses a ribosome structure typical for members of the Crenarchaeota. Based on its gene complement, this lineage likely diverged near the separation of the two major kingdoms of Archaea. Further investigations of these unique organisms may shed additional light onto the evolution of extant life.

  17. Evidence for an Ancestral Association of Human Coronavirus 229E with Bats

    PubMed Central

    Corman, Victor Max; Baldwin, Heather J.; Tateno, Adriana Fumie; Zerbinati, Rodrigo Melim; Annan, Augustina; Owusu, Michael; Nkrumah, Evans Ewald; Maganga, Gael Darren; Oppong, Samuel; Adu-Sarkodie, Yaw; Vallo, Peter; da Silva Filho, Luiz Vicente Ribeiro Ferreira; Leroy, Eric M.; Thiel, Volker; van der Hoek, Lia; Poon, Leo L. M.; Tschapka, Marco

    2015-01-01

    ABSTRACT We previously showed that close relatives of human coronavirus 229E (HCoV-229E) exist in African bats. The small sample and limited genomic characterizations have prevented further analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana for HCoV-229E-related viruses by reverse transcription-PCR (RT-PCR). Only hipposiderid bats tested positive. To compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and diagnostic specimens sampled globally over 10 years. Bat viruses were 5- and 6-fold more diversified than HCoV-229E in the RNA-dependent RNA polymerase (RdRp) and spike genes. In phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with animal viruses. Bat viruses formed three large clades in close and more distant sister relationships. A recently described 229E-related alpaca virus occupied an intermediate phylogenetic position between bat and human viruses. According to taxonomic criteria, human, alpaca, and bat viruses form a single CoV species showing evidence for multiple recombination events. HCoV-229E and the alpaca virus showed a major deletion in the spike S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3′ end. ORF8 also existed in the 229E-related alpaca virus. Reanalysis of HCoV-229E sequences showed a conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as intermediate hosts preceding the establishment of HCoV-229E. IMPORTANCE The ancestral origins of major human coronaviruses (HCoVs) likely involve bat hosts. Here, we provide conclusive genetic evidence for an evolutionary origin of the common cold virus HCoV-229E in

  18. Characterisation of monotreme caseins reveals lineage-specific expansion of an ancestral casein locus in mammals.

    PubMed

    Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R

    2009-01-01

    Using a milk-cell cDNA sequencing approach we characterised milk-protein sequences from two monotreme species, platypus (Ornithorhynchus anatinus) and echidna (Tachyglossus aculeatus) and found a full set of caseins and casein variants. The genomic organisation of the platypus casein locus is compared with other mammalian genomes, including the marsupial opossum and several eutherians. Physical linkage of casein genes has been seen in the casein loci of all mammalian genomes examined and we confirm that this is also observed in platypus. However, we show that a recent duplication of beta-casein occurred in the monotreme lineage, as opposed to more ancient duplications of alpha-casein in the eutherian lineage, while marsupials possess only single copies of alpha- and beta-caseins. Despite this variability, the close proximity of the main alpha- and beta-casein genes in an inverted tail-tail orientation and the relative orientation of the more distant kappa-casein genes are similar in all mammalian genome sequences so far available. Overall, the conservation of the genomic organisation of the caseins indicates the early, pre-monotreme development of the fundamental role of caseins during lactation. In contrast, the lineage-specific gene duplications that have occurred within the casein locus of monotremes and eutherians but not marsupials, which may have lost part of the ancestral casein locus, emphasises the independent selection on milk provision strategies to the young, most likely linked to different developmental strategies. The monotremes therefore provide insight into the ancestral drivers for lactation and how these have adapted in different lineages. PMID:19874726

  19. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  20. Selection for Unequal Densities of Sigma70 Promoter-like Signalsin Different Regions of Large Bacterial Genomes

    SciTech Connect

    Huerta, Araceli M.; Francino, M. Pilar; Morett, Enrique; Collado-Vides, Julio

    2006-03-01

    distribution of promoter-like signals between regulatory and nonregulatory regions detected in large bacterial genomes confers a significant, although small, fitness advantage. This study paves the way for further identification of the specific types of selective constraints that affect the organization of regulatory regions and the overall distribution of promoter-like signals through more detailed comparative analyses among closely-related bacterial genomes.

  1. Obligate Insect Endosymbionts Exhibit Increased Ortholog Length Variation and Loss of Large Accessory Proteins Concurrent with Genome Shrinkage

    PubMed Central

    Kenyon, Laura J.; Sabree, Zakee L.

    2014-01-01

    Extreme genome reduction has been observed in obligate intracellular insect mutualists and is an assumed consequence of fixed, long-term host isolation. Rapid accumulation of mutations and pseudogenization of genes no longer vital for an intracellular lifestyle, followed by deletion of many genes, are factors that lead to genome reduction. Size reductions in individual genes due to small-scale deletions have also been implicated in contributing to overall genome shrinkage. Conserved protein functional domains are expected to exhibit low tolerance for mutations and therefore remain relatively unchanged throughout protein length reduction while nondomain regions, presumably under less selective pressures, would shorten. This hypothesis was tested using orthologous protein sets from the Flavobacteriaceae (phylum: Bacteroidetes) and Enterobacteriaceae (subphylum: Gammaproteobacteria) families, each of which includes some of the smallest known genomes. Upon examination of protein, functional domain, and nondomain region lengths, we found that proteins were not uniformly shrinking with genome reduction, but instead increased length variability and variability was observed in both the functional domain and nondomain regions. Additionally, as complete gene loss also contributes to overall genome shrinkage, we found that the largest proteins in the proteomes of nonhost-restricted bacteroidetial and gammaproteobacterial species often were inferred to be involved in secondary metabolic processes, extracellular sensing, or of unknown function. These proteins were absent in the proteomes of obligate insect endosymbionts. Therefore, loss of genes encoding large proteins not required for host-restricted lifestyles in obligate endosymbiont proteomes likely contributes to extreme genome reduction to a greater degree than gene shrinkage. PMID:24671745

  2. Neanderthal and Denisova genetic affinities with contemporary humans: introgression versus common ancestral polymorphisms.

    PubMed

    Lowery, Robert K; Uribe, Gabriel; Jimenez, Eric B; Weiss, Mark A; Herrera, Kristian J; Regueiro, Maria; Herrera, Rene J

    2013-11-01

    Analyses of the genetic relationships among modern humans, Neanderthals and Denisovans have suggested that 1-4% of the non-Sub-Saharan African gene pool may be Neanderthal derived, while 6-8% of the Melanesian gene pool may be the product of admixture between the Denisovans and the direct ancestors of Melanesians. In the present study, we analyzed single nucleotide polymorphism (SNP) diversity among a worldwide collection of contemporary human populations with respect to the genetic constitution of these two archaic hominins and Pan troglodytes (chimpanzee). We partitioned SNPs into subsets, including those that are derived in both archaic lineages, those that are ancestral in both archaic lineages and those that are only derived in one archaic lineage. By doing this, we have conducted separate examinations of subsets of mutations with higher probabilities of divergent phylogenetic origins. While previous investigations have excluded SNPs from common ancestors in principal component analyses, we included common ancestral SNPs in our analyses to visualize the relative placement of the Neanderthal and Denisova among human populations. To assess the genetic similarities among the various hominin lineages, we performed genetic structure analyses to provide a comparison of genetic patterns found within contemporary human genomes that may have archaic or common ancestral roots. Our results indicate that 3.6% of the Neanderthal genome is shared with roughly 65.4% of the average European gene pool, which clinally diminishes with distance from Europe. Our results suggest that Neanderthal genetic associations with contemporary non-Sub-Saharan African populations, as well as the genetic affinities observed between Denisovans and Melanesians most likely result from the retention of ancient mutations in these populations. PMID:23872234

  3. The mammary gland-specific marsupial ELP and eutherian CTI share a common ancestral gene

    PubMed Central

    2012-01-01

    Background The marsupial early lactation protein (ELP) gene is expressed in the mammary gland and the protein is secreted into milk during early lactation (Phase 2A). Mature ELP shares approximately 55.4% similarity with the colostrum-specific bovine colostrum trypsin inhibitor (CTI) protein. Although ELP and CTI both have a single bovine pancreatic trypsin inhibitor (BPTI)-Kunitz domain and are secreted only during the early lactation phases, their evolutionary history is yet to be investigated. Results Tammar ELP was isolated from a genomic library and the fat-tailed dunnart and Southern koala ELP genes cloned from genomic DNA. The tammar ELP gene was expressed only in the mammary gland during late pregnancy (Phase 1) and early lactation (Phase 2A). The opossum and fat-tailed dunnart ELP and cow CTI transcripts were cloned from RNA isolated from the mammary gland and dog CTI from cells in colostrum. The putative mature ELP and CTI peptides shared 44.6%-62.2% similarity. In silico analyses identified the ELP and CTI genes in the other species examined and provided compelling evidence that they evolved from a common ancestral gene. In addition, whilst the eutherian CTI gene was conserved in the Laurasiatherian orders Carnivora and Cetartiodactyla, it had become a pseudogene in others. These data suggest that bovine CTI may be the ancestral gene of the Artiodactyla-specific, rapidly evolving chromosome 13 pancreatic trypsin inhibitor (PTI), spleen trypsin inhibitor (STI) and the five placenta-specific trophoblast Kunitz domain protein (TKDP1-5) genes. Conclusions Marsupial ELP and eutherian CTI evolved from an ancestral therian mammal gene before the divergence of marsupials and eutherians between 130 and 160 million years ago. The retention of the ELP gene in marsupials suggests that this early lactation-specific milk protein may have an important role in the immunologically naïve young of these species. PMID:22681678

  4. Efficient generation of large-scale genome-modified mice using gRNA and CAS9 endonuclease.

    PubMed

    Fujii, Wataru; Kawasaki, Kurenai; Sugiura, Koji; Naito, Kunihiko

    2013-11-01

    The generation of genome-modified animals is a powerful approach to analyze gene functions. The CAS9/guide RNA (gRNA) system is expected to become widely used for the efficient generation of genome-modified animals, but detailed studies on optimum conditions and availability are limited. In the present study, we attempted to generate large-scale genome-modified mice with an optimized CAS9/gRNA system, and confirmed the transmission of these mutations to the next generations. A comparison of different types of gRNA indicated that the target loci of almost all pups were modified successfully by the use of long-type gRNAs with CAS9. We showed that this system has much higher mutation efficiency and much lower off-target effect compared to zinc-finger nuclease. We propose that most of these off-target effects can be avoided by the careful control of CAS9 mRNA concentration and that the genome-modification efficiency depends rather on the gRNA concentration. Under optimized conditions, large-scale (~10 kb) genome-modified mice can be efficiently generated by modifying two loci on a single chromosome using two gRNAs at once in mouse zygotes. In addition, the normal transmission of these CAS9/gRNA-induced mutations to the next generation was confirmed. These results indicate that CAS9/gRNA system can become a highly effective tool for the generation of genome-modified animals. PMID:23997119

  5. Leveraging Large-Scale Cancer Genomics Datasets for Germline Discovery - TCGA

    Cancer.gov

    The session will review how data types have changed over time, focusing on how next-generation sequencing is being employed to yield more precise information about the underlying genomic variation that influences tumor etiology and biology.

  6. A large maize (Zea Mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection for accelerated breeding. We report the establishment of a large SNP array for maize and i...

  7. Minimal genome: Worthwhile or worthless efforts toward being smaller?

    PubMed

    Choe, Donghui; Cho, Suhyung; Kim, Sun Chang; Cho, Byung-Kwan

    2016-02-01

    Microbial cells are versatile hosts for the production of value-added products due to the well-established background knowledge, various genetic tools, and ease of manipulation. Despite those advantages, efficiency of newly incorporated synthetic pathways in microbial cells is frequently limited by innate metabolism, product toxicity, and growth-mediated genetic instability. To overcome those obstacles, a minimal genome harboring only the essential set of genes was proposed, which is a fascinating concept with potential for use as a platform strain. Here, we review the currently available artificial reduced genomes and discuss the prospects for extending use of the genome-reduced strains as programmable chasses. The genome-reduced strains generally showed comparable growth to and higher productivity than their ancestral strains. In Escherichia coli, about 300 genes are estimated as the minimal number of genes under laboratory conditions. However, recent advances revealed that there are non-essential components in essential genes, suggesting that the design principle of minimal genomes should be reconstructed. Current technology is not efficient enough to reduce large amount of interspaced genomic regions or to synthesize the genome. Furthermore, construction of minimal genome frequently has failed due to lack of genomic information. Technological breakthroughs and intense systematic studies on genomes remain tasks. PMID:26356135

  8. Analysis of the bread wheat genome using whole-genome shotgun sequencing.

    PubMed

    Brenchley, Rachel; Spannagl, Manuel; Pfeifer, Matthias; Barker, Gary L A; D'Amore, Rosalinda; Allen, Alexandra M; McKenzie, Neil; Kramer, Melissa; Kerhornou, Arnaud; Bolser, Dan; Kay, Suzanne; Waite, Darren; Trick, Martin; Bancroft, Ian; Gu, Yong; Huo, Naxin; Luo, Ming-Cheng; Sehgal, Sunish; Gill, Bikram; Kianian, Sharyar; Anderson, Olin; Kersey, Paul; Dvorak, Jan; McCombie, W Richard; Hall, Anthony; Mayer, Klaus F X; Edwards, Keith J; Bevan, Michael W; Hall, Neil

    2012-11-29

    Bread wheat (Triticum aestivum) is a globally important crop, accounting for 20 per cent of the calories consumed by humans. Major efforts are underway worldwide to increase wheat production by extending genetic diversity and analysing key traits, and genomic resources can accelerate progress. But so far the very large size and polyploid complexity of the bread wheat genome have been substantial barriers to genome analysis. Here we report the sequencing of its large, 17-gigabase-pair, hexaploid genome using 454 pyrosequencing, and comparison of this with the sequences of diploid ancestral and progenitor genomes. We identified between 94,000 and 96,000 genes, and assigned two-thirds to the three component genomes (A, B and D) of hexaploid wheat. High-resolution synteny maps identified many small disruptions to conserved gene order. We show that the hexaploid genome is highly dynamic, with significant loss of gene family members on polyploidization and domestication, and an abundance of gene fragments. Several classes of genes involved in energy harvesting, metabolism and growth are among expanded gene families that could be associated with crop productivity. Our analyses, coupled with the identification of extensive genetic variation, provide a resource for accelerating gene discovery and improving this major crop. PMID:23192148

  9. Ancestral-derived effects on the mutational landscape of laryngeal cancer.

    PubMed

    Ramakodi, Meganathan P; Kulathinal, Rob J; Chung, Yujin; Serebriiskii, Ilya; Liu, Jeffrey C; Ragin, Camille C

    2016-03-01

    Laryngeal cancer disproportionately affects more African-Americans than European-Americans. Here, we analyze the genome-wide somatic point mutations from the tumors of 13 African-Americans and 57 European-Americans from TCGA to differentiate between environmental and ancestrally-inherited factors. The mean number of mutations was different between African-Americans (151.31) and European-Americans (277.63). Other differences in the overall mutational landscape between African-American and European-American were also found. The frequency of C>A, and C>G were significantly different between the two populations (p-value<0.05). Context nucleotide signatures for some mutation types significantly differ between these two populations. Thus, the context nucleotide signatures along with other factors could be related to the observed mutational landscape differences between two races. Finally, we show that mutated genes associated with these mutational differences differ between the two populations. Thus, at the molecular level, race appears to be a factor in the progression of laryngeal cancer with ancestral genomic signatures best explaining these differences. PMID:26721311

  10. Genome Sequence of the Pathogenic Intestinal Spirochete Brachyspira hyodysenteriae Reveals Adaptations to Its Lifestyle in the Porcine Large Intestine

    PubMed Central

    La, Tom; Ryan, Karon; Moolhuijzen, Paula; Albertyn, Zayed; Shaban, Babak; Motro, Yair; Dunn, David S.; Schibeci, David; Hunter, Adam; Barrero, Roberto; Phillips, Nyree D.; Hampson, David J.

    2009-01-01

    Brachyspira hyodysenteriae is an anaerobic intestinal spirochete that colonizes the large intestine of pigs and causes swine dysentery, a disease of significant economic importance. The genome sequence of B. hyodysenteriae strain WA1 was determined, making it the first representative of the genus Brachyspira to be sequenced, and the seventeenth spirochete genome to be reported. The genome consisted of a circular 3,000,694 base pair (bp) chromosome, and a 35,940 bp circular plasmid that has not previously been described. The spirochete had 2,122 protein-coding sequences. Of the predicted proteins, more had similarities to proteins of the enteric Escherichia coli and Clostridium species than they did to proteins of other spirochetes. Many of these genes were associated with transport and metabolism, and they may have been gradually acquired through horizontal gene transfer in the environment of the large intestine. A reconstruction of central metabolic pathways identified a complete set of coding sequences for glycolysis, gluconeogenesis, a non-oxidative pentose phosphate pathway, nucleotide metabolism, lipooligosaccharide biosynthesis, and a respiratory electron transport chain. A notable finding was the presence on the plasmid of the genes involved in rhamnose biosynthesis. Potential virulence genes included those for 15 proteases and six hemolysins. Other adaptations to an enteric lifestyle included the presence of large numbers of genes associated with chemotaxis and motility. B. hyodysenteriae has diverged from other spirochetes in the process of accommodating to its habitat in the porcine large intestine. PMID:19262690