USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates for selection. Originally these models were developed without considering genotype ' environment interaction (GE). Several authors have proposed extensions of the cannonical GS model that accomm...
Advances and Challenges in Genomic Selection for Disease Resistance.
Poland, Jesse; Rutkoski, Jessica
2016-08-04
Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens, becomes a tractable and powerful approach in breeding programs.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Akgöl, Batuhan; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2017-02-01
Early generation genomic selection is superior to conventional phenotypic selection in line breeding and can be strongly improved by including additional information from preliminary yield trials. The selection of lines that enter resource-demanding multi-environment trials is a crucial decision in every line breeding program as a large amount of resources are allocated for thoroughly testing these potential varietal candidates. We compared conventional phenotypic selection with various genomic selection approaches across multiple years as well as the merit of integrating phenotypic information from preliminary yield trials into the genomic selection framework. The prediction accuracy using only phenotypic data was rather low (r = 0.21) for grain yield but could be improved by modeling genetic relationships in unreplicated preliminary yield trials (r = 0.33). Genomic selection models were nevertheless found to be superior to conventional phenotypic selection for predicting grain yield performance of lines across years (r = 0.39). We subsequently simplified the problem of predicting untested lines in untested years to predicting tested lines in untested years by combining breeding values from preliminary yield trials and predictions from genomic selection models by a heritability index. This genomic assisted selection led to a 20% increase in prediction accuracy, which could be further enhanced by an appropriate marker selection for both grain yield (r = 0.48) and protein content (r = 0.63). The easy to implement and robust genomic assisted selection gave thus a higher prediction accuracy than either conventional phenotypic or genomic selection alone. The proposed method took the complex inheritance of both low and high heritable traits into account and appears capable to support breeders in their selection decisions to develop enhanced varieties more efficiently.
Genomic selection in a commercial winter wheat population.
He, Sang; Schulthess, Albert Wilhelm; Mirdita, Vilson; Zhao, Yusheng; Korzun, Viktor; Bothe, Reiner; Ebmeyer, Erhard; Reif, Jochen C; Jiang, Yong
2016-03-01
Genomic selection models can be trained using historical data and filtering genotypes based on phenotyping intensity and reliability criterion are able to increase the prediction ability. We implemented genomic selection based on a large commercial population incorporating 2325 European winter wheat lines. Our objectives were (1) to study whether modeling epistasis besides additive genetic effects results in enhancement on prediction ability of genomic selection, (2) to assess prediction ability when training population comprised historical or less-intensively phenotyped lines, and (3) to explore the prediction ability in subpopulations selected based on the reliability criterion. We found a 5 % increase in prediction ability when shifting from additive to additive plus epistatic effects models. In addition, only a marginal loss from 0.65 to 0.50 in accuracy was observed using the data collected from 1 year to predict genotypes of the following year, revealing that stable genomic selection models can be accurately calibrated to predict subsequent breeding stages. Moreover, prediction ability was maximized when the genotypes evaluated in a single location were excluded from the training set but subsequently decreased again when the phenotyping intensity was increased above two locations, suggesting that the update of the training population should be performed considering all the selected genotypes but excluding those evaluated in a single location. The genomic prediction ability was substantially higher in subpopulations selected based on the reliability criterion, indicating that phenotypic selection for highly reliable individuals could be directly replaced by applying genomic selection to them. We empirically conclude that there is a high potential to assist commercial wheat breeding programs employing genomic selection approaches.
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
2015-01-01
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Assessing Predictive Properties of Genome-Wide Selection in Soybeans
Xavier, Alencar; Muir, William M.; Rainey, Katy Martin
2016-01-01
Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786
USDA-ARS?s Scientific Manuscript database
In this study, we aimed to (1) predict genomic estimated breeding value (GEBV) for bacterial cold water disease (BCWD) resistance by genotyping training (n=583) and validation samples (n=53) with two genotyping platforms (24K RAD-SNP and 49K SNP) and using different genomic selection (GS) models (Ba...
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture, and traditional family-based breeding programs aimed at improving BCWD resistance have been limited to exploiting only between-family variation. We used genomic selection (GS) models to predict genomic br...
Lado, Bettina; Matus, Ivan; Rodríguez, Alejandra; Inostroza, Luis; Poland, Jesse; Belzile, François; del Pozo, Alejandro; Quincke, Martín; Castro, Marina; von Zitzewitz, Jarislav
2013-12-09
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.
2017-01-01
The consequences of selection at linked sites are multiple and widespread across the genomes of most species. Here, I first review the main concepts behind models of selection and linkage in recombining genomes, present the difficulty in parametrizing these models simply as a reduction in effective population size (Ne) and discuss the predicted impact of recombination rates on levels of diversity across genomes. Arguments are then put forward in favour of using a model of selection and linkage with neutral and deleterious mutations (i.e. the background selection model, BGS) as a sensible null hypothesis for investigating the presence of other forms of selection, such as balancing or positive. I also describe and compare two studies that have generated high-resolution landscapes of the predicted consequences of selection at linked sites in Drosophila melanogaster. Both studies show that BGS can explain a very large fraction of the observed variation in diversity across the whole genome, thus supporting its use as null model. Finally, I identify and discuss a number of caveats and challenges in studies of genetic hitchhiking that have been often overlooked, with several of them sharing a potential bias towards overestimating the evidence supporting recent selective sweeps to the detriment of a BGS explanation. One potential source of bias is the analysis of non-equilibrium populations: it is precisely because models of selection and linkage predict variation in Ne across chromosomes that demographic dynamics are not expected to be equivalent chromosome- or genome-wide. Other challenges include the use of incomplete genome annotations, the assumption of temporally stable recombination landscapes, the presence of genes under balancing selection and the consequences of ignoring non-crossover (gene conversion) recombination events. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’. PMID:29109230
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.
Genome-enabled prediction models for yield related traits in chickpea
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) unlike marker-assisted backcrossing (MABC) predicts breeding values of lines using genome-wide marker profiling and allows selection of lines prior to field-phenotyping, thereby shortening the breeding cycle. A collection of 320 elite breeding lines was selected and phenotyped...
Measuring genomic pre-selection in theory and in practice
USDA-ARS?s Scientific Manuscript database
Potential biases from genomic pre-selection were estimated from actual selection and mating patterns of US Holsteins. Traditional models using only phenotypes and pedigrees do not adjust for average genomic merit of an animal’s parents, progeny, mates, or contemporaries. Positive assortative mating ...
Lado, Bettina; Matus, Ivan; Rodríguez, Alejandra; Inostroza, Luis; Poland, Jesse; Belzile, François; del Pozo, Alejandro; Quincke, Martín; Castro, Marina; von Zitzewitz, Jarislav
2013-01-01
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models. PMID:24082033
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Calus, M P L; de Haas, Y; Veerkamp, R F
2013-10-01
Genomic selection holds the promise to be particularly beneficial for traits that are difficult or expensive to measure, such that access to phenotypes on large daughter groups of bulls is limited. Instead, cow reference populations can be generated, potentially supplemented with existing information from the same or (highly) correlated traits available on bull reference populations. The objective of this study, therefore, was to develop a model to perform genomic predictions and genome-wide association studies based on a combined cow and bull reference data set, with the accuracy of the phenotypes differing between the cow and bull genomic selection reference populations. The developed bivariate Bayesian stochastic search variable selection model allowed for an unbalanced design by imputing residuals in the residual updating scheme for all missing records. The performance of this model is demonstrated on a real data example, where the analyzed trait, being milk fat or protein yield, was either measured only on a cow or a bull reference population, or recorded on both. Our results were that the developed bivariate Bayesian stochastic search variable selection model was able to analyze 2 traits, even though animals had measurements on only 1 of 2 traits. The Bayesian stochastic search variable selection model yielded consistently higher accuracy for fat yield compared with a model without variable selection, both for the univariate and bivariate analyses, whereas the accuracy of both models was very similar for protein yield. The bivariate model identified several additional quantitative trait loci peaks compared with the single-trait models on either trait. In addition, the bivariate models showed a marginal increase in accuracy of genomic predictions for the cow traits (0.01-0.05), although a greater increase in accuracy is expected as the size of the bull population increases. Our results emphasize that the chosen value of priors in Bayesian genomic prediction models are especially important in small data sets. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Genome-wide heterogeneity of nucleotide substitution model fit.
Arbiza, Leonardo; Patricio, Mateus; Dopazo, Hernán; Posada, David
2011-01-01
At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.
Theory of prokaryotic genome evolution.
Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V
2016-10-11
Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.
Genetic signatures of natural selection in a model invasive ascidian
NASA Astrophysics Data System (ADS)
Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin
2017-03-01
Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta.
Ovenden, Ben; Milgate, Andrew; Wade, Len J; Rebetzke, Greg J; Holland, James B
2018-05-31
Abiotic stress tolerance traits are often complex and recalcitrant targets for conventional breeding improvement in many crop species. This study evaluated the potential of genomic selection to predict water-soluble carbohydrate concentration (WSCC), an important drought tolerance trait, in wheat under field conditions. A panel of 358 varieties and breeding lines constrained for maturity was evaluated under rainfed and irrigated treatments across two locations and two years. Whole-genome marker profiles and factor analytic mixed models were used to generate genomic estimated breeding values (GEBVs) for specific environments and environment groups. Additive genetic variance was smaller than residual genetic variance for WSCC, such that genotypic values were dominated by residual genetic effects rather than additive breeding values. As a result, GEBVs were not accurate predictors of genotypic values of the extant lines, but GEBVs should be reliable selection criteria to choose parents for intermating to produce new populations. The accuracy of GEBVs for untested lines was sufficient to increase predicted genetic gain from genomic selection per unit time compared to phenotypic selection if the breeding cycle is reduced by half by the use of GEBVs in off-season generations. Further, genomic prediction accuracy depended on having phenotypic data from environments with strong correlations with target production environments to build prediction models. By combining high-density marker genotypes, stress-managed field evaluations, and mixed models that model simultaneously covariances among genotypes and covariances of complex trait performance between pairs of environments, we were able to train models with good accuracy to facilitate genetic gain from genomic selection. Copyright © 2018 Ovenden et al.
Liabeuf, Debora; Sim, Sung-Chur; Francis, David M
2018-03-01
Bacterial spot affects tomato crops (Solanum lycopersicum) grown under humid conditions. Major genes and quantitative trait loci (QTL) for resistance have been described, and multiple loci from diverse sources need to be combined to improve disease control. We investigated genomic selection (GS) prediction models for resistance to Xanthomonas euvesicatoria and experimentally evaluated the accuracy of these models. The training population consisted of 109 families combining resistance from four sources and directionally selected from a population of 1,100 individuals. The families were evaluated on a plot basis in replicated inoculated trials and genotyped with single nucleotide polymorphisms (SNP). We compared the prediction ability of models developed with 14 to 387 SNP. Genomic estimated breeding values (GEBV) were derived using Bayesian least absolute shrinkage and selection operator regression (BL) and ridge regression (RR). Evaluations were based on leave-one-out cross validation and on empirical observations in replicated field trials using the next generation of inbred progeny and a hybrid population resulting from selections in the training population. Prediction ability was evaluated based on correlations between GEBV and phenotypes (r g ), percentage of coselection between genomic and phenotypic selection, and relative efficiency of selection (r g /r p ). Results were similar with BL and RR models. Models using only markers previously identified as significantly associated with resistance but weighted based on GEBV and mixed models with markers associated with resistance treated as fixed effects and markers distributed in the genome treated as random effects offered greater accuracy and a high percentage of coselection. The accuracy of these models to predict the performance of progeny and hybrids exceeded the accuracy of phenotypic selection.
Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays
Berg, Jeremy J.; Birchler, James A.; Grote, Mark N.; Lorant, Anne; Quezada, Juvenal
2018-01-01
While the vast majority of genome size variation in plants is due to differences in repetitive sequence, we know little about how selection acts on repeat content in natural populations. Here we investigate parallel changes in intraspecific genome size and repeat content of domesticated maize (Zea mays) landraces and their wild relative teosinte across altitudinal gradients in Mesoamerica and South America. We combine genotyping, low coverage whole-genome sequence data, and flow cytometry to test for evidence of selection on genome size and individual repeat abundance. We find that population structure alone cannot explain the observed variation, implying that clinal patterns of genome size are maintained by natural selection. Our modeling additionally provides evidence of selection on individual heterochromatic knob repeats, likely due to their large individual contribution to genome size. To better understand the phenotypes driving selection on genome size, we conducted a growth chamber experiment using a population of highland teosinte exhibiting extensive variation in genome size. We find weak support for a positive correlation between genome size and cell size, but stronger support for a negative correlation between genome size and the rate of cell production. Reanalyzing published data of cell counts in maize shoot apical meristems, we then identify a negative correlation between cell production rate and flowering time. Together, our data suggest a model in which variation in genome size is driven by natural selection on flowering time across altitudinal clines, connecting intraspecific variation in repetitive sequence to important differences in adaptive phenotypes. PMID:29746459
USDA-ARS?s Scientific Manuscript database
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its ef...
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
USDA-ARS?s Scientific Manuscript database
Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road was proposed based on evidence from diverse genomic analyses. Cultiva...
Jonas, Elisabeth; de Koning, Dirk Jan
Genomic Selection is an important topic in quantitative genetics and breeding. Not only does it allow the full use of current molecular genetic technologies, it stimulates also the development of new methods and models. Genomic selection, if fully implemented in commercial farming, should have a major impact on the productivity of various agricultural systems. But suggested approaches need to be applicable in commercial breeding populations. Many of the published research studies focus on methodologies. We conclude from the reviewed publications, that a stronger focus on strategies for the implementation of genomic selection in advanced breeding lines, introduction of new varieties, hybrids or multi-line crosses is needed. Efforts to find solutions for a better prediction and integration of environmental influences need to continue within applied breeding schemes. Goals of the implementation of genomic selection into crop breeding should be carefully defined and crop breeders in the private sector will play a substantial part in the decision-making process. However, the lack of published results from studies within, or in collaboration with, private companies diminishes the knowledge on the status of genomic selection within applied breeding programmes. Studies on the implementation of genomic selection in plant breeding need to evaluate models and methods with an enhanced emphasis on population-specific requirements and production environments. Adaptation of methods to breeding schemes or changes to breeding programmes for a better integration of genomic selection strategies are needed across species. More openness with a continuous exchange will contribute to successes.
Genomic Selection in Multi-environment Crop Trials.
Oakey, Helena; Cullis, Brian; Thompson, Robin; Comadran, Jordi; Halpin, Claire; Waugh, Robbie
2016-05-03
Genomic selection in crop breeding introduces modeling challenges not found in animal studies. These include the need to accommodate replicate plants for each line, consider spatial variation in field trials, address line by environment interactions, and capture nonadditive effects. Here, we propose a flexible single-stage genomic selection approach that resolves these issues. Our linear mixed model incorporates spatial variation through environment-specific terms, and also randomization-based design terms. It considers marker, and marker by environment interactions using ridge regression best linear unbiased prediction to extend genomic selection to multiple environments. Since the approach uses the raw data from line replicates, the line genetic variation is partitioned into marker and nonmarker residual genetic variation (i.e., additive and nonadditive effects). This results in a more precise estimate of marker genetic effects. Using barley height data from trials, in 2 different years, of up to 477 cultivars, we demonstrate that our new genomic selection model improves predictions compared to current models. Analyzing single trials revealed improvements in predictive ability of up to 5.7%. For the multiple environment trial (MET) model, combining both year trials improved predictive ability up to 11.4% compared to a single environment analysis. Benefits were significant even when fewer markers were used. Compared to a single-year standard model run with 3490 markers, our partitioned MET model achieved the same predictive ability using between 500 and 1000 markers depending on the trial. Our approach can be used to increase accuracy and confidence in the selection of the best lines for breeding and/or, to reduce costs by using fewer markers. Copyright © 2016 Oakey et al.
Via, Sara
2012-01-01
In allopatric populations, geographical separation simultaneously isolates the entire genome, allowing genetic divergence to accumulate virtually anywhere in the genome. In sympatric populations, however, the strong divergent selection required to overcome migration produces a genetic mosaic of divergent and non-divergent genomic regions. In some recent genome scans, each divergent genomic region has been interpreted as an independent incidence of migration/selection balance, such that the reduction of gene exchange is restricted to a few kilobases around each divergently selected gene. I propose an alternative mechanism, ‘divergence hitchhiking’ (DH), in which divergent selection can reduce gene exchange for several megabases around a gene under strong divergent selection. Not all genes/markers within a DH region are divergently selected, yet the entire region is protected to some degree from gene exchange, permitting genetic divergence from mechanisms other than divergent selection to accumulate secondarily. After contrasting DH and multilocus migration/selection balance (MM/SB), I outline a model in which genomic isolation at a given genomic location is jointly determined by DH and genome-wide effects of the progressive reduction in realized migration, then illustrate DH using data from several pairs of incipient species in the wild. PMID:22201174
USDA-ARS?s Scientific Manuscript database
Previously we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative enabling exploitation...
Genetic signatures of natural selection in a model invasive ascidian
Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin
2017-01-01
Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta. PMID:28266616
Resende, R T; Resende, M D V; Silva, F F; Azevedo, C F; Takahashi, E K; Silva-Junior, O B; Grattapaglia, D
2017-10-01
We report a genomic selection (GS) study of growth and wood quality traits in an outbred F 2 hybrid Eucalyptus population (n=768) using high-density single-nucleotide polymorphism (SNP) genotyping. Going beyond previous reports in forest trees, models were developed for different selection targets, namely, families, individuals within families and individuals across the entire population using a genomic model including dominance. To provide a more breeder-intelligible assessment of the performance of GS we calculated the expected response as the percentage gain over the population average expected genetic value (EGV) for different proportions of genomically selected individuals, using a rigorous cross-validation (CV) scheme that removed relatedness between training and validation sets. Predictive abilities (PAs) were 0.40-0.57 for individual selection and 0.56-0.75 for family selection. PAs under an additive+dominance model improved predictions by 5 to 14% for growth depending on the selection target, but no improvement was seen for wood traits. The good performance of GS with no relatedness in CV suggested that our average SNP density (~25 kb) captured some short-range linkage disequilibrium. Truncation GS successfully selected individuals with an average EGV significantly higher than the population average. Response to GS on a per year basis was ~100% more efficient than by phenotypic selection and more so with higher selection intensities. These results contribute further experimental data supporting the positive prospects of GS in forest trees. Because generation times are long, traits are complex and costs of DNA genotyping are plummeting, genomic prediction has good perspectives of adoption in tree breeding practice.
Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.
Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K
2017-11-01
Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
Genome-Assisted Prediction of Quantitative Traits Using the R Package sommer.
Covarrubias-Pazaran, Giovanny
2016-01-01
Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.
Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M
2016-06-01
Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.
The locus of sexual selection: moving sexual selection studies into the post-genomics era.
Wilkinson, G S; Breden, F; Mank, J E; Ritchie, M G; Higginson, A D; Radwan, J; Jaquiery, J; Salzburger, W; Arriero, E; Barribeau, S M; Phillips, P C; Renn, S C P; Rowe, L
2015-04-01
Sexual selection drives fundamental evolutionary processes such as trait elaboration and speciation. Despite this importance, there are surprisingly few examples of genes unequivocally responsible for variation in sexually selected phenotypes. This lack of information inhibits our ability to predict phenotypic change due to universal behaviours, such as fighting over mates and mate choice. Here, we discuss reasons for this apparent gap and provide recommendations for how it can be overcome by adopting contemporary genomic methods, exploiting underutilized taxa that may be ideal for detecting the effects of sexual selection and adopting appropriate experimental paradigms. Identifying genes that determine variation in sexually selected traits has the potential to improve theoretical models and reveal whether the genetic changes underlying phenotypic novelty utilize common or unique molecular mechanisms. Such a genomic approach to sexual selection will help answer questions in the evolution of sexually selected phenotypes that were first asked by Darwin and can furthermore serve as a model for the application of genomics in all areas of evolutionary biology. © 2015 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2015 European Society For Evolutionary Biology.
Theory of microbial genome evolution
NASA Astrophysics Data System (ADS)
Koonin, Eugene
Bacteria and archaea have small genomes tightly packed with protein-coding genes. This compactness is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. By fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. Thus, the number of genes in prokaryotic genomes seems to reflect the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias. New genes acquired by microbial genomes, on average, appear to be adaptive. Evolution of bacterial and archaeal genomes involves extensive horizontal gene transfer and gene loss. Many microbes have open pangenomes, where each newly sequenced genome contains more than 10% `ORFans', genes without detectable homologues in other species. A simple, steady-state evolutionary model reveals two sharply distinct classes of microbial genes, one of which (ORFans) is characterized by effectively instantaneous gene replacement, whereas the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of at least a billion distinct genes in the prokaryotic genomic universe.
Rutkoski, Jessica; Poland, Jesse; Mondal, Suchismita; Autrique, Enrique; Pérez, Lorena González; Crossa, José; Reynolds, Matthew; Singh, Ravi
2016-01-01
Genomic selection can be applied prior to phenotyping, enabling shorter breeding cycles and greater rates of genetic gain relative to phenotypic selection. Traits measured using high-throughput phenotyping based on proximal or remote sensing could be useful for improving pedigree and genomic prediction model accuracies for traits not yet possible to phenotype directly. We tested if using aerial measurements of canopy temperature, and green and red normalized difference vegetation index as secondary traits in pedigree and genomic best linear unbiased prediction models could increase accuracy for grain yield in wheat, Triticum aestivum L., using 557 lines in five environments. Secondary traits on training and test sets, and grain yield on the training set were modeled as multivariate, and compared to univariate models with grain yield on the training set only. Cross validation accuracies were estimated within and across-environment, with and without replication, and with and without correcting for days to heading. We observed that, within environment, with unreplicated secondary trait data, and without correcting for days to heading, secondary traits increased accuracies for grain yield by 56% in pedigree, and 70% in genomic prediction models, on average. Secondary traits increased accuracy slightly more when replicated, and considerably less when models corrected for days to heading. In across-environment prediction, trends were similar but less consistent. These results show that secondary traits measured in high-throughput could be used in pedigree and genomic prediction to improve accuracy. This approach could improve selection in wheat during early stages if validated in early-generation breeding plots. PMID:27402362
Improving the baking quality of bread wheat by genomic selection in early generations.
Michel, Sebastian; Kummer, Christian; Gallee, Martin; Hellinger, Jakob; Ametz, Christian; Akgöl, Batuhan; Epure, Doru; Güngör, Huseyin; Löschenberger, Franziska; Buerstmayr, Hermann
2018-02-01
Genomic selection shows great promise for pre-selecting lines with superior bread baking quality in early generations, 3 years ahead of labour-intensive, time-consuming, and costly quality analysis. The genetic improvement of baking quality is one of the grand challenges in wheat breeding as the assessment of the associated traits often involves time-consuming, labour-intensive, and costly testing forcing breeders to postpone sophisticated quality tests to the very last phases of variety development. The prospect of genomic selection for complex traits like grain yield has been shown in numerous studies, and might thus be also an interesting method to select for baking quality traits. Hence, we focused in this study on the accuracy of genomic selection for laborious and expensive to phenotype quality traits as well as its selection response in comparison with phenotypic selection. More than 400 genotyped wheat lines were, therefore, phenotyped for protein content, dough viscoelastic and mixing properties related to baking quality in multi-environment trials 2009-2016. The average prediction accuracy across three independent validation populations was r = 0.39 and could be increased to r = 0.47 by modelling major QTL as fixed effects as well as employing multi-trait prediction models, which resulted in an acceptable prediction accuracy for all dough rheological traits (r = 0.38-0.63). Genomic selection can furthermore be applied 2-3 years earlier than direct phenotypic selection, and the estimated selection response was nearly twice as high in comparison with indirect selection by protein content for baking quality related traits. This considerable advantage of genomic selection could accordingly support breeders in their selection decisions and aid in efficiently combining superior baking quality with grain yield in newly developed wheat varieties.
Multilocus approaches for the measurement of selection on correlated genetic loci.
Gompert, Zachariah; Egan, Scott P; Barrett, Rowan D H; Feder, Jeffrey L; Nosil, Patrik
2017-01-01
The study of ecological speciation is inherently linked to the study of selection. Methods for estimating phenotypic selection within a generation based on associations between trait values and fitness (e.g. survival) of individuals are established. These methods attempt to disentangle selection acting directly on a trait from indirect selection caused by correlations with other traits via multivariate statistical approaches (i.e. inference of selection gradients). The estimation of selection on genotypic or genomic variation could also benefit from disentangling direct and indirect selection on genetic loci. However, achieving this goal is difficult with genomic data because the number of potentially correlated genetic loci (p) is very large relative to the number of individuals sampled (n). In other words, the number of model parameters exceeds the number of observations (p ≫ n). We present simulations examining the utility of whole-genome regression approaches (i.e. Bayesian sparse linear mixed models) for quantifying direct selection in cases where p ≫ n. Such models have been used for genome-wide association mapping and are common in artificial breeding. Our results show they hold promise for studies of natural selection in the wild and thus of ecological speciation. But we also demonstrate important limitations to the approach and discuss study designs required for more robust inferences. © 2016 John Wiley & Sons Ltd.
Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S
2015-06-01
This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Are there laws of genome evolution?
Koonin, Eugene V
2011-08-01
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law-like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as "laws of evolutionary genomics" in the same sense "law" is understood in modern physics.
Frantz, Laurent A F; Schraiber, Joshua G; Madsen, Ole; Megens, Hendrik-Jan; Cagan, Alex; Bosse, Mirte; Paudel, Yogesh; Crooijmans, Richard P M A; Larson, Greger; Groenen, Martien A M
2015-10-01
Traditionally, the process of domestication is assumed to be initiated by humans, involve few individuals and rely on reproductive isolation between wild and domestic forms. We analyzed pig domestication using over 100 genome sequences and tested whether pig domestication followed a traditional linear model or a more complex, reticulate model. We found that the assumptions of traditional models, such as reproductive isolation and strong domestication bottlenecks, are incompatible with the genetic data. In addition, our results show that, despite gene flow, the genomes of domestic pigs have strong signatures of selection at loci that affect behavior and morphology. We argue that recurrent selection for domestic traits likely counteracted the homogenizing effect of gene flow from wild boars and created 'islands of domestication' in the genome. Our results have major ramifications for the understanding of animal domestication and suggest that future studies should employ models that do not assume reproductive isolation.
2013-01-01
Background Genomic selection is an appealing method to select purebreds for crossbred performance. In the case of crossbred records, single nucleotide polymorphism (SNP) effects can be estimated using an additive model or a breed-specific allele model. In most studies, additive gene action is assumed. However, dominance is the likely genetic basis of heterosis. Advantages of incorporating dominance in genomic selection were investigated in a two-way crossbreeding program for a trait with different magnitudes of dominance. Training was carried out only once in the simulation. Results When the dominance variance and heterosis were large and overdominance was present, a dominance model including both additive and dominance SNP effects gave substantially greater cumulative response to selection than the additive model. Extra response was the result of an increase in heterosis but at a cost of reduced purebred performance. When the dominance variance and heterosis were realistic but with overdominance, the advantage of the dominance model decreased but was still significant. When overdominance was absent, the dominance model was slightly favored over the additive model, but the difference in response between the models increased as the number of quantitative trait loci increased. This reveals the importance of exploiting dominance even in the absence of overdominance. When there was no dominance, response to selection for the dominance model was as high as for the additive model, indicating robustness of the dominance model. The breed-specific allele model was inferior to the dominance model in all cases and to the additive model except when the dominance variance and heterosis were large and with overdominance. However, the advantage of the dominance model over the breed-specific allele model may decrease as differences in linkage disequilibrium between the breeds increase. Retraining is expected to reduce the advantage of the dominance model over the alternatives, because in general, the advantage becomes important only after five or six generations post-training. Conclusion Under dominance and without retraining, genomic selection based on the dominance model is superior to the additive model and the breed-specific allele model to maximize crossbred performance through purebred selection. PMID:23621868
Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D
2016-01-01
Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.
Auinger, Hans-Jürgen; Schönleben, Manfred; Lehermeier, Christina; Schmidt, Malthe; Korzun, Viktor; Geiger, Hartwig H; Piepho, Hans-Peter; Gordillo, Andres; Wilde, Peer; Bauer, Eva; Schön, Chris-Carolin
2016-11-01
Genomic prediction accuracy can be significantly increased by model calibration across multiple breeding cycles as long as selection cycles are connected by common ancestors. In hybrid rye breeding, application of genome-based prediction is expected to increase selection gain because of long selection cycles in population improvement and development of hybrid components. Essentially two prediction scenarios arise: (1) prediction of the genetic value of lines from the same breeding cycle in which model training is performed and (2) prediction of lines from subsequent cycles. It is the latter from which a reduction in cycle length and consequently the strongest impact on selection gain is expected. We empirically investigated genome-based prediction of grain yield, plant height and thousand kernel weight within and across four selection cycles of a hybrid rye breeding program. Prediction performance was assessed using genomic and pedigree-based best linear unbiased prediction (GBLUP and PBLUP). A total of 1040 S 2 lines were genotyped with 16 k SNPs and each year testcrosses of 260 S 2 lines were phenotyped in seven or eight locations. The performance gap between GBLUP and PBLUP increased significantly for all traits when model calibration was performed on aggregated data from several cycles. Prediction accuracies obtained from cross-validation were in the order of 0.70 for all traits when data from all cycles (N CS = 832) were used for model training and exceeded within-cycle accuracies in all cases. As long as selection cycles are connected by a sufficient number of common ancestors and prediction accuracy has not reached a plateau when increasing sample size, aggregating data from several preceding cycles is recommended for predicting genetic values in subsequent cycles despite decreasing relatedness over time.
Relating hybrid advantage and genome replacement in unisexual salamanders.
Charney, Noah D
2012-05-01
Unisexual vertebrates are model systems for understanding the evolution of sex. Many predominantly clonal lineages allow occasional genetic recombination, which may be sufficient to avoid the accumulation of deleterious mutations and parasites. Introgression of paternal DNA into an all-female lineage represents a one-way flow of genetic material. Over many generations, this could result in complete replacement of the unisexual genomes by those of the donor species. The process of genome replacement may be counteracted by contemporary dispersal or by positive selection on hybrid nuclear genomes in ecotones. I present a conceptual model that relates nuclear genome replacement, positive selection on hybrids and biogeography in unisexual systems. I execute an individual-based simulation of the fate of hybrid genotypes in contact with a single host species. I parameterize these models for unisexual salamanders in the Ambystoma genus, for which the frequency of genome replacement has been a source of ongoing debate. I find that, if genome replacement occurs at a rate greater than 1/10,000 in Ambystoma, then there must be compensating positive selection in order to maintain observed levels of hybrid nuclei. Future researchers studying unisexual systems may use this framework as a guide to evaluating the hybrid superiority hypothesis. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.
Efficient use of historical data for genomic selection: a case study of rust resistance in wheat
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) is a new methodology that can improve wheat breeding efficiency. To implement GS, a training population (TP) with both phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). Several factors impact prediction...
Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc
2014-02-01
Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
Jonas, Elisabeth; de Koning, Dirk-Jan
2015-01-01
Genomic selection is a promising development in agriculture, aiming improved production by exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. It opens opportunities for research, as novel algorithms and lab methodologies are developed. Genomic selection can be applied in many breeds and species. Further research on the implementation of genomic selection (GS) in breeding programs is highly desirable not only for the common good, but also the private sector (breeding companies). It has been projected that this approach will improve selection routines, especially in species with long reproduction cycles, late or sex-limited or expensive trait recording and for complex traits. The task of integrating GS into existing breeding programs is, however, not straightforward. Despite successful integration into breeding programs for dairy cattle, it has yet to be shown how much emphasis can be given to the genomic information and how much additional phenotypic information is needed from new selection candidates. Genomic selection is already part of future planning in many breeding companies of pigs and beef cattle among others, but further research is needed to fully estimate how effective the use of genomic information will be for the prediction of the performance of future breeding stock. Genomic prediction of production in crossbreeding and across-breed schemes, costs and choice of individuals for genotyping are reasons for a reluctance to fully rely on genomic information for selection decisions. Breeding objectives are highly dependent on the industry and the additional gain when using genomic information has to be considered carefully. This review synthesizes some of the suggested approaches in selected livestock species including cattle, pig, chicken, and fish. It outlines tasks to help understanding possible consequences when applying genomic information in breeding scenarios. PMID:25750652
Jonas, Elisabeth; de Koning, Dirk-Jan
2015-01-01
Genomic selection is a promising development in agriculture, aiming improved production by exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. It opens opportunities for research, as novel algorithms and lab methodologies are developed. Genomic selection can be applied in many breeds and species. Further research on the implementation of genomic selection (GS) in breeding programs is highly desirable not only for the common good, but also the private sector (breeding companies). It has been projected that this approach will improve selection routines, especially in species with long reproduction cycles, late or sex-limited or expensive trait recording and for complex traits. The task of integrating GS into existing breeding programs is, however, not straightforward. Despite successful integration into breeding programs for dairy cattle, it has yet to be shown how much emphasis can be given to the genomic information and how much additional phenotypic information is needed from new selection candidates. Genomic selection is already part of future planning in many breeding companies of pigs and beef cattle among others, but further research is needed to fully estimate how effective the use of genomic information will be for the prediction of the performance of future breeding stock. Genomic prediction of production in crossbreeding and across-breed schemes, costs and choice of individuals for genotyping are reasons for a reluctance to fully rely on genomic information for selection decisions. Breeding objectives are highly dependent on the industry and the additional gain when using genomic information has to be considered carefully. This review synthesizes some of the suggested approaches in selected livestock species including cattle, pig, chicken, and fish. It outlines tasks to help understanding possible consequences when applying genomic information in breeding scenarios.
Lenz, Patrick R N; Beaulieu, Jean; Mansfield, Shawn D; Clément, Sébastien; Desponts, Mireille; Bousquet, Jean
2017-04-28
Genomic selection (GS) uses information from genomic signatures consisting of thousands of genetic markers to predict complex traits. As such, GS represents a promising approach to accelerate tree breeding, which is especially relevant for the genetic improvement of boreal conifers characterized by long breeding cycles. In the present study, we tested GS in an advanced-breeding population of the boreal black spruce (Picea mariana [Mill.] BSP) for growth and wood quality traits, and concurrently examined factors affecting GS model accuracy. The study relied on 734 25-year-old trees belonging to 34 full-sib families derived from 27 parents and that were established on two contrasting sites. Genomic profiles were obtained from 4993 Single Nucleotide Polymorphisms (SNPs) representative of as many gene loci distributed among the 12 linkage groups common to spruce. GS models were obtained for four growth and wood traits. Validation using independent sets of trees showed that GS model accuracy was high, related to trait heritability and equivalent to that of conventional pedigree-based models. In forward selection, gains per unit of time were three times higher with the GS approach than with conventional selection. In addition, models were also accurate across sites, indicating little genotype-by-environment interaction in the area investigated. Using information from half-sibs instead of full-sibs led to a significant reduction in model accuracy, indicating that the inclusion of relatedness in the model contributed to its higher accuracies. About 500 to 1000 markers were sufficient to obtain GS model accuracy almost equivalent to that obtained with all markers, whether they were well spread across the genome or from a single linkage group, further confirming the implication of relatedness and potential long-range linkage disequilibrium (LD) in the high accuracy estimates obtained. Only slightly higher model accuracy was obtained when using marker subsets that were identified to carry large effects, indicating a minor role for short-range LD in this population. This study supports the integration of GS models in advanced-generation tree breeding programs, given that high genomic prediction accuracy was obtained with a relatively small number of markers due to high relatedness and family structure in the population. In boreal spruce breeding programs and similar ones with long breeding cycles, much larger gain per unit of time can be obtained from genomic selection at an early age than by the conventional approach. GS thus appears highly profitable, especially in the context of forward selection in species which are amenable to mass vegetative propagation of selected stock, such as spruces.
Orsini, Luisa; Spanier, Katina I; DE Meester, Luc
2012-05-01
Natural populations are confronted with multiple selection pressures resulting in a mosaic of environmental stressors at the landscape level. Identifying the genetic underpinning of adaptation to these complex selection environments and assigning causes of natural selection within multidimensional selection regimes in the wild is challenging. The water flea Daphnia is a renowned ecological model system with its well-documented ecology, the possibility to analyse subfossil dormant egg banks and the short generation time allowing an experimental evolution approach. Capitalizing on the strengths of this model system, we here link candidate genome regions to three selection pressures, known to induce micro-evolutionary responses in Daphnia magna: fish predation, parasitism and land use. Using a genome scan approach in space, time and experimental evolution trials, we provide solid evidence of selection at the genome level under well-characterized environmental gradients in the wild and identify candidate genes linked to the three environmental stressors. Our study reveals differential selection at the genome level in Daphnia populations and provides evidence for repeatable patterns of local adaptation in a geographic mosaic of environmental stressors fuelled by standing genetic variation. Our results imply high evolutionary potential of local populations, which is relevant to understand the dynamics of trait changes in natural populations and their impact on community and ecosystem responses through eco-evolutionary feedbacks. © 2012 Blackwell Publishing Ltd.
Microsatellites as targets of natural selection.
Haasl, Ryan J; Payseur, Bret A
2013-02-01
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich's ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.
Microsatellites as Targets of Natural Selection
Haasl, Ryan J.; Payseur, Bret A.
2013-01-01
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich’s ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants. PMID:23104080
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.
Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger
2018-04-19
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
Exploring new alleles for frost tolerance in winter rye.
Erath, Wiltrud; Bauer, Eva; Fowler, D Brian; Gordillo, Andres; Korzun, Viktor; Ponomareva, Mira; Schmidt, Malthe; Schmiedchen, Brigitta; Wilde, Peer; Schön, Chris-Carolin
2017-10-01
Rye genetic resources provide a valuable source of new alleles for the improvement of frost tolerance in rye breeding programs. Frost tolerance is a must-have trait for winter cereal production in northern and continental cropping areas. Genetic resources should harbor promising alleles for the improvement of frost tolerance of winter rye elite lines. For frost tolerance breeding, the identification of quantitative trait loci (QTL) and the choice of optimum genome-based selection methods are essential. We identified genomic regions involved in frost tolerance of winter rye by QTL mapping in a biparental population derived from a highly frost tolerant selection from the Canadian cultivar Puma and the European elite line Lo157. Lines per se and their testcrosses were phenotyped in a controlled freeze test and in multi-location field trials in Russia and Canada. Three QTL on chromosomes 4R, 5R, and 7R were consistently detected across environments. The QTL on 5R is congruent with the genomic region harboring the Frost resistance locus 2 (Fr-2) in Triticeae. The Puma allele at the Fr-R2 locus was found to significantly increase frost tolerance. A comparison of predictive ability obtained from the QTL-based model with different whole-genome prediction models revealed that besides a few large, also small QTL effects contribute to the genomic variance of frost tolerance in rye. Genomic prediction models assigning a high weight to the Fr-R2 locus allow increasing the selection intensity for frost tolerance by genome-based pre-selection of promising candidates.
Da, Yang
2015-12-18
The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.
Selective significance of genome size in a plant community with heavy metal pollution.
Vidic, T; Greilhuber, J; Vilhar, B; Dermastia, M
2009-09-01
In eukaryotes, nuclear genome sizes vary by more than five orders of magnitude. This variation is not related to organismal complexity, and its origin and biological significance are still disputed. One of the open questions is whether genome size has an adaptive role. We tested the hypothesis that genome size has selective significance, using five grassland communities occurring on a gradient of metal pollution of the soil as a model. We detected a negative correlation between the concentration of contaminating metals in the soil and the number of vascular plant species. Analysis of genome sizes of 70 herbaceous dicot perennial species occurring on the investigated plots revealed a negative correlation between the concentration of contaminating metals in the soil and the proportion of species with large genomes in plant communities. Consistent with the hypothesis, these results show that species with large genomes are at selective disadvantage in extreme environmental conditions.
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo
2017-01-05
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
Allele frequency changes due to hitch-hiking in genomic selection programs
2014-01-01
Background Genomic selection makes it possible to reduce pedigree-based inbreeding over best linear unbiased prediction (BLUP) by increasing emphasis on own rather than family information. However, pedigree inbreeding might not accurately reflect loss of genetic variation and the true level of inbreeding due to changes in allele frequencies and hitch-hiking. This study aimed at understanding the impact of using long-term genomic selection on changes in allele frequencies, genetic variation and level of inbreeding. Methods Selection was performed in simulated scenarios with a population of 400 animals for 25 consecutive generations. Six genetic models were considered with different heritabilities and numbers of QTL (quantitative trait loci) affecting the trait. Four selection criteria were used, including selection on own phenotype and on estimated breeding values (EBV) derived using phenotype-BLUP, genomic BLUP and Bayesian Lasso. Changes in allele frequencies at QTL, markers and linked neutral loci were investigated for the different selection criteria and different scenarios, along with the loss of favourable alleles and the rate of inbreeding measured by pedigree and runs of homozygosity. Results For each selection criterion, hitch-hiking in the vicinity of the QTL appeared more extensive when accuracy of selection was higher and the number of QTL was lower. When inbreeding was measured by pedigree information, selection on genomic BLUP EBV resulted in lower levels of inbreeding than selection on phenotype BLUP EBV, but this did not always apply when inbreeding was measured by runs of homozygosity. Compared to genomic BLUP, selection on EBV from Bayesian Lasso led to less genetic drift, reduced loss of favourable alleles and more effectively controlled the rate of both pedigree and genomic inbreeding in all simulated scenarios. In addition, selection on EBV from Bayesian Lasso showed a higher selection differential for mendelian sampling terms than selection on genomic BLUP EBV. Conclusions Neutral variation can be shaped to a great extent by the hitch-hiking effects associated with selection, rather than just by genetic drift. When implementing long-term genomic selection, strategies for genomic control of inbreeding are essential, due to a considerable hitch-hiking effect, regardless of the method that is used for prediction of EBV. PMID:24495634
Iwata, Hiroyoshi; Hayashi, Takeshi; Terakami, Shingo; Takada, Norio; Sawamura, Yutaka; Yamamoto, Toshiya
2013-01-01
Although the potential of marker-assisted selection (MAS) in fruit tree breeding has been reported, bi-parental QTL mapping before MAS has hindered the introduction of MAS to fruit tree breeding programs. Genome-wide association studies (GWAS) are an alternative to bi-parental QTL mapping in long-lived perennials. Selection based on genomic predictions of breeding values (genomic selection: GS) is another alternative for MAS. This study examined the potential of GWAS and GS in pear breeding with 76 Japanese pear cultivars to detect significant associations of 162 markers with nine agronomic traits. We applied multilocus Bayesian models accounting for ordinal categorical phenotypes for GWAS and GS model training. Significant associations were detected at harvest time, black spot resistance and the number of spurs and two of the associations were closely linked to known loci. Genome-wide predictions for GS were accurate at the highest level (0.75) in harvest time, at medium levels (0.38–0.61) in resistance to black spot, firmness of flesh, fruit shape in longitudinal section, fruit size, acid content and number of spurs and at low levels (<0.2) in all soluble solid content and vigor of tree. Results suggest the potential of GWAS and GS for use in future breeding programs in Japanese pear. PMID:23641189
Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R
2015-02-01
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.
Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R.
2015-01-01
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline. PMID:25689273
Flores-Ponce, Mitzi; Vallebueno-Estrada, Miguel; González-Orozco, Eduardo; Ramos-Aboites, Hilda E; García-Chávez, J Noé; Simões, Nelson; Montiel, Rafael
2017-04-26
The entomopathogenic nematode Steinernema carpocapsae has been used worldwide as a biocontrol agent for insect pests, making it an interesting model for understanding parasite-host interactions. Two models propose that these interactions are co-evolutionary processes in such a way that equilibrium is never reached. In one model, known as "arms race", new alleles in relevant genes are fixed in both host and pathogens by directional positive selection, producing recurrent and alternating selective sweeps. In the other model, known as"trench warfare", persistent dynamic fluctuations in allele frequencies are sustained by balancing selection. There are some examples of genes evolving according to both models, however, it is not clear to what extent these interactions might alter genome-level evolutionary patterns and intraspecific diversity. Here we investigate some of these aspects by studying genomic variation in S. carpocapsae and other pathogenic and free-living nematodes from phylogenetic clades IV and V. To look for signatures of an arms-race dynamic, we conducted massive scans to detect directional positive selection in interspecific data. In free-living nematodes, we detected a significantly higher proportion of genes with sites under positive selection than in parasitic nematodes. However, in these genes, we found more enriched Gene Ontology terms in parasites. To detect possible effects of dynamic polymorphisms interactions we looked for signatures of balancing selection in intraspecific genomic data. The observed distribution of Tajima's D values in S. carpocapsae was more skewed to positive values and significantly different from the observed distribution in the free-living Caenorhabditis briggsae. Also, the proportion of significant positive values of Tajima's D was elevated in genes that were differentially expressed after induction with insect tissues as compared to both non-differentially expressed genes and the global scan. Our study provides a first portrait of the effects that lifestyle might have in shaping the patterns of selection at the genomic level. An arms-race between hosts and pathogens seems to be affecting specific genetic functions but not necessarily increasing the number of positively selected genes. Trench warfare dynamics seem to be acting more generally in the genome, likely focusing on genes responding to the interaction, rather than targeting specific genetic functions.
Integrating genomic selection into dairy cattle breeding programmes: a review.
Bouquet, A; Juga, J
2013-05-01
Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However, recent simulation studies have shown that putting constraints on genomic inbreeding rates for defining optimal contributions of breeding animals could significantly reduce achievable genetic gain. Finally, the article summarizes the potential of genomic selection to include new traits in the breeding goal to meet societal demands regarding animal health and environmental efficiency in animal production.
Comparative Reannotation of 21 Aspergillus Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Salamov, Asaf; Riley, Robert; Kuo, Alan
2013-03-08
We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one whichmore » most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.« less
Training set optimization under population structure in genomic selection
USDA-ARS?s Scientific Manuscript database
The optimization of the training set (TRS) in genomic selection (GS) has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the Coefficient of D...
Lignin-degrading Peroxidases from Genome of Selective Ligninolytic Fungus Ceriporiopsis subverispora
Elena Fernandez-Fueyo; Francisco J. Ruiz-Duenas; Yuta Miki; Marta Jesus Martinez; Kenneth E. Hammel; Angel T. Martinez
2012-01-01
Background: The first genome of a selective lignin degrader is available. Results: Its screening shows 26 peroxidase genes, and 5 genes were heterologously expressed and the catalytic properties investigated. Conclusion: Two new peroxidases oxidize simple and dimeric lignin models and efficiently depolymerize lignin. Significance: Although lignin peroxidase and...
Barría, Agustín; Christensen, Kris A.; Yoshida, Grazyella M.; Correa, Katharina; Jedlicki, Ana; Lhorente, Jean P.; Davidson, William S.; Yáñez, José M.
2018-01-01
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD. PMID:29440129
Joost, Stéphane; Kalbermatten, Michael; Bezault, Etienne; Seehausen, Ole
2012-01-01
When searching for loci possibly under selection in the genome, an alternative to population genetics theoretical models is to establish allele distribution models (ADM) for each locus to directly correlate allelic frequencies and environmental variables such as precipitation, temperature, or sun radiation. Such an approach implementing multiple logistic regression models in parallel was implemented within a computing program named MATSAM: . Recently, this application was improved in order to support qualitative environmental predictors as well as to permit the identification of associations between genomic variation and individual phenotypes, allowing the detection of loci involved in the genetic architecture of polymorphic characters. Here, we present the corresponding methodological developments and compare the results produced by software implementing population genetics theoretical models (DFDIST: and BAYESCAN: ) and ADM (MATSAM: ) in an empirical context to detect signatures of genomic divergence associated with speciation in Lake Victoria cichlid fishes.
Duan, Naibin; Bai, Yang; Sun, Honghe; Wang, Nan; Ma, Yumin; Li, Mingjun; Wang, Xin; Jiao, Chen; Legall, Noah; Mao, Linyong; Wan, Sibao; Wang, Kun; He, Tianming; Feng, Shouqian; Zhang, Zongying; Mao, Zhiquan; Shen, Xiang; Chen, Xiaoliu; Jiang, Yuanmao; Wu, Shujing; Yin, Chengmiao; Ge, Shunfeng; Yang, Long; Jiang, Shenghui; Xu, Haifeng; Liu, Jingxuan; Wang, Deyun; Qu, Changzhi; Wang, Yicheng; Zuo, Weifang; Xiang, Li; Liu, Chang; Zhang, Daoyuan; Gao, Yuan; Xu, Yimin; Xu, Kenong; Chao, Thomas; Fazio, Gennaro; Shu, Huairui; Zhong, Gan-Yuan; Cheng, Lailiang; Fei, Zhangjun; Chen, Xuesen
2017-08-15
Human selection has reshaped crop genomes. Here we report an apple genome variation map generated through genome sequencing of 117 diverse accessions. A comprehensive model of apple speciation and domestication along the Silk Road is proposed based on evidence from diverse genomic analyses. Cultivated apples likely originate from Malus sieversii in Kazakhstan, followed by intensive introgressions from M. sylvestris. M. sieversii in Xinjiang of China turns out to be an "ancient" isolated ecotype not directly contributing to apple domestication. We have identified selective sweeps underlying quantitative trait loci/genes of important fruit quality traits including fruit texture and flavor, and provide evidences supporting a model of apple fruit size evolution comprising two major events with one occurring prior to domestication and the other during domestication. This study outlines the genetic basis of apple domestication and evolution, and provides valuable information for facilitating marker-assisted breeding and apple improvement.Apple is one of the most important fruit crops. Here, the authors perform deep genome resequencing of 117 diverse accessions and reveal comprehensive models of apple origin, speciation, domestication, and fruit size evolution as well as candidate genes associated with important agronomic traits.
Schiavo, G; Galimberti, G; Calò, D G; Samorè, A B; Bertolini, F; Russo, V; Gallo, M; Buttazzoni, L; Fontanesi, L
2016-04-01
In this study, we investigated at the genome-wide level if 20 years of artificial directional selection based on boar genetic evaluation obtained with a classical BLUP animal model shaped the genome of the Italian Large White pig breed. The most influential boars of this breed (n = 192), born from 1992 (the beginning of the selection program of this breed) to 2012, with an estimated breeding value reliability of >0.85, were genotyped with the Illumina Porcine SNP60 BeadChip. After grouping the boars in eight classes according to their year of birth, filtered single nucleotide polymorphisms (SNPs) were used to evaluate the effects of time on genotype frequency changes using multinomial logistic regression models. Of these markers, 493 had a PBonferroni < 0.10. However, there was an increasing number of SNPs with a decreasing level of allele frequency changes over time, representing a continuous profile across the genome. The largest proportion of the 493 SNPs was on porcine chromosome (SSC) 7, SSC2, SSC8 and SSC18 for a total of 204 haploblocks. Functional annotations of genomic regions, including the 493 shifted SNPs, reported a few Gene Ontology terms that might underly the biological processes that contributed to increase performances of the pigs over the 20 years of the selection program. The obtained results indicated that the genome of the Italian Large White pigs was shaped by a directional selection program derived by the application of methodologies assuming the infinitesimal model that captured a continuous trend of allele frequency changes in the boar population. © 2015 Stichting International Foundation for Animal Genetics.
USDA-ARS?s Scientific Manuscript database
Single-step Genomic Best Linear Unbiased Predictor (ssGBLUP) has become increasingly popular for whole-genome prediction (WGP) modeling as it utilizes any available pedigree and phenotypes on both genotyped and non-genotyped individuals. The WGP accuracy of ssGBLUP has been demonstrated to be greate...
Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C
2014-06-01
Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection.
Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C
2014-01-01
Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection. PMID:24518889
Optimization of multi-environment trials for genomic selection based on crop models.
Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J
2017-08-01
We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Dias, Kaio Olímpio Das Graças; Gezan, Salvador Alejandro; Guimarães, Claudia Teixeira; Nazarian, Alireza; da Costa E Silva, Luciano; Parentoni, Sidney Netto; de Oliveira Guimarães, Paulo Evaristo; de Oliveira Anoni, Carina; Pádua, José Maria Villela; de Oliveira Pinto, Marcos; Noda, Roberto Willians; Ribeiro, Carlos Alexandre Gomes; de Magalhães, Jurandir Vieira; Garcia, Antonio Augusto Franco; de Souza, João Cândido; Guimarães, Lauro José Moreira; Pastina, Maria Marta
2018-07-01
Breeding for drought tolerance is a challenging task that requires costly, extensive, and precise phenotyping. Genomic selection (GS) can be used to maximize selection efficiency and the genetic gains in maize (Zea mays L.) breeding programs for drought tolerance. Here, we evaluated the accuracy of genomic selection (GS) using additive (A) and additive + dominance (AD) models to predict the performance of untested maize single-cross hybrids for drought tolerance in multi-environment trials. Phenotypic data of five drought tolerance traits were measured in 308 hybrids along eight trials under water-stressed (WS) and well-watered (WW) conditions over two years and two locations in Brazil. Hybrids' genotypes were inferred based on their parents' genotypes (inbred lines) using single-nucleotide polymorphism markers obtained via genotyping-by-sequencing. GS analyses were performed using genomic best linear unbiased prediction by fitting a factor analytic (FA) multiplicative mixed model. Two cross-validation (CV) schemes were tested: CV1 and CV2. The FA framework allowed for investigating the stability of additive and dominance effects across environments, as well as the additive-by-environment and the dominance-by-environment interactions, with interesting applications for parental and hybrid selection. Results showed differences in the predictive accuracy between A and AD models, using both CV1 and CV2, for the five traits in both water conditions. For grain yield (GY) under WS and using CV1, the AD model doubled the predictive accuracy in comparison to the A model. Through CV2, GS models benefit from borrowing information of correlated trials, resulting in an increase of 40% and 9% in the predictive accuracy of GY under WS for A and AD models, respectively. These results highlight the importance of multi-environment trial analyses using GS models that incorporate additive and dominance effects for genomic predictions of GY under drought in maize single-cross hybrids.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Conditional Selection of Genomic Alterations Dictates Cancer Evolution and Oncogenic Dependencies.
Mina, Marco; Raynaud, Franck; Tavernari, Daniele; Battistello, Elena; Sungalee, Stephanie; Saghafinia, Sadegh; Laessle, Titouan; Sanchez-Vega, Francisco; Schultz, Nikolaus; Oricchio, Elisa; Ciriello, Giovanni
2017-08-14
Cancer evolves through the emergence and selection of molecular alterations. Cancer genome profiling has revealed that specific events are more or less likely to be co-selected, suggesting that the selection of one event depends on the others. However, the nature of these evolutionary dependencies and their impact remain unclear. Here, we designed SELECT, an algorithmic approach to systematically identify evolutionary dependencies from alteration patterns. By analyzing 6,456 genomes from multiple tumor types, we constructed a map of oncogenic dependencies associated with cellular pathways, transcriptional readouts, and therapeutic response. Finally, modeling of cancer evolution shows that alteration dependencies emerge only under conditional selection. These results provide a framework for the design of strategies to predict cancer progression and therapeutic response. Copyright © 2017 Elsevier Inc. All rights reserved.
Studying the genetic basis of speciation in high gene flow marine invertebrates
2016-01-01
A growing number of genes responsible for reproductive incompatibilities between species (barrier loci) exhibit the signals of positive selection. However, the possibility that genes experiencing positive selection diverge early in speciation and commonly cause reproductive incompatibilities has not been systematically investigated on a genome-wide scale. Here, I outline a research program for studying the genetic basis of speciation in broadcast spawning marine invertebrates that uses a priori genome-wide information on a large, unbiased sample of genes tested for positive selection. A targeted sequence capture approach is proposed that scores single-nucleotide polymorphisms (SNPs) in widely separated species populations at an early stage of allopatric divergence. The targeted capture of both coding and non-coding sequences enables SNPs to be characterized at known locations across the genome and at genes with known selective or neutral histories. The neutral coding and non-coding SNPs provide robust background distributions for identifying FST-outliers within genes that can, in principle, identify specific mutations experiencing diversifying selection. If natural hybridization occurs between species, the neutral coding and non-coding SNPs can provide a neutral admixture model for genomic clines analyses aimed at finding genes exhibiting strong blocks to introgression. Strongylocentrotid sea urchins are used as a model system to outline the approach but it can be used for any group that has a complete reference genome available. PMID:29491951
García-Ruiz, Adriana; Cole, John B; VanRaden, Paul M; Wiggans, George R; Ruiz-López, Felipe J; Van Tassell, Curtis P
2016-07-12
Seven years after the introduction of genomic selection in the United States, it is now possible to evaluate the impact of this technology on the population. Selection differential(s) (SD) and generation interval(s) (GI) were characterized in a four-path selection model that included sire(s) of bulls (SB), sire(s) of cows (SC), dam(s) of bulls (DB), and dam(s) of cows (DC). Changes in SD over time were estimated for milk, fat, and protein yield; somatic cell score (SCS); productive life (PL); and daughter pregnancy rate (DPR) for the Holstein breed. In the period following implementation of genomic selection, dramatic reductions were seen in GI, especially the SB and SC paths. The SB GI reduced from ∼7 y to less than 2.5 y, and the DB GI fell from about 4 y to nearly 2.5 y. SD were relatively stable for yield traits, although modest gains were noted in recent years. The most dramatic response to genomic selection was observed for the lowly heritable traits DPR, PL, and SCS. Genetic trends changed from close to zero to large and favorable, resulting in rapid genetic improvement in fertility, lifespan, and health in a breed where these traits eroded over time. These results clearly demonstrate the positive impact of genomic selection in US dairy cattle, even though this technology has only been in use for a short time. Based on the four-path selection model, rates of genetic gain per year increased from ∼50-100% for yield traits and from threefold to fourfold for lowly heritable traits.
García-Ruiz, Adriana; Cole, John B.; VanRaden, Paul M.; Wiggans, George R.; Ruiz-López, Felipe J.; Van Tassell, Curtis P.
2016-01-01
Seven years after the introduction of genomic selection in the United States, it is now possible to evaluate the impact of this technology on the population. Selection differential(s) (SD) and generation interval(s) (GI) were characterized in a four-path selection model that included sire(s) of bulls (SB), sire(s) of cows (SC), dam(s) of bulls (DB), and dam(s) of cows (DC). Changes in SD over time were estimated for milk, fat, and protein yield; somatic cell score (SCS); productive life (PL); and daughter pregnancy rate (DPR) for the Holstein breed. In the period following implementation of genomic selection, dramatic reductions were seen in GI, especially the SB and SC paths. The SB GI reduced from ∼7 y to less than 2.5 y, and the DB GI fell from about 4 y to nearly 2.5 y. SD were relatively stable for yield traits, although modest gains were noted in recent years. The most dramatic response to genomic selection was observed for the lowly heritable traits DPR, PL, and SCS. Genetic trends changed from close to zero to large and favorable, resulting in rapid genetic improvement in fertility, lifespan, and health in a breed where these traits eroded over time. These results clearly demonstrate the positive impact of genomic selection in US dairy cattle, even though this technology has only been in use for a short time. Based on the four-path selection model, rates of genetic gain per year increased from ∼50–100% for yield traits and from threefold to fourfold for lowly heritable traits. PMID:27354521
Genomic selection in sugar beet breeding populations.
Würschum, Tobias; Reif, Jochen C; Kraft, Thomas; Janssen, Geert; Zhao, Yusheng
2013-09-18
Genomic selection exploits dense genome-wide marker data to predict breeding values. In this study we used a large sugar beet population of 924 lines representing different germplasm types present in breeding populations: unselected segregating families and diverse lines from more advanced stages of selection. All lines have been intensively phenotyped in multi-location field trials for six agronomically important traits and genotyped with 677 SNP markers. We used ridge regression best linear unbiased prediction in combination with fivefold cross-validation and obtained high prediction accuracies for all except one trait. In addition, we investigated whether a calibration developed based on a training population composed of diverse lines is suited to predict the phenotypic performance within families. Our results show that the prediction accuracy is lower than that obtained within the diverse set of lines, but comparable to that obtained by cross-validation within the respective families. The results presented in this study suggest that a training population derived from intensively phenotyped and genotyped diverse lines from a breeding program does hold potential to build up robust calibration models for genomic selection. Taken together, our results indicate that genomic selection is a valuable tool and can thus complement the genomics toolbox in sugar beet breeding.
Jiang, Y; Zhao, Y; Rodemann, B; Plieske, J; Kollers, S; Korzun, V; Ebmeyer, E; Argillier, O; Hinze, M; Ling, J; Röder, M S; Ganal, M W; Mette, M F; Reif, J C
2015-03-01
Genome-wide mapping approaches in diverse populations are powerful tools to unravel the genetic architecture of complex traits. The main goals of our study were to investigate the potential and limits to unravel the genetic architecture and to identify the factors determining the accuracy of prediction of the genotypic variation of Fusarium head blight (FHB) resistance in wheat (Triticum aestivum L.) based on data collected with a diverse panel of 372 European varieties. The wheat lines were phenotyped in multi-location field trials for FHB resistance and genotyped with 782 simple sequence repeat (SSR) markers, and 9k and 90k single-nucleotide polymorphism (SNP) arrays. We applied genome-wide association mapping in combination with fivefold cross-validations and observed surprisingly high accuracies of prediction for marker-assisted selection based on the detected quantitative trait loci (QTLs). Using a random sample of markers not selected for marker-trait associations revealed only a slight decrease in prediction accuracy compared with marker-based selection exploiting the QTL information. The same picture was confirmed in a simulation study, suggesting that relatedness is a main driver of the accuracy of prediction in marker-assisted selection of FHB resistance. When the accuracy of prediction of three genomic selection models was contrasted for the three marker data sets, no significant differences in accuracies among marker platforms and genomic selection models were observed. Marker density impacted the accuracy of prediction only marginally. Consequently, genomic selection of FHB resistance can be implemented most cost-efficiently based on low- to medium-density SNP arrays.
Genomic selection for slaughter age in pigs using the Cox frailty model.
Santos, V S; Martins Filho, S; Resende, M D V; Azevedo, C F; Lopes, P S; Guimarães, S E F; Glória, L S; Silva, F F
2015-10-19
The aim of this study was to compare genomic selection methodologies using a linear mixed model and the Cox survival model. We used data from an F2 population of pigs, in which the response variable was the time in days from birth to the culling of the animal and the covariates were 238 markers [237 single nucleotide polymorphism (SNP) plus the halothane gene]. The data were corrected for fixed effects, and the accuracy of the method was determined based on the correlation of the ranks of predicted genomic breeding values (GBVs) in both models with the corrected phenotypic values. The analysis was repeated with a subset of SNP markers with largest absolute effects. The results were in agreement with the GBV prediction and the estimation of marker effects for both models for uncensored data and for normality. However, when considering censored data, the Cox model with a normal random effect (S1) was more appropriate. Since there was no agreement between the linear mixed model and the imputed data (L2) for the prediction of genomic values and the estimation of marker effects, the model S1 was considered superior as it took into account the latent variable and the censored data. Marker selection increased correlations between the ranks of predicted GBVs by the linear and Cox frailty models and the corrected phenotypic values, and 120 markers were required to increase the predictive ability for the characteristic analyzed.
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.
van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F
2015-09-17
In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.
GAPIT: genome association and prediction integrated tool.
Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu
2012-09-15
Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.
Recent advances in understanding the role of nutrition in human genome evolution.
Ye, Kaixiong; Gu, Zhenglong
2011-11-01
Dietary transitions in human history have been suggested to play important roles in the evolution of mankind. Genetic variations caused by adaptation to diet during human evolution could have important health consequences in current society. The advance of sequencing technologies and the rapid accumulation of genome information provide an unprecedented opportunity to comprehensively characterize genetic variations in human populations and unravel the genetic basis of human evolution. Series of selection detection methods, based on various theoretical models and exploiting different aspects of selection signatures, have been developed. Their applications at the species and population levels have respectively led to the identification of human specific selection events that distinguish human from nonhuman primates and local adaptation events that contribute to human diversity. Scrutiny of candidate genes has revealed paradigms of adaptations to specific nutritional components and genome-wide selection scans have verified the prevalence of diet-related selection events and provided many more candidates awaiting further investigation. Understanding the role of diet in human evolution is fundamental for the development of evidence-based, genome-informed nutritional practices in the era of personal genomics.
2014-01-01
Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Beaulieu, Jean; Doerksen, Trevor K; MacKay, John; Rainville, André; Bousquet, Jean
2014-12-02
Genomic selection (GS) may improve selection response over conventional pedigree-based selection if markers capture more detailed information than pedigrees in recently domesticated tree species and/or make it more cost effective. Genomic prediction accuracies using 1748 trees and 6932 SNPs representative of as many distinct gene loci were determined for growth and wood traits in white spruce, within and between environments and breeding groups (BG), each with an effective size of Ne ≈ 20. Marker subsets were also tested. Model fits and/or cross-validation (CV) prediction accuracies for ridge regression (RR) and the least absolute shrinkage and selection operator models approached those of pedigree-based models. With strong relatedness between CV sets, prediction accuracies for RR within environment and BG were high for wood (r = 0.71-0.79) and moderately high for growth (r = 0.52-0.69) traits, in line with trends in heritabilities. For both classes of traits, these accuracies achieved between 83% and 92% of those obtained with phenotypes and pedigree information. Prediction into untested environments remained moderately high for wood (r ≥ 0.61) but dropped significantly for growth (r ≥ 0.24) traits, emphasizing the need to phenotype in all test environments and model genotype-by-environment interactions for growth traits. Removing relatedness between CV sets sharply decreased prediction accuracies for all traits and subpopulations, falling near zero between BGs with no known shared ancestry. For marker subsets, similar patterns were observed but with lower prediction accuracies. Given the need for high relatedness between CV sets to obtain good prediction accuracies, we recommend to build GS models for prediction within the same breeding population only. Breeding groups could be merged to build genomic prediction models as long as the total effective population size does not exceed 50 individuals in order to obtain high prediction accuracy such as that obtained in the present study. A number of markers limited to a few hundred would not negatively impact prediction accuracies, but these could decrease more rapidly over generations. The most promising short-term approach for genomic selection would likely be the selection of superior individuals within large full-sib families vegetatively propagated to implement multiclonal forestry.
Will genomic selection be a practical method for plant breeding?
Nakaya, Akihiro; Isobe, Sachiko N
2012-11-01
Genomic selection or genome-wide selection (GS) has been highlighted as a new approach for marker-assisted selection (MAS) in recent years. GS is a form of MAS that selects favourable individuals based on genomic estimated breeding values. Previous studies have suggested the utility of GS, especially for capturing small-effect quantitative trait loci, but GS has not become a popular methodology in the field of plant breeding, possibly because there is insufficient information available on GS for practical use. In this review, GS is discussed from a practical breeding viewpoint. Statistical approaches employed in GS are briefly described, before the recent progress in GS studies is surveyed. GS practices in plant breeding are then reviewed before future prospects are discussed. Statistical concepts used in GS are discussed with genetic models and variance decomposition, heritability, breeding value and linear model. Recent progress in GS studies is reviewed with a focus on empirical studies. For the practice of GS in plant breeding, several specific points are discussed including linkage disequilibrium, feature of populations and genotyped markers and breeding scheme. Currently, GS is not perfect, but it is a potent, attractive and valuable approach for plant breeding. This method will be integrated into many practical breeding programmes in the near future with further advances and the maturing of its theory.
Laurenson, Yan C S M; Kyriazakis, Ilias; Bishop, Stephen C
2013-10-18
Estimated breeding values (EBV) for faecal egg count (FEC) and genetic markers for host resistance to nematodes may be used to identify resistant animals for selective breeding programmes. Similarly, targeted selective treatment (TST) requires the ability to identify the animals that will benefit most from anthelmintic treatment. A mathematical model was used to combine the concepts and evaluate the potential of using genetic-based methods to identify animals for a TST regime. EBVs obtained by genomic prediction were predicted to be the best determinant criterion for TST in terms of the impact on average empty body weight and average FEC, whereas pedigree-based EBVs for FEC were predicted to be marginally worse than using phenotypic FEC as a determinant criterion. Whilst each method has financial implications, if the identification of host resistance is incorporated into a wider genomic selection indices or selective breeding programmes, then genetic or genomic information may be plausibly included in TST regimes. Copyright © 2013 Elsevier B.V. All rights reserved.
Terekhanova, Nadezhda V.; Logacheva, Maria D.; Penin, Aleksey A.; Neretina, Tatiana V.; Barmintseva, Anna E.; Bazykin, Georgii A.; Kondrashov, Alexey S.; Mugue, Nikolai S.
2014-01-01
Adaptation is driven by natural selection; however, many adaptations are caused by weak selection acting over large timescales, complicating its study. Therefore, it is rarely possible to study selection comprehensively in natural environments. The threespine stickleback (Gasterosteus aculeatus) is a well-studied model organism with a short generation time, small genome size, and many genetic and genomic tools available. Within this originally marine species, populations have recurrently adapted to freshwater all over its range. This evolution involved extensive parallelism: pre-existing alleles that adapt sticklebacks to freshwater habitats, but are also present at low frequencies in marine populations, have been recruited repeatedly. While a number of genomic regions responsible for this adaptation have been identified, the details of selection remain poorly understood. Using whole-genome resequencing, we compare pooled genomic samples from marine and freshwater populations of the White Sea basin, and identify 19 short genomic regions that are highly divergent between them, including three known inversions. 17 of these regions overlap protein-coding genes, including a number of genes with predicted functions that are relevant for adaptation to the freshwater environment. We then analyze four additional independently derived young freshwater populations of known ages, two natural and two artificially established, and use the observed shifts of allelic frequencies to estimate the strength of positive selection. Adaptation turns out to be quite rapid, indicating strong selection acting simultaneously at multiple regions of the genome, with selection coefficients of up to 0.27. High divergence between marine and freshwater genotypes, lack of reduction in polymorphism in regions responsible for adaptation, and high frequencies of freshwater alleles observed even in young freshwater populations are all consistent with rapid assembly of G. aculeatus freshwater genotypes from pre-existing genomic regions of adaptive variation, with strong selection that favors this assembly acting simultaneously at multiple loci. PMID:25299485
Terekhanova, Nadezhda V; Logacheva, Maria D; Penin, Aleksey A; Neretina, Tatiana V; Barmintseva, Anna E; Bazykin, Georgii A; Kondrashov, Alexey S; Mugue, Nikolai S
2014-10-01
Adaptation is driven by natural selection; however, many adaptations are caused by weak selection acting over large timescales, complicating its study. Therefore, it is rarely possible to study selection comprehensively in natural environments. The threespine stickleback (Gasterosteus aculeatus) is a well-studied model organism with a short generation time, small genome size, and many genetic and genomic tools available. Within this originally marine species, populations have recurrently adapted to freshwater all over its range. This evolution involved extensive parallelism: pre-existing alleles that adapt sticklebacks to freshwater habitats, but are also present at low frequencies in marine populations, have been recruited repeatedly. While a number of genomic regions responsible for this adaptation have been identified, the details of selection remain poorly understood. Using whole-genome resequencing, we compare pooled genomic samples from marine and freshwater populations of the White Sea basin, and identify 19 short genomic regions that are highly divergent between them, including three known inversions. 17 of these regions overlap protein-coding genes, including a number of genes with predicted functions that are relevant for adaptation to the freshwater environment. We then analyze four additional independently derived young freshwater populations of known ages, two natural and two artificially established, and use the observed shifts of allelic frequencies to estimate the strength of positive selection. Adaptation turns out to be quite rapid, indicating strong selection acting simultaneously at multiple regions of the genome, with selection coefficients of up to 0.27. High divergence between marine and freshwater genotypes, lack of reduction in polymorphism in regions responsible for adaptation, and high frequencies of freshwater alleles observed even in young freshwater populations are all consistent with rapid assembly of G. aculeatus freshwater genotypes from pre-existing genomic regions of adaptive variation, with strong selection that favors this assembly acting simultaneously at multiple loci.
Evolutionary signals of selection on cognition from the great tit genome and methylome
Laine, Veronika N.; Gossmann, Toni I.; Schachtschneider, Kyle M.; Garroway, Colin J.; Madsen, Ole; Verhoeven, Koen J. F.; de Jager, Victor; Megens, Hendrik-Jan; Warren, Wesley C.; Minx, Patrick; Crooijmans, Richard P. M. A.; Corcoran, Pádraic; Adriaensen, Frank; Belda, Eduardo; Bushuev, Andrey; Cichon, Mariusz; Charmantier, Anne; Dingemanse, Niels; Doligez, Blandine; Eeva, Tapio; Erikstad, Kjell Einar; Fedorov, Slava; Hau, Michaela; Hille, Sabine; Hinde, Camilla; Kempenaers, Bart; Kerimov, Anvar; Krist, Milos; Mand, Raivo; Matthysen, Erik; Nager, Reudi; Norte, Claudia; Orell, Markku; Richner, Heinz; Slagsvold, Tore; Tilgar, Vallo; Tinbergen, Joost; Torok, Janos; Tschirren, Barbara; Yuta, Tera; Sheldon, Ben C.; Slate, Jon; Zeng, Kai; van Oers, Kees; Visser, Marcel E.; Groenen, Martien A. M.
2016-01-01
For over 50 years, the great tit (Parus major) has been a model species for research in evolutionary, ecological and behavioural research; in particular, learning and cognition have been intensively studied. Here, to provide further insight into the molecular mechanisms behind these important traits, we de novo assemble a great tit reference genome and whole-genome re-sequence another 29 individuals from across Europe. We show an overrepresentation of genes related to neuronal functions, learning and cognition in regions under positive selection, as well as increased CpG methylation in these regions. In addition, great tit neuronal non-CpG methylation patterns are very similar to those observed in mammals, suggesting a universal role in neuronal epigenetic regulation which can affect learning-, memory- and experience-induced plasticity. The high-quality great tit genome assembly will play an instrumental role in furthering the integration of ecological, evolutionary, behavioural and genomic approaches in this model species. PMID:26805030
USDA-ARS?s Scientific Manuscript database
Background Several studies have examined the accuracy of genomic selection both within and across purebred beef or dairy populations. However, the accuracy of direct genomic breeding values (DGVs) has been less well studied in crossbred or admixed cattle populations. We used a population of 3,240 cr...
Genotyping by sequencing for genomic prediction in a soybean breeding population.
Jarquín, Diego; Kocak, Kyle; Posadas, Luis; Hyma, Katie; Jedlicka, Joseph; Graef, George; Lorenz, Aaron
2014-08-29
Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations. Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller. Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.
Genomic selection in sugar beet breeding populations
2013-01-01
Background Genomic selection exploits dense genome-wide marker data to predict breeding values. In this study we used a large sugar beet population of 924 lines representing different germplasm types present in breeding populations: unselected segregating families and diverse lines from more advanced stages of selection. All lines have been intensively phenotyped in multi-location field trials for six agronomically important traits and genotyped with 677 SNP markers. Results We used ridge regression best linear unbiased prediction in combination with fivefold cross-validation and obtained high prediction accuracies for all except one trait. In addition, we investigated whether a calibration developed based on a training population composed of diverse lines is suited to predict the phenotypic performance within families. Our results show that the prediction accuracy is lower than that obtained within the diverse set of lines, but comparable to that obtained by cross-validation within the respective families. Conclusions The results presented in this study suggest that a training population derived from intensively phenotyped and genotyped diverse lines from a breeding program does hold potential to build up robust calibration models for genomic selection. Taken together, our results indicate that genomic selection is a valuable tool and can thus complement the genomics toolbox in sugar beet breeding. PMID:24047500
Staubach, Fabian; Lorenc, Anna; Messer, Philipp W.; Tang, Kun; Petrov, Dmitri A.; Tautz, Diethard
2012-01-01
General parameters of selection, such as the frequency and strength of positive selection in natural populations or the role of introgression, are still insufficiently understood. The house mouse (Mus musculus) is a particularly well-suited model system to approach such questions, since it has a defined history of splits into subspecies and populations and since extensive genome information is available. We have used high-density single-nucleotide polymorphism (SNP) typing arrays to assess genomic patterns of positive selection and introgression of alleles in two natural populations of each of the subspecies M. m. domesticus and M. m. musculus. Applying different statistical procedures, we find a large number of regions subject to apparent selective sweeps, indicating frequent positive selection on rare alleles or novel mutations. Genes in the regions include well-studied imprinted loci (e.g. Plagl1/Zac1), homologues of human genes involved in adaptations (e.g. alpha-amylase genes) or in genetic diseases (e.g. Huntingtin and Parkin). Haplotype matching between the two subspecies reveals a large number of haplotypes that show patterns of introgression from specific populations of the respective other subspecies, with at least 10% of the genome being affected by partial or full introgression. Using neutral simulations for comparison, we find that the size and the fraction of introgressed haplotypes are not compatible with a pure migration or incomplete lineage sorting model. Hence, it appears that introgressed haplotypes can rise in frequency due to positive selection and thus can contribute to the adaptive genomic landscape of natural populations. Our data support the notion that natural genomes are subject to complex adaptive processes, including the introgression of haplotypes from other differentiated populations or species at a larger scale than previously assumed for animals. This implies that some of the admixture found in inbred strains of mice may also have a natural origin. PMID:22956910
SweeD: likelihood-based detection of selective sweeps in thousands of genomes.
Pavlidis, Pavlos; Živkovic, Daniel; Stamatakis, Alexandros; Alachiotis, Nikolaos
2013-09-01
The advent of modern DNA sequencing technology is the driving force in obtaining complete intra-specific genomes that can be used to detect loci that have been subject to positive selection in the recent past. Based on selective sweep theory, beneficial loci can be detected by examining the single nucleotide polymorphism patterns in intraspecific genome alignments. In the last decade, a plethora of algorithms for identifying selective sweeps have been developed. However, the majority of these algorithms have not been designed for analyzing whole-genome data. We present SweeD (Sweep Detector), an open-source tool for the rapid detection of selective sweeps in whole genomes. It analyzes site frequency spectra and represents a substantial extension of the widely used SweepFinder program. The sequential version of SweeD is up to 22 times faster than SweepFinder and, more importantly, is able to analyze thousands of sequences. We also provide a parallel implementation of SweeD for multi-core processors. Furthermore, we implemented a checkpointing mechanism that allows to deploy SweeD on cluster systems with queue execution time restrictions, as well as to resume long-running analyses after processor failures. In addition, the user can specify various demographic models via the command-line to calculate their theoretically expected site frequency spectra. Therefore, (in contrast to SweepFinder) the neutral site frequencies can optionally be directly calculated from a given demographic model. We show that an increase of sample size results in more precise detection of positive selection. Thus, the ability to analyze substantially larger sample sizes by using SweeD leads to more accurate sweep detection. We validate SweeD via simulations and by scanning the first chromosome from the 1000 human Genomes project for selective sweeps. We compare SweeD results with results from a linkage-disequilibrium-based approach and identify common outliers.
Genomic selection using beef commercial carcass phenotypes.
Todd, D L; Roughsedge, T; Woolliams, J A
2014-03-01
In this study, an industry terminal breeding goal was used in a deterministic simulation, using selection index methodology, to predict genetic gain in a beef population modelled on the UK pedigree Limousin, when using genomic selection (GS) and incorporating phenotype information from novel commercial carcass traits. The effect of genotype-environment interaction was investigated by including the model variations of the genetic correlation between purebred and commercial cross-bred performance (ρX). Three genomic scenarios were considered: (1) genomic breeding values (GBV)+estimated breeding values (EBV) for existing selection traits; (2) GBV for three novel commercial carcass traits+EBV in existing traits; and (3) GBV for novel and existing traits plus EBV for existing traits. Each of the three scenarios was simulated for a range of training population (TP) sizes and with three values of ρX. Scenarios 2 and 3 predicted substantially higher percentage increases over current selection than Scenario 1. A TP of 2000 sires, each with 20 commercial progeny with carcass phenotypes, and assuming a ρX of 0.7, is predicted to increase gain by 40% over current selection in Scenario 3. The percentage increase in gain over current selection increased with decreasing ρX; however, the effect of varying ρX was reduced at high TP sizes for Scenarios 2 and 3. A further non-genomic scenario (4) was considered simulating a conventional population-wide progeny test using EBV only. With 20 commercial cross-bred progenies per sire, similar gain was predicted to Scenario 3 with TP=5000 and ρX=1.0. The range of increases in genetic gain predicted for terminal traits when using GS are of similar magnitude to those observed after the implementation of BLUP technology in the United Kingdom. It is concluded that implementation of GS in a terminal sire breeding goal, using purebred phenotypes alone, will be sub-optimal compared with the inclusion of novel commercial carcass phenotypes in genomic evaluations.
Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L
2016-12-01
Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.
Spindel, J E; Begum, H; Akdemir, D; Collard, B; Redoña, E; Jannink, J-L; McCouch, S
2016-01-01
To address the multiple challenges to food security posed by global climate change, population growth and rising incomes, plant breeders are developing new crop varieties that can enhance both agricultural productivity and environmental sustainability. Current breeding practices, however, are unable to keep pace with demand. Genomic selection (GS) is a new technique that helps accelerate the rate of genetic gain in breeding by using whole-genome data to predict the breeding value of offspring. Here, we describe a new GS model that combines RR-BLUP with markers fit as fixed effects selected from the results of a genome-wide-association study (GWAS) on the RR-BLUP training data. We term this model GS + de novo GWAS. In a breeding population of tropical rice, GS + de novo GWAS outperformed six other models for a variety of traits and in multiple environments. On the basis of these results, we propose an extended, two-part breeding design that can be used to efficiently integrate novel variation into elite breeding populations, thus expanding genetic diversity and enhancing the potential for sustainable productivity gains. PMID:26860200
Ishiyama, Izumi; Tanzawa, Tetsuro; Watanabe, Maiko; Maeda, Tadahiko; Muto, Kaori; Tamakoshi, Akiko; Nagai, Akiko; Yamagata, Zentaro
2012-05-01
This study aimed to assess public attitudes in Japan to the promotion of genomic selection in crop studies and to examine associated factors. We analysed data from a nationwide opinion survey. A total of 4,000 people were selected from the Japanese general population by a stratified two-phase sampling method, and 2,171 people participated by post; this survey asked about the pros and cons of crop-related genomic studies promotion, examined people's scientific literacy in genomics, and investigated factors thought to be related to genomic literacy and attitude. The relationships were examined using logistic regression models stratified by gender. Survey results showed that 50.0% of respondents approved of the promotion of crop-related genomic studies, while 6.7% disapproved. No correlation was found between literacy and attitude towards promotion. Trust in experts, belief in science, an interest in genomic studies and willingness to purchase new products correlated with a positive attitude towards crop-related genomic studies.
Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations
Bendall, Matthew L.; Stevens, Sarah L.R.; Chan, Leong-Keat; ...
2016-01-08
Multiple models describe the formation and evolution of distinct microbial phylogenetic groups. These evolutionary models make different predictions regarding how adaptive alleles spread through populations and how genetic diversity is maintained. Processes predicted by competing evolutionary models, for example, genome-wide selective sweeps vs gene-specific sweeps, could be captured in natural populations using time-series metagenomics if the approach were applied over a sufficiently long time frame. Direct observations of either process would help resolve how distinct microbial groups evolve. Using a 9-year metagenomic study of a freshwater lake (2005–2013), we explore changes in single-nucleotide polymorphism (SNP) frequencies and patterns of genemore » gain and loss in 30 bacterial populations. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied by >1000-fold among populations. SNP allele frequencies also changed dramatically over time within some populations. Interestingly, nearly all SNP variants were slowly purged over several years from one population of green sulfur bacteria, while at the same time multiple genes either swept through or were lost from this population. Furthermore, these patterns were consistent with a genome-wide selective sweep in progress, a process predicted by the ‘ecotype model’ of speciation but not previously observed in nature. In contrast, other populations contained large, SNP-free genomic regions that appear to have swept independently through the populations prior to the study without purging diversity elsewhere in the genome. Finally, evidence for both genome-wide and gene-specific sweeps suggests that different models of bacterial speciation may apply to different populations coexisting in the same environment.« less
Hsieh, PingHsun; Veeramah, Krishna R.; Lachance, Joseph; Tishkoff, Sarah A.; Wall, Jeffrey D.; Hammer, Michael F.; Gutenkunst, Ryan N.
2016-01-01
African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 yr ago. We also find that bidirectional asymmetric gene flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors. PMID:26888263
Gompert, Zachariah; Lucas, Lauren K; Nice, Chris C; Fordyce, James A; Forister, Matthew L; Buerkle, C Alex
2012-07-01
Speciation is the process by which reproductively isolated lineages arise, and is one of the fundamental means by which the diversity of life increases. Whereas numerous studies have documented an association between ecological divergence and reproductive isolation, relatively little is known about the role of natural selection in genome divergence during the process of speciation. Here, we use genome-wide DNA sequences and Bayesian models to test the hypothesis that loci under divergent selection between two butterfly species (Lycaeides idas and L. melissa) also affect fitness in an admixed population. Locus-specific measures of genetic differentiation between L. idas and L. melissa and genomic introgression in hybrids varied across the genome. The most differentiated genetic regions were characterized by elevated L. idas ancestry in the admixed population, which occurs in L. idas-like habitat, consistent with the hypothesis that local adaptation contributes to speciation. Moreover, locus-specific measures of genetic differentiation (a metric of divergent selection) were positively associated with extreme genomic introgression (a metric of hybrid fitness). Interestingly, concordance of differentiation and introgression was only partial. We discuss multiple, complementary explanations for this partial concordance. © 2012 The Author(s).
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement.
Sun, Jin; Rutkoski, Jessica E; Poland, Jesse A; Crossa, José; Jannink, Jean-Luc; Sorrells, Mark E
2017-07-01
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat ( L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect selection for grain yield. In this study, we evaluated three statistical models, simple repeatability (SR), multitrait (MT), and random regression (RR), for the longitudinal data of secondary traits and compared the impact of the proposed models for secondary traits on their predictive abilities for grain yield. Grain yield and secondary traits, canopy temperature (CT) and normalized difference vegetation index (NDVI), were collected in five diverse environments for 557 wheat lines with available pedigree and genomic information. A two-stage analysis was applied for pedigree and genomic selection (GS). First, secondary traits were fitted by SR, MT, or RR models, separately, within each environment. Then, best linear unbiased predictions (BLUPs) of secondary traits from the above models were used in the multivariate prediction models to compare predictive abilities for grain yield. Predictive ability was substantially improved by 70%, on average, from multivariate pedigree and genomic models when including secondary traits in both training and test populations. Additionally, (i) predictive abilities slightly varied for MT, RR, or SR models in this data set, (ii) results indicated that including BLUPs of secondary traits from the MT model was the best in severe drought, and (iii) the RR model was slightly better than SR and MT models under drought environment. Copyright © 2017 Crop Science Society of America.
Miyashita, Shuhei; Ishibashi, Kazuhiro; Kishino, Hirohisa; Ishikawa, Masayuki
2015-01-01
Recent studies on evolutionarily distant viral groups have shown that the number of viral genomes that establish cell infection after cell-to-cell transmission is unexpectedly small (1–20 genomes). This aspect of viral infection appears to be important for the adaptation and survival of viruses. To clarify how the number of viral genomes that establish cell infection is determined, we developed a simulation model of cell infection for tomato mosaic virus (ToMV), a positive-strand RNA virus. The model showed that stochastic processes that govern the replication or degradation of individual genomes result in the infection by a small number of genomes, while a large number of infectious genomes are introduced in the cell. It also predicted two interesting characteristics regarding cell infection patterns: stochastic variation among cells in the number of viral genomes that establish infection and stochastic inequality in the accumulation of their progenies in each cell. Both characteristics were validated experimentally by inoculating tobacco cells with a library of nucleotide sequence–tagged ToMV and analyzing the viral genomes that accumulated in each cell using a high-throughput sequencer. An additional simulation model revealed that these two characteristics enhance selection during tissue infection. The cell infection model also predicted a mechanism that enhances selection at the cellular level: a small difference in the replication abilities of coinfected variants results in a large difference in individual accumulation via the multiple-round formation of the replication complex (i.e., the replication machinery). Importantly, this predicted effect was observed in vivo. The cell infection model was robust to changes in the parameter values, suggesting that other viruses could adopt similar adaptation mechanisms. Taken together, these data reveal a comprehensive picture of viral infection processes including replication, cell-to-cell transmission, and evolution, which are based on the stochastic behavior of the viral genome molecules in each cell. PMID:25781391
NASA Astrophysics Data System (ADS)
Wang, Quanchao; Yu, Yang; Li, Fuhua; Zhang, Xiaojun; Xiang, Jianhai
2017-09-01
Genomic selection (GS) can be used to accelerate genetic improvement by shortening the selection interval. The successful application of GS depends largely on the accuracy of the prediction of genomic estimated breeding value (GEBV). This study is a first attempt to understand the practicality of GS in Litopenaeus vannamei and aims to evaluate models for GS on growth traits. The performance of GS models in L. vannamei was evaluated in a population consisting of 205 individuals, which were genotyped for 6 359 single nucleotide polymorphism (SNP) markers by specific length amplified fragment sequencing (SLAF-seq) and phenotyped for body length and body weight. Three GS models (RR-BLUP, BayesA, and Bayesian LASSO) were used to obtain the GEBV, and their predictive ability was assessed by the reliability of the GEBV and the bias of the predicted phenotypes. The mean reliability of the GEBVs for body length and body weight predicted by the different models was 0.296 and 0.411, respectively. For each trait, the performances of the three models were very similar to each other with respect to predictability. The regression coefficients estimated by the three models were close to one, suggesting near to zero bias for the predictions. Therefore, when GS was applied in a L. vannamei population for the studied scenarios, all three models appeared practicable. Further analyses suggested that improved estimation of the genomic prediction could be realized by increasing the size of the training population as well as the density of SNPs.
Campo, D; Lehmann, K; Fjeldsted, C; Souaiaia, T; Kao, J; Nuzhdin, S V
2013-10-01
The prevailing demographic model for Drosophila melanogaster suggests that the colonization of North America occurred very recently from a subset of European flies that rapidly expanded across the continent. This model implies a sudden population growth and range expansion consistent with very low or no population subdivision. As flies adapt to new environments, local adaptation events may be expected. To describe demographic and selective events during North American colonization, we have generated a data set of 35 individual whole-genome sequences from inbred lines of D. melanogaster from a west coast US population (Winters, California, USA) and compared them with a public genome data set from Raleigh (Raleigh, North Carolina, USA). We analysed nuclear and mitochondrial genomes and described levels of variation and divergence within and between these two North American D. melanogaster populations. Both populations exhibit negative values of Tajima's D across the genome, a common signature of demographic expansion. We also detected a low but significant level of genome-wide differentiation between the two populations, as well as multiple allele surfing events, which can be the result of gene drift in local subpopulations on the edge of an expansion wave. In contrast to this genome-wide pattern, we uncovered a 50-kilobase segment in chromosome arm 3L that showed all the hallmarks of a soft selective sweep in both populations. A comparison of allele frequencies within this divergent region among six populations from three continents allowed us to cluster these populations in two differentiated groups, providing evidence for the action of natural selection on a global scale. © 2013 John Wiley & Sons Ltd.
Chen, Ze-Hui; Zhang, Min; Lv, Feng-Hua; Ren, Xue; Li, Wen-Rong; Liu, Ming-Jun; Nam, Kiwoong; Bruford, Michael W; Li, Meng-Hua
2018-04-01
Analyses of genomic diversity along the X chromosome and of its correlation with autosomal diversity can facilitate understanding of evolutionary forces in shaping sex-linked genomic architecture. Strong selective sweeps and accelerated genetic drift on the X-chromosome have been inferred in primates and other model species, but no such insight has yet been gained in domestic animals compared with their wild relatives. Here, we analyzed X-chromosome variability in a large ovine data set, including a BeadChip array for 943 ewes from the world's sheep populations and 110 whole genomes of wild and domestic sheep. Analyzing whole-genome sequences, we observed a substantially reduced X-to-autosome diversity ratio (∼0.6) compared with the value expected under a neutral model (0.75). In particular, one large X-linked segment (43.05-79.25 Mb) was found to show extremely low diversity, most likely due to a high density of coding genes, featuring highly conserved regions. In general, we observed higher nucleotide diversity on the autosomes, but a flat diversity gradient in X-linked segments, as a function of increasing distance from the nearest genes, leading to a decreased X: autosome (X/A) diversity ratio and contrasting to the positive correlation detected in primates and other model animals. Our evidence suggests that accelerated genetic drift but reduced directional selection on X chromosome, as well as sex-biased demographic events, explain low X-chromosome diversity in sheep species. The distinct patterns of X-linked and X/A diversity we observed between Middle Eastern and non-Middle Eastern sheep populations can be explained by multiple migrations, selection, and admixture during the domestic sheep's recent postdomestication demographic expansion, coupled with natural selection for adaptation to new environments. In addition, we identify important novel genes involved in abnormal behavioral phenotypes, metabolism, and immunity, under selection on the sheep X-chromosome.
Chen, Ze-Hui; Zhang, Min; Lv, Feng-Hua; Ren, Xue; Li, Wen-Rong; Liu, Ming-Jun; Nam, Kiwoong; Bruford, Michael W; Li, Meng-Hua
2018-01-01
Abstract Analyses of genomic diversity along the X chromosome and of its correlation with autosomal diversity can facilitate understanding of evolutionary forces in shaping sex-linked genomic architecture. Strong selective sweeps and accelerated genetic drift on the X-chromosome have been inferred in primates and other model species, but no such insight has yet been gained in domestic animals compared with their wild relatives. Here, we analyzed X-chromosome variability in a large ovine data set, including a BeadChip array for 943 ewes from the world’s sheep populations and 110 whole genomes of wild and domestic sheep. Analyzing whole-genome sequences, we observed a substantially reduced X-to-autosome diversity ratio (∼0.6) compared with the value expected under a neutral model (0.75). In particular, one large X-linked segment (43.05–79.25 Mb) was found to show extremely low diversity, most likely due to a high density of coding genes, featuring highly conserved regions. In general, we observed higher nucleotide diversity on the autosomes, but a flat diversity gradient in X-linked segments, as a function of increasing distance from the nearest genes, leading to a decreased X: autosome (X/A) diversity ratio and contrasting to the positive correlation detected in primates and other model animals. Our evidence suggests that accelerated genetic drift but reduced directional selection on X chromosome, as well as sex-biased demographic events, explain low X-chromosome diversity in sheep species. The distinct patterns of X-linked and X/A diversity we observed between Middle Eastern and non-Middle Eastern sheep populations can be explained by multiple migrations, selection, and admixture during the domestic sheep’s recent postdomestication demographic expansion, coupled with natural selection for adaptation to new environments. In addition, we identify important novel genes involved in abnormal behavioral phenotypes, metabolism, and immunity, under selection on the sheep X-chromosome. PMID:29790980
2011-05-01
genome was determined and compared to simian and human herpesvirus genomes representing alpha-herpesvi- ruses, beta- herpesviruses and gamma-1 and...of JMRV Genome with Select Simian and Human Herpesvirus Genomes Showing Percent Nucleotide Sequence Identity Virus JMRV RRV KSHV HVS RhLCV EBV RhCMV...2 - Introduction Particular viruses, especially gama- herpesviruses , may act as a trigger of multiple sclerosis (MS) (Levin et
Zueva, Ksenia J.; Lumme, Jaakko; Veselov, Alexey E.; Kent, Matthew P.; Lien, Sigbjørn; Primmer, Craig R.
2014-01-01
Mechanisms of host-parasite co-adaptation have long been of interest in evolutionary biology; however, determining the genetic basis of parasite resistance has been challenging. Current advances in genome technologies provide new opportunities for obtaining a genome-scale view of the action of parasite-driven natural selection in wild populations and thus facilitate the search for specific genomic regions underlying inter-population differences in pathogen response. European populations of Atlantic salmon (Salmo salar L.) exhibit natural variance in susceptibility levels to the ectoparasite Gyrodactylus salaris Malmberg 1957, ranging from resistance to extreme susceptibility, and are therefore a good model for studying the evolution of virulence and resistance. However, distinguishing the molecular signatures of genetic drift and environment-associated selection in small populations such as land-locked Atlantic salmon populations presents a challenge, specifically in the search for pathogen-driven selection. We used a novel genome-scan analysis approach that enabled us to i) identify signals of selection in salmon populations affected by varying levels of genetic drift and ii) separate potentially selected loci into the categories of pathogen (G. salaris)-driven selection and selection acting upon other environmental characteristics. A total of 4631 single nucleotide polymorphisms (SNPs) were screened in Atlantic salmon from 12 different northern European populations. We identified three genomic regions potentially affected by parasite-driven selection, as well as three regions presumably affected by salinity-driven directional selection. Functional annotation of candidate SNPs is consistent with the role of the detected genomic regions in immune defence and, implicitly, in osmoregulation. These results provide new insights into the genetic basis of pathogen susceptibility in Atlantic salmon and will enable future searches for the specific genes involved. PMID:24670947
Zueva, Ksenia J; Lumme, Jaakko; Veselov, Alexey E; Kent, Matthew P; Lien, Sigbjørn; Primmer, Craig R
2014-01-01
Mechanisms of host-parasite co-adaptation have long been of interest in evolutionary biology; however, determining the genetic basis of parasite resistance has been challenging. Current advances in genome technologies provide new opportunities for obtaining a genome-scale view of the action of parasite-driven natural selection in wild populations and thus facilitate the search for specific genomic regions underlying inter-population differences in pathogen response. European populations of Atlantic salmon (Salmo salar L.) exhibit natural variance in susceptibility levels to the ectoparasite Gyrodactylus salaris Malmberg 1957, ranging from resistance to extreme susceptibility, and are therefore a good model for studying the evolution of virulence and resistance. However, distinguishing the molecular signatures of genetic drift and environment-associated selection in small populations such as land-locked Atlantic salmon populations presents a challenge, specifically in the search for pathogen-driven selection. We used a novel genome-scan analysis approach that enabled us to i) identify signals of selection in salmon populations affected by varying levels of genetic drift and ii) separate potentially selected loci into the categories of pathogen (G. salaris)-driven selection and selection acting upon other environmental characteristics. A total of 4631 single nucleotide polymorphisms (SNPs) were screened in Atlantic salmon from 12 different northern European populations. We identified three genomic regions potentially affected by parasite-driven selection, as well as three regions presumably affected by salinity-driven directional selection. Functional annotation of candidate SNPs is consistent with the role of the detected genomic regions in immune defence and, implicitly, in osmoregulation. These results provide new insights into the genetic basis of pathogen susceptibility in Atlantic salmon and will enable future searches for the specific genes involved.
Nwakanma, Davis C.; Duffy, Craig W.; Amambua-Ngwa, Alfred; Oriero, Eniyou C.; Bojang, Kalifa A.; Pinder, Margaret; Drakeley, Chris J.; Sutherland, Colin J.; Milligan, Paul J.; MacInnis, Bronwyn; Kwiatkowski, Dominic P.; Clark, Taane G.; Greenwood, Brian M.; Conway, David J.
2014-01-01
Background. Analysis of genome-wide polymorphism in many organisms has potential to identify genes under recent selection. However, data on historical allele frequency changes are rarely available for direct confirmation. Methods. We genotyped single nucleotide polymorphisms (SNPs) in 4 Plasmodium falciparum drug resistance genes in 668 archived parasite-positive blood samples of a Gambian population between 1984 and 2008. This covered a period before antimalarial resistance was detected locally, through subsequent failure of multiple drugs until introduction of artemisinin combination therapy. We separately performed genome-wide sequence analysis of 52 clinical isolates from 2008 to prospect for loci under recent directional selection. Results. Resistance alleles increased from very low frequencies, peaking in 2000 for chloroquine resistance-associated crt and mdr1 genes and at the end of the survey period for dhfr and dhps genes respectively associated with pyrimethamine and sulfadoxine resistance. Temporal changes fit a model incorporating likely selection coefficients over the period. Three of the drug resistance loci were in the top 4 regions under strong selection implicated by the genome-wide analysis. Conclusions. Genome-wide polymorphism analysis of an endemic population sample robustly identifies loci with detailed documentation of recent selection, demonstrating power to prospectively detect emerging drug resistance genes. PMID:24265439
Figueras, Antonio; Robledo, Diego; Corvelo, André; Hermida, Miguel; Pereiro, Patricia; Rubiolo, Juan A.; Gómez-Garrido, Jèssica; Carreté, Laia; Bello, Xabier; Gut, Marta; Gut, Ivo Glynne; Marcet-Houben, Marina; Forn-Cuní, Gabriel; Galán, Beatriz; García, José Luis; Abal-Fabeiro, José Luis; Pardo, Belen G.; Taboada, Xoana; Fernández, Carlos; Vlasova, Anna; Hermoso-Pulido, Antonio; Guigó, Roderic; Álvarez-Dios, José Antonio; Gómez-Tato, Antonio; Viñas, Ana; Maside, Xulio; Gabaldón, Toni; Novoa, Beatriz; Bouza, Carmen; Alioto, Tyler; Martínez, Paulino
2016-01-01
The turbot is a flatfish (Pleuronectiformes) with increasing commercial value, which has prompted active genomic research aimed at more efficient selection. Here we present the sequence and annotation of the turbot genome, which represents a milestone for both boosting breeding programmes and ascertaining the origin and diversification of flatfish. We compare the turbot genome with model fish genomes to investigate teleost chromosome evolution. We observe a conserved macrosyntenic pattern within Percomorpha and identify large syntenic blocks within the turbot genome related to the teleost genome duplication. We identify gene family expansions and positive selection of genes associated with vision and metabolism of membrane lipids, which suggests adaptation to demersal lifestyle and to cold temperatures, respectively. Our data indicate a quick evolution and diversification of flatfish to adapt to benthic life and provide clues for understanding their controversial origin. Moreover, we investigate the genomic architecture of growth, sex determination and disease resistance, key traits for understanding local adaptation and boosting turbot production, by mapping candidate genes and previously reported quantitative trait loci. The genomic architecture of these productive traits has allowed the identification of candidate genes and enriched pathways that may represent useful information for future marker-assisted selection in turbot. PMID:26951068
Figueras, Antonio; Robledo, Diego; Corvelo, André; Hermida, Miguel; Pereiro, Patricia; Rubiolo, Juan A; Gómez-Garrido, Jèssica; Carreté, Laia; Bello, Xabier; Gut, Marta; Gut, Ivo Glynne; Marcet-Houben, Marina; Forn-Cuní, Gabriel; Galán, Beatriz; García, José Luis; Abal-Fabeiro, José Luis; Pardo, Belen G; Taboada, Xoana; Fernández, Carlos; Vlasova, Anna; Hermoso-Pulido, Antonio; Guigó, Roderic; Álvarez-Dios, José Antonio; Gómez-Tato, Antonio; Viñas, Ana; Maside, Xulio; Gabaldón, Toni; Novoa, Beatriz; Bouza, Carmen; Alioto, Tyler; Martínez, Paulino
2016-06-01
The turbot is a flatfish (Pleuronectiformes) with increasing commercial value, which has prompted active genomic research aimed at more efficient selection. Here we present the sequence and annotation of the turbot genome, which represents a milestone for both boosting breeding programmes and ascertaining the origin and diversification of flatfish. We compare the turbot genome with model fish genomes to investigate teleost chromosome evolution. We observe a conserved macrosyntenic pattern within Percomorpha and identify large syntenic blocks within the turbot genome related to the teleost genome duplication. We identify gene family expansions and positive selection of genes associated with vision and metabolism of membrane lipids, which suggests adaptation to demersal lifestyle and to cold temperatures, respectively. Our data indicate a quick evolution and diversification of flatfish to adapt to benthic life and provide clues for understanding their controversial origin. Moreover, we investigate the genomic architecture of growth, sex determination and disease resistance, key traits for understanding local adaptation and boosting turbot production, by mapping candidate genes and previously reported quantitative trait loci. The genomic architecture of these productive traits has allowed the identification of candidate genes and enriched pathways that may represent useful information for future marker-assisted selection in turbot. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
2010-01-01
Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788
Potential benefits of genomic selection on genetic gain of small ruminant breeding programs.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Elsen, J M
2013-08-01
In conventional small ruminant breeding programs, only pedigree and phenotype records are used to make selection decisions but prospects of including genomic information are now under consideration. The objective of this study was to assess the potential benefits of genomic selection on the genetic gain in French sheep and goat breeding designs of today. Traditional and genomic scenarios were modeled with deterministic methods for 3 breeding programs. The models included decisional variables related to male selection candidates, progeny testing capacity, and economic weights that were optimized to maximize annual genetic gain (AGG) of i) a meat sheep breeding program that improved a meat trait of heritability (h(2)) = 0.30 and a maternal trait of h(2) = 0.09 and ii) dairy sheep and goat breeding programs that improved a milk trait of h(2) = 0.30. Values of ±0.20 of genetic correlation between meat and maternal traits were considered to study their effects on AGG. The Bulmer effect was accounted for and the results presented here are the averages of AGG after 10 generations of selection. Results showed that current traditional breeding programs provide an AGG of 0.095 genetic standard deviation (σa) for meat and 0.061 σa for maternal trait in meat breed and 0.147 σa and 0.120 σa in sheep and goat dairy breeds, respectively. By optimizing decisional variables, the AGG with traditional selection methods increased to 0.139 σa for meat and 0.096 σa for maternal traits in meat breeding programs and to 0.174 σa and 0.183 σa in dairy sheep and goat breeding programs, respectively. With a medium-sized reference population (nref) of 2,000 individuals, the best genomic scenarios gave an AGG that was 17.9% greater than with traditional selection methods with optimized values of decisional variables for combined meat and maternal traits in meat sheep, 51.7% in dairy sheep, and 26.2% in dairy goats. The superiority of genomic schemes increased with the size of the reference population and genomic selection gave the best results when nref > 1,000 individuals for dairy breeds and nref > 2,000 individuals for meat breed. Genetic correlation between meat and maternal traits had a large impact on the genetic gain of both traits. Changes in AGG due to correlation were greatest for low heritable maternal traits. As a general rule, AGG was increased both by optimizing selection designs and including genomic information.
Yabe, Shiori; Hara, Takashi; Ueno, Mariko; Enoki, Hiroyuki; Kimura, Tatsuro; Nishimura, Satoru; Yasui, Yasuo; Ohsawa, Ryo; Iwata, Hiroyoshi
2018-01-01
To evaluate the potential of genomic selection (GS), a selection experiment with GS and phenotypic selection (PS) was performed in an allogamous crop, common buckwheat ( Fagopyrum esculentum Moench). To indirectly select for seed yield per unit area, which cannot be measured on a single-plant basis, a selection index was constructed from seven agro-morphological traits measurable on a single plant basis. Over 3 years, we performed two GS and one PS cycles per year for improvement in the selection index. In GS, a prediction model was updated every year on the basis of genotypes of 14,598-50,000 markers and phenotypes. Plants grown from seeds derived from a series of generations of GS and PS populations were evaluated for the traits in the selection index and other yield-related traits. GS resulted in a 20.9% increase and PS in a 15.0% increase in the selection index in comparison with the initial population. Although the level of linkage disequilibrium in the breeding population was low, the target trait was improved with GS. Traits with higher weights in the selection index were improved more than those with lower weights, especially when prediction accuracy was high. No trait changed in an unintended direction in either GS or PS. The accuracy of genomic prediction models built in the first cycle decreased in the later cycles because the genetic bottleneck through the selection cycles changed linkage disequilibrium patterns in the breeding population. The present study emphasizes the importance of updating models in GS and demonstrates the potential of GS in mass selection of allogamous crop species, and provided a pilot example of successful application of GS to plant breeding.
Yabe, Shiori; Hara, Takashi; Ueno, Mariko; Enoki, Hiroyuki; Kimura, Tatsuro; Nishimura, Satoru; Yasui, Yasuo; Ohsawa, Ryo; Iwata, Hiroyoshi
2018-01-01
To evaluate the potential of genomic selection (GS), a selection experiment with GS and phenotypic selection (PS) was performed in an allogamous crop, common buckwheat (Fagopyrum esculentum Moench). To indirectly select for seed yield per unit area, which cannot be measured on a single-plant basis, a selection index was constructed from seven agro-morphological traits measurable on a single plant basis. Over 3 years, we performed two GS and one PS cycles per year for improvement in the selection index. In GS, a prediction model was updated every year on the basis of genotypes of 14,598–50,000 markers and phenotypes. Plants grown from seeds derived from a series of generations of GS and PS populations were evaluated for the traits in the selection index and other yield-related traits. GS resulted in a 20.9% increase and PS in a 15.0% increase in the selection index in comparison with the initial population. Although the level of linkage disequilibrium in the breeding population was low, the target trait was improved with GS. Traits with higher weights in the selection index were improved more than those with lower weights, especially when prediction accuracy was high. No trait changed in an unintended direction in either GS or PS. The accuracy of genomic prediction models built in the first cycle decreased in the later cycles because the genetic bottleneck through the selection cycles changed linkage disequilibrium patterns in the breeding population. The present study emphasizes the importance of updating models in GS and demonstrates the potential of GS in mass selection of allogamous crop species, and provided a pilot example of successful application of GS to plant breeding. PMID:29619035
Demographic history, selection and functional diversity of the canine genome.
Ostrander, Elaine A; Wayne, Robert K; Freedman, Adam H; Davis, Brian W
2017-12-01
The domestic dog represents one of the most dramatic long-term evolutionary experiments undertaken by humans. From a large wolf-like progenitor, unparalleled diversity in phenotype and behaviour has developed in dogs, providing a model for understanding the developmental and genomic mechanisms of diversification. We discuss pattern and process in domestication, beginning with general findings about early domestication and problems in documenting selection at the genomic level. Furthermore, we summarize genotype-phenotype studies based first on single nucleotide polymorphism (SNP) genotyping and then with whole-genome data and show how an understanding of evolution informs topics as different as human history, adaptive and deleterious variation, morphological development, ageing, cancer and behaviour.
Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.
Badke, Yvonne M; Bates, Ronald O; Ernst, Catherine W; Fix, Justin; Steibel, Juan P
2014-04-16
Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65-0.68). Using genotypes imputed from a large reference panel (accuracy: R(2) = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R(2) = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for selection results in lower accuracy of genomic evaluation.
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Li, Lun; Long, Yan; Zhang, Libin; Dalton-Morgan, Jessica; Batley, Jacqueline; Yu, Longjiang; Meng, Jinling; Li, Maoteng
2015-01-01
The prediction of the flowering time (FT) trait in Brassica napus based on genome-wide markers and the detection of underlying genetic factors is important not only for oilseed producers around the world but also for the other crop industry in the rotation system in China. In previous studies the low density and mixture of biomarkers used obstructed genomic selection in B. napus and comprehensive mapping of FT related loci. In this study, a high-density genome-wide SNP set was genotyped from a double-haploid population of B. napus. We first performed genomic prediction of FT traits in B. napus using SNPs across the genome under ten environments of three geographic regions via eight existing genomic predictive models. The results showed that all the models achieved comparably high accuracies, verifying the feasibility of genomic prediction in B. napus. Next, we performed a large-scale mapping of FT related loci among three regions, and found 437 associated SNPs, some of which represented known FT genes, such as AP1 and PHYE. The genes tagged by the associated SNPs were enriched in biological processes involved in the formation of flowers. Epistasis analysis showed that significant interactions were found between detected loci, even among some known FT related genes. All the results showed that our large scale and high-density genotype data are of great practical and scientific values for B. napus. To our best knowledge, this is the first evaluation of genomic selection models in B. napus based on a high-density SNP dataset and large-scale mapping of FT loci.
Genomic analyses provide insights into the history of tomato breeding.
Lin, Tao; Zhu, Guangtao; Zhang, Junhong; Xu, Xiangyang; Yu, Qinghui; Zheng, Zheng; Zhang, Zhonghua; Lun, Yaoyao; Li, Shuai; Wang, Xiaoxuan; Huang, Zejun; Li, Junming; Zhang, Chunzhi; Wang, Taotao; Zhang, Yuyang; Wang, Aoxue; Zhang, Yancong; Lin, Kui; Li, Chuanyou; Xiong, Guosheng; Xue, Yongbiao; Mazzucato, Andrea; Causse, Mathilde; Fei, Zhangjun; Giovannoni, James J; Chetelat, Roger T; Zamir, Dani; Städler, Thomas; Li, Jingfu; Ye, Zhibiao; Du, Yongchen; Huang, Sanwen
2014-11-01
The histories of crop domestication and breeding are recorded in genomes. Although tomato is a model species for plant biology and breeding, the nature of human selection that altered its genome remains largely unknown. Here we report a comprehensive analysis of tomato evolution based on the genome sequences of 360 accessions. We provide evidence that domestication and improvement focused on two independent sets of quantitative trait loci (QTLs), resulting in modern tomato fruit ∼100 times larger than its ancestor. Furthermore, we discovered a major genomic signature for modern processing tomatoes, identified the causative variants that confer pink fruit color and precisely visualized the linkage drag associated with wild introgressions. This study outlines the accomplishments as well as the costs of historical selection and provides molecular insights toward further improvement.
Exact Solution of Mutator Model with Linear Fitness and Finite Genome Length
NASA Astrophysics Data System (ADS)
Saakian, David B.
2017-08-01
We considered the infinite population version of the mutator phenomenon in evolutionary dynamics, looking at the uni-directional mutations in the mutator-specific genes and linear selection. We solved exactly the model for the finite genome length case, looking at the quasispecies version of the phenomenon. We calculated the mutator probability both in the statics and dynamics. The exact solution is important for us because the mutator probability depends on the genome length in a highly non-trivial way.
Will genomic selection be a practical method for plant breeding?
Nakaya, Akihiro; Isobe, Sachiko N.
2012-01-01
Background Genomic selection or genome-wide selection (GS) has been highlighted as a new approach for marker-assisted selection (MAS) in recent years. GS is a form of MAS that selects favourable individuals based on genomic estimated breeding values. Previous studies have suggested the utility of GS, especially for capturing small-effect quantitative trait loci, but GS has not become a popular methodology in the field of plant breeding, possibly because there is insufficient information available on GS for practical use. Scope In this review, GS is discussed from a practical breeding viewpoint. Statistical approaches employed in GS are briefly described, before the recent progress in GS studies is surveyed. GS practices in plant breeding are then reviewed before future prospects are discussed. Conclusions Statistical concepts used in GS are discussed with genetic models and variance decomposition, heritability, breeding value and linear model. Recent progress in GS studies is reviewed with a focus on empirical studies. For the practice of GS in plant breeding, several specific points are discussed including linkage disequilibrium, feature of populations and genotyped markers and breeding scheme. Currently, GS is not perfect, but it is a potent, attractive and valuable approach for plant breeding. This method will be integrated into many practical breeding programmes in the near future with further advances and the maturing of its theory. PMID:22645117
Multi-locus analysis of genomic time series data from experimental evolution.
Terhorst, Jonathan; Schlötterer, Christian; Song, Yun S
2015-04-01
Genomic time series data generated by evolve-and-resequence (E&R) experiments offer a powerful window into the mechanisms that drive evolution. However, standard population genetic inference procedures do not account for sampling serially over time, and new methods are needed to make full use of modern experimental evolution data. To address this problem, we develop a Gaussian process approximation to the multi-locus Wright-Fisher process with selection over a time course of tens of generations. The mean and covariance structure of the Gaussian process are obtained by computing the corresponding moments in discrete-time Wright-Fisher models conditioned on the presence of a linked selected site. This enables our method to account for the effects of linkage and selection, both along the genome and across sampled time points, in an approximate but principled manner. We first use simulated data to demonstrate the power of our method to correctly detect, locate and estimate the fitness of a selected allele from among several linked sites. We study how this power changes for different values of selection strength, initial haplotypic diversity, population size, sampling frequency, experimental duration, number of replicates, and sequencing coverage depth. In addition to providing quantitative estimates of selection parameters from experimental evolution data, our model can be used by practitioners to design E&R experiments with requisite power. We also explore how our likelihood-based approach can be used to infer other model parameters, including effective population size and recombination rate. Then, we apply our method to analyze genome-wide data from a real E&R experiment designed to study the adaptation of D. melanogaster to a new laboratory environment with alternating cold and hot temperatures.
Reitzel, A M; Herrera, S; Layden, M J; Martindale, M Q; Shank, T M
2013-06-01
Characterization of large numbers of single-nucleotide polymorphisms (SNPs) throughout a genome has the power to refine the understanding of population demographic history and to identify genomic regions under selection in natural populations. To this end, population genomic approaches that harness the power of next-generation sequencing to understand the ecology and evolution of marine invertebrates represent a boon to test long-standing questions in marine biology and conservation. We employed restriction-site-associated DNA sequencing (RAD-seq) to identify SNPs in natural populations of the sea anemone Nematostella vectensis, an emerging cnidarian model with a broad geographic range in estuarine habitats in North and South America, and portions of England. We identified hundreds of SNP-containing tags in thousands of RAD loci from 30 barcoded individuals inhabiting four locations from Nova Scotia to South Carolina. Population genomic analyses using high-confidence SNPs resulted in a highly-resolved phylogeography, a result not achieved in previous studies using traditional markers. Plots of locus-specific FST against heterozygosity suggest that a majority of polymorphic sites are neutral, with a smaller proportion suggesting evidence for balancing selection. Loci inferred to be under balancing selection were mapped to the genome, where 90% were located in gene bodies, indicating potential targets of selection. The results from analyses with and without a reference genome supported similar conclusions, further highlighting RAD-seq as a method that can be efficiently applied to species lacking existing genomic resources. We discuss the utility of RAD-seq approaches in burgeoning Nematostella research as well as in other cnidarian species, particularly corals and jellyfishes, to determine phylogeographic relationships of populations and identify regions of the genome undergoing selection. © 2013 John Wiley & Sons Ltd.
Reitzel, A.M.; Herrera, S.; Layden, M.J.; Martindale, M.Q.; Shank, T.M.
2013-01-01
Characterization of large numbers of single nucleotide polymorphisms (SNPs) throughout a genome has the power to refine the understanding of population demographic history and to identify genomic regions under selection in natural populations. To this end, population genomic approaches that harness the power of next-generation sequencing to understand the ecology and evolution of marine invertebrates represent a boon to test long-standing questions in marine biology and conservation. We employed restriction-site-associated DNA sequencing (RAD-seq) to identify SNPs in natural populations of the sea anemone Nematostella vectensis, an emerging cnidarian model with a broad geographic range in estuarine habitats in North and South America, and portions of England. We identified hundreds of SNP-containing tags in thousands of RAD loci from 30 barcoded individuals inhabiting four locations from Nova Scotia to South Carolina. Population genomic analyses using high-confidence SNPs resulted in a highly-resolved phylogeography, a result not achieved in previous studies using traditional markers. Plots of locus-specific FST against heterozygosity suggest that a majority of polymorphic sites are neutral, with a smaller proportion suggesting evidence for balancing selection. Loci inferred to be under balancing selection were mapped to the genome, where 90% were located in gene bodies, indicating potential targets of selection. Results from analyses with and without a reference genome supported similar conclusions, further supporting RAD-seq as a method that can be efficiently applied to species lacking existing genomic resources. We discuss the utility of RAD-seq approaches in burgeoning Nematostella research as well as in other cnidarian species, particularly corals, to determine phylogeographic relationships of populations and identify regions of the genome undergoing selection. PMID:23473066
Chen, Minhui; Wang, Jiying; Wang, Yanping; Wu, Ying; Fu, Jinluan; Liu, Jian-Feng
2018-05-18
Currently, genome-wide scans for positive selection signatures in commercial breed have been investigated. However, few studies have focused on selection footprints of indigenous breeds. Laiwu pig is an invaluable Chinese indigenous pig breed with extremely high proportion of intramuscular fat (IMF), and an excellent model to detect footprint as the result of natural and artificial selection for fat deposition in muscle. In this study, based on GeneSeek Genomic profiler Porcine HD data, three complementary methods, F ST , iHS (integrated haplotype homozygosity score) and CLR (composite likelihood ratio), were implemented to detect selection signatures in the whole genome of Laiwu pigs. Totally, 175 candidate selected regions were obtained by at least two of the three methods, which covered 43.75 Mb genomic regions and corresponded to 1.79% of the genome sequence. Gene annotation of the selected regions revealed a list of functionally important genes for feed intake and fat deposition, reproduction, and immune response. Especially, in accordance to the phenotypic features of Laiwu pigs, among the candidate genes, we identified several genes, NPY1R, NPY5R, PIK3R1 and JAKMIP1, involved in the actions of two sets of neurons, which are central regulators in maintaining the balance between food intake and energy expenditure. Our results identified a number of regions showing signatures of selection, as well as a list of functionally candidate genes with potential effect on phenotypic traits, especially fat deposition in muscle. Our findings provide insights into the mechanisms of artificial selection of fat deposition and further facilitate follow-up functional studies.
Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome
Rizvi, Ahsan Z; Venu Gopal, T; Bhattacharya, C
2012-01-01
Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model. PMID:22368390
Schematic for efficient computation of GC, GC3, and AT3 bias spectra of genome.
Rizvi, Ahsan Z; Venu Gopal, T; Bhattacharya, C
2012-01-01
Selection of synonymous codons for an amino acid is biased in protein translation process. This biased selection causes repetition of synonymous codons in structural parts of genome that stands for high N/3 peaks in DNA spectrum. Period-3 spectral property is utilized here to produce a 3-phase network model based on polyphase filterbank concepts for derivation of codon bias spectra (CBS). Modification of parameters in this model can produce GC, GC3, and AT3 bias spectra. Complete schematic in LabVIEW platform is presented here for efficient and parallel computation of GC, GC3, and AT3 bias spectra of genomes alongwith results of CBS patterns. We have performed the correlation coefficient analysis of GC, GC3, and AT3 bias spectra with codon bias patterns of CBS for biological and statistical significance of this model.
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an “island model” inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement. PMID:27115872
Beissinger, Timothy M.; Hirsch, Candice N.; Vaillancourt, Brieanne; Deshpande, Shweta; Barry, Kerrie; Buell, C. Robin; Kaeppler, Shawn M.; Gianola, Daniel; de Leon, Natalia
2014-01-01
A genome-wide scan to detect evidence of selection was conducted in the Golden Glow maize long-term selection population. The population had been subjected to selection for increased number of ears per plant for 30 generations, with an empirically estimated effective population size ranging from 384 to 667 individuals and an increase of more than threefold in the number of ears per plant. Allele frequencies at >1.2 million single-nucleotide polymorphism loci were estimated from pooled whole-genome resequencing data, and FST values across sliding windows were employed to assess divergence between the population preselection and the population postselection. Twenty-eight highly divergent regions were identified, with half of these regions providing gene-level resolution on potentially selected variants. Approximately 93% of the divergent regions do not demonstrate a significant decrease in heterozygosity, which suggests that they are not approaching fixation. Also, most regions display a pattern consistent with a soft-sweep model as opposed to a hard-sweep model, suggesting that selection mostly operated on standing genetic variation. For at least 25% of the regions, results suggest that selection operated on variants located outside of currently annotated coding regions. These results provide insights into the underlying genetic effects of long-term artificial selection and identification of putative genetic elements underlying number of ears per plant in maize. PMID:24381334
USDA-ARS?s Scientific Manuscript database
Breeding and selection for the traits with polygenic inheritance is a challenging task that can be done by phenotypic selection, by marker-assisted selection or by genome wide selection. We tested predictive ability of four selection models in a biparental population genotyped with 95 SNP markers an...
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, Kevin
2016-01-11
This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive featuremore » of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.« less
Efficient Breeding by Genomic Mating.
Akdemir, Deniz; Sánchez, Julio I
2016-01-01
Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.
Genomic selection in domestic animals: Principles, applications and perspectives.
Boichard, Didier; Ducrocq, Vincent; Croiseau, Pascal; Fritz, Sébastien
2016-01-01
The principles of genomic selection are described, with the main factors affecting its efficiency and the assumptions underlying the different models proposed. The reasons of its fast adoption in dairy cattle are explained and the conditions of its application to other species are discussed. Perspectives of development include: selection for new traits and new breeding objectives; adoption of more robust approaches based on information on causal variants; predictions of genotype×environment interactions. Copyright © 2016 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Christe, Camille; Stölting, Kai N; Bresadola, Luisa; Fussi, Barbara; Heinze, Berthold; Wegmann, Daniel; Lexer, Christian
2016-06-01
Natural hybrid zones have proven to be precious tools for understanding the origin and maintenance of reproductive isolation (RI) and therefore species. Most available genomic studies of hybrid zones using whole- or partial-genome resequencing approaches have focused on comparisons of the parental source populations involved in genome admixture, rather than exploring fine-scale patterns of chromosomal ancestry across the full admixture gradient present between hybridizing species. We have studied three well-known European 'replicate' hybrid zones of Populus alba and P. tremula, two widespread, ecologically divergent forest trees, using up to 432 505 single-nucleotide polymorphisms (SNPs) from restriction site-associated DNA (RAD) sequencing. Estimates of fine-scale chromosomal ancestry, genomic divergence and differentiation across all 19 poplar chromosomes revealed strikingly contrasting results, including an unexpected preponderance of F1 hybrids in the centre of genomic clines on the one hand, and genomically localized, spatially variable shared variants consistent with ancient introgression between the parental species on the other. Genetic ancestry had a significant effect on survivorship of hybrid seedlings in a common garden trial, pointing to selection against early-generation recombinants. Our results indicate a role for selection against recombinant genotypes in maintaining RI in the face of apparent F1 fertility, consistent with the intragenomic 'coadaptation' model of barriers to introgression upon secondary contact. Whole-genome resequencing of hybridizing populations will clarify the roles of specific genetic pathways in RI between these model forest trees and may reveal which loci are affected most strongly by its cyclic breakdown. © 2016 John Wiley & Sons Ltd.
Whole-Genome Positive Selection and Habitat-Driven Evolution in a Shallow and a Deep-Sea Urchin
Oliver, Thomas A.; Garfield, David A.; Manier, Mollie K.; Haygood, Ralph; Wray, Gregory A.; Palumbi, Stephen R.
2010-01-01
Comparisons of genomic sequence between divergent species can provide insight into the action of natural selection across many distinct classes of proteins. Here, we examine the extent of positive selection as a function of tissue-specific and stage-specific gene expression in two closely-related sea urchins, the shallow-water Strongylocentrotus purpuratus and the deep-sea Allocentrotus fragilis, which have diverged greatly in their adult but not larval habitats. Genes that are expressed specifically in adult somatic tissue have significantly higher dN/dS ratios than the genome-wide average, whereas those in larvae are indistinguishable from the genome-wide average. Testis-specific genes have the highest dN/dS values, whereas ovary-specific have the lowest. Branch-site models involving the outgroup S. franciscanus indicate greater selection (ωFG) along the A. fragilis branch than along the S. purpuratus branch. The A. fragilis branch also shows a higher proportion of genes under positive selection, including those involved in skeletal development, endocytosis, and sulfur metabolism. Both lineages are approximately equal in enrichment for positive selection of genes involved in immunity, development, and cell–cell communication. The branch-site models further suggest that adult-specific genes have experienced greater positive selection than those expressed in larvae and that ovary-specific genes are more conserved (i.e., experienced greater negative selection) than those expressed specifically in adult somatic tissues and testis. Our results chart the patterns of protein change that have occurred after habitat divergence in these two species and show that the developmental or functional context in which a gene acts can play an important role in how divergent species adapt to new environments. PMID:20935062
Mehrban, Hossein; Lee, Deuk Hwan; Moradi, Mohammad Hossein; IlCho, Chung; Naserkheil, Masoumeh; Ibáñez-Escriche, Noelia
2017-01-04
Hanwoo beef is known for its marbled fat, tenderness, juiciness and characteristic flavor, as well as for its low cholesterol and high omega 3 fatty acid contents. As yet, there has been no comprehensive investigation to estimate genomic selection accuracy for carcass traits in Hanwoo cattle using dense markers. This study aimed at evaluating the accuracy of alternative statistical methods that differed in assumptions about the underlying genetic model for various carcass traits: backfat thickness (BT), carcass weight (CW), eye muscle area (EMA), and marbling score (MS). Accuracies of direct genomic breeding values (DGV) for carcass traits were estimated by applying fivefold cross-validation to a dataset including 1183 animals and approximately 34,000 single nucleotide polymorphisms (SNPs). Accuracies of BayesC, Bayesian LASSO (BayesL) and genomic best linear unbiased prediction (GBLUP) methods were similar for BT, EMA and MS. However, for CW, DGV accuracy was 7% higher with BayesC than with BayesL and GBLUP. The increased accuracy of BayesC, compared to GBLUP and BayesL, was maintained for CW, regardless of the training sample size, but not for BT, EMA, and MS. Genome-wide association studies detected consistent large effects for SNPs on chromosomes 6 and 14 for CW. The predictive performance of the models depended on the trait analyzed. For CW, the results showed a clear superiority of BayesC compared to GBLUP and BayesL. These findings indicate the importance of using a proper variable selection method for genomic selection of traits and also suggest that the genetic architecture that underlies CW differs from that of the other carcass traits analyzed. Thus, our study provides significant new insights into the carcass traits of Hanwoo cattle.
The Hydra genome: insights, puzzles and opportunities for developmental biologists.
Steele, Robert E
2012-01-01
The sequencing of a Hydra genome marked the beginning of a new era in the use of Hydra as a developmental model. Analysis of the genome sequence has led to a number of interesting findings, has required revisiting of previous work, and most importantly presents new opportunities for understanding the developmental biology of Hydra. This review will de-scribe the history of the Hydra genome project, a selection of results from it that are relevant to developmental biologists, and some future research opportunities provided by Hydra genomics.
Rat Genome and Model Resources.
Shimoyama, Mary; Smith, Jennifer R; Bryda, Elizabeth; Kuramoto, Takashi; Saba, Laura; Dwinell, Melinda
2017-07-01
Rats remain a major model for studying disease mechanisms and discovery, validation, and testing of new compounds to improve human health. The rat's value continues to grow as indicated by the more than 1.4 million publications (second to human) at PubMed documenting important discoveries using this model. Advanced sequencing technologies, genome modification techniques, and the development of embryonic stem cell protocols ensure the rat remains an important mammalian model for disease studies. The 2004 release of the reference genome has been followed by the production of complete genomes for more than two dozen individual strains utilizing NextGen sequencing technologies; their analyses have identified over 80 million variants. This explosion in genomic data has been accompanied by the ability to selectively edit the rat genome, leading to hundreds of new strains through multiple technologies. A number of resources have been developed to provide investigators with access to precision rat models, comprehensive datasets, and sophisticated software tools necessary for their research. Those profiled here include the Rat Genome Database, PhenoGen, Gene Editing Rat Resource Center, Rat Resource and Research Center, and the National BioResource Project for the Rat in Japan. © The Author 2017. Published by Oxford University Press.
Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes
2017-01-01
Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.
Rouam, Sigrid; Broët, Philippe
2013-08-01
To identify genomic markers with consistent effect on tumor dynamics across multiple cancer series, discrimination indices based on proportional hazards models can be used since they do not depend heavily on the sample size. However, the underlying assumption of proportionality of the hazards does not always hold, especially when the studied population is a mixture of cured and uncured patients, like in early-stage cancers. We propose a novel index that quantifies the capability of a genomic marker to separate uncured patients, according to their time-to-event outcomes. It allows to identify genomic markers characterizing tumor growth dynamic across multiple studies. Simulation results show that our index performs better than classical indices based on the Cox model. It is neither affected by the sample size nor the cure rate fraction. In a cross-study of early-stage breast cancers, the index allows to select genomic markers with a potential consistent effect on tumor growth dynamics. Copyright © 2013 Elsevier Inc. All rights reserved.
Hill, William G
2014-01-01
Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives' performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher's infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with "genomic selection" is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.
Random genetic drift, natural selection, and noise in human cranial evolution.
Roseman, Charles C
2016-08-01
This study assesses the extent to which relationships among groups complicate comparative studies of adaptation in recent human cranial variation and the extent to which departures from neutral additive models of evolution hinder the reconstruction of population relationships among groups using cranial morphology. Using a maximum likelihood evolutionary model fitting approach and a mixed population genomic and cranial data set, I evaluate the relative fits of several widely used models of human cranial evolution. Moreover, I compare the goodness of fit of models of cranial evolution constrained by genomic variation to test hypotheses about population specific departures from neutrality. Models from population genomics are much better fits to cranial variation than are traditional models from comparative human biology. There is not enough evolutionary information in the cranium to reconstruct much of recent human evolution but the influence of population history on cranial variation is strong enough to cause comparative studies of adaptation serious difficulties. Deviations from a model of random genetic drift along a tree-like population history show the importance of environmental effects, gene flow, and/or natural selection on human cranial variation. Moreover, there is a strong signal of the effect of natural selection or an environmental factor on a group of humans from Siberia. The evolution of the human cranium is complex and no one evolutionary process has prevailed at the expense of all others. A holistic unification of phenome, genome, and environmental context, gives us a strong point of purchase on these problems, which is unavailable to any one traditional approach alone. Am J Phys Anthropol 160:582-592, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Huang, Huateng; Rabosky, Daniel L
2015-09-16
Sexual dichromatism is the tendency for sexes to differ in color pattern and represents a striking form of within-species morphological variation. Conspicuous intersexual differences in avian plumage are generally thought to result from Darwinian sexual selection, to the extent that dichromatism is often treated as a surrogate for the intensity of sexual selection in phylogenetic comparative studies. Intense sexual selection is predicted to leave a footprint on genetic evolution by reducing the relative genetic diversity on sex chromosome to that on the autosomes. In this study, we test the association between plumage dichromatism and sex-linked genetic diversity using eight species pairs with contrasting levels of dichromatism. We estimated Z-linked and autosomal genetic diversity for these non-model avian species using restriction-site associated (RAD) loci that covered ~3 % of the genome. We find that monochromatic birds consistently have reduced sex-linked genomic variation relative to phylogenetically-paired dichromatic species and this pattern is robust to mutational biases. Our results are consistent with several interpretations. If present-day sexual selection is stronger in dichromatic birds, our results suggest that its impact on sex-linked genomic variation is offset by other processes that lead to proportionately lower Z-linked variation in monochromatic species. We discuss possible factors that may contribute to this discrepancy between phenotypes and genomic variation. Conversely, it is possible that present-day sexual selection -- as measured by the variance in male reproductive success -- is stronger in the set of monochromatic taxa we have examined, potentially reflecting the importance of song, behavior and other non-plumage associated traits as targets of sexual selection. This counterintuitive finding suggests that the relationship between genomic variation and sexual selection is complex and highlights the need for a more comprehensive survey of genomic variation in avian taxa that vary markedly in social and genetic mating systems.
Cow genotyping strategies for genomic selection in a small dairy cattle population.
Jenko, J; Wiggans, G R; Cooper, T A; Eaglen, S A E; Luff, W G de L; Bichard, M; Pong-Wong, R; Woolliams, J A
2017-01-01
This study compares how different cow genotyping strategies increase the accuracy of genomic estimated breeding values (EBV) in dairy cattle breeds with low numbers. In these breeds, few sires have progeny records, and genotyping cows can improve the accuracy of genomic EBV. The Guernsey breed is a small dairy cattle breed with approximately 14,000 recorded individuals worldwide. Predictions of phenotypes of milk yield, fat yield, protein yield, and calving interval were made for Guernsey cows from England and Guernsey Island using genomic EBV, with training sets including 197 de-regressed proofs of genotyped bulls, with cows selected from among 1,440 genotyped cows using different genotyping strategies. Accuracies of predictions were tested using 10-fold cross-validation among the cows. Genomic EBV were predicted using 4 different methods: (1) pedigree BLUP, (2) genomic BLUP using only bulls, (3) univariate genomic BLUP using bulls and cows, and (4) bivariate genomic BLUP. Genotyping cows with phenotypes and using their data for the prediction of single nucleotide polymorphism effects increased the correlation between genomic EBV and phenotypes compared with using only bulls by 0.163±0.022 for milk yield, 0.111±0.021 for fat yield, and 0.113±0.018 for protein yield; a decrease of 0.014±0.010 for calving interval from a low base was the only exception. Genetic correlation between phenotypes from bulls and cows were approximately 0.6 for all yield traits and significantly different from 1. Only a very small change occurred in correlation between genomic EBV and phenotypes when using the bivariate model. It was always better to genotype all the cows, but when only half of the cows were genotyped, a divergent selection strategy was better compared with the random or directional selection approach. Divergent selection of 30% of the cows remained superior for the yield traits in 8 of 10 folds. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs.
Singh, Priyanka; Engel, Jasper; Jansen, Jeroen; de Haan, Jorn; Buydens, Lutgarde Maria Celina
2016-05-04
Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS. Eight genomic distance measures were used to quantify relationships between the tomato accessions from the SNPs. Subsequently, each of these distance measures was used to predict the BW using the DPLS prediction model. The DPLS model was found to be robust to the choice of distance measures; similar prediction performances were obtained for each distance measure. DPLS greatly outperformed the single-SNP regression approach, showing that BW is a comprehensive trait dependent on several loci. Next, the performance of the DPLS model was compared to that of GBLUP. Although GBLUP and DPLS are conceptually very different, the prediction quality (PQ) measured by DPLS models were similar to the prediction statistics obtained from GBLUP. A considerable advantage of DPLS is that the genotype-phenotype relationship can easily be visualized in a 2-D scatter plot. This so-called score-plot provides breeders an insight to select candidates for their future breeding program. DPLS is a highly appropriate method for GP. The model prediction performance was similar to the GBLUP and far better than the single-SNP approach. The proposed method can be used in combination with a wide range of genomic dissimilarity measures and genotype representations such as allele-count, haplotypes or allele-intensity values. Additionally, the data can be insightfully visualized by the DPLS model, allowing for selection of desirable candidates from the breeding experiments. In this study, we have assessed the DPLS performance on a single trait.
Kerner, Berit; North, Kari E; Fallin, M Daniele
2010-01-01
Participants analyzed actual and simulated longitudinal data from the Framingham Heart Study for various metabolic and cardiovascular traits. The genetic information incorporated into these investigations ranged from selected single-nucleotide polymorphisms to genome-wide association arrays. Genotypes were incorporated using a broad range of methodological approaches including conditional logistic regression, linear mixed models, generalized estimating equations, linear growth curve estimation, growth modeling, growth mixture modeling, population attributable risk fraction based on survival functions under the proportional hazards models, and multivariate adaptive splines for the analysis of longitudinal data. The specific scientific questions addressed by these different approaches also varied, ranging from a more precise definition of the phenotype, bias reduction in control selection, estimation of effect sizes and genotype associated risk, to direct incorporation of genetic data into longitudinal modeling approaches and the exploration of population heterogeneity with regard to longitudinal trajectories. The group reached several overall conclusions: 1) The additional information provided by longitudinal data may be useful in genetic analyses. 2) The precision of the phenotype definition as well as control selection in nested designs may be improved, especially if traits demonstrate a trend over time or have strong age-of-onset effects. 3) Analyzing genetic data stratified for high-risk subgroups defined by a unique development over time could be useful for the detection of rare mutations in common multi-factorial diseases. 4) Estimation of the population impact of genomic risk variants could be more precise. The challenges and computational complexity demanded by genome-wide single-nucleotide polymorphism data were also discussed. PMID:19924713
2010-01-01
Background Food supply from the ocean is constrained by the shortage of domesticated and selected fish. Development of genomic models of economically important fishes should assist with the removal of this bottleneck. European sea bass Dicentrarchus labrax L. (Moronidae, Perciformes, Teleostei) is one of the most important fishes in European marine aquaculture; growing genomic resources put it on its way to serve as an economic model. Results End sequencing of a sea bass genomic BAC-library enabled the comparative mapping of the sea bass genome using the three-spined stickleback Gasterosteus aculeatus genome as a reference. BAC-end sequences (102,690) were aligned to the stickleback genome. The number of mappable BACs was improved using a two-fold coverage WGS dataset of sea bass resulting in a comparative BAC-map covering 87% of stickleback chromosomes with 588 BAC-contigs. The minimum size of 83 contigs covering 50% of the reference was 1.2 Mbp; the largest BAC-contig comprised 8.86 Mbp. More than 22,000 BAC-clones aligned with both ends to the reference genome. Intra-chromosomal rearrangements between sea bass and stickleback were identified. Size distributions of mapped BACs were used to calculate that the genome of sea bass may be only 1.3 fold larger than the 460 Mbp stickleback genome. Conclusions The BAC map is used for sequencing single BACs or BAC-pools covering defined genomic entities by second generation sequencing technologies. Together with the WGS dataset it initiates a sea bass genome sequencing project. This will allow the quantification of polymorphisms through resequencing, which is important for selecting highly performing domesticated fish. PMID:20105308
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
Navigating the Interface Between Landscape Genetics and Landscape Genomics.
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used.
Navigating the Interface Between Landscape Genetics and Landscape Genomics
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K.
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used. PMID:29593776
Sattath, Shmuel; Elyashiv, Eyal; Kolodny, Oren; Rinott, Yosef; Sella, Guy
2011-02-10
In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence-in particular, conclusions about the rate and strength of beneficial substitutions-remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ∼13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions in Drosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation.
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection
Stanley, Craig E.; Kulathinal, Rob J.
2016-01-01
With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. PMID:27226167
flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.
Stanley, Craig E; Kulathinal, Rob J
2016-08-09
With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. Copyright © 2016 Stanley and Kulathinal.
Fernandez, Ronan; Berro, Julien
2017-01-01
Fission yeast is a powerful model organism that has provided insights into important cellular processes thanks to the ease of its genome editing by homologous recombination. However, creation of strains with a large number of targeted mutations or containing plasmids has been challenging because only a very small number of selection markers is available in Schizosaccharomyces pombe. In this paper, we identify two fission yeast fluoride exporter channels (Fex1p and Fex2p) and describe the development of a new strategy using Fex1p as a selection marker for transformants in rich media supplemented with fluoride. To our knowledge this is the first positive selection marker identified in S. pombe that does not use auxotrophy or drug resistance and that can be used for plasmids transformation or genomic integration in rich media. We illustrate the application of our new marker by significantly accelerating the protocol for genome edition using CRISPR/Cas9 in S. pombe. PMID:27327046
Pecetti, Luciano; Brummer, E. Charles; Palmonari, Alberto; Tava, Aldo
2017-01-01
Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3–0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits (by genomic selection or MAS) and forage yield. PMID:28068350
Biazzi, Elisa; Nazzicari, Nelson; Pecetti, Luciano; Brummer, E Charles; Palmonari, Alberto; Tava, Aldo; Annicchiarico, Paolo
2017-01-01
Genetic progress for forage quality has been poor in alfalfa (Medicago sativa L.), the most-grown forage legume worldwide. This study aimed at exploring opportunities for marker-assisted selection (MAS) and genomic selection of forage quality traits based on breeding values of parent plants. Some 154 genotypes from a broadly-based reference population were genotyped by genotyping-by-sequencing (GBS), and phenotyped for leaf-to-stem ratio, leaf and stem contents of protein, neutral detergent fiber (NDF) and acid detergent lignin (ADL), and leaf and stem NDF digestibility after 24 hours (NDFD), of their dense-planted half-sib progenies in three growing conditions (summer harvest, full irrigation; summer harvest, suspended irrigation; autumn harvest). Trait-marker analyses were performed on progeny values averaged over conditions, owing to modest germplasm × condition interaction. Genomic selection exploited 11,450 polymorphic SNP markers, whereas a subset of 8,494 M. truncatula-aligned markers were used for a genome-wide association study (GWAS). GWAS confirmed the polygenic control of quality traits and, in agreement with phenotypic correlations, indicated substantially different genetic control of a given trait in stems and leaves. It detected several SNPs in different annotated genes that were highly linked to stem protein content. Also, it identified a small genomic region on chromosome 8 with high concentration of annotated genes associated with leaf ADL, including one gene probably involved in the lignin pathway. Three genomic selection models, i.e., Ridge-regression BLUP, Bayes B and Bayesian Lasso, displayed similar prediction accuracy, whereas SVR-lin was less accurate. Accuracy values were moderate (0.3-0.4) for stem NDFD and leaf protein content, modest for leaf ADL and NDFD, and low to very low for the other traits. Along with previous results for the same germplasm set, this study indicates that GBS data can be exploited to improve both quality traits (by genomic selection or MAS) and forage yield.
Šmarda, Petr; Bureš, Petr; Horová, Lucie
2007-01-01
Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968
Minimal-assumption inference from population-genomic data
NASA Astrophysics Data System (ADS)
Weissman, Daniel; Hallatschek, Oskar
Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
Genomic signals of selection predict climate-driven population declines in a migratory bird.
Bay, Rachael A; Harrigan, Ryan J; Underwood, Vinh Le; Gibbs, H Lisle; Smith, Thomas B; Ruegg, Kristen
2018-01-05
The ongoing loss of biodiversity caused by rapid climatic shifts requires accurate models for predicting species' responses. Despite evidence that evolutionary adaptation could mitigate climate change impacts, evolution is rarely integrated into predictive models. Integrating population genomics and environmental data, we identified genomic variation associated with climate across the breeding range of the migratory songbird, yellow warbler ( Setophaga petechia ). Populations requiring the greatest shifts in allele frequencies to keep pace with future climate change have experienced the largest population declines, suggesting that failure to adapt may have already negatively affected populations. Broadly, our study suggests that the integration of genomic adaptation can increase the accuracy of future species distribution models and ultimately guide more effective mitigation efforts. Copyright © 2018, American Association for the Advancement of Science.
Sukumaran, Sivakumar; Crossa, Jose; Jarquin, Diego; Lopes, Marta; Reynolds, Matthew P
2017-02-09
Developing genomic selection (GS) models is an important step in applying GS to accelerate the rate of genetic gain in grain yield in plant breeding. In this study, seven genomic prediction models under two cross-validation (CV) scenarios were tested on 287 advanced elite spring wheat lines phenotyped for grain yield (GY), thousand-grain weight (GW), grain number (GN), and thermal time for flowering (TTF) in 18 international environments (year-location combinations) in major wheat-producing countries in 2010 and 2011. Prediction models with genomic and pedigree information included main effects and interaction with environments. Two random CV schemes were applied to predict a subset of lines that were not observed in any of the 18 environments (CV1), and a subset of lines that were not observed in a set of the environments, but were observed in other environments (CV2). Genomic prediction models, including genotype × environment (G×E) interaction, had the highest average prediction ability under the CV1 scenario for GY (0.31), GN (0.32), GW (0.45), and TTF (0.27). For CV2, the average prediction ability of the model including the interaction terms was generally high for GY (0.38), GN (0.43), GW (0.63), and TTF (0.53). Wheat lines in site-year combinations in Mexico and India had relatively high prediction ability for GY and GW. Results indicated that prediction ability of lines not observed in certain environments could be relatively high for genomic selection when predicting G×E interaction in multi-environment trials. Copyright © 2017 Sukumaran et al.
The Joint Effects of Background Selection and Genetic Recombination on Local Gene Genealogies
Zeng, Kai; Charlesworth, Brian
2011-01-01
Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data. PMID:21705759
The joint effects of background selection and genetic recombination on local gene genealogies.
Zeng, Kai; Charlesworth, Brian
2011-09-01
Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.
Uniparental Inheritance Promotes Adaptive Evolution in Cytoplasmic Genomes.
Christie, Joshua R; Beekman, Madeleine
2017-03-01
Eukaryotes carry numerous asexual cytoplasmic genomes (mitochondria and plastids). Lacking recombination, asexual genomes should theoretically suffer from impaired adaptive evolution. Yet, empirical evidence indicates that cytoplasmic genomes experience higher levels of adaptive evolution than predicted by theory. In this study, we use a computational model to show that the unique biology of cytoplasmic genomes-specifically their organization into host cells and their uniparental (maternal) inheritance-enable them to undergo effective adaptive evolution. Uniparental inheritance of cytoplasmic genomes decreases competition between different beneficial substitutions (clonal interference), promoting the accumulation of beneficial substitutions. Uniparental inheritance also facilitates selection against deleterious cytoplasmic substitutions, slowing Muller's ratchet. In addition, uniparental inheritance generally reduces genetic hitchhiking of deleterious substitutions during selective sweeps. Overall, uniparental inheritance promotes adaptive evolution by increasing the level of beneficial substitutions relative to deleterious substitutions. When we assume that cytoplasmic genome inheritance is biparental, decreasing the number of genomes transmitted during gametogenesis (bottleneck) aids adaptive evolution. Nevertheless, adaptive evolution is always more efficient when inheritance is uniparental. Our findings explain empirical observations that cytoplasmic genomes-despite their asexual mode of reproduction-can readily undergo adaptive evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
USDA-ARS?s Scientific Manuscript database
Population genetics is a powerful tool for invasion biology and pest management, from tracing invasion pathways to informing management decisions with inference of population demographics. Genomics greatly increases the resolution of population-scale analyses, yet outside of model species with exten...
Hagen, Ingerid J; Billing, Anna M; Rønning, Bernt; Pedersen, Sindre A; Pärn, Henrik; Slate, Jon; Jensen, Henrik
2013-05-01
With the advent of next generation sequencing, new avenues have opened to study genomics in wild populations of non-model species. Here, we describe a successful approach to a genome-wide medium density Single Nucleotide Polymorphism (SNP) panel in a non-model species, the house sparrow (Passer domesticus), through the development of a 10 K Illumina iSelect HD BeadChip. Genomic DNA and cDNA derived from six individuals were sequenced on a 454 GS FLX system and generated a total of 1.2 million sequences, in which SNPs were detected. As no reference genome exists for the house sparrow, we used the zebra finch (Taeniopygia guttata) reference genome to determine the most likely position of each SNP. The 10 000 SNPs on the SNP-chip were selected to be distributed evenly across 31 chromosomes, giving on average one SNP per 100 000 bp. The SNP-chip was screened across 1968 individual house sparrows from four island populations. Of the original 10 000 SNPs, 7413 were found to be variable, and 99% of these SNPs were successfully called in at least 93% of all individuals. We used the SNP-chip to demonstrate the ability of such genome-wide marker data to detect population sub-division, and compared these results to similar analyses using microsatellites. The SNP-chip will be used to map Quantitative Trait Loci (QTL) for fitness-related phenotypic traits in natural populations. © 2013 Blackwell Publishing Ltd.
Strong Selection at MHC in Mexicans since Admixture
Zhou, Quan; Zhao, Liang; Guan, Yongtao
2016-01-01
Mexicans are a recent admixture of Amerindians, Europeans, and Africans. We performed local ancestry analysis of Mexican samples from two genome-wide association studies obtained from dbGaP, and discovered that at the MHC region Mexicans have excessive African ancestral alleles compared to the rest of the genome, which is the hallmark of recent selection for admixed samples. The estimated selection coefficients are 0.05 and 0.07 for two datasets, which put our finding among the strongest known selections observed in humans, namely, lactase selection in northern Europeans and sickle-cell trait in Africans. Using inaccurate Amerindian training samples was a major concern for the credibility of previously reported selection signals in Latinos. Taking advantage of the flexibility of our statistical model, we devised a model fitting technique that can learn Amerindian ancestral haplotype from the admixed samples, which allows us to infer local ancestries for Mexicans using only European and African training samples. The strong selection signal at the MHC remains without Amerindian training samples. Finally, we note that medical history studies suggest such a strong selection at MHC is plausible in Mexicans. PMID:26863142
A Model-Based Approach for Identifying Signatures of Ancient Balancing Selection in Genetic Data
DeGiorgio, Michael; Lohmueller, Kirk E.; Nielsen, Rasmus
2014-01-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. PMID:25144706
A model-based approach for identifying signatures of ancient balancing selection in genetic data.
DeGiorgio, Michael; Lohmueller, Kirk E; Nielsen, Rasmus
2014-08-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.
Evolution Models with Conditional Mutation Rates: Strange Plateaus in Population Distribution
NASA Astrophysics Data System (ADS)
Saakian, David B.
2017-08-01
Cancer is related to clonal evolution with a strongly nonlinear, collective behavior. Here we investigate a slightly advanced version of the popular Crow-Kimura evolution model, suggested recently, by simply assuming a conditional mutation rate. We investigated the steady-state solution and found a highly intriguing plateau in the distribution. There are selective and nonselective phases, with a rather narrow plateau in the distribution at the peak in the first phase, and a wide plateau for many Hamming classes (a collection of genomes with the same number of mutations from the reference genome) in the second phase. We analytically solved the steady state distribution in the selective and nonselective phases, calculating the widths of the plateaus. Numerically, we also found an intermediate phase with several plateaus in the steady-state distribution, related to large finite-genome-length corrections. We assume that the newly observed phenomena should exist in other versions of evolution dynamics when the parameters of the model are conditioned to the population distribution.
Medaka: a promising model animal for comparative population genomics
Matsumoto, Yoshifumi; Oota, Hiroki; Asaoka, Yoichi; Nishina, Hiroshi; Watanabe, Koji; Bujnicki, Janusz M; Oda, Shoji; Kawamura, Shoji; Mitani, Hiroshi
2009-01-01
Background Within-species genome diversity has been best studied in humans. The international HapMap project has revealed a tremendous amount of single-nucleotide polymorphisms (SNPs) among humans, many of which show signals of positive selection during human evolution. In most of the cases, however, functional differences between the alleles remain experimentally unverified due to the inherent difficulty of human genetic studies. It would therefore be highly useful to have a vertebrate model with the following characteristics: (1) high within-species genetic diversity, (2) a variety of gene-manipulation protocols already developed, and (3) a completely sequenced genome. Medaka (Oryzias latipes) and its congeneric species, tiny fresh-water teleosts distributed broadly in East and Southeast Asia, meet these criteria. Findings Using Oryzias species from 27 local populations, we conducted a simple screening of nonsynonymous SNPs for 11 genes with apparent orthology between medaka and humans. We found medaka SNPs for which the same sites in human orthologs are known to be highly differentiated among the HapMap populations. Importantly, some of these SNPs show signals of positive selection. Conclusion These results indicate that medaka is a promising model system for comparative population genomics exploring the functional and adaptive significance of allelic differentiations. PMID:19426554
Basket Studies: Redefining Clinical Trials in the Era of Genome-Driven Oncology.
Tao, Jessica J; Schram, Alison M; Hyman, David M
2018-01-29
Understanding a tumor's detailed molecular profile has become increasingly necessary to deliver the standard of care for patients with advanced cancer. Innovations in both tumor genomic sequencing technology and the development of drugs that target molecular alterations have fueled recent gains in genome-driven oncology care. "Basket studies," or histology-agnostic clinical trials in genomically selected patients, represent one important research tool to continue making progress in this field. We review key aspects of genome-driven oncology care, including the purpose and utility of basket studies, biostatistical considerations in trial design, genomic knowledgebase development, and patient matching and enrollment models, which are critical for translating our genomic knowledge into clinically meaningful outcomes.
Puttini, Stefania; Ouvrard-Pascaud, Antoine; Palais, Gael; Beggah, Ahmed T; Gascard, Philippe; Cohen-Tannoudji, Michel; Babinet, Charles; Blot-Chabaud, Marcel; Jaisser, Frederic
2005-03-16
Functional genomic analysis is a challenging step in the so-called post-genomic field. Identification of potential targets using large-scale gene expression analysis requires functional validation to identify those that are physiologically relevant. Genetically modified cell models are often used for this purpose allowing up- or down-expression of selected targets in a well-defined and if possible highly differentiated cell type. However, the generation of such models remains time-consuming and expensive. In order to alleviate this step, we developed a strategy aimed at the rapid and efficient generation of genetically modified cell lines with conditional, inducible expression of various target genes. Efficient knock-in of various constructs, called targeted transgenesis, in a locus selected for its permissibility to the tet inducible system, was obtained through the stimulation of site-specific homologous recombination by the meganuclease I-SceI. Our results demonstrate that targeted transgenesis in a reference inducible locus greatly facilitated the functional analysis of the selected recombinant cells. The efficient screening strategy we have designed makes possible automation of the transfection and selection steps. Furthermore, this strategy could be applied to a variety of highly differentiated cells.
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
Selfish drive can trump function when animal mitochondrial genomes compete.
Ma, Hansong; O'Farrell, Patrick H
2016-07-01
Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection. In contrast, matchups between distantly related genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome, leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes showed that the noncoding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, in each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection, promoting change in the sequences influencing transmission.
Selfish drive can trump function when animal mitochondrial genomes compete
Ma, Hansong; O’Farrell, Patrick H.
2016-01-01
Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection1. Contrastingly, matchups between distant genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes revealed that the non-coding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, within each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection promoting change in the sequences influencing transmission. PMID:27270106
Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes.
Studer, Romain A; Penel, Simon; Duret, Laurent; Robinson-Rechavi, Marc
2008-09-01
A stringent branch-site codon model was used to detect positive selection in vertebrate evolution. We show that the test is robust to the large evolutionary distances involved. Positive selection was detected in 77% of 884 genes studied. Most positive selection concerns a few sites on a single branch of the phylogenetic tree: Between 0.9% and 4.7% of sites are affected by positive selection depending on the branches. No functional category was overrepresented among genes under positive selection. Surprisingly, whole genome duplication had no effect on the prevalence of positive selection, whether the fish-specific genome duplication or the two rounds at the origin of vertebrates. Thus positive selection has not been limited to a few gene classes, or to specific evolutionary events such as duplication, but has been pervasive during vertebrate evolution.
Finding Fingerprints of Selection in Poplar Genomes
Tuskan, Gerald
2018-05-30
Jerry Tuskan of Oak Ridge National Laboratory and the DOE JGI talks about poplar trees as models for selective adaptation to an environment. This video complements a study published ahead online August 24, 2014 in Nature Genetics.
Comparative genomics of wild type yeast strains unveils important genome diversity
Carreto, Laura; Eiriz, Maria F; Gomes, Ana C; Pereira, Patrícia M; Schuller, Dorit; Santos, Manuel AS
2008-01-01
Background Genome variability generates phenotypic heterogeneity and is of relevance for adaptation to environmental change, but the extent of such variability in natural populations is still poorly understood. For example, selected Saccharomyces cerevisiae strains are variable at the ploidy level, have gene amplifications, changes in chromosome copy number, and gross chromosomal rearrangements. This suggests that genome plasticity provides important genetic diversity upon which natural selection mechanisms can operate. Results In this study, we have used wild-type S. cerevisiae (yeast) strains to investigate genome variation in natural and artificial environments. We have used comparative genome hybridization on array (aCGH) to characterize the genome variability of 16 yeast strains, of laboratory and commercial origin, isolated from vineyards and wine cellars, and from opportunistic human infections. Interestingly, sub-telomeric instability was associated with the clinical phenotype, while Ty element insertion regions determined genomic differences of natural wine fermentation strains. Copy number depletion of ASP3 and YRF1 genes was found in all wild-type strains. Other gene families involved in transmembrane transport, sugar and alcohol metabolism or drug resistance had copy number changes, which also distinguished wine from clinical isolates. Conclusion We have isolated and genotyped more than 1000 yeast strains from natural environments and carried out an aCGH analysis of 16 strains representative of distinct genotype clusters. Important genomic variability was identified between these strains, in particular in sub-telomeric regions and in Ty-element insertion sites, suggesting that this type of genome variability is the main source of genetic diversity in natural populations of yeast. The data highlights the usefulness of yeast as a model system to unravel intraspecific natural genome diversity and to elucidate how natural selection shapes the yeast genome. PMID:18983662
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
Uniparental Inheritance Promotes Adaptive Evolution in Cytoplasmic Genomes
Christie, Joshua R.; Beekman, Madeleine
2017-01-01
Eukaryotes carry numerous asexual cytoplasmic genomes (mitochondria and plastids). Lacking recombination, asexual genomes should theoretically suffer from impaired adaptive evolution. Yet, empirical evidence indicates that cytoplasmic genomes experience higher levels of adaptive evolution than predicted by theory. In this study, we use a computational model to show that the unique biology of cytoplasmic genomes—specifically their organization into host cells and their uniparental (maternal) inheritance—enable them to undergo effective adaptive evolution. Uniparental inheritance of cytoplasmic genomes decreases competition between different beneficial substitutions (clonal interference), promoting the accumulation of beneficial substitutions. Uniparental inheritance also facilitates selection against deleterious cytoplasmic substitutions, slowing Muller’s ratchet. In addition, uniparental inheritance generally reduces genetic hitchhiking of deleterious substitutions during selective sweeps. Overall, uniparental inheritance promotes adaptive evolution by increasing the level of beneficial substitutions relative to deleterious substitutions. When we assume that cytoplasmic genome inheritance is biparental, decreasing the number of genomes transmitted during gametogenesis (bottleneck) aids adaptive evolution. Nevertheless, adaptive evolution is always more efficient when inheritance is uniparental. Our findings explain empirical observations that cytoplasmic genomes—despite their asexual mode of reproduction—can readily undergo adaptive evolution. PMID:28025277
Statistical Selection of Biological Models for Genome-Wide Association Analyses.
Bi, Wenjian; Kang, Guolian; Pounds, Stanley B
2018-05-24
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie
2014-05-12
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.« less
Population-genetic models of sex-limited genomic imprinting.
Kelly, S Thomas; Spencer, Hamish G
2017-06-01
Genomic imprinting is a form of epigenetic modification involving parent-of-origin-dependent gene expression, usually the inactivation of one gene copy in some tissues, at least, for some part of the diploid life cycle. Occurring at a number of loci in mammals and flowering plants, this mode of non-Mendelian expression can be viewed more generally as parentally-specific differential gene expression. The effects of natural selection on genetic variation at imprinted loci have previously been examined in a several population-genetic models. Here we expand the existing one-locus, two-allele population-genetic models of viability selection with genomic imprinting to include sex-limited imprinting, i.e., imprinted expression occurring only in one sex, and differential viability between the sexes. We first consider models of complete inactivation of either parental allele and these models are subsequently generalized to incorporate differential expression. Stable polymorphic equilibrium was possible without heterozygote advantage as observed in some prior models of imprinting in both sexes. In contrast to these latter models, in the sex-limited case it was critical whether the paternally inherited or maternally inherited allele was inactivated. The parental origin of inactivated alleles had a different impact on how the population responded to the different selection pressures between the sexes. Under the same fitness parameters, imprinting in the other sex altered the number of possible equilibrium states and their stability. When the parental origin of imprinted alleles and the sex in which they are inactive differ, an allele cannot be inactivated in consecutive generations. The system dynamics became more complex with more equilibrium points emerging. Our results show that selection can interact with epigenetic factors to maintain genetic variation in previously unanticipated ways. Copyright © 2017 Elsevier Inc. All rights reserved.
The scope and strength of sex-specific selection in genome evolution
Wright, A E; Mank, J E
2013-01-01
Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. PMID:23848139
2013-01-01
Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks
2011-01-01
Background Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. Results A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Conclusions Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced. PMID:21849086
Evolutionary versatility of eukaryotic protein domains revealed by their bigram networks.
Xie, Xueying; Jin, Jing; Mao, Yongyi
2011-08-18
Protein domains are globular structures of independently folded polypeptides that exert catalytic or binding activities. Their sequences are recognized as evolutionary units that, through genome recombination, constitute protein repertoires of linkage patterns. Via mutations, domains acquire modified functions that contribute to the fitness of cells and organisms. Recent studies have addressed the evolutionary selection that may have shaped the functions of individual domains and the emergence of particular domain combinations, which led to new cellular functions in multi-cellular animals. This study focuses on modeling domain linkage globally and investigates evolutionary implications that may be revealed by novel computational analysis. A survey of 77 completely sequenced eukaryotic genomes implies a potential hierarchical and modular organization of biological functions in most living organisms. Domains in a genome or multiple genomes are modeled as a network of hetero-duplex covalent linkages, termed bigrams. A novel computational technique is introduced to decompose such networks, whereby the notion of domain "networking versatility" is derived and measured. The most and least "versatile" domains (termed "core domains" and "peripheral domains" respectively) are examined both computationally via sequence conservation measures and experimentally using selected domains. Our study suggests that such a versatility measure extracted from the bigram networks correlates with the adaptivity of domains during evolution, where the network core domains are highly adaptive, significantly contrasting the network peripheral domains. Domain recombination has played a major part in the evolution of eukaryotes attributing to genome complexity. From a system point of view, as the results of selection and constant refinement, networks of domain linkage are structured in a hierarchical modular fashion. Domains with high degree of networking versatility appear to be evolutionary adaptive, potentially through functional innovations. Domain bigram networks are informative as a model of biological functions. The networking versatility indices extracted from such networks for individual domains reflect the strength of evolutionary selection that the domains have experienced.
TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS
Jones, Matthew R.; Good, Jeffrey M.
2016-01-01
The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993
Darwinian evolution in the light of genomics
Koonin, Eugene V.
2009-01-01
Comparative genomics and systems biology offer unprecedented opportunities for testing central tenets of evolutionary biology formulated by Darwin in the Origin of Species in 1859 and expanded in the Modern Synthesis 100 years later. Evolutionary-genomic studies show that natural selection is only one of the forces that shape genome evolution and is not quantitatively dominant, whereas non-adaptive processes are much more prominent than previously suspected. Major contributions of horizontal gene transfer and diverse selfish genetic elements to genome evolution undermine the Tree of Life concept. An adequate depiction of evolution requires the more complex concept of a network or ‘forest’ of life. There is no consistent tendency of evolution towards increased genomic complexity, and when complexity increases, this appears to be a non-adaptive consequence of evolution under weak purifying selection rather than an adaptation. Several universals of genome evolution were discovered including the invariant distributions of evolutionary rates among orthologous genes from diverse genomes and of paralogous gene family sizes, and the negative correlation between gene expression level and sequence evolution rate. Simple, non-adaptive models of evolution explain some of these universals, suggesting that a new synthesis of evolutionary biology might become feasible in a not so remote future. PMID:19213802
Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine
Elsik, Christine G.; Tayal, Aditi; Diesh, Colin M.; Unni, Deepak R.; Emery, Marianne L.; Nguyen, Hung N.; Hagen, Darren E.
2016-01-01
We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. PMID:26578564
Wolc, Anna; Stricker, Chris; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Preisinger, Rudolf; Habier, David; Fernando, Rohan; Garrick, Dorian J; Lamont, Susan J; Dekkers, Jack C M
2011-01-21
Genomic selection involves breeding value estimation of selection candidates based on high-density SNP genotypes. To quantify the potential benefit of genomic selection, accuracies of estimated breeding values (EBV) obtained with different methods using pedigree or high-density SNP genotypes were evaluated and compared in a commercial layer chicken breeding line. The following traits were analyzed: egg production, egg weight, egg color, shell strength, age at sexual maturity, body weight, albumen height, and yolk weight. Predictions appropriate for early or late selection were compared. A total of 2,708 birds were genotyped for 23,356 segregating SNP, including 1,563 females with records. Phenotypes on relatives without genotypes were incorporated in the analysis (in total 13,049 production records).The data were analyzed with a Reduced Animal Model using a relationship matrix based on pedigree data or on marker genotypes and with a Bayesian method using model averaging. Using a validation set that consisted of individuals from the generation following training, these methods were compared by correlating EBV with phenotypes corrected for fixed effects, selecting the top 30 individuals based on EBV and evaluating their mean phenotype, and by regressing phenotypes on EBV. Using high-density SNP genotypes increased accuracies of EBV up to two-fold for selection at an early age and by up to 88% for selection at a later age. Accuracy increases at an early age can be mostly attributed to improved estimates of parental EBV for shell quality and egg production, while for other egg quality traits it is mostly due to improved estimates of Mendelian sampling effects. A relatively small number of markers was sufficient to explain most of the genetic variation for egg weight and body weight.
Martínez, R; Gómez, Y; Rocha, J F M
2014-08-25
Whole genome selection represents an important tool for improving parameters related to the production of livestock. In order to build genomic selection indexes within a particular breed, it is important to identify polymorphisms that have the most significant association with a desired trait. A genome-wide marker association approach based on the Illumina BovineSNP50 BeadChip(TM) was used to identify genomic regions affecting birth weight (BW), weaning weight (WW), and daily weight gain (DWG) in purebred and crossbred creole cattle populations. We genotyped 654 individuals of Blanco Orejinegro (BON), Romosinuano (ROMO) and Cebú breeds and the crossbreeds BON x Cebú and ROMO x Cebú, and tested 5 genetic control models. In total, 85 single nucleotide polymorphisms (SNPs) were related (P < 0.05) to the 3 evaluated traits; BW was associated with the highest number of SNPs. For statistical false-positive correction, Bonferroni correction was used. From the results, we identified 7, 6, and 4 SNPs with strong associations with BW, WW, and DWG, respectively. Many of these SNPs were located on important coding regions of the bovine genome; their ontology and interactions are discussed herein. The results could contribute to the identification of genes involved in the physiology of beef cattle growth and the development of new strategies for breeding management via genomic selection to improve the productivity of creole cattle herds.
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
Genomic Prediction of Testcross Performance in Canola (Brassica napus)
Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.
2016-01-01
Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924
Understanding the Origin of Species with Genome-Scale Data: the Role of Gene Flow
Sousa, Vitor; Hey, Jody
2017-01-01
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population-genomic data sets. Such data hold the potential to resolve evolutionary biology’s long-standing questions about the role of gene exchange in species formation. In principle the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here. PMID:23657479
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Rochus, Christina Marie; Tortereau, Flavie; Plisson-Petit, Florence; Restoux, Gwendal; Moreno-Romieux, Carole; Tosser-Klopp, Gwenola; Servin, Bertrand
2018-01-23
One of the approaches to detect genetics variants affecting fitness traits is to identify their surrounding genomic signatures of past selection. With established methods for detecting selection signatures and the current and future availability of large datasets, such studies should have the power to not only detect these signatures but also to infer their selective histories. Domesticated animals offer a powerful model for these approaches as they adapted rapidly to environmental and human-mediated constraints in a relatively short time. We investigated this question by studying a large dataset of 542 individuals from 27 domestic sheep populations raised in France, genotyped for more than 500,000 SNPs. Population structure analysis revealed that this set of populations harbour a large part of European sheep diversity in a small geographical area, offering a powerful model for the study of adaptation. Identification of extreme SNP and haplotype frequency differences between populations listed 126 genomic regions likely affected by selection. These signatures revealed selection at loci commonly identified as selection targets in many species ("selection hotspots") including ABCG2, LCORL/NCAPG, MSTN, and coat colour genes such as ASIP, MC1R, MITF, and TYRP1. For one of these regions (ABCG2, LCORL/NCAPG), we could propose a historical scenario leading to the introgression of an adaptive allele into a new genetic background. Among selection signatures, we found clear evidence for parallel selection events in different genetic backgrounds, most likely for different mutations. We confirmed this allelic heterogeneity in one case by resequencing the MC1R gene in three black-faced breeds. Our study illustrates how dense genetic data in multiple populations allows the deciphering of evolutionary history of populations and of their adaptive mutations.
Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars
2017-02-10
The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
Genome-wide association analysis for feed efficiency in Angus cattle.
Rolf, M M; Taylor, J F; Schnabel, R D; McKay, S D; McClure, M C; Northcutt, S L; Kerley, M S; Weaber, R L
2012-08-01
Estimated breeding values for average daily feed intake (AFI; kg/day), residual feed intake (RFI; kg/day) and average daily gain (ADG; kg/day) were generated using a mixed linear model incorporating genomic relationships for 698 Angus steers genotyped with the Illumina BovineSNP50 assay. Association analyses of estimated breeding values (EBVs) were performed for 41,028 single nucleotide polymorphisms (SNPs), and permutation analysis was used to empirically establish the genome-wide significance threshold (P < 0.05) for each trait. SNPs significantly associated with each trait were used in a forward selection algorithm to identify genomic regions putatively harbouring genes with effects on each trait. A total of 53, 66 and 68 SNPs explained 54.12% (24.10%), 62.69% (29.85%) and 55.13% (26.54%) of the additive genetic variation (when accounting for the genomic relationships) in steer breeding values for AFI, RFI and ADG, respectively, within this population. Evaluation by pathway analysis revealed that many of these SNPs are in genomic regions that harbour genes with metabolic functions. The presence of genetic correlations between traits resulted in 13.2% of SNPs selected for AFI and 4.5% of SNPs selected for RFI also being selected for ADG in the analysis of breeding values. While our study identifies panels of SNPs significant for efficiency traits in our population, validation of all SNPs in independent populations will be necessary before commercialization. © 2011 The Authors, Animal Genetics © 2011 Stichting International Foundation for Animal Genetics.
Mitigating Mitochondrial Genome Erosion Without Recombination.
Radzvilavicius, Arunas L; Kokko, Hanna; Christie, Joshua R
2017-11-01
Mitochondria are ATP-producing organelles of bacterial ancestry that played a key role in the origin and early evolution of complex eukaryotic cells. Most modern eukaryotes transmit mitochondrial genes uniparentally, often without recombination among genetically divergent organelles. While this asymmetric inheritance maintains the efficacy of purifying selection at the level of the cell, the absence of recombination could also make the genome susceptible to Muller's ratchet. How mitochondria escape this irreversible defect accumulation is a fundamental unsolved question. Occasional paternal leakage could in principle promote recombination, but it would also compromise the purifying selection benefits of uniparental inheritance. We assess this tradeoff using a stochastic population-genetic model. In the absence of recombination, uniparental inheritance of freely-segregating genomes mitigates mutational erosion, while paternal leakage exacerbates the ratchet effect. Mitochondrial fusion-fission cycles ensure independent genome segregation, improving purifying selection. Paternal leakage provides opportunity for recombination to slow down the mutation accumulation, but always at a cost of increased steady-state mutation load. Our findings indicate that random segregation of mitochondrial genomes under uniparental inheritance can effectively combat the mutational meltdown, and that homologous recombination under paternal leakage might not be needed. Copyright © 2017 by the Genetics Society of America.
Genomic selection in plant breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor ...
The scope and strength of sex-specific selection in genome evolution.
Wright, A E; Mank, J E
2013-09-01
Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
2015-01-01
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
Genomics of local adaptation with gene flow.
Tigano, Anna; Friesen, Vicki L
2016-05-01
Gene flow is a fundamental evolutionary force in adaptation that is especially important to understand as humans are rapidly changing both the natural environment and natural levels of gene flow. Theory proposes a multifaceted role for gene flow in adaptation, but it focuses mainly on the disruptive effect that gene flow has on adaptation when selection is not strong enough to prevent the loss of locally adapted alleles. The role of gene flow in adaptation is now better understood due to the recent development of both genomic models of adaptive evolution and genomic techniques, which both point to the importance of genetic architecture in the origin and maintenance of adaptation with gene flow. In this review, we discuss three main topics on the genomics of adaptation with gene flow. First, we investigate selection on migration and gene flow. Second, we discuss the three potential sources of adaptive variation in relation to the role of gene flow in the origin of adaptation. Third, we explain how local adaptation is maintained despite gene flow: we provide a synthesis of recent genomic models of adaptation, discuss the genomic mechanisms and review empirical studies on the genomics of adaptation with gene flow. Despite predictions on the disruptive effect of gene flow in adaptation, an increasing number of studies show that gene flow can promote adaptation, that local adaptations can be maintained despite high gene flow, and that genetic architecture plays a fundamental role in the origin and maintenance of local adaptation with gene flow. © 2016 John Wiley & Sons Ltd.
Conserved noncoding sequences conserve biological networks and influence genome evolution.
Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang
2018-05-01
Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
The Evolution of the Human Genome
Simonti, Corinne N.; Capra, John A.
2015-01-01
Human genomes hold a record of the evolutionary forces that have shaped our species. Advances in DNA sequencing, functional genomics, and population genetic modeling have deepened our understanding of human demographic history, natural selection, and many other long-studied topics. These advances have also revealed many previously underappreciated factors that influence the evolution of the human genome, including functional modifications to DNA and histones, conserved 3D topological chromatin domains, structural variation, and heterogeneous mutation patterns along the genome. Using evolutionary theory as a lens to study these phenomena will lead to significant breakthroughs in understanding what makes us human and why we get sick. PMID:26338498
Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
NASA Astrophysics Data System (ADS)
Qian, Long; Kussell, Edo
2016-10-01
The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.
Vallejo, Roger L; Leeds, Timothy D; Gao, Guangtu; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Fragomeni, Breno O; Wiens, Gregory D; Palti, Yniv
2017-02-01
Previously, we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative that enables exploitation of within-family genetic variation. We compared three GS models [single-step genomic best linear unbiased prediction (ssGBLUP), weighted ssGBLUP (wssGBLUP), and BayesB] to predict genomic-enabled breeding values (GEBV) for BCWD resistance in a commercial rainbow trout population, and compared the accuracy of GEBV to traditional estimates of breeding values (EBV) from a pedigree-based BLUP (P-BLUP) model. We also assessed the impact of sampling design on the accuracy of GEBV predictions. For these comparisons, we used BCWD survival phenotypes recorded on 7893 fish from 102 families, of which 1473 fish from 50 families had genotypes [57 K single nucleotide polymorphism (SNP) array]. Naïve siblings of the training fish (n = 930 testing fish) were genotyped to predict their GEBV and mated to produce 138 progeny testing families. In the following generation, 9968 progeny were phenotyped to empirically assess the accuracy of GEBV predictions made on their non-phenotyped parents. The accuracy of GEBV from all tested GS models were substantially higher than the P-BLUP model EBV. The highest increase in accuracy relative to the P-BLUP model was achieved with BayesB (97.2 to 108.8%), followed by wssGBLUP at iteration 2 (94.4 to 97.1%) and 3 (88.9 to 91.2%) and ssGBLUP (83.3 to 85.3%). Reducing the training sample size to n = ~1000 had no negative impact on the accuracy (0.67 to 0.72), but with n = ~500 the accuracy dropped to 0.53 to 0.61 if the training and testing fish were full-sibs, and even substantially lower, to 0.22 to 0.25, when they were not full-sibs. Using progeny performance data, we showed that the accuracy of genomic predictions is substantially higher than estimates obtained from the traditional pedigree-based BLUP model for BCWD resistance. Overall, we found that using a much smaller training sample size compared to similar studies in livestock, GS can substantially improve the selection accuracy and genetic gains for this trait in a commercial rainbow trout breeding population.
Fuller, Zachary L; Niño, Elina L; Patch, Harland M; Bedoya-Reina, Oscar C; Baumgarten, Tracey; Muli, Elliud; Mumoki, Fiona; Ratan, Aakrosh; McGraw, John; Frazier, Maryann; Masiga, Daniel; Schuster, Stephen; Grozinger, Christina M; Miller, Webb
2015-07-10
With the development of inexpensive, high-throughput sequencing technologies, it has become feasible to examine questions related to population genetics and molecular evolution of non-model species in their ecological contexts on a genome-wide scale. Here, we employed a newly developed suite of integrated, web-based programs to examine population dynamics and signatures of selection across the genome using several well-established tests, including F ST, pN/pS, and McDonald-Kreitman. We applied these techniques to study populations of honey bees (Apis mellifera) in East Africa. In Kenya, there are several described A. mellifera subspecies, which are thought to be localized to distinct ecological regions. We performed whole genome sequencing of 11 worker honey bees from apiaries distributed throughout Kenya and identified 3.6 million putative single-nucleotide polymorphisms. The dense coverage allowed us to apply several computational procedures to study population structure and the evolutionary relationships among the populations, and to detect signs of adaptive evolution across the genome. While there is considerable gene flow among the sampled populations, there are clear distinctions between populations from the northern desert region and those from the temperate, savannah region. We identified several genes showing population genetic patterns consistent with positive selection within African bee populations, and between these populations and European A. mellifera or Asian Apis florea. These results lay the groundwork for future studies of adaptive ecological evolution in honey bees, and demonstrate the use of new, freely available web-based tools and workflows ( http://usegalaxy.org/r/kenyanbee ) that can be applied to any model system with genomic information.
iPat: intelligent prediction and association tool for genomic research.
Chen, Chunpeng James; Zhang, Zhiwu
2018-06-01
The ultimate goal of genomic research is to effectively predict phenotypes from genotypes so that medical management can improve human health and molecular breeding can increase agricultural production. Genomic prediction or selection (GS) plays a complementary role to genome-wide association studies (GWAS), which is the primary method to identify genes underlying phenotypes. Unfortunately, most computing tools cannot perform data analyses for both GWAS and GS. Furthermore, the majority of these tools are executed through a command-line interface (CLI), which requires programming skills. Non-programmers struggle to use them efficiently because of the steep learning curves and zero tolerance for data formats and mistakes when inputting keywords and parameters. To address these problems, this study developed a software package, named the Intelligent Prediction and Association Tool (iPat), with a user-friendly graphical user interface. With iPat, GWAS or GS can be performed using a pointing device to simply drag and/or click on graphical elements to specify input data files, choose input parameters and select analytical models. Models available to users include those implemented in third party CLI packages such as GAPIT, PLINK, FarmCPU, BLINK, rrBLUP and BGLR. Users can choose any data format and conduct analyses with any of these packages. File conversions are automatically conducted for specified input data and selected packages. A GWAS-assisted genomic prediction method was implemented to perform genomic prediction using any GWAS method such as FarmCPU. iPat was written in Java for adaptation to multiple operating systems including Windows, Mac and Linux. The iPat executable file, user manual, tutorials and example datasets are freely available at http://zzlab.net/iPat. zhiwu.zhang@wsu.edu.
Shedding genomic light on Aristotle's lantern.
Sodergren, Erica; Shen, Yufeng; Song, Xingzhi; Zhang, Lan; Gibbs, Richard A; Weinstock, George M
2006-12-01
Sea urchins have proved fascinating to biologists since the time of Aristotle who compared the appearance of their bony mouth structure to a lantern in The History of Animals. Throughout modern times it has been a model system for research in developmental biology. Now, the genome of the sea urchin Strongylocentrotus purpuratus is the first echinoderm genome to be sequenced. A high quality draft sequence assembly was produced using the Atlas assembler to combine whole genome shotgun sequences with sequences from a collection of BACs selected to form a minimal tiling path along the genome. A formidable challenge was presented by the high degree of heterozygosity between the two haplotypes of the selected male representative of this marine organism. This was overcome by use of the BAC tiling path backbone, in which each BAC represents a single haplotype, as well as by improvements in the Atlas software. Another innovation introduced in this project was the sequencing of pools of tiling path BACs rather than individual BAC sequencing. The Clone-Array Pooled Shotgun Strategy greatly reduced the cost and time devoted to preparing shotgun libraries from BAC clones. The genome sequence was analyzed with several gene prediction methods to produce a comprehensive gene list that was then manually refined and annotated by a volunteer team of sea urchin experts. This latter annotation community edited over 9000 gene models and uncovered many unexpected aspects of the sea urchin genetic content impacting transcriptional regulation, immunology, sensory perception, and an organism's development. Analysis of the basic deuterostome genetic complement supports the sea urchin's role as a model system for deuterostome and, by extension, chordate development.
Analyses of pig genomes provide insight into porcine demography and evolution
Groenen, Martien A. M.; Archibald, Alan L.; Uenishi, Hirohide; Tuggle, Christopher K.; Takeuchi, Yasuhiro; Rothschild, Max F.; Rogel-Gaillard, Claire; Park, Chankyu; Milan, Denis; Megens, Hendrik-Jan; Li, Shengting; Larkin, Denis M.; Kim, Heebal; Frantz, Laurent A. F.; Caccamo, Mario; Ahn, Hyeonju; Aken, Bronwen L.; Anselmo, Anna; Anthon, Christian; Auvil, Loretta; Badaoui, Bouabid; Beattie, Craig W.; Bendixen, Christian; Berman, Daniel; Blecha, Frank; Blomberg, Jonas; Bolund, Lars; Bosse, Mirte; Botti, Sara; Bujie, Zhan; Bystrom, Megan; Capitanu, Boris; Silva, Denise Carvalho; Chardon, Patrick; Chen, Celine; Cheng, Ryan; Choi, Sang-Haeng; Chow, William; Clark, Richard C.; Clee, Christopher; Crooijmans, Richard P. M. A.; Dawson, Harry D.; Dehais, Patrice; De Sapio, Fioravante; Dibbits, Bert; Drou, Nizar; Du, Zhi-Qiang; Eversole, Kellye; Fadista, João; Fairley, Susan; Faraut, Thomas; Faulkner, Geoffrey J.; Fowler, Katie E.; Fredholm, Merete; Fritz, Eric; Gilbert, James G. R.; Giuffra, Elisabetta; Gorodkin, Jan; Griffin, Darren K.; Harrow, Jennifer L.; Hayward, Alexander; Howe, Kerstin; Hu, Zhi-Liang; Humphray, Sean J.; Hunt, Toby; Hornshøj, Henrik; Jeon, Jin-Tae; Jern, Patric; Jones, Matthew; Jurka, Jerzy; Kanamori, Hiroyuki; Kapetanovic, Ronan; Kim, Jaebum; Kim, Jae-Hwan; Kim, Kyu-Won; Kim, Tae-Hun; Larson, Greger; Lee, Kyooyeol; Lee, Kyung-Tai; Leggett, Richard; Lewin, Harris A.; Li, Yingrui; Liu, Wansheng; Loveland, Jane E.; Lu, Yao; Lunney, Joan K.; Ma, Jian; Madsen, Ole; Mann, Katherine; Matthews, Lucy; McLaren, Stuart; Morozumi, Takeya; Murtaugh, Michael P.; Narayan, Jitendra; Nguyen, Dinh Truong; Ni, Peixiang; Oh, Song-Jung; Onteru, Suneel; Panitz, Frank; Park, Eung-Woo; Park, Hong-Seog; Pascal, Geraldine; Paudel, Yogesh; Perez-Enciso, Miguel; Ramirez-Gonzalez, Ricardo; Reecy, James M.; Zas, Sandra Rodriguez; Rohrer, Gary A.; Rund, Lauretta; Sang, Yongming; Schachtschneider, Kyle; Schraiber, Joshua G.; Schwartz, John; Scobie, Linda; Scott, Carol; Searle, Stephen; Servin, Bertrand; Southey, Bruce R.; Sperber, Goran; Stadler, Peter; Sweedler, Jonathan V.; Tafer, Hakim; Thomsen, Bo; Wali, Rashmi; Wang, Jian; Wang, Jun; White, Simon; Xu, Xun; Yerle, Martine; Zhang, Guojie; Zhang, Jianguo; Zhang, Jie; Zhao, Shuhong; Rogers, Jane; Churcher, Carol; Schook, Lawrence B.
2013-01-01
For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ~1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model. PMID:23151582
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses.
Talukder, Shyamal K; Saha, Malay C
2017-01-01
Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS) are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs). Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.
Genomic Characterisation of the Indigenous Irish Kerry Cattle Breed
Browett, Sam; McHugo, Gillian; Richardson, Ian W.; Magee, David A.; Park, Stephen D. E.; Fahey, Alan G.; Kearney, John F.; Correia, Carolina N.; Randhawa, Imtiaz A. S.; MacHugh, David E.
2018-01-01
Kerry cattle are an endangered landrace heritage breed of cultural importance to Ireland. In the present study we have used genome-wide SNP array data to evaluate genomic diversity within the Kerry population and between Kerry cattle and other European breeds. Patterns of genetic differentiation and gene flow among breeds using phylogenetic trees with ancestry graphs highlighted historical gene flow from the British Shorthorn breed into the ancestral population of modern Kerry cattle. Principal component analysis (PCA) and genetic clustering emphasised the genetic distinctiveness of Kerry cattle relative to comparator British and European cattle breeds. Modelling of genetic effective population size (Ne) revealed a demographic trend of diminishing Ne over time and that recent estimated Ne values for the Kerry breed may be less than the threshold for sustainable genetic conservation. In addition, analysis of genome-wide autozygosity (FROH) showed that genomic inbreeding has increased significantly during the 20 years between 1992 and 2012. Finally, signatures of selection revealed genomic regions subject to natural and artificial selection as Kerry cattle adapted to the climate, physical geography and agro-ecology of southwest Ireland. PMID:29520297
Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity
Andersen, Erik C.; Gerke, Justin P.; Shapiro, Joshua A.; Crissman, Jonathan R.; Ghosh, Rajarshi; Bloom, Joshua S.; Félix, Marie-Anne; Kruglyak, Leonid
2011-01-01
The nematode Caenorhabditis elegans is central to research in molecular, cell, and developmental biology, but nearly all of this research has been conducted on a single strain. Comparatively little is known about the population genomic and evolutionary history of this species. We characterized C. elegans genetic variation by high-throughput selective sequencing of a worldwide collection of 200 wild strains, identifying 41,188 single nucleotide polymorphisms. Unexpectedly, C. elegans genome variation is dominated by a set of commonly shared haplotypes on four of the six chromosomes, each spanning many megabases. Population-genetic modeling shows that this pattern was generated by chromosome-scale selective sweeps that have reduced variation worldwide; at least one of these sweeps likely occurred in the past few hundred years. These sweeps, which we hypothesize to be a result of human activity, have dramatically reshaped the global C. elegans population in the recent past. PMID:22286215
2012-05-01
determined and compared to simian and human herpesvirus genomes representing alpha-herpesvi- ruses, beta- herpesviruses and gamma-1 and gamma-2 her...report the isolation of a previously unknown herpesvirus , JMRV, isolated from acute JME TABLE 2: Clustal W Alignment of JMRV Genome with Select Simian and...to use this model in pre-clinical screens of novel agents with the potential to inhibit MS attacks and to promote remyelination and regeneration
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
Coalescence and genetic diversity in sexual populations under selection.
Neher, Richard A; Kessinger, Taylor A; Shraiman, Boris I
2013-09-24
In sexual populations, selection operates neither on the whole genome, which is repeatedly taken apart and reassembled by recombination, nor on individual alleles that are tightly linked to the chromosomal neighborhood. The resulting interference between linked alleles reduces the efficiency of selection and distorts patterns of genetic diversity. Inference of evolutionary history from diversity shaped by linked selection requires an understanding of these patterns. Here, we present a simple but powerful scaling analysis identifying the unit of selection as the genomic "linkage block" with a characteristic length, , determined in a self-consistent manner by the condition that the rate of recombination within the block is comparable to the fitness differences between different alleles of the block. We find that an asexual model with the strength of selection tuned to that of the linkage block provides an excellent description of genetic diversity and the site frequency spectra compared with computer simulations. This linkage block approximation is accurate for the entire spectrum of strength of selection and is particularly powerful in scenarios with many weakly selected loci. The latter limit allows us to characterize coalescence, genetic diversity, and the speed of adaptation in the infinitesimal model of quantitative genetics.
The Genomic Impacts of Drift and Selection for Hybrid Performance in Maize
Gerke, Justin P.; Edwards, Jode W.; Guill, Katherine E.; Ross-Ibarra, Jeffrey; McMullen, Michael D.
2015-01-01
Although maize is naturally an outcrossing organism, modern breeding utilizes highly inbred lines in controlled crosses to produce hybrids. The U.S. Department of Agriculture’s reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest running experiments to understand the response to selection for hybrid performance. To investigate the genomic impact of this selection program, we genotyped the progenitor lines and >600 individuals across multiple cycles of selection using a genome-wide panel of ∼40,000 SNPs. We confirmed previous results showing a steady temporal decrease in genetic diversity within populations and a corresponding increase in differentiation between populations. Thanks to detailed historical information on experimental design, we were able to perform extensive simulations using founder haplotypes to replicate the experiment in the absence of selection. These simulations demonstrate that while most of the observed reduction in genetic diversity can be attributed to genetic drift, heterozygosity in each population has fallen more than expected. We then took advantage of our high-density genotype data to identify extensive regions of haplotype fixation and trace haplotype ancestry to single founder inbred lines. The vast majority of regions showing such evidence of selection differ between the two populations, providing evidence for the dominance model of heterosis. We discuss how this pattern is likely to occur during selection for hybrid performance and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome. PMID:26385980
Oilseed rape: learning about ancient and recent polyploid evolution from a recent crop species.
Mason, A S; Snowdon, R J
2016-11-01
Oilseed rape (Brassica napus) is one of our youngest crop species, arising several times under cultivation in the last few thousand years and completely unknown in the wild. Oilseed rape originated from hybridisation events between progenitor diploid species B. rapa and B. oleracea, both important vegetable species. The diploid progenitors are also ancient polyploids, with remnants of two previous polyploidisation events evident in the triplicated genome structure. This history of polyploid evolution and human agricultural selection makes B. napus an excellent model with which to investigate processes of genomic evolution and selection in polyploid crops. The ease of de novo interspecific hybridisation, responsiveness to tissue culture, and the close relationship of oilseed rape to the model plant Arabidopsis thaliana, coupled with the recent availability of reference genome sequences and suites of molecular cytogenetic and high-throughput genotyping tools, allow detailed dissection of genetic, genomic and phenotypic interactions in this crop. In this review we discuss the past and present uses of B. napus as a model for polyploid speciation and evolution in crop species, along with current and developing analysis tools and resources. We further outline unanswered questions that may now be tractable to investigation. © 2016 German Botanical Society and The Royal Botanical Society of the Netherlands.
Montague, Michael J; Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L; Searle, Steven M J; Minx, Patrick; Hillier, LaDeana W; Koboldt, Daniel C; Davis, Brian W; Driscoll, Carlos A; Barr, Christina S; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W C; Hahn, Matthew W; Menotti-Raymond, Marilyn; O'Brien, Stephen J; Wilson, Richard K; Lyons, Leslie A; Murphy, William J; Warren, Wesley C
2014-12-02
Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae.
Li, Gang; Gandolfi, Barbara; Khan, Razib; Aken, Bronwen L.; Searle, Steven M. J.; Minx, Patrick; Hillier, LaDeana W.; Koboldt, Daniel C.; Davis, Brian W.; Driscoll, Carlos A.; Barr, Christina S.; Blackistone, Kevin; Quilez, Javier; Lorente-Galdos, Belen; Marques-Bonet, Tomas; Alkan, Can; Thomas, Gregg W. C.; Hahn, Matthew W.; Menotti-Raymond, Marilyn; O’Brien, Stephen J.; Wilson, Richard K.; Lyons, Leslie A.; Murphy, William J.; Warren, Wesley C.
2014-01-01
Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae. PMID:25385592
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato.
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs. PMID:29563919
SENCA: A Multilayered Codon Model to Study the Origins and Dynamics of Codon Usage
Pouyet, Fanny; Bailly-Bechet, Marc; Mouchiroud, Dominique; Guéguen, Laurent
2016-01-01
Gene sequences are the target of evolution operating at different levels, including the nucleotide, codon, and amino acid levels. Disentangling the impact of those different levels on gene sequences requires developing a probabilistic model with three layers. Here we present SENCA (site evolution of nucleotides, codons, and amino acids), a codon substitution model that separately describes 1) nucleotide processes which apply on all sites of a sequence such as the mutational bias, 2) preferences between synonymous codons, and 3) preferences among amino acids. We argue that most synonymous substitutions are not neutral and that SENCA provides more accurate estimates of selection compared with more classical codon sequence models. We study the forces that drive the genomic content evolution, intraspecifically in the core genome of 21 prokaryotes and interspecifically for five Enterobacteria. We retrieve the existence of a universal mutational bias toward AT, and that taking into account selection on synonymous codon usage has consequences on the measurement of selection on nonsynonymous substitutions. We also confirm that codon usage bias is mostly driven by selection on preferred codons. We propose new summary statistics to measure the relative importance of the different evolutionary processes acting on sequences. PMID:27401173
Ghosh, Soma; Prava, Jyoti; Samal, Himanshu Bhusan; Suar, Mrutyunjay; Mahapatra, Rajani Kanta
2014-06-01
Now-a-days increasing emergence of antibiotic-resistant pathogenic microorganisms is one of the biggest challenges for management of disease. In the present study comparative genomics, metabolic pathways analysis and additional parameters were defined for the identification of 94 non-homologous essential proteins in Staphylococcus aureus genome. Further study prioritized 19 proteins as vaccine candidates where as druggability study reports 34 proteins suitable as drug targets. Enzymes from peptidoglycan biosynthesis, folate biosynthesis were identified as candidates for drug development. Furthermore, bacterial secretory proteins and few hypothetical proteins identified in our analysis fulfill the criteria of vaccine candidates. As a case study, we built a homology model of one of the potential drug target, MurA ligase, using MODELLER (9v12) software. The model has been further selected for in silico docking study with inhibitors from the DrugBank database. Results from this study could facilitate selection of proteins for entry into drug design and vaccine production pipelines. Copyright © 2014 Elsevier B.V. All rights reserved.
Genomic selection in plant breeding.
Newell, Mark A; Jannink, Jean-Luc
2014-01-01
Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor and major marker effects. Thus, the GEBV may capture more of the genetic variation for the particular trait under selection.
A genomic perspective on the generation and maintenance of genetic diversity in herbivorous insects
Gloss, Andrew D.; Groen, Simon C.; Whiteman, Noah K.
2017-01-01
Understanding the processes that generate and maintain genetic variation within populations is a central goal in evolutionary biology. Theory predicts that some of this variation is maintained as a consequence of adapting to variable habitats. Studies in herbivorous insects have played a key role in confirming this prediction. Here, we highlight theoretical and conceptual models for the maintenance of genetic diversity in herbivorous insects, empirical genomic studies testing these models, and pressing questions within the realm of evolutionary and functional genomic studies. To address key gaps, we propose an integrative approach combining population genomic scans for adaptation, genome-wide characterization of targets of selection through experimental manipulations, mapping the genetic architecture of traits influencing fitness, and functional studies. We also stress the importance of studying the maintenance of genetic variation across biological scales—from variation within populations to divergence among populations—to form a comprehensive view of adaptation in herbivorous insects. PMID:28736510
Algorithmic methods to infer the evolutionary trajectories in cancer progression
Graudenzi, Alex; Ramazzotti, Daniele; Sanz-Pamplona, Rebeca; De Sano, Luca; Mauri, Giancarlo; Moreno, Victor; Antoniotti, Marco; Mishra, Bud
2016-01-01
The genomic evolution inherent to cancer relates directly to a renewed focus on the voluminous next-generation sequencing data and machine learning for the inference of explanatory models of how the (epi)genomic events are choreographed in cancer initiation and development. However, despite the increasing availability of multiple additional -omics data, this quest has been frustrated by various theoretical and technical hurdles, mostly stemming from the dramatic heterogeneity of the disease. In this paper, we build on our recent work on the “selective advantage” relation among driver mutations in cancer progression and investigate its applicability to the modeling problem at the population level. Here, we introduce PiCnIc (Pipeline for Cancer Inference), a versatile, modular, and customizable pipeline to extract ensemble-level progression models from cross-sectional sequenced cancer genomes. The pipeline has many translational implications because it combines state-of-the-art techniques for sample stratification, driver selection, identification of fitness-equivalent exclusive alterations, and progression model inference. We demonstrate PiCnIc’s ability to reproduce much of the current knowledge on colorectal cancer progression as well as to suggest novel experimentally verifiable hypotheses. PMID:27357673
Introgression of a Block of Genome Under Infinitesimal Selection.
Sachdeva, Himani; Barton, Nicholas H
2018-06-12
Adaptive introgression is common in nature and can be driven by selection acting on multiple, linked genes. We explore the effects of polygenic selection on introgression under the infinitesimal model with linkage. This model assumes that the introgressing block has an effectively infinite number of loci, each with an infinitesimal effect on the trait under selection. The block is assumed to introgress under directional selection within a native population that is genetically homogeneous. We use individual-based simulations and a branching process approximation to compute various statistics of the introgressing block, and explore how these depend on parameters such as the map length and initial trait value associated with the introgressing block, the genetic variability along the block, and the strength of selection. Our results show that the introgression dynamics of a block under infinitesimal selection are qualitatively different from the dynamics of neutral introgression. We also find that in the long run, surviving descendant blocks are likely to have intermediate lengths, and clarify how their length is shaped by the interplay between linkage and infinitesimal selection. Our results suggest that it may be difficult to distinguish the long-term introgression of a block of genome with a single strongly selected locus from the introgression of a block with multiple, tightly linked and weakly selected loci. Copyright © 2018, Genetics.
Gutierrez, Jahir M; Lewis, Nathan E
2015-07-01
Eukaryotic cell lines, including Chinese hamster ovary cells, yeast, and insect cells, are invaluable hosts for the production of many recombinant proteins. With the advent of genomic resources, one can now leverage genome-scale computational modeling of cellular pathways to rationally engineer eukaryotic host cells. Genome-scale models of metabolism include all known biochemical reactions occurring in a specific cell. By describing these mathematically and using tools such as flux balance analysis, the models can simulate cell physiology and provide targets for cell engineering that could lead to enhanced cell viability, titer, and productivity. Here we review examples in which metabolic models in eukaryotic cell cultures have been used to rationally select targets for genetic modification, improve cellular metabolic capabilities, design media supplementation, and interpret high-throughput omics data. As more comprehensive models of metabolism and other cellular processes are developed for eukaryotic cell culture, these will enable further exciting developments in cell line engineering, thus accelerating recombinant protein production and biotechnology in the years to come. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lefébure, Tristan; Stanhope, Michael J
2007-01-01
Background The genus Streptococcus is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 Streptococcus genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts. Results Streptococcus genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. S. agalactiae has a large pan-genome, with little recombination in its core-genome, while S. pyogenes has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for S. agalactiae compared to S. pyogenes. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. S. suis was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss. Conclusion Recombination is an important evolutionary force in shaping Streptococcus genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different Streptococcus species to different hosts. PMID:17475002
Inference of Gorilla Demographic and Selective History from Whole-Genome Sequence Data
McManus, Kimberly F.; Kelley, Joanna L.; Song, Shiya; Veeramah, Krishna R.; Woerner, August E.; Stevison, Laurie S.; Ryder, Oliver A.; Ape Genome Project, Great; Kidd, Jeffrey M.; Wall, Jeffrey D.; Bustamante, Carlos D.; Hammer, Michael F.
2015-01-01
Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. PMID:25534031
Campos, José L.; Halligan, Daniel L.; Haddrill, Penelope R.; Charlesworth, Brian
2014-01-01
Genetic recombination associated with sexual reproduction increases the efficiency of natural selection by reducing the strength of Hill–Robertson interference. Such interference can be caused either by selective sweeps of positively selected alleles or by background selection (BGS) against deleterious mutations. Its consequences can be studied by comparing patterns of molecular evolution and variation in genomic regions with different rates of crossing over. We carried out a comprehensive study of the benefits of recombination in Drosophila melanogaster, both by contrasting five independent genomic regions that lack crossing over with the rest of the genome and by comparing regions with different rates of crossing over, using data on DNA sequence polymorphisms from an African population that is geographically close to the putatively ancestral population for the species, and on sequence divergence from a related species. We observed reductions in sequence diversity in noncrossover (NC) regions that are inconsistent with the effects of hard selective sweeps in the absence of recombination. Overall, the observed patterns suggest that the recombination rate experienced by a gene is positively related to an increase in the efficiency of both positive and purifying selection. The results are consistent with a BGS model with interference among selected sites in NC regions, and joint effects of BGS, selective sweeps, and a past population expansion on variability in regions of the genome that experience crossing over. In such crossover regions, the X chromosome exhibits a higher rate of adaptive protein sequence evolution than the autosomes, implying a Faster-X effect. PMID:24489114
Upweighting rare favourable alleles increases long-term genetic gain in genomic selection programs.
Liu, Huiming; Meuwissen, Theo H E; Sørensen, Anders C; Berg, Peer
2015-03-21
The short-term impact of using different genomic prediction (GP) models in genomic selection has been intensively studied, but their long-term impact is poorly understood. Furthermore, long-term genetic gain of genomic selection is expected to improve by using Jannink's weighting (JW) method, in which rare favourable marker alleles are upweighted in the selection criterion. In this paper, we extend the JW method by including an additional parameter to decrease the emphasis on rare favourable alleles over the time horizon, with the purpose of further improving the long-term genetic gain. We call this new method dynamic weighting (DW). The paper explores the long-term impact of different GP models with or without weighting methods. Different selection criteria were tested by simulating a population of 500 animals with truncation selection of five males and 50 females. Selection criteria included unweighted and weighted genomic estimated breeding values using the JW or DW methods, for which ridge regression (RR) and Bayesian lasso (BL) were used to estimate marker effects. The impacts of these selection criteria were compared under three genetic architectures, i.e. varying numbers of QTL for the trait and for two time horizons of 15 (TH15) or 40 (TH40) generations. For unweighted GP, BL resulted in up to 21.4% higher long-term genetic gain and 23.5% lower rate of inbreeding under TH40 than RR. For weighted GP, DW resulted in 1.3 to 5.5% higher long-term gain compared to unweighted GP. JW, however, showed a 6.8% lower long-term genetic gain relative to unweighted GP when BL was used to estimate the marker effects. Under TH40, both DW and JW obtained significantly higher genetic gain than unweighted GP. With DW, the long-term genetic gain was increased by up to 30.8% relative to unweighted GP, and also increased by 8% relative to JW, although at the expense of a lower short-term gain. Irrespective of the number of QTL simulated, BL is superior to RR in maintaining genetic variance and therefore results in higher long-term genetic gain. Moreover, DW is a promising method with which high long-term genetic gain can be expected within a fixed time frame.
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture. At the National Center for Cool and Cold Water Aquaculture (NCCCWA), we have pursued selective breeding to increase rainbow trout genetic resistance against BCWD and found that post-challenge survival is ...
Evolutionary maintenance of filovirus-like genes in bat genomes
2011-01-01
Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats). We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF) of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained copies of such genes in mammals. "Living fossils" of filoviruses appear to be selectively maintained in a diverse mammalian genus (Myotis). PMID:22093762
Crandall, Eric D.; Liggins, Libby; Bongaerts, Pim; Treml, Eric A.
2016-01-01
Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations. PMID:29491947
Riginos, Cynthia; Crandall, Eric D; Liggins, Libby; Bongaerts, Pim; Treml, Eric A
2016-12-01
Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations.
Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine.
Elsik, Christine G; Tayal, Aditi; Diesh, Colin M; Unni, Deepak R; Emery, Marianne L; Nguyen, Hung N; Hagen, Darren E
2016-01-04
We report an update of the Hymenoptera Genome Database (HGD) (http://HymenopteraGenome.org), a model organism database for insect species of the order Hymenoptera (ants, bees and wasps). HGD maintains genomic data for 9 bee species, 10 ant species and 1 wasp, including the versions of genome and annotation data sets published by the genome sequencing consortiums and those provided by NCBI. A new data-mining warehouse, HymenopteraMine, based on the InterMine data warehousing system, integrates the genome data with data from external sources and facilitates cross-species analyses based on orthology. New genome browsers and annotation tools based on JBrowse/WebApollo provide easy genome navigation, and viewing of high throughput sequence data sets and can be used for collaborative genome annotation. All of the genomes and annotation data sets are combined into a single BLAST server that allows users to select and combine sequence data sets to search. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genetic Variation in the Acorn Barnacle from Allozymes to Population Genomics
Flight, Patrick A.; Rand, David M.
2012-01-01
Understanding the patterns of genetic variation within and among populations is a central problem in population and evolutionary genetics. We examine this question in the acorn barnacle, Semibalanus balanoides, in which the allozyme loci Mpi and Gpi have been implicated in balancing selection due to varying selective pressures at different spatial scales. We review the patterns of genetic variation at the Mpi locus, compare this to levels of population differentiation at mtDNA and microsatellites, and place these data in the context of genome-wide variation from high-throughput sequencing of population samples spanning the North Atlantic. Despite considerable geographic variation in the patterns of selection at the Mpi allozyme, this locus shows rather low levels of population differentiation at ecological and trans-oceanic scales (FST ∼ 5%). Pooled population sequencing was performed on samples from Rhode Island (RI), Maine (ME), and Southwold, England (UK). Analysis of more than 650 million reads identified approximately 335,000 high-quality SNPs in 19 million base pairs of the S. balanoides genome. Much variation is shared across the Atlantic, but there are significant examples of strong population differentiation among samples from RI, ME, and UK. An FST outlier screen of more than 22,000 contigs provided a genome-wide context for interpretation of earlier studies on allozymes, mtDNA, and microsatellites. FST values for allozymes, mtDNA and microsatellites are close to the genome-wide average for random SNPs, with the exception of the trans-Atlantic FST for mtDNA. The majority of FST outliers were unique between individual pairs of populations, but some genes show shared patterns of excess differentiation. These data indicate that gene flow is high, that selection is strong on a subset of genes, and that a variety of genes are experiencing diversifying selection at large spatial scales. This survey of polymorphism in S. balanoides provides a number of genomic tools that promise to make this a powerful model for ecological genomics of the rocky intertidal. PMID:22767487
Lessons learned from the dog genome.
Wayne, Robert K; Ostrander, Elaine A
2007-11-01
Extensive genetic resources and a high-quality genome sequence position the dog as an important model species for understanding genome evolution, population genetics and genes underlying complex phenotypic traits. Newly developed genomic resources have expanded our understanding of canine evolutionary history and dog origins. Domestication involved genetic contributions from multiple populations of gray wolves probably through backcrossing. More recently, the advent of controlled breeding practices has segregated genetic variability into distinct dog breeds that possess specific phenotypic traits. Consequently, genome-wide association and selective sweep scans now allow the discovery of genes underlying breed-specific characteristics. The dog is finally emerging as a novel resource for studying the genetic basis of complex traits, including behavior.
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) simultaneously incorporates dense SNP marker genotypes with phenotypic data from related animals to predict animal-specific genomic breeding value (GEBV), which circumvents the need to measure the disease phenotype in potential breeders. Marker assisted selection (MAS) involv...
Zueva, Ksenia J; Lumme, Jaakko; Veselov, Alexey E; Kent, Matthew P; Primmer, Craig R
2018-06-01
Understanding the genomic basis of host-parasite adaptation is important for predicting the long-term viability of species and developing successful management practices. However, in wild populations, identifying specific signatures of parasite-driven selection often presents a challenge, as it is difficult to unravel the molecular signatures of selection driven by different, but correlated, environmental factors. Furthermore, separating parasite-mediated selection from similar signatures due to genetic drift and population history can also be difficult. Populations of Atlantic salmon (Salmo salar L.) from northern Europe have pronounced differences in their reactions to the parasitic flatworm Gyrodactylus salaris Malmberg 1957 and are therefore a good model to search for specific genomic regions underlying inter-population differences in pathogen response. We used a dense Atlantic salmon SNP array, along with extensive sampling of 43 salmon populations representing the two G. salaris response extremes (extreme susceptibility vs resistant), to screen the salmon genome for signatures of directional selection while attempting to separate the parasite effect from other factors. After combining the results from two independent genome scan analyses, 57 candidate genes potentially under positive selection were identified, out of which 50 were functionally annotated. This candidate gene set was shown to be functionally enriched for lymph node development, focal adhesion genes and anti-viral response, which suggests that the regulation of both innate and acquired immunity might be an important mechanism for salmon response to G. salaris. Overall, our results offer insights into the apparently complex genetic basis of pathogen susceptibility in salmon and highlight methodological challenges for separating the effects of various environmental factors. Copyright © 2018 Elsevier B.V. All rights reserved.
Esfandyari, Hadi; Sørensen, Anders Christian; Bijma, Piter
2015-09-29
Breeding goals in a crossbreeding system should be defined at the commercial crossbred level. However, selection is often performed to improve purebred performance. A genomic selection (GS) model that includes dominance effects can be used to select purebreds for crossbred performance. Optimization of the GS model raises the question of whether marker effects should be estimated from data on the pure lines or crossbreds. Therefore, the first objective of this study was to compare response to selection of crossbreds by simulating a two-way crossbreeding program with either a purebred or a crossbred training population. We assumed a trait of interest that was controlled by loci with additive and dominance effects. Animals were selected on estimated breeding values for crossbred performance. There was no genotype by environment interaction. Linkage phase and strength of linkage disequilibrium between quantitative trait loci (QTL) and single nucleotide polymorphisms (SNPs) can differ between breeds, which causes apparent effects of SNPs to be line-dependent. Thus, our second objective was to compare response to GS based on crossbred phenotypes when the line origin of alleles was taken into account or not in the estimation of breeding values. Training on crossbred animals yielded a larger response to selection in crossbred offspring compared to training on both pure lines separately or on both pure lines combined into a single reference population. Response to selection in crossbreds was larger if both phenotypes and genotypes were collected on crossbreds than if phenotypes were only recorded on crossbreds and genotypes on their parents. If both parental lines were distantly related, tracing the line origin of alleles improved genomic prediction, whereas if both parental lines were closely related and the reference population was small, it was better to ignore the line origin of alleles. Response to selection in crossbreeding programs can be increased by training on crossbred genotypes and phenotypes. Moreover, if the reference population is sufficiently large and both pure lines are not very closely related, tracing the line origin of alleles in crossbreds improves genomic prediction.
Distance from sub-Saharan Africa predicts mutational load in diverse human genomes.
Henn, Brenna M; Botigué, Laura R; Peischl, Stephan; Dupanloup, Isabelle; Lipatov, Mikhail; Maples, Brian K; Martin, Alicia R; Musharoff, Shaila; Cann, Howard; Snyder, Michael P; Excoffier, Laurent; Kidd, Jeffrey M; Bustamante, Carlos D
2016-01-26
The Out-of-Africa (OOA) dispersal ∼ 50,000 y ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from seven geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia, and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.
Genome-wide selection components analysis in a fish with male pregnancy.
Flanagan, Sarah P; Jones, Adam G
2017-04-01
A major goal of evolutionary biology is to identify the genome-level targets of natural and sexual selection. With the advent of next-generation sequencing, whole-genome selection components analysis provides a promising avenue in the search for loci affected by selection in nature. Here, we implement a genome-wide selection components analysis in the sex role reversed Gulf pipefish, Syngnathus scovelli. Our approach involves a double-digest restriction-site associated DNA sequencing (ddRAD-seq) technique, applied to adult females, nonpregnant males, pregnant males, and their offspring. An F ST comparison of allele frequencies among these groups reveals 47 genomic regions putatively experiencing sexual selection, as well as 468 regions showing a signature of differential viability selection between males and females. A complementary likelihood ratio test identifies similar patterns in the data as the F ST analysis. Sexual selection and viability selection both tend to favor the rare alleles in the population. Ultimately, we conclude that genome-wide selection components analysis can be a useful tool to complement other approaches in the effort to pinpoint genome-level targets of selection in the wild. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Lin, Zibei; Shi, Fan; Hayes, Ben J; Daetwyler, Hans D
2017-05-01
Heuristic genomic inbreeding controls reduce inbreeding in genomic breeding schemes without reducing genetic gain. Genomic selection is increasingly being implemented in plant breeding programs to accelerate genetic gain of economically important traits. However, it may cause significant loss of genetic diversity when compared with traditional schemes using phenotypic selection. We propose heuristic strategies to control the rate of inbreeding in outbred plants, which can be categorised into three types: controls during mate allocation, during selection, and simultaneous selection and mate allocation. The proposed mate allocation measure GminF allocates two or more parents for mating in mating groups that minimise coancestry using a genomic relationship matrix. Two types of relationship-adjusted genomic breeding values for parent selection candidates ([Formula: see text]) and potential offspring ([Formula: see text]) are devised to control inbreeding during selection and even enabling simultaneous selection and mate allocation. These strategies were tested in a case study using a simulated perennial ryegrass breeding scheme. As compared to the genomic selection scheme without controls, all proposed strategies could significantly decrease inbreeding while achieving comparable genetic gain. In particular, the scenario using [Formula: see text] in simultaneous selection and mate allocation reduced inbreeding to one-third of the original genomic selection scheme. The proposed strategies are readily applicable in any outbred plant breeding program.
Kessner, Darren; Novembre, John
2015-01-01
Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTL) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTL under selection affects the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50–100%) can be explained by detected QTL in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates. PMID:25672748
Connallon, Tim; Clark, Andrew G
2010-12-01
Sex-biased genes--genes that are differentially expressed within males and females--are nonrandomly distributed across animal genomes, with sex chromosomes and autosomes often carrying markedly different concentrations of male- and female-biased genes. These linkage patterns are often gene- and lineage-dependent, differing between functional genetic categories and between species. Although sex-specific selection is often hypothesized to shape the evolution of sex-linked and autosomal gene content, population genetics theory has yet to account for many of the gene- and lineage-specific idiosyncrasies emerging from the empirical literature. With the goal of improving the connection between evolutionary theory and a rapidly growing body of genome-wide empirical studies, we extend previous population genetics theory of sex-specific selection by developing and analyzing a biologically informed model that incorporates sex linkage, pleiotropy, recombination, and epistasis, factors that are likely to vary between genes and between species. Our results demonstrate that sex-specific selection and sex-specific recombination rates can generate, and are compatible with, the gene- and species-specific linkage patterns reported in the genomics literature. The theory suggests that sexual selection may strongly influence the architectures of animal genomes, as well as the chromosomal distribution of fixed substitutions underlying sexually dimorphic traits. © 2010 The Author(s). Evolution© 2010 The Society for the Study of Evolution.
Genome analysis of Legionella pneumophila strains using a mixed-genome microarray.
Euser, Sjoerd M; Nagelkerke, Nico J; Schuren, Frank; Jansen, Ruud; Den Boer, Jeroen W
2012-01-01
Legionella, the causative agent for Legionnaires' disease, is ubiquitous in both natural and man-made aquatic environments. The distribution of Legionella genotypes within clinical strains is significantly different from that found in environmental strains. Developing novel genotypic methods that offer the ability to distinguish clinical from environmental strains could help to focus on more relevant (virulent) Legionella species in control efforts. Mixed-genome microarray data can be used to perform a comparative-genome analysis of strain collections, and advanced statistical approaches, such as the Random Forest algorithm are available to process these data. Microarray analysis was performed on a collection of 222 Legionella pneumophila strains, which included patient-derived strains from notified cases in The Netherlands in the period 2002-2006 and the environmental strains that were collected during the source investigation for those patients within the Dutch National Legionella Outbreak Detection Programme. The Random Forest algorithm combined with a logistic regression model was used to select predictive markers and to construct a predictive model that could discriminate between strains from different origin: clinical or environmental. Four genetic markers were selected that correctly predicted 96% of the clinical strains and 66% of the environmental strains collected within the Dutch National Legionella Outbreak Detection Programme. The Random Forest algorithm is well suited for the development of prediction models that use mixed-genome microarray data to discriminate between Legionella strains from different origin. The identification of these predictive genetic markers could offer the possibility to identify virulence factors within the Legionella genome, which in the future may be implemented in the daily practice of controlling Legionella in the public health environment.
Ridge, Lasso and Bayesian additive-dominance genomic models.
Azevedo, Camila Ferreira; de Resende, Marcos Deon Vilela; E Silva, Fabyano Fonseca; Viana, José Marcelo Soriano; Valente, Magno Sávio Ferreira; Resende, Márcio Fernando Ribeiro; Muñoz, Patricio
2015-08-25
A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (-2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
Ryu, J; Lee, C
2016-04-01
Selection signals of Korean cattle might be attributed largely to artificial selection for meat quality. Rapidly increased intragenic markers of newly annotated genes in the bovine genome would help overcome limited findings of genetic markers associated with meat quality at the selection signals in a previous study. The present study examined genetic associations of marbling score (MS) with intragenic nucleotide variants at selection signals of Korean cattle. A total of 39 092 nucleotide variants of 407 Korean cattle were utilized in the association analysis. A total of 129 variants were selected within newly annotated genes in the bovine genome. Their genetic associations were analyzed using the mixed model with random polygenic effects based on identical-by-state genetic relationships among animals in order to control for spurious associations produced by population structure. Genetic associations of MS were found (P<3.88×10-4) with six intragenic nucleotide variants on bovine autosomes 3 (cache domain containing 1, CACHD1), 5 (like-glycosyltransferase, LARGE), 16 (cell division cycle 42 binding protein kinase alpha, CDC42BPA) and 21 (snurportin 1, SNUPN; protein tyrosine phosphatase, non-receptor type 9, PTPN9; chondroitin sulfate proteoglycan 4, CSPG4). In particular, the genetic associations with CDC42BPA and LARGE were confirmed using an independent data set of Korean cattle. The results implied that allele frequencies of functional variants and their proximity variants have been augmented by directional selection for greater MS and remain selection signals in the bovine genome. Further studies of fine mapping would be useful to incorporate favorable alleles in marker-assisted selection for MS of Korean cattle.
Adaptive Evolution as a Predictor of Species-Specific Innate Immune Response.
Webb, Andrew E; Gerek, Z Nevin; Morgan, Claire C; Walsh, Thomas A; Loscher, Christine E; Edwards, Scott V; O'Connell, Mary J
2015-07-01
It has been proposed that positive selection may be associated with protein functional change. For example, human and macaque have different outcomes to HIV infection and it has been shown that residues under positive selection in the macaque TRIM5α receptor locate to the region known to influence species-specific response to HIV. In general, however, the relationship between sequence and function has proven difficult to fully elucidate, and it is the role of large-scale studies to help bridge this gap in our understanding by revealing major patterns in the data that correlate genotype with function or phenotype. In this study, we investigate the level of species-specific positive selection in innate immune genes from human and mouse. In total, we analyzed 456 innate immune genes using codon-based models of evolution, comparing human, mouse, and 19 other vertebrate species to identify putative species-specific positive selection. Then we used population genomic data from the recently completed Neanderthal genome project, the 1000 human genomes project, and the 17 laboratory mouse genomes project to determine whether the residues that were putatively positively selected are fixed or variable in these populations. We find evidence of species-specific positive selection on both the human and the mouse branches and we show that the classes of genes under positive selection cluster by function and by interaction. Data from this study provide us with targets to test the relationship between positive selection and protein function and ultimately to test the relationship between positive selection and discordant phenotypes. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Accuracy of genomic breeding values for meat tenderness in Polled Nellore cattle.
Magnabosco, C U; Lopes, F B; Fragoso, R C; Eifert, E C; Valente, B D; Rosa, G J M; Sainz, R D
2016-07-01
Zebu () cattle, mostly of the Nellore breed, comprise more than 80% of the beef cattle in Brazil, given their tolerance of the tropical climate and high resistance to ectoparasites. Despite their advantages for production in tropical environments, zebu cattle tend to produce tougher meat than Bos taurus breeds. Traditional genetic selection to improve meat tenderness is constrained by the difficulty and cost of phenotypic evaluation for meat quality. Therefore, genomic selection may be the best strategy to improve meat quality traits. This study was performed to compare the accuracies of different Bayesian regression models in predicting molecular breeding values for meat tenderness in Polled Nellore cattle. The data set was composed of Warner-Bratzler shear force (WBSF) of longissimus muscle from 205, 141, and 81 animals slaughtered in 2005, 2010, and 2012, respectively, which were selected and mated so as to create extreme segregation for WBSF. The animals were genotyped with either the Illumina BovineHD (HD; 777,000 from 90 samples) chip or the GeneSeek Genomic Profiler (GGP Indicus HD; 77,000 from 337 samples). The quality controls of SNP were Hard-Weinberg Proportion -value ≥ 0.1%, minor allele frequency > 1%, and call rate > 90%. The FImpute program was used for imputation from the GGP Indicus HD chip to the HD chip. The effect of each SNP was estimated using ridge regression, least absolute shrinkage and selection operator (LASSO), Bayes A, Bayes B, and Bayes Cπ methods. Different numbers of SNP were used, with 1, 2, 3, 4, 5, 7, 10, 20, 40, 60, 80, or 100% of the markers preselected based on their significance test (-value from genomewide association studies [GWAS]) or randomly sampled. The prediction accuracy was assessed by the correlation between genomic breeding value and the observed WBSF phenotype, using a leave-one-out cross-validation methodology. The prediction accuracies using all markers were all very similar for all models, ranging from 0.22 (Bayes Cπ) to 0.25 (Bayes B). When preselecting SNP based on GWAS results, the highest correlation (0.27) between WBSF and the genomic breeding value was achieved using the Bayesian LASSO model with 15,030 (3%) markers. Although this study used relatively few animals, the design of the segregating population ensured wide genetic variability for meat tenderness, which was important to achieve acceptable accuracy of genomic prediction. Although all models showed similar levels of prediction accuracy, some small advantages were observed with the Bayes B approach when higher numbers of markers were preselected based on their -values resulting from a GWAS analysis.
Universal features in the genome-level evolution of protein domains.
Cosentino Lagomarsino, Marco; Sellerio, Alessandro L; Heijning, Philip D; Bassetti, Bruno
2009-01-01
Protein domains can be used to study proteome evolution at a coarse scale. In particular, they are found on genomes with notable statistical distributions. It is known that the distribution of domains with a given topology follows a power law. We focus on a further aspect: these distributions, and the number of distinct topologies, follow collective trends, or scaling laws, depending on the total number of domains only, and not on genome-specific features. We present a stochastic duplication/innovation model, in the class of the so-called 'Chinese restaurant processes', that explains this observation with two universal parameters, representing a minimal number of domains and the relative weight of innovation to duplication. Furthermore, we study a model variant where new topologies are related to occurrence in genomic data, accounting for fold specificity. Both models have general quantitative agreement with data from hundreds of genomes, which indicates that the domains of a genome are built with a combination of specificity and robust self-organizing phenomena. The latter are related to the basic evolutionary 'moves' of duplication and innovation, and give rise to the observed scaling laws, a priori of the specific evolutionary history of a genome. We interpret this as the concurrent effect of neutral and selective drives, which increase duplication and decrease innovation in larger and more complex genomes. The validity of our model would imply that the empirical observation of a small number of folds in nature may be a consequence of their evolution.
SuperDCA for genome-wide epistasis analysis.
Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A; Bentley, Stephen D; Croucher, Nicholas J; Corander, Jukka
2018-05-29
The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10 4 -10 5 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10 5 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
Genomic Prediction Accounting for Residual Heteroskedasticity
Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.
2015-01-01
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950
The Evolution of Genome Structure by Natural and Sexual Selection.
Kirkpatrick, Mark
2017-01-01
Progress on understanding how genome structure evolves is accelerating with the arrival of new genomic, comparative, and theoretical approaches. This article reviews progress in understanding how chromosome inversions and sex chromosomes evolve, and how their evolution affects species' ecology. Analyses of clines in inversion frequencies in flies and mosquitoes imply strong local adaptation, and roles for both over- and under dominant selection. Those results are consistent with the hypothesis that inversions become established when they capture locally adapted alleles. Inversions can carry alleles that are beneficial to closely related species, causing them to introgress following hybridization. Models show that this "adaptive cassette" scenario can trigger large range expansions, as recently happened in malaria mosquitoes. Sex chromosomes are the most rapidly evolving genome regions of some taxa. Sexually antagonistic selection may be the key force driving transitions of sex determination between different pairs of chromosomes and between XY and ZW systems. Fusions between sex-chromosomes and autosomes most often involve the Y chromosome, a pattern that can be explained if fusions are mildly deleterious and fix by drift. Sexually antagonistic selection is one of several hypotheses to explain the recent discovery that the sex determination system has strong effects on the adult sex ratios of tetrapods. The emerging view of how genome structure evolves invokes a much richer constellation of forces than was envisioned during the Golden Age of research on Drosophila karyotypes. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan
2009-01-01
The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397
USDA-ARS?s Scientific Manuscript database
The compelling elegance of using genome-wide scans to detect the signature of selection is difficult to resist, but is countered by the low demonstrated efficacy of pinpointing the actual genes and traits that are the targets of selection in non-model species. While the difficulty of going from a s...
Signatures of selection in tilapia revealed by whole genome resequencing.
Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua
2015-09-16
Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.
Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales
NASA Astrophysics Data System (ADS)
Qian, Long; Kussell, Edo
The composition of genomes with respect to short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. The underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, which we detect in all species across domains of life. We hypothesize that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Alternative contributions may come from interference of protein-DNA binding with replication and mutational repair processes, which operates with similar rates. We conclude that genome-wide word compositions have been molded by DNA binding proteins through tiny evolutionary steps over timescales spanning millions of generations.
Santos, Bruno F S; van der Werf, Julius H J; Gibson, John P; Byrne, Timothy J; Amer, Peter R
2017-01-17
Performance recording and genotyping in the multiplier tier of multi-tiered sheep breeding schemes could potentially reduce the difference in the average genetic merit between nucleus and commercial flocks, and create additional economic benefits for the breeding structure. The genetic change in a multiple-trait breeding objective was predicted for various selection strategies that included performance recording, parentage testing and genomic selection. A deterministic simulation model was used to predict selection differentials and the flow of genetic superiority through the different tiers. Cumulative discounted economic benefits were calculated based on trait gains achieved in each of the tiers and considering the extra revenue and associated costs of applying recording, genotyping and selection practices in the multiplier tier of the breeding scheme. Performance recording combined with genomic or parentage information in the multiplier tier reduced the genetic lag between the nucleus and commercial flock by 2 to 3 years. The overall economic benefits of improved performance in the commercial tier offset the costs of recording the multiplier. However, it took more than 18 years before the cumulative net present value of benefits offset the costs at current test prices. Strategies in which recorded multiplier ewes were selected as replacements for the nucleus flock did modestly increase profitability when compared to a closed nucleus structure. Applying genomic selection is the most beneficial strategy if testing costs can be reduced or by genotyping only a proportion of the selection candidates. When the cost of genotyping was reduced, scenarios that combine performance recording with genomic selection were more profitable and reached breakeven point about 10 years earlier. Economic benefits can be generated in multiplier flocks by implementing performance recording in conjunction with either DNA pedigree recording or genomic technology. These recording practices reduce the long genetic lag between the nucleus and commercial flocks in multi-tiered breeding programs. Under current genotyping costs, the time to breakeven was found to be generally very long, although this varied between strategies. Strategies using either genomic selection or DNA pedigree verification were found to be economically viable provided the price paid for the tests is lower than current prices, in the long-term.
How and how much does RAD-seq bias genetic diversity estimates?
Cariou, Marie; Duret, Laurent; Charlat, Sylvain
2016-11-08
RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to "ideal" empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections.
A little bit of sex matters for genome evolution in asexual plants.
Hojsgaard, Diego; Hörandl, Elvira
2015-01-01
Genome evolution in asexual organisms is theoretically expected to be shaped by various factors: first, hybrid origin, and polyploidy confer a genomic constitution of highly heterozygous genotypes with multiple copies of genes; second, asexuality confers a lack of recombination and variation in populations, which reduces the efficiency of selection against deleterious mutations; hence, the accumulation of mutations and a gradual increase in mutational load (Muller's ratchet) would lead to rapid extinction of asexual lineages; third, allelic sequence divergence is expected to result in rapid divergence of lineages (Meselson effect). Recent transcriptome studies on the asexual polyploid complex Ranunculus auricomus using single-nucleotide polymorphisms confirmed neutral allelic sequence divergence within a short time frame, but rejected a hypothesis of a genome-wide accumulation of mutations in asexuals compared to sexuals, except for a few genes related to reproductive development. We discuss a general model that the observed incidence of facultative sexuality in plants may unmask deleterious mutations with partial dominance and expose them efficiently to purging selection. A little bit of sex may help to avoid genomic decay and extinction.
A segmentation/clustering model for the analysis of array CGH data.
Picard, F; Robin, S; Lebarbier, E; Daudin, J-J
2007-09-01
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
Inference of gorilla demographic and selective history from whole-genome sequence data.
McManus, Kimberly F; Kelley, Joanna L; Song, Shiya; Veeramah, Krishna R; Woerner, August E; Stevison, Laurie S; Ryder, Oliver A; Ape Genome Project, Great; Kidd, Jeffrey M; Wall, Jeffrey D; Bustamante, Carlos D; Hammer, Michael F
2015-03-01
Although population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole-genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor approximately 261 ka, and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage approximately 68 ka. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of 1.4-fold around 970 ka and a recent 5.6-fold contraction in population size 23 ka. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Rolf, Megan M; Taylor, Jeremy F; Schnabel, Robert D; McKay, Stephanie D; McClure, Matthew C; Northcutt, Sally L; Kerley, Monty S; Weaber, Robert L
2010-04-19
Molecular estimates of breeding value are expected to increase selection response due to improvements in the accuracy of selection and a reduction in generation interval, particularly for traits that are difficult or expensive to record or are measured late in life. Several statistical methods for incorporating molecular data into breeding value estimation have been proposed, however, most studies have utilized simulated data in which the generated linkage disequilibrium may not represent the targeted livestock population. A genomic relationship matrix was developed for 698 Angus steers and 1,707 Angus sires using 41,028 single nucleotide polymorphisms and breeding values were estimated using feed efficiency phenotypes (average daily feed intake, residual feed intake, and average daily gain) recorded on the steers. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Results were compared to estimates produced from pedigree-based mixed model analysis of 862 Angus steers with 34,864 identified paternal relatives but no female ancestors. Estimates of additive genetic variance and breeding value accuracies were similar for AFI and RFI using the numerator and genomic relationship matrices despite fewer animals in the genomic analysis. Bootstrap analyses indicated that 2,500-10,000 markers are required for robust estimation of genomic relationship matrices in cattle. This research shows that breeding values and their accuracies may be estimated for commercially important sires for traits recorded in experimental populations without the need for pedigree data to establish identity by descent between members of the commercial and experimental populations when at least 2,500 SNPs are available for the generation of a genomic relationship matrix.
Deleterious Mutations, Apparent Stabilizing Selection and the Maintenance of Quantitative Variation
Kondrashov, A. S.; Turelli, M.
1992-01-01
Apparent stabilizing selection on a quantitative trait that is not causally connected to fitness can result from the pleiotropic effects of unconditionally deleterious mutations, because as N. Barton noted, ``... individuals with extreme values of the trait will tend to carry more deleterious alleles ....'' We use a simple model to investigate the dependence of this apparent selection on the genomic deleterious mutation rate, U; the equilibrium distribution of K, the number of deleterious mutations per genome; and the parameters describing directional selection against deleterious mutations. Unlike previous analyses, we allow for epistatic selection against deleterious alleles. For various selection functions and realistic parameter values, the distribution of K, the distribution of breeding values for a pleiotropically affected trait, and the apparent stabilizing selection function are all nearly Gaussian. The additive genetic variance for the quantitative trait is kQa(2), where k is the average number of deleterious mutations per genome, Q is the proportion of deleterious mutations that affect the trait, and a(2) is the variance of pleiotropic effects for individual mutations that do affect the trait. In contrast, when the trait is measured in units of its additive standard deviation, the apparent fitness function is essentially independent of Q and a(2); and β, the intensity of selection, measured as the ratio of additive genetic variance to the ``variance'' of the fitness curve, is very close to s = U/k, the selection coefficient against individual deleterious mutations at equilibrium. Therefore, this model predicts appreciable apparent stabilizing selection if s exceeds about 0.03, which is consistent with various data. However, the model also predicts that β must equal V(m)/V(G), the ratio of new additive variance for the trait introduced each generation by mutation to the standing additive variance. Most, although not all, estimates of this ratio imply apparent stabilizing selection weaker than generally observed. A qualitative argument suggests that even when direct selection is responsible for most of the selection observed on a character, it may be essentially irrelevant to the maintenance of variation for the character by mutation-selection balance. Simple experiments can indicate the fraction of observed stabilizing selection attributable to the pleiotropic effects of deleterious mutations. PMID:1427047
USDA-ARS?s Scientific Manuscript database
The objective of this study was to compare genetic trends from a single-step genomic BLUP (ssGBLUP) and the traditional BLUP models for milk production traits in US Holstein. Phenotypes were 305-day milk, fat, and protein yield from 21,527,040 cows recorded between January, 1990 and August, 2015. Th...
Kawakami, Takeshi; Backström, Niclas; Burri, Reto; Husby, Arild; Olason, Pall; Rice, Amber M; Ålund, Murielle; Qvarnström, Anna; Ellegren, Hans
2014-01-01
With the access to draft genome sequence assemblies and whole-genome resequencing data from population samples, molecular ecology studies will be able to take truly genome-wide approaches. This now applies to an avian model system in ecological and evolutionary research: Old World flycatchers of the genus Ficedula, for which we recently obtained a 1.1 Gb collared flycatcher genome assembly and identified 13 million single-nucleotide polymorphism (SNP)s in population resequencing of this species and its sister species, pied flycatcher. Here, we developed a custom 50K Illumina iSelect flycatcher SNP array with markers covering 30 autosomes and the Z chromosome. Using a number of selection criteria for inclusion in the array, both genotyping success rate and polymorphism information content (mean marker heterozygosity = 0.41) were high. We used the array to assess linkage disequilibrium (LD) and hybridization in flycatchers. Linkage disequilibrium declined quickly to the background level at an average distance of 17 kb, but the extent of LD varied markedly within the genome and was more than 10-fold higher in ‘genomic islands’ of differentiation than in the rest of the genome. Genetic ancestry analysis identified 33 F1 hybrids but no later-generation hybrids from sympatric populations of collared flycatchers and pied flycatchers, contradicting earlier reports of backcrosses identified from much fewer number of markers. With an estimated divergence time as recently as <1 Ma, this suggests strong selection against F1 hybrids and unusually rapid evolution of reproductive incompatibility in an avian system. PMID:24784959
Population genomics reveal recent speciation and rapid evolutionary adaptation in polar bears.
Liu, Shiping; Lorenzen, Eline D; Fumagalli, Matteo; Li, Bo; Harris, Kelley; Xiong, Zijun; Zhou, Long; Korneliussen, Thorfinn Sand; Somel, Mehmet; Babbitt, Courtney; Wray, Greg; Li, Jianwen; He, Weiming; Wang, Zhuo; Fu, Wenjing; Xiang, Xueyan; Morgan, Claire C; Doherty, Aoife; O'Connell, Mary J; McInerney, James O; Born, Erik W; Dalén, Love; Dietz, Rune; Orlando, Ludovic; Sonne, Christian; Zhang, Guojie; Nielsen, Rasmus; Willerslev, Eske; Wang, Jun
2014-05-08
Polar bears are uniquely adapted to life in the High Arctic and have undergone drastic physiological changes in response to Arctic climates and a hyper-lipid diet of primarily marine mammal prey. We analyzed 89 complete genomes of polar bear and brown bear using population genomic modeling and show that the species diverged only 479-343 thousand years BP. We find that genes on the polar bear lineage have been under stronger positive selection than in brown bears; nine of the top 16 genes under strong positive selection are associated with cardiomyopathy and vascular disease, implying important reorganization of the cardiovascular system. One of the genes showing the strongest evidence of selection, APOB, encodes the primary lipoprotein component of low-density lipoprotein (LDL); functional mutations in APOB may explain how polar bears are able to cope with life-long elevated LDL levels that are associated with high risk of heart disease in humans. Copyright © 2014 Elsevier Inc. All rights reserved.
POPULATION GENOMICS REVEAL RECENT SPECIATION AND RAPID EVOLUTIONARY ADAPTATION IN POLAR BEARS
Liu, Shiping; Lorenzen, Eline D.; Fumagalli, Matteo; Li, Bo; Harris, Kelley; Xiong, Zijun; Zhou, Long; Korneliussen, Thorfinn Sand; Somel, Mehmet; Babbitt, Courtney; Wray, Greg; Li, Jianwen; He, Weiming; Wang, Zhuo; Fu, Wenjing; Xiang, Xueyan; Morgan, Claire C.; Doherty, Aoife; O’Connell, Mary J.; McInerney, James O.; Born, Erik W.; Dalén, Love; Dietz, Rune; Orlando, Ludovic; Sonne, Christian; Zhang, Guojie; Nielsen, Rasmus; Willerslev, Eske; Wang, Jun
2014-01-01
SUMMARY Polar bears are uniquely adapted to life in the High Arctic and have undergone drastic physiological changes in response to Arctic climates and a hyperlipid diet of primarily marine mammal prey. We analyzed 89 complete genomes of polar bear and brown bear using population genomic modeling and show that the species diverged only 479–343 thousand years BP. We find that genes on the polar bear lineage have been under stronger positive selection than in brown bears; nine of the top 16 genes under strong positive selection are associated with cardiomyopathy and vascular disease, implying important reorganization of the cardio-vascular system. One of the genes showing the strongest evidence of selection, APOB, encodes the primary lipoprotein component of low-density lipoprotein (LDL); functional mutations in APOB may explain how polar bears are able to cope with life-long elevated LDL levels that are associated with high risk of heart disease in humans. PMID:24813606
Vive la résistance: genome-wide selection against introduced alleles in invasive hybrid zones
Kovach, Ryan P.; Hand, Brian K.; Hohenlohe, Paul A.; Cosart, Ted F.; Boyer, Matthew C.; Neville, Helen H.; Muhlfeld, Clint C.; Amish, Stephen J.; Carim, Kellie; Narum, Shawn R.; Lowe, Winsor H.; Allendorf, Fred W.; Luikart, Gordon
2016-01-01
Evolutionary and ecological consequences of hybridization between native and invasive species are notoriously complicated because patterns of selection acting on non-native alleles can vary throughout the genome and across environments. Rapid advances in genomics now make it feasible to assess locus-specific and genome-wide patterns of natural selection acting on invasive introgression within and among natural populations occupying diverse environments. We quantified genome-wide patterns of admixture across multiple independent hybrid zones of native westslope cutthroat trout and invasive rainbow trout, the world's most widely introduced fish, by genotyping 339 individuals from 21 populations using 9380 species-diagnostic loci. A significantly greater proportion of the genome appeared to be under selection favouring native cutthroat trout (rather than rainbow trout), and this pattern was pervasive across the genome (detected on most chromosomes). Furthermore, selection against invasive alleles was consistent across populations and environments, even in those where rainbow trout were predicted to have a selective advantage (warm environments). These data corroborate field studies showing that hybrids between these species have lower fitness than the native taxa, and show that these fitness differences are due to selection favouring many native genes distributed widely throughout the genome.
Vitezica, Zulma G; Varona, Luis; Legarra, Andres
2013-12-01
Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.
2007-01-01
Background The usage of synonymous codons shows considerable variation among mammalian genes. How and why this usage is non-random are fundamental biological questions and remain controversial. It is also important to explore whether mammalian genes that are selectively expressed at different developmental stages bear different molecular features. Results In two models of mouse stem cell differentiation, we established correlations between codon usage and the patterns of gene expression. We found that the optimal codons exhibited variation (AT- or GC-ending codons) in different cell types within the developmental hierarchy. We also found that genes that were enriched (developmental-pivotal genes) or specifically expressed (developmental-specific genes) at different developmental stages had different patterns of codon usage and local genomic GC (GCg) content. Moreover, at the same developmental stage, developmental-specific genes generally used more GC-ending codons and had higher GCg content compared with developmental-pivotal genes. Further analyses suggest that the model of translational selection might be consistent with the developmental stage-related patterns of codon usage, especially for the AT-ending optimal codons. In addition, our data show that after human-mouse divergence, the influence of selective constraints is still detectable. Conclusion Our findings suggest that developmental stage-related patterns of gene expression are correlated with codon usage (GC3) and GCg content in stem cell hierarchies. Moreover, this paper provides evidence for the influence of natural selection at synonymous sites in the mouse genome and novel clues for linking the molecular features of genes to their patterns of expression during mammalian ontogenesis. PMID:17349061
Kaddis Maldonado, Rebecca J.; Parent, Leslie J.
2016-01-01
Infectious retrovirus particles contain two copies of unspliced viral RNA that serve as the viral genome. Unspliced retroviral RNA is transcribed in the nucleus by the host RNA polymerase II and has three potential fates: (1) it can be spliced into subgenomic messenger RNAs (mRNAs) for the translation of viral proteins; or it can remain unspliced to serve as either (2) the mRNA for the translation of Gag and Gag–Pol; or (3) the genomic RNA (gRNA) that is packaged into virions. The Gag structural protein recognizes and binds the unspliced viral RNA to select it as a genome, which is selected in preference to spliced viral RNAs and cellular RNAs. In this review, we summarize the current state of understanding about how retroviral packaging is orchestrated within the cell and explore potential new mechanisms based on recent discoveries in the field. We discuss the cis-acting elements in the unspliced viral RNA and the properties of the Gag protein that are required for their interaction. In addition, we discuss the role of host factors in influencing the fate of the newly transcribed viral RNA, current models for how retroviruses distinguish unspliced viral mRNA from viral genomic RNA, and the possible subcellular sites of genomic RNA dimerization and selection by Gag. Although this review centers primarily on the wealth of data available for the alpharetrovirus Rous sarcoma virus, in which a discrete RNA packaging sequence has been identified, we have also summarized the cis- and trans-acting factors as well as the mechanisms governing gRNA packaging of other retroviruses for comparison. PMID:27657110
Analysis of Cytoskeletal and Motility Proteins in the Sea Urchin Genome Assembly
RL, Morris; MP, Hoffman; RA, Obar; SS, McCafferty; IR, Gibbons; AD, Leone; J, Cool; EL, Allgood; AM, Musante; KM, Judkins; BJ, Rossetti; AP, Rawson; DR, Burgess
2007-01-01
The sea urchin embryo is a classical model system for studying the role of the cytoskeleton in such events as fertilization, mitosis, cleavage, cell migration and gastrulation. We have conducted an analysis of gene models derived from the Strongylocentrotus purpuratus genome assembly and have gathered strong evidence for the existence of multiple gene families encoding cytoskeletal proteins and their regulators in sea urchin. While many cytoskeletal genes have been cloned from sea urchin with sequences already existing in public databases, genome analysis reveals a significantly higher degree of diversity within certain gene families. Furthermore, genes are described corresponding to homologs of cytoskeletal proteins not previously documented in sea urchins. To illustrate the varying degree of sequence diversity that exists within cytoskeletal gene families, we conducted an analysis of genes encoding actins, specific actin-binding proteins, myosins, tubulins, kinesins, dyneins, specific microtubule-associated proteins, and intermediate filaments. We conducted ontological analysis of select genes to better understand the relatedness of urchin cytoskeletal genes to those of other deuterostomes. We analyzed developmental expression (EST) data to confirm the existence of select gene models and to understand their differential expression during various stages of early development. PMID:17027957
Population genomic data reveal genes related to important traits of quail.
Wu, Yan; Zhang, Yaolei; Hou, Zhuocheng; Fan, Guangyi; Pi, Jinsong; Sun, Shuai; Chen, Jiang; Liu, Huaqiao; Du, Xiao; Shen, Jie; Hu, Gang; Chen, Wenbin; Pan, Ailuan; Yin, Pingping; Chen, Xiaoli; Pu, Yuejin; Zhang, He; Liang, Zhenhua; Jian, Jianbo; Zhang, Hao; Wu, Bin; Sun, Jing; Chen, Jianwei; Tao, Hu; Yang, Ting; Xiao, Hongwei; Yang, Huan; Zheng, Chuanwei; Bai, Mingzhou; Fang, Xiaodong; Burt, David W; Wang, Wen; Li, Qingyi; Xu, Xun; Li, Chengfeng; Yang, Huanming; Wang, Jian; Yang, Ning; Liu, Xin; Du, Jinping
2018-05-01
Japanese quail (Coturnix japonica), a recently domesticated poultry species, is important not only as an agricultural product, but also as a model bird species for genetic research. However, most of the biological questions concerning genomics, phylogenetics, and genetics of some important economic traits have not been answered. It is thus necessary to complete a high-quality genome sequence as well as a series of comparative genomics, evolution, and functional studies. Here, we present a quail genome assembly spanning 1.04 Gb with 86.63% of sequences anchored to 30 chromosomes (28 autosomes and 2 sex chromosomes Z/W). Our genomic data have resolved the long-term debate of phylogeny among Perdicinae (Japanese quail), Meleagridinae (turkey), and Phasianinae (chicken). Comparative genomics and functional genomic data found that four candidate genes involved in early maturation had experienced positive selection, and one of them encodes follicle stimulating hormone beta (FSHβ), which is correlated with different FSHβ levels in quail and chicken. We re-sequenced 31 quails (10 wild, 11 egg-type, and 10 meat-type) and identified 18 and 26 candidate selective sweep regions in the egg-type and meat-type lines, respectively. That only one of them is shared between egg-type and meat-type lines suggests that they were subject to an independent selection. We also detected a haplotype on chromosome Z, which was closely linked with maroon/yellow plumage in quail using population resequencing and a genome-wide association study. This haplotype block will be useful for quail breeding programs. This study provided a high-quality quail reference genome, identified quail-specific genes, and resolved quail phylogeny. We have identified genes related to quail early maturation and a marker for plumage color, which is significant for quail breeding. These results will facilitate biological discovery in quails and help us elucidate the evolutionary processes within the Phasianidae family.
Selective Packaging of Host tRNA's by Murine Leukemia Virus Particles Does Not Require Genomic RNA
Levin, Judith G.; Seidman, J. G.
1979-01-01
The 4S RNA contained in RNA tumor virus particles consists of a selected population of host tRNA's. However, the mechanism by which virions select host tRNA's has not been elucidated. We have considered a model which specifies that 35S genomic RNA determines which tRNA's are to be encapsidated as well as the relative amounts of these tRNA's within the virion. The model was tested by comparing the free 4S RNA composition of normal murine leukemia virus (MuLV) particles and noninfectious virions from actinomycin D (ActD)-treated cells, which are deficient in genomic RNA (ActD virions). Viral 4S RNA was analyzed by two-dimensional polyacrylamide gel electrophoresis. Surprisingly, the patterns obtained for control and ActD 4S RNA were identical to each other and were clearly distinct from the cell 4S RNA pattern. The viral patterns had three prominent areas of radioactivity. One of the spots was identified on the basis of its oligonucleotide fingerprint as tRNA Pro, the primer for MuLV RNA-directed DNA synthesis. These results were obtained with two different MuLV strains, AKR and Moloney, each grown in SC-1 cells. The demonstration that ActD virions contain primer tRNA and in general exhibit the characteristic MuLV tRNA pattern rather than the complete representation of cell 4S RNA leads to the conclusion that genomic RNA is not the major determinant in selective packaging of host tRNA's. A possible role for one or more viral proteins, including reverse transcriptase, is suggested. Images PMID:219227
Kim, Kwondo; Jung, Jaehoon; Caetano-Anollés, Kelsey; Sung, Samsun; Yoo, DongAhn; Choi, Bong-Hwan; Kim, Hyung-Chul; Jeong, Jin-Young; Cho, Yong-Min; Park, Eung-Woo; Choi, Tae-Jeong; Park, Byoungho; Lim, Dajeong
2018-01-01
Artificial selection has been demonstrated to have a rapid and significant effect on the phenotype and genome of an organism. However, most previous studies on artificial selection have focused solely on genomic sequences modified by artificial selection or genomic sequences associated with a specific trait. In this study, we generated whole genome sequencing data of 126 cattle under artificial selection, and 24,973,862 single nucleotide variants to investigate the relationship among artificial selection, genomic sequences and trait. Using runs of homozygosity detected by the variants, we showed increase of inbreeding for decades, and at the same time demonstrated a little influence of recent inbreeding on body weight. Also, we could identify ~0.2 Mb runs of homozygosity segment which may be created by recent artificial selection. This approach may aid in development of genetic markers directly influenced by artificial selection, and provide insight into the process of artificial selection. PMID:29561881
Doolittle-Hall, Janet M.; Cunningham Glasspoole, Danielle L.; Seaman, William T.; Webster-Cyriaque, Jennifer
2015-01-01
Oncoviruses cause tremendous global cancer burden. For several DNA tumor viruses, human genome integration is consistently associated with cancer development. However, genomic features associated with tumor viral integration are poorly understood. We sought to define genomic determinants for 1897 loci prone to hosting human papillomavirus (HPV), hepatitis B virus (HBV) or Merkel cell polyomavirus (MCPyV). These were compared to HIV, whose enzyme-mediated integration is well understood. A comprehensive catalog of integration sites was constructed from the literature and experimentally-determined HPV integration sites. Features were scored in eight categories (genes, expression, open chromatin, histone modifications, methylation, protein binding, chromatin segmentation and repeats) and compared to random loci. Random forest models determined loci classification and feature selection. HPV and HBV integrants were not fragile site associated. MCPyV preferred integration near sensory perception genes. Unique signatures of integration-associated predictive genomic features were detected. Importantly, repeats, actively-transcribed regions and histone modifications were common tumor viral integration signatures. PMID:26569308
Improving Genomic Prediction in Cassava Field Experiments Using Spatial Analysis.
Elias, Ani A; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc
2018-01-04
Cassava ( Manihot esculenta Crantz) is an important staple food in sub-Saharan Africa. Breeding experiments were conducted at the International Institute of Tropical Agriculture in cassava to select elite parents. Taking into account the heterogeneity in the field while evaluating these trials can increase the accuracy in estimation of breeding values. We used an exploratory approach using the parametric spatial kernels Power, Spherical, and Gaussian to determine the best kernel for a given scenario. The spatial kernel was fit simultaneously with a genomic kernel in a genomic selection model. Predictability of these models was tested through a 10-fold cross-validation method repeated five times. The best model was chosen as the one with the lowest prediction root mean squared error compared to that of the base model having no spatial kernel. Results from our real and simulated data studies indicated that predictability can be increased by accounting for spatial variation irrespective of the heritability of the trait. In real data scenarios we observed that the accuracy can be increased by a median value of 3.4%. Through simulations, we showed that a 21% increase in accuracy can be achieved. We also found that Range (row) directional spatial kernels, mostly Gaussian, explained the spatial variance in 71% of the scenarios when spatial correlation was significant. Copyright © 2018 Elias et al.
Bürger, R; Gimelfarb, A
1999-01-01
Stabilizing selection for an intermediate optimum is generally considered to deplete genetic variation in quantitative traits. However, conflicting results from various types of models have been obtained. While classical analyses assuming a large number of independent additive loci with individually small effects indicated that no genetic variation is preserved under stabilizing selection, several analyses of two-locus models showed the contrary. We perform a complete analysis of a generalization of Wright's two-locus quadratic-optimum model and investigate numerically the ability of quadratic stabilizing selection to maintain genetic variation in additive quantitative traits controlled by up to five loci. A statistical approach is employed by choosing randomly 4000 parameter sets (allelic effects, recombination rates, and strength of selection) for a given number of loci. For each parameter set we iterate the recursion equations that describe the dynamics of gamete frequencies starting from 20 randomly chosen initial conditions until an equilibrium is reached, record the quantities of interest, and calculate their corresponding mean values. As the number of loci increases from two to five, the fraction of the genome expected to be polymorphic declines surprisingly rapidly, and the loci that are polymorphic increasingly are those with small effects on the trait. As a result, the genetic variance expected to be maintained under stabilizing selection decreases very rapidly with increased number of loci. The equilibrium structure expected under stabilizing selection on an additive trait differs markedly from that expected under selection with no constraints on genotypic fitness values. The expected genetic variance, the expected polymorphic fraction of the genome, as well as other quantities of interest, are only weakly dependent on the selection intensity and the level of recombination. PMID:10353920
DOE Office of Scientific and Technical Information (OSTI.GOV)
Immonen, Taina T.; Conway, Jessica M.; Romero-Severson, Ethan O.
HIV-1 is subject to immune pressure exerted by the host, giving variants that escape the immune response an advantage. Virus released from activated latent cells competes against variants that have continually evolved and adapted to host immune pressure. Nevertheless, there is increasing evidence that virus displaying a signal of latency survives in patient plasma despite having reduced fitness due to long-term immune memory. We investigated the survival of virus with latent envelope genomic fragments by simulating within-host HIV-1 sequence evolution and the cycling of viral lineages in and out of the latent reservoir. Our model incorporates a detailed mutation processmore » including nucleotide substitution, recombination, latent reservoir dynamics, diversifying selection pressure driven by the immune response, and purifying selection pressure asserted by deleterious mutations. We evaluated the ability of our model to capture sequence evolution in vivo by comparing our simulated sequences to HIV-1 envelope sequence data from 16 HIV-infected untreated patients. Empirical sequence divergence and diversity measures were qualitatively and quantitatively similar to those of our simulated HIV-1 populations, suggesting that our model invokes realistic trends of HIV-1 genetic evolution. Moreover, reconstructed phylogenies of simulated and patient HIV-1 populations showed similar topological structures. Our simulation results suggest that recombination is a key mechanism facilitating the persistence of virus with latent envelope genomic fragments in the productively infected cell population. Recombination increased the survival probability of latent virus forms approximately 13-fold. Prevalence of virus with latent fragments in productively infected cells was observed in only 2% of simulations when we ignored recombination, while the proportion increased to 27% of simulations when we allowed recombination. We also found that the selection pressures exerted by different fitness landscapes influenced the shape of phylogenies, diversity trends, and survival of virus with latent genomic fragments. Furthermore, our model predicts that the persistence of latent genomic fragments from multiple different ancestral origins increases sequence diversity in plasma for reasonable fitness landscapes.« less
Immonen, Taina T.; Conway, Jessica M.; Romero-Severson, Ethan O.; ...
2015-12-22
HIV-1 is subject to immune pressure exerted by the host, giving variants that escape the immune response an advantage. Virus released from activated latent cells competes against variants that have continually evolved and adapted to host immune pressure. Nevertheless, there is increasing evidence that virus displaying a signal of latency survives in patient plasma despite having reduced fitness due to long-term immune memory. We investigated the survival of virus with latent envelope genomic fragments by simulating within-host HIV-1 sequence evolution and the cycling of viral lineages in and out of the latent reservoir. Our model incorporates a detailed mutation processmore » including nucleotide substitution, recombination, latent reservoir dynamics, diversifying selection pressure driven by the immune response, and purifying selection pressure asserted by deleterious mutations. We evaluated the ability of our model to capture sequence evolution in vivo by comparing our simulated sequences to HIV-1 envelope sequence data from 16 HIV-infected untreated patients. Empirical sequence divergence and diversity measures were qualitatively and quantitatively similar to those of our simulated HIV-1 populations, suggesting that our model invokes realistic trends of HIV-1 genetic evolution. Moreover, reconstructed phylogenies of simulated and patient HIV-1 populations showed similar topological structures. Our simulation results suggest that recombination is a key mechanism facilitating the persistence of virus with latent envelope genomic fragments in the productively infected cell population. Recombination increased the survival probability of latent virus forms approximately 13-fold. Prevalence of virus with latent fragments in productively infected cells was observed in only 2% of simulations when we ignored recombination, while the proportion increased to 27% of simulations when we allowed recombination. We also found that the selection pressures exerted by different fitness landscapes influenced the shape of phylogenies, diversity trends, and survival of virus with latent genomic fragments. Furthermore, our model predicts that the persistence of latent genomic fragments from multiple different ancestral origins increases sequence diversity in plasma for reasonable fitness landscapes.« less
How to infer relative fitness from a sample of genomic sequences.
Dayarian, Adel; Shraiman, Boris I
2014-07-01
Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
Wernegreen, Jennifer J
2017-09-15
Ancient associations between insects and bacteria provide models to study intimate host-microbe interactions. Currently, a wealth of genome sequence data for long-term, obligately intracellular (primary) endosymbionts of insects reveals profound genomic consequences of this specialized bacterial lifestyle. Those consequences include severe genome reduction and extreme base compositions. This minireview highlights the utility of genome sequence data to understand how, and why, endosymbionts have been pushed to such extremes, and to illuminate the functional consequences of such extensive genome change. While the static snapshots provided by individual endosymbiont genomes are valuable, comparative analyses of multiple genomes have shed light on evolutionary mechanisms. Namely, genome comparisons have told us that selection is important in fine-tuning gene content, but at the same time, mutational pressure and genetic drift contribute to genome degradation. Examples from Blochmannia, the primary endosymbiont of the ant tribe Camponotini, illustrate the value and constraints of genome sequence data, and exemplify how genomes can serve as a springboard for further comparative and experimental inquiry. Copyright © 2017. Published by Elsevier Inc.
Signatures of selection in tilapia revealed by whole genome resequencing
Hong Xia, Jun; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Yi Wan, Zi; Li, Jiale; Lin, Haoran; Hua Yue, Gen
2015-01-01
Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10–100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia. PMID:26373374
Metabolic 'engines' of flight drive genome size reduction in birds.
Wright, Natalie A; Gregory, T Ryan; Witt, Christopher C
2014-03-22
The tendency for flying organisms to possess small genomes has been interpreted as evidence of natural selection acting on the physical size of the genome. Nonetheless, the flight-genome link and its mechanistic basis have yet to be well established by comparative studies within a volant clade. Is there a particular functional aspect of flight such as brisk metabolism, lift production or maneuverability that impinges on the physical genome? We measured genome sizes, wing dimensions and heart, flight muscle and body masses from a phylogenetically diverse set of bird species. In phylogenetically controlled analyses, we found that genome size was negatively correlated with relative flight muscle size and heart index (i.e. ratio of heart to body mass), but positively correlated with body mass and wing loading. The proportional masses of the flight muscles and heart were the most important parameters explaining variation in genome size in multivariate models. Hence, the metabolic intensity of powered flight appears to have driven genome size reduction in birds.
Determining the Effect of Natural Selection on Linked Neutral Divergence across Species
Phung, Tanya N.; Lohmueller, Kirk E.
2016-01-01
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation. PMID:27508305
Determining the Effect of Natural Selection on Linked Neutral Divergence across Species.
Phung, Tanya N; Huber, Christian D; Lohmueller, Kirk E
2016-08-01
A major goal in evolutionary biology is to understand how natural selection has shaped patterns of genetic variation across genomes. Studies in a variety of species have shown that neutral genetic diversity (intra-species differences) has been reduced at sites linked to those under direct selection. However, the effect of linked selection on neutral sequence divergence (inter-species differences) remains ambiguous. While empirical studies have reported correlations between divergence and recombination, which is interpreted as evidence for natural selection reducing linked neutral divergence, theory argues otherwise, especially for species that have diverged long ago. Here we address these outstanding issues by examining whether natural selection can affect divergence between both closely and distantly related species. We show that neutral divergence between closely related species (e.g. human-primate) is negatively correlated with functional content and positively correlated with human recombination rate. We also find that neutral divergence between distantly related species (e.g. human-rodent) is negatively correlated with functional content and positively correlated with estimates of background selection from primates. These patterns persist after accounting for the confounding factors of hypermutable CpG sites, GC content, and biased gene conversion. Coalescent models indicate that even when the contribution of ancestral polymorphism to divergence is small, background selection in the ancestral population can still explain a large proportion of the variance in divergence across the genome, generating the observed correlations. Our findings reveal that, contrary to previous intuition, natural selection can indirectly affect linked neutral divergence between both closely and distantly related species. Though we cannot formally exclude the possibility that the direct effects of purifying selection drive some of these patterns, such a scenario would be possible only if more of the genome is under purifying selection than currently believed. Our work has implications for understanding the evolution of genomes and interpreting patterns of genetic variation.
Rare beneficial mutations can halt Muller's ratchet
NASA Astrophysics Data System (ADS)
Balick, Daniel; Goyal, Sidhartha; Jerison, Elizabeth; Neher, Richard; Shraiman, Boris; Desai, Michael
2012-02-01
In viral, bacterial, and other asexual populations, the vast majority of non-neutral mutations are deleterious. This motivates the application of models without beneficial mutations. Here we show that the presence of surprisingly few compensatory mutations halts fitness decay in these models. Production of deleterious mutations is balanced by purifying selection, stabilizing the fitness distribution. However, stochastic vanishing of fitness classes can lead to slow fitness decay (i.e. Muller's ratchet). For weakly deleterious mutations, production overwhelms purification, rapidly decreasing population fitness. We show that when beneficial mutations are introduced, a stable steady state emerges in the form of a dynamic mutation-selection balance. We argue this state is generic for all mutation rates and population sizes, and is reached as an end state as genomes become saturated by either beneficial or deleterious mutations. Assuming all mutations have the same magnitude selective effect, we calculate the fraction of beneficial mutations necessary to maintain the dynamic balance. This may explain the unexpected maintenance of asexual genomes, as in mitochondria, in the presence of selection. This will affect in the statistics of genetic diversity in these populations.
Friedrich, Torben; Rahmann, Sven; Weigel, Wilfried; Rabsch, Wolfgang; Fruth, Angelika; Ron, Eliora; Gunzer, Florian; Dandekar, Thomas; Hacker, Jörg; Müller, Tobias; Dobrindt, Ulrich
2010-10-21
The Enterobacteriaceae comprise a large number of clinically relevant species with several individual subspecies. Overlapping virulence-associated gene pools and the high overall genome plasticity often interferes with correct enterobacterial strain typing and risk assessment. Array technology offers a fast, reproducible and standardisable means for bacterial typing and thus provides many advantages for bacterial diagnostics, risk assessment and surveillance. The development of highly discriminative broad-range microbial diagnostic microarrays remains a challenge, because of marked genome plasticity of many bacterial pathogens. We developed a DNA microarray for strain typing and detection of major antimicrobial resistance genes of clinically relevant enterobacteria. For this purpose, we applied a global genome-wide probe selection strategy on 32 available complete enterobacterial genomes combined with a regression model for pathogen classification. The discriminative power of the probe set was further tested in silico on 15 additional complete enterobacterial genome sequences. DNA microarrays based on the selected probes were used to type 92 clinical enterobacterial isolates. Phenotypic tests confirmed the array-based typing results and corroborate that the selected probes allowed correct typing and prediction of major antibiotic resistances of clinically relevant Enterobacteriaceae, including the subspecies level, e.g. the reliable distinction of different E. coli pathotypes. Our results demonstrate that the global probe selection approach based on longest common factor statistics as well as the design of a DNA microarray with a restricted set of discriminative probes enables robust discrimination of different enterobacterial variants and represents a proof of concept that can be adopted for diagnostics of a wide range of microbial pathogens. Our approach circumvents misclassifications arising from the application of virulence markers, which are highly affected by horizontal gene transfer. Moreover, a broad range of pathogens have been covered by an efficient probe set size enabling the design of high-throughput diagnostics.
Syed, Khajamohiddin; Shale, Karabo; Pagadala, Nataraj Sekhar; Tuszynski, Jack
2014-01-01
Genome sequencing of basidiomycetes, a group of fungi capable of degrading/mineralizing plant material, revealed the presence of numerous cytochrome P450 monooxygenases (P450s) in their genomes, with some exceptions. Considering the large repertoire of P450s found in fungi, it is difficult to identify P450s that play an important role in fungal metabolism and the adaptation of fungi to diverse ecological niches. In this study, we followed Sir Charles Darwin’s theory of natural selection to identify such P450s in model basidiomycete fungi showing a preference for different types of plant components degradation. Any P450 family comprising a large number of member P450s compared to other P450 families indicates its natural selection over other P450 families by its important role in fungal physiology. Genome-wide comparative P450 analysis in the basidiomycete species, Phanerochaete chrysosporium, Phanerochaete carnosa, Agaricus bisporus, Postia placenta, Ganoderma sp. and Serpula lacrymans, revealed enrichment of 11 P450 families (out of 68 P450 families), CYP63, CYP512, CYP5035, CYP5037, CYP5136, CYP5141, CYP5144, CYP5146, CYP5150, CYP5348 and CYP5359. Phylogenetic analysis of the P450 family showed species-specific alignment of P450s across the P450 families with the exception of P450s of Phanerochaete chrysosporium and Phanerochaete carnosa, suggesting paralogous evolution of P450s in model basidiomycetes. P450 gene-structure analysis revealed high conservation in the size of exons and the location of introns. P450s with the same gene structure were found tandemly arranged in the genomes of selected fungi. This clearly suggests that extensive gene duplications, particularly tandem gene duplications, led to the enrichment of selective P450 families in basidiomycetes. Functional analysis and gene expression profiling data suggest that members of the P450 families are catalytically versatile and possibly involved in fungal colonization of plant material. To our knowledge, this is the first report on the identification and comparative-evolutionary analysis of P450 families enriched in model basidiomycetes. PMID:24466198
Li, Mingkun; Rothwell, Rebecca; Vermaat, Martijn; Wachsmuth, Manja; Schröder, Roland; Laros, Jeroen F.J.; van Oven, Mannis; de Bakker, Paul I.W.; Bovenberg, Jasper A.; van Duijn, Cornelia M.; van Ommen, Gert-Jan B.; Slagboom, P. Eline; Swertz, Morris A.; Wijmenga, Cisca; Kayser, Manfred; Boomsma, Dorret I.; Zöllner, Sebastian; de Knijff, Peter; Stoneking, Mark
2016-01-01
Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies. PMID:26916109
Jha, Aashish R.; Miles, Cecelia M.; Lippert, Nodia R.; Brown, Christopher D.; White, Kevin P.; Kreitman, Martin
2015-01-01
Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. PMID:26044351
The Blueprint of a Minimal Cell: MiniBacillus
Reuß, Daniel R.; Commichau, Fabian M.; Gundlach, Jan; Zhu, Bingyao
2016-01-01
SUMMARY Bacillus subtilis is one of the best-studied organisms. Due to the broad knowledge and annotation and the well-developed genetic system, this bacterium is an excellent starting point for genome minimization with the aim of constructing a minimal cell. We have analyzed the genome of B. subtilis and selected all genes that are required to allow life in complex medium at 37°C. This selection is based on the known information on essential genes and functions as well as on gene and protein expression data and gene conservation. The list presented here includes 523 and 119 genes coding for proteins and RNAs, respectively. These proteins and RNAs are required for the basic functions of life in information processing (replication and chromosome maintenance, transcription, translation, protein folding, and secretion), metabolism, cell division, and the integrity of the minimal cell. The completeness of the selected metabolic pathways, reactions, and enzymes was verified by the development of a model of metabolism of the minimal cell. A comparison of the MiniBacillus genome to the recently reported designed minimal genome of Mycoplasma mycoides JCVI-syn3.0 indicates excellent agreement in the information-processing pathways, whereas each species has a metabolism that reflects specific evolution and adaptation. The blueprint of MiniBacillus presented here serves as the starting point for a successive reduction of the B. subtilis genome. PMID:27681641
Short template switch events explain mutation clusters in the human genome.
Löytynoja, Ari; Goldman, Nick
2017-06-01
Resequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model of template switching during replication that extends existing models of genome rearrangement and used this to study the role of template switch events in the origin of short mutation clusters. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor and hundreds of events between two independently sequenced human genomes. Although many of these are consistent with a template switch mechanism previously proposed for bacteria, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. The local template switch process can create numerous complex mutation patterns, including hairpin loop structures, and explains multinucleotide mutations and compensatory substitutions without invoking positive selection, speculative mechanisms, or implausible coincidence. Clustered sequence differences are challenging for current mapping and variant calling methods, and we show that many erroneous variant annotations exist in human reference data. Local template switch events may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into reference-based analysis pipelines and comparisons of de novo assembled genomes will lead to improved understanding of genome variation and evolution. © 2017 Löytynoja and Goldman; Published by Cold Spring Harbor Laboratory Press.
Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.
Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel
2018-06-01
The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Mueller, Jakob C; Kuhl, Heiner; Timmermann, Bernd; Kempenaers, Bart
2016-03-01
Decoding genomic sequences and determining their variation within populations has potential to reveal adaptive processes and unravel the genetic basis of ecologically relevant trait variation within a species. The blue tit Cyanistes caeruleus--a long-time ecological model species--has been used to investigate fitness consequences of variation in mating and reproductive behaviour. However, very little is known about the underlying genetic changes due to natural and sexual selection in the genome of this songbird. As a step to bridge this gap, we assembled the first draft genome of a single blue tit, mapped the transcriptome of five females and five males to this reference, identified genomewide variants and performed sex-differential expression analysis in the gonads, brain and other tissues. In the gonads, we found a high number of sex-biased genes, and of those, a similar proportion were sex-limited (genes only expressed in one sex) in males and females. However, in the brain, the proportion of female-limited genes within the female-biased gene category (82%) was substantially higher than the proportion of male-limited genes within the male-biased category (6%). This suggests a predominant on-off switching mechanism for the female-limited genes. In addition, most male-biased genes were located on the Z-chromosome, indicating incomplete dosage compensation for the male-biased genes. We called more than 500,000 SNPs from the RNA-seq data. Heterozygote detection in the single reference individual was highly congruent between DNA-seq and RNA-seq calling. Using information from these polymorphisms, we identified potential selection signals in the genome. We list candidate genes which can be used for further sequencing and detailed selection studies, including genes potentially related to meiotic drive evolution. A public genome browser of the blue tit with the described information is available at http://public-genomes-ngs.molgen.mpg.de. © 2015 John Wiley & Sons Ltd.
Genomic signatures of positive selection in humans and the limits of outlier approaches.
Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M
2006-08-01
Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Akanno, E C; Schenkel, F S; Sargolzaei, M; Friendship, R M; Robinson, J A B
2014-10-01
Genetic improvement of pigs in tropical developing countries has focused on imported exotic populations which have been subjected to intensive selection with attendant high population-wide linkage disequilibrium (LD). Presently, indigenous pig population with limited selection and low LD are being considered for improvement. Given that the infrastructure for genetic improvement using the conventional BLUP selection methods are lacking, a genome-wide selection (GS) program was proposed for developing countries. A simulation study was conducted to evaluate the option of using 60 K SNP panel and observed amount of LD in the exotic and indigenous pig populations. Several scenarios were evaluated including different size and structure of training and validation populations, different selection methods and long-term accuracy of GS in different population/breeding structures and traits. The training set included previously selected exotic population, unselected indigenous population and their crossbreds. Traits studied included number born alive (NBA), average daily gain (ADG) and back fat thickness (BFT). The ridge regression method was used to train the prediction model. The results showed that accuracies of genomic breeding values (GBVs) in the range of 0.30 (NBA) to 0.86 (BFT) in the validation population are expected if high density marker panels are utilized. The GS method improved accuracy of breeding values better than pedigree-based approach for traits with low heritability and in young animals with no performance data. Crossbred training population performed better than purebreds when validation was in populations with similar or a different structure as in the training set. Genome-wide selection holds promise for genetic improvement of pigs in the tropics. © 2014 Blackwell Verlag GmbH.
Clear: Composition of Likelihoods for Evolve and Resequence Experiments.
Iranmehr, Arya; Akbari, Ali; Schlötterer, Christian; Bafna, Vineet
2017-06-01
The advent of next generation sequencing technologies has made whole-genome and whole-population sampling possible, even for eukaryotes with large genomes. With this development, experimental evolution studies can be designed to observe molecular evolution "in action" via evolve-and-resequence (E&R) experiments. Among other applications, E&R studies can be used to locate the genes and variants responsible for genetic adaptation. Most existing literature on time-series data analysis often assumes large population size, accurate allele frequency estimates, or wide time spans. These assumptions do not hold in many E&R studies. In this article, we propose a method-composition of likelihoods for evolve-and-resequence experiments (Clear)-to identify signatures of selection in small population E&R experiments. Clear takes whole-genome sequences of pools of individuals as input, and properly addresses heterogeneous ascertainment bias resulting from uneven coverage. Clear also provides unbiased estimates of model parameters, including population size, selection strength, and dominance, while being computationally efficient. Extensive simulations show that Clear achieves higher power in detecting and localizing selection over a wide range of parameters, and is robust to variation of coverage. We applied the Clear statistic to multiple E&R experiments, including data from a study of adaptation of Drosophila melanogaster to alternating temperatures and a study of outcrossing yeast populations, and identified multiple regions under selection with genome-wide significance. Copyright © 2017 by the Genetics Society of America.
Jang, In Sock; Dienstmann, Rodrigo; Margolin, Adam A; Guinney, Justin
2015-01-01
Complex mechanisms involving genomic aberrations in numerous proteins and pathways are believed to be a key cause of many diseases such as cancer. With recent advances in genomics, elucidating the molecular basis of cancer at a patient level is now feasible, and has led to personalized treatment strategies whereby a patient is treated according to his or her genomic profile. However, there is growing recognition that existing treatment modalities are overly simplistic, and do not fully account for the deep genomic complexity associated with sensitivity or resistance to cancer therapies. To overcome these limitations, large-scale pharmacogenomic screens of cancer cell lines--in conjunction with modern statistical learning approaches--have been used to explore the genetic underpinnings of drug response. While these analyses have demonstrated the ability to infer genetic predictors of compound sensitivity, to date most modeling approaches have been data-driven, i.e. they do not explicitly incorporate domain-specific knowledge (priors) in the process of learning a model. While a purely data-driven approach offers an unbiased perspective of the data--and may yield unexpected or novel insights--this strategy introduces challenges for both model interpretability and accuracy. In this study, we propose a novel prior-incorporated sparse regression model in which the choice of informative predictor sets is carried out by knowledge-driven priors (gene sets) in a stepwise fashion. Under regularization in a linear regression model, our algorithm is able to incorporate prior biological knowledge across the predictive variables thereby improving the interpretability of the final model with no loss--and often an improvement--in predictive performance. We evaluate the performance of our algorithm compared to well-known regularization methods such as LASSO, Ridge and Elastic net regression in the Cancer Cell Line Encyclopedia (CCLE) and Genomics of Drug Sensitivity in Cancer (Sanger) pharmacogenomics datasets, demonstrating that incorporation of the biological priors selected by our model confers improved predictability and interpretability, despite much fewer predictors, over existing state-of-the-art methods.
Atanur, Santosh S; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R; Kaisaki, Pamela J; Otto, Georg W; Ma, Man Chun John; Keane, Thomas M; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J
2013-08-01
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Atanur, Santosh S.; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R.; Kaisaki, Pamela J.; Otto, Georg W.; Ma, Man Chun John; Keane, Thomas M.; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R.; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J.; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J.
2013-01-01
Summary Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. PaperClip PMID:23890820
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.
van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem
2015-10-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.
Selections that isolate recombinant mitochondrial genomes in animals
Ma, Hansong; O'Farrell, Patrick H
2015-01-01
Homologous recombination is widespread and catalyzes evolution. Nonetheless, its existence in animal mitochondrial DNA is questioned. We designed selections for recombination between co-resident mitochondrial genomes in various heteroplasmic Drosophila lines. In four experimental settings, recombinant genomes became the sole or dominant genome in the progeny. Thus, selection uncovers occurrence of homologous recombination in Drosophila mtDNA and documents its functional benefit. Double-strand breaks enhanced recombination in the germline and revealed somatic recombination. When the recombination partner was a diverged Drosophila melanogaster genome or a genome from a different species such as Drosophila yakuba, sequencing revealed long continuous stretches of exchange. In addition, the distribution of sequence polymorphisms in recombinants allowed us to map a selected trait to a particular region in the Drosophila mitochondrial genome. Thus, recombination can be harnessed to dissect function and evolution of mitochondrial genome. DOI: http://dx.doi.org/10.7554/eLife.07247.001 PMID:26237110
Genomic Prediction Accounting for Residual Heteroskedasticity.
Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M
2015-11-12
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. Copyright © 2016 Ou et al.
Spiked GBS: A unified, open platform for single marker genotyping and whole-genome profiling
USDA-ARS?s Scientific Manuscript database
In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of t...
Simulating a base population in honey bee for molecular genetic studies
2012-01-01
Background Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee. Results Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ2 statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r2 values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency. Conclusion We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html PMID:22520469
Simulating a base population in honey bee for molecular genetic studies.
Gupta, Pooja; Conrad, Tim; Spötter, Andreas; Reinsch, Norbert; Bienefeld, Kaspar
2012-06-27
Over the past years, reports have indicated that honey bee populations are declining and that infestation by an ecto-parasitic mite (Varroa destructor) is one of the main causes. Selective breeding of resistant bees can help to prevent losses due to the parasite, but it requires that a robust breeding program and genetic evaluation are implemented. Genomic selection has emerged as an important tool in animal breeding programs and simulation studies have shown that it yields more accurate breeding value estimates, higher genetic gain and low rates of inbreeding. Since genomic selection relies on marker data, simulations conducted on a genomic dataset are a pre-requisite before selection can be implemented. Although genomic datasets have been simulated in other species undergoing genetic evaluation, simulation of a genomic dataset specific to the honey bee is required since this species has a distinct genetic and reproductive biology. Our software program was aimed at constructing a base population by simulating a random mating honey bee population. A forward-time population simulation approach was applied since it allows modeling of genetic characteristics and reproductive behavior specific to the honey bee. Our software program yielded a genomic dataset for a base population in linkage disequilibrium. In addition, information was obtained on (1) the position of markers on each chromosome, (2) allele frequency, (3) χ(2) statistics for Hardy-Weinberg equilibrium, (4) a sorted list of markers with a minor allele frequency less than or equal to the input value, (5) average r(2) values of linkage disequilibrium between all simulated marker loci pair for all generations and (6) average r2 value of linkage disequilibrium in the last generation for selected markers with the highest minor allele frequency. We developed a software program that takes into account the genetic and reproductive biology specific to the honey bee and that can be used to constitute a genomic dataset compatible with the simulation studies necessary to optimize breeding programs. The source code together with an instruction file is freely accessible at http://msproteomics.org/Research/Misc/honeybeepopulationsimulator.html.
Effect of Artificial Selection on Runs of Homozygosity in U.S. Holstein Cattle
Kim, Eui-Soo; Cole, John B.; Huson, Heather; Wiggans, George R.; Van Tassell, Curtis P.; Crooker, Brian A.; Liu, George; Da, Yang; Sonstegard, Tad S.
2013-01-01
The intensive selection programs for milk made possible by mass artificial insemination increased the similarity among the genomes of North American (NA) Holsteins tremendously since the 1960s. This migration of elite alleles has caused certain regions of the genome to have runs of homozygosity (ROH) occasionally spanning millions of continuous base pairs at a specific locus. In this study, genome signatures of artificial selection in NA Holsteins born between 1953 and 2008 were identified by comparing changes in ROH between three distinct groups under different selective pressure for milk production. The ROH regions were also used to estimate the inbreeding coefficients. The comparisons of genomic autozygosity between groups selected or unselected since 1964 for milk production revealed significant differences with respect to overall ROH frequency and distribution. These results indicate selection has increased overall autozygosity across the genome, whereas the autozygosity in an unselected line has not changed significantly across most of the chromosomes. In addition, ROH distribution was more variable across the genomes of selected animals in comparison to a more even ROH distribution for unselected animals. Further analysis of genome-wide autozygosity changes and the association between traits and haplotypes identified more than 40 genomic regions under selection on several chromosomes (Chr) including Chr 2, 7, 16 and 20. Many of these selection signatures corresponded to quantitative trait loci for milk, fat, and protein yield previously found in contemporary Holsteins. PMID:24348915
Jackson, Benjamin C.; Campos, José L.; Haddrill, Penelope R.; Charlesworth, Brian
2017-01-01
Four-fold degenerate coding sites form a major component of the genome, and are often used to make inferences about selection and demography, so that understanding their evolution is important. Despite previous efforts, many questions regarding the causes of base composition changes at these sites in Drosophila remain unanswered. To shed further light on this issue, we obtained a new whole-genome polymorphism data set from D. simulans. We analyzed samples from the putatively ancestral range of D. simulans, as well as an existing polymorphism data set from an African population of D. melanogaster. By using D. yakuba as an outgroup, we found clear evidence for selection on 4-fold sites along both lineages over a substantial period, with the intensity of selection increasing with GC content. Based on an explicit model of base composition evolution, we suggest that the observed AT-biased substitution pattern in both lineages is probably due to an ancestral reduction in selection intensity, and is unlikely to be the result of an increase in mutational bias towards AT alone. By using two polymorphism-based methods for estimating selection coefficients over different timescales, we show that the selection intensity on codon usage has been rather stable in D. simulans in the recent past, but the long-term estimates in D. melanogaster are much higher than the short-term ones, indicating a continuing decline in selection intensity, to such an extent that the short-term estimates suggest that selection is only active in the most GC-rich parts of the genome. Finally, we provide evidence for complex evolutionary patterns in the putatively neutral short introns, which cannot be explained by the standard GC-biased gene conversion model. These results reveal a dynamic picture of base composition evolution. PMID:28082609
Dadousis, C; Biffani, S; Cipolat-Gotet, C; Nicolazzi, E L; Rossoni, A; Santus, E; Bittante, G; Cecchinato, A
2016-05-01
Cheese production is increasing in many countries, and a desire toward genetic selection for milk coagulation properties in dairy cattle breeding exists. However, measurements of individual cheesemaking properties are hampered by high costs and labor, whereas traditional single-point milk coagulation properties (MCP) are sometimes criticized. Nevertheless, new modeling of the entire curd firmness and syneresis process (CFt equation) offers new insight into the cheesemaking process. Moreover, identification of genomic regions regulating milk cheesemaking properties might enhance direct selection of individuals in breeding programs based on cheese ability rather than related milk components. Therefore, the objective of this study was to perform genome-wide association studies to identify genomic regions linked to traditional MCP and new CFt parameters, milk acidity (pH), and milk protein percentage. Milk and DNA samples from 1,043 Italian Brown Swiss cows were used. Milk pH and 3 MCP traits were grouped together to represent the MCP set. Four CFt equation parameters, 2 derived traits, and protein percentage were considered as the second group of traits (CFt set). Animals were genotyped with the Illumina SNP50 BeadChip v.2 (Illumina Inc., San Diego, CA). Multitrait animal models were used to estimate variance components. For genome-wide association studies, the genome-wide association using mixed model and regression-genomic control approach was used. In total, 106 significant marker traits associations and 66 single nucleotide polymorphisms were identified on 12 chromosomes (1, 6, 9, 11, 13, 15, 16, 19, 20, 23, 26, and 28). Sharp peaks were detected at 84 to 88 Mbp on Bos taurus autosome (BTA) 6, with a peak at 87.4 Mbp in the region harboring the casein genes. Evidence of quantitative trait loci at 82.6 and 88.4 Mbp on the same chromosome was found. All chromosomes but BTA6, BTA11, and BTA28 were associated with only one trait. Only BTA6 was in common between MCP and CFt sets. The new CFt traits reinforced the support of MCP signals and provided with additional information on genomic regions that might be involved in regulation of the coagulation process of bovine milk. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T
2009-11-01
The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.
Performance of genomic prediction within and across generations in maritime pine.
Bartholomé, Jérôme; Van Heerwaarden, Joost; Isik, Fikret; Boury, Christophe; Vidal, Marjorie; Plomion, Christophe; Bouffier, Laurent
2016-08-11
Genomic selection (GS) is a promising approach for decreasing breeding cycle length in forest trees. Assessment of progeny performance and of the prediction accuracy of GS models over generations is therefore a key issue. A reference population of maritime pine (Pinus pinaster) with an estimated effective inbreeding population size (status number) of 25 was first selected with simulated data. This reference population (n = 818) covered three generations (G0, G1 and G2) and was genotyped with 4436 single-nucleotide polymorphism (SNP) markers. We evaluated the effects on prediction accuracy of both the relatedness between the calibration and validation sets and validation on the basis of progeny performance. Pedigree-based (best linear unbiased prediction, ABLUP) and marker-based (genomic BLUP and Bayesian LASSO) models were used to predict breeding values for three different traits: circumference, height and stem straightness. On average, the ABLUP model outperformed genomic prediction models, with a maximum difference in prediction accuracies of 0.12, depending on the trait and the validation method. A mean difference in prediction accuracy of 0.17 was found between validation methods differing in terms of relatedness. Including the progenitors in the calibration set reduced this difference in prediction accuracy to 0.03. When only genotypes from the G0 and G1 generations were used in the calibration set and genotypes from G2 were used in the validation set (progeny validation), prediction accuracies ranged from 0.70 to 0.85. This study suggests that the training of prediction models on parental populations can predict the genetic merit of the progeny with high accuracy: an encouraging result for the implementation of GS in the maritime pine breeding program.
Megacycles of atmospheric carbon dioxide concentration correlate with fossil plant genome size.
Franks, Peter J; Freckleton, Rob P; Beaulieu, Jeremy M; Leitch, Ilia J; Beerling, David J
2012-02-19
Tectonic processes drive megacycles of atmospheric carbon dioxide (CO(2)) concentration, c(a), that force large fluctuations in global climate. With a period of several hundred million years, these megacycles have been linked to the evolution of vascular plants, but adaptation at the subcellular scale has been difficult to determine because fossils typically do not preserve this information. Here we show, after accounting for evolutionary relatedness using phylogenetic comparative methods, that plant nuclear genome size (measured as the haploid DNA amount) and the size of stomatal guard cells are correlated across a broad taxonomic range of extant species. This phylogenetic regression was used to estimate the mean genome size of fossil plants from the size of fossil stomata. For the last 400 Myr, spanning almost the full evolutionary history of vascular plants, we found a significant correlation between fossil plant genome size and c(a), modelled independently using geochemical data. The correlation is consistent with selection for stomatal size and genome size by c(a) as plants adapted towards optimal leaf gas exchange under a changing CO(2) regime. Our findings point to the possibility that major episodes of change in c(a) throughout Earth history might have selected for changes in genome size, influencing plant diversification.
NASA Astrophysics Data System (ADS)
Hsu, Fei-Man; Chen, Pao-Yang
2017-03-01
Von Neumann and Morgenstern published the Theory of Games and Economic Behavior in 1944, describing game theory as a model in which intelligent rational decision-makers manage to find their best strategies in conflict, cooperative or other mutualistic relationships to acquire the greatest benefit [1]. This model was subsequently incorporated in ecology to simulate the ;fitness; of a species during natural selection, designated evolutionary game theory (EGT) [2]. Wang et al. proposed ;epiGame;, taking paternal and maternal genomes as ;intelligent; players that compete, cooperate or both during embryogenesis to maximize the fitness of the embryo [3]. They further extended game theory to an individual or single cell environment. During early zygote development, DNA methylation is reprogrammed such that the paternal genome is demethylated before the maternal genome. After the reset, the blastocyst is re-methylated during embryogenesis. At that time, the paternal and maternal genomes have a conflict of interest related to the expression of their own genes. The proposed epiGame models such interactive regulation between the parental genomes to reach a balance for embryo development (equation (2)).
Viral genome structures, charge, and sequences are optimal for capsid assembly
NASA Astrophysics Data System (ADS)
Hagan, Michael
2014-03-01
For many viruses, the spontaneous assembly of a capsid shell around the nu-cleic acid (NA) genome is an essential step in the viral life cycle. Capsid formation is a multicomponent, out-of-equilibrium assembly process for which kinetic effects and thermodynamic constraints compete to determine the outcome. Understand-ing how viral components drive highly efficient assembly under these constraints could promote biomedical efforts to block viral propagation, and would elucidate the factors controlling assembly in a wide range of systems containing proteins and polyelectrolytes. This talk will describe coarse-grained models of capsid proteins and NAs with which we investigate the dynamics and thermodynamics of virus assembly. In con-trast to recent theoretical models, we find that capsids spontaneously `overcharge' that is, the NA length which is kinetically and thermodynamically optimal possess-es a negative charge greater than the positive charge of the capsid. When applied to specific virus capsids, the calculated optimal NA lengths closely correspond to the natural viral genome lengths. These results suggest that the features included in this model (i.e. electrostatics, excluded volume, and NA tertiary structure) play key roles in determining assembly thermodynamics and consequently exert selec-tive pressure on viral evolution. I will then discuss mechanisms by which se-quence-specific interactions between NAs and capsid proteins promote selective encapsidation of the viral genome. This work was supported by NIH R01GM108021 and the Brandeis MRSEC NSF-MRSEC-0820492.
USDA-ARS?s Scientific Manuscript database
The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...
Acosta-Pech, Rocío; Crossa, José; de Los Campos, Gustavo; Teyssèdre, Simon; Claustres, Bruno; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino
2017-07-01
A new genomic model that incorporates genotype × environment interaction gave increased prediction accuracy of untested hybrid response for traits such as percent starch content, percent dry matter content and silage yield of maize hybrids. The prediction of hybrid performance (HP) is very important in agricultural breeding programs. In plant breeding, multi-environment trials play an important role in the selection of important traits, such as stability across environments, grain yield and pest resistance. Environmental conditions modulate gene expression causing genotype × environment interaction (G × E), such that the estimated genetic correlations of the performance of individual lines across environments summarize the joint action of genes and environmental conditions. This article proposes a genomic statistical model that incorporates G × E for general and specific combining ability for predicting the performance of hybrids in environments. The proposed model can also be applied to any other hybrid species with distinct parental pools. In this study, we evaluated the predictive ability of two HP prediction models using a cross-validation approach applied in extensive maize hybrid data, comprising 2724 hybrids derived from 507 dent lines and 24 flint lines, which were evaluated for three traits in 58 environments over 12 years; analyses were performed for each year. On average, genomic models that include the interaction of general and specific combining ability with environments have greater predictive ability than genomic models without interaction with environments (ranging from 12 to 22%, depending on the trait). We concluded that including G × E in the prediction of untested maize hybrids increases the accuracy of genomic models.
Meuwissen, Theo H E; Indahl, Ulf G; Ødegård, Jørgen
2017-12-27
Non-linear Bayesian genomic prediction models such as BayesA/B/C/R involve iteration and mostly Markov chain Monte Carlo (MCMC) algorithms, which are computationally expensive, especially when whole-genome sequence (WGS) data are analyzed. Singular value decomposition (SVD) of the genotype matrix can facilitate genomic prediction in large datasets, and can be used to estimate marker effects and their prediction error variances (PEV) in a computationally efficient manner. Here, we developed, implemented, and evaluated a direct, non-iterative method for the estimation of marker effects for the BayesC genomic prediction model. The BayesC model assumes a priori that markers have normally distributed effects with probability [Formula: see text] and no effect with probability (1 - [Formula: see text]). Marker effects and their PEV are estimated by using SVD and the posterior probability of the marker having a non-zero effect is calculated. These posterior probabilities are used to obtain marker-specific effect variances, which are subsequently used to approximate BayesC estimates of marker effects in a linear model. A computer simulation study was conducted to compare alternative genomic prediction methods, where a single reference generation was used to estimate marker effects, which were subsequently used for 10 generations of forward prediction, for which accuracies were evaluated. SVD-based posterior probabilities of markers having non-zero effects were generally lower than MCMC-based posterior probabilities, but for some regions the opposite occurred, resulting in clear signals for QTL-rich regions. The accuracies of breeding values estimated using SVD- and MCMC-based BayesC analyses were similar across the 10 generations of forward prediction. For an intermediate number of generations (2 to 5) of forward prediction, accuracies obtained with the BayesC model tended to be slightly higher than accuracies obtained using the best linear unbiased prediction of SNP effects (SNP-BLUP model). When reducing marker density from WGS data to 30 K, SNP-BLUP tended to yield the highest accuracies, at least in the short term. Based on SVD of the genotype matrix, we developed a direct method for the calculation of BayesC estimates of marker effects. Although SVD- and MCMC-based marker effects differed slightly, their prediction accuracies were similar. Assuming that the SVD of the marker genotype matrix is already performed for other reasons (e.g. for SNP-BLUP), computation times for the BayesC predictions were comparable to those of SNP-BLUP.
Sun, Yu; Tamarit, Daniel
2017-01-01
Abstract The major codon preference model suggests that codons read by tRNAs in high concentrations are preferentially utilized in highly expressed genes. However, the identity of the optimal codons differs between species although the forces driving such changes are poorly understood. We suggest that these questions can be tackled by placing codon usage studies in a phylogenetic framework and that bacterial genomes with extreme nucleotide composition biases provide informative model systems. Switches in the background substitution biases from GC to AT have occurred in Gardnerella vaginalis (GC = 32%), and from AT to GC in Lactobacillus delbrueckii (GC = 62%) and Lactobacillus fermentum (GC = 63%). We show that despite the large effects on codon usage patterns by these switches, all three species evolve under selection on synonymous sites. In G. vaginalis, the dramatic codon frequency changes coincide with shifts of optimal codons. In contrast, the optimal codons have not shifted in the two Lactobacillus genomes despite an increased fraction of GC-ending codons. We suggest that all three species are in different phases of an on-going shift of optimal codons, and attribute the difference to a stronger background substitution bias and/or longer time since the switch in G. vaginalis. We show that comparative and correlative methods for optimal codon identification yield conflicting results for genomes in flux and discuss possible reasons for the mispredictions. We conclude that switches in the direction of the background substitution biases can drive major shifts in codon preference patterns even under sustained selection on synonymous codon sites. PMID:27540085
Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction
Bandeira e Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose
2017-01-01
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. PMID:28455415
Genomic-Enabled Prediction in Maize Using Kernel Models with Genotype × Environment Interaction.
Bandeira E Sousa, Massaine; Cuevas, Jaime; de Oliveira Couto, Evellyn Giselly; Pérez-Rodríguez, Paulino; Jarquín, Diego; Fritsche-Neto, Roberto; Burgueño, Juan; Crossa, Jose
2017-06-07
Multi-environment trials are routinely conducted in plant breeding to select candidates for the next selection cycle. In this study, we compare the prediction accuracy of four developed genomic-enabled prediction models: (1) single-environment, main genotypic effect model (SM); (2) multi-environment, main genotypic effects model (MM); (3) multi-environment, single variance G×E deviation model (MDs); and (4) multi-environment, environment-specific variance G×E deviation model (MDe). Each of these four models were fitted using two kernel methods: a linear kernel Genomic Best Linear Unbiased Predictor, GBLUP (GB), and a nonlinear kernel Gaussian kernel (GK). The eight model-method combinations were applied to two extensive Brazilian maize data sets (HEL and USP data sets), having different numbers of maize hybrids evaluated in different environments for grain yield (GY), plant height (PH), and ear height (EH). Results show that the MDe and the MDs models fitted with the Gaussian kernel (MDe-GK, and MDs-GK) had the highest prediction accuracy. For GY in the HEL data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 9 to 32%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 9 to 49%. For GY in the USP data set, the increase in prediction accuracy of SM-GK over SM-GB ranged from 0 to 7%. For the MM, MDs, and MDe models, the increase in prediction accuracy of GK over GB ranged from 34 to 70%. For traits PH and EH, gains in prediction accuracy of models with GK compared to models with GB were smaller than those achieved in GY. Also, these gains in prediction accuracy decreased when a more difficult prediction problem was studied. Copyright © 2017 Bandeira e Sousa et al.
Assessing genomic selection prediction accuracy in a dynamic barley breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection is a method to improve quantitative traits in crops and livestock by estimating breeding values of selection candidates using phenotype and genome-wide marker data sets. Prediction accuracy has been evaluated through simulation and cross-validation, however validation based on prog...
Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models
Sousa, Vitor C.; Carneiro, Miguel; Ferrand, Nuno; Hey, Jody
2013-01-01
When divergence occurs in the presence of gene flow, there can arise an interesting dynamic in which selection against gene flow, at sites associated with population-specific adaptations or genetic incompatibilities, can cause net gene flow to vary across the genome. Loci linked to sites under selection may experience reduced gene flow and may experience genetic bottlenecks by the action of nearby selective sweeps. Data from histories such as these may be poorly fitted by conventional neutral model approaches to demographic inference, which treat all loci as equally subject to forces of genetic drift and gene flow. To allow for demographic inference in the face of such histories, as well as the identification of loci affected by selection, we developed an isolation-with-migration model that explicitly provides for variation among genomic regions in migration rates and/or rates of genetic drift. The method allows for loci to fall into any of multiple groups, each characterized by a different set of parameters, thus relaxing the assumption that all loci share the same demography. By grouping loci, the method can be applied to data with multiple loci and still have tractable dimensionality and statistical power. We studied the performance of the method using simulated data, and we applied the method to study the divergence of two subspecies of European rabbits (Oryctolagus cuniculus). PMID:23457232
Divergent clonal selection dominates medulloblastoma at recurrence
Morrissy, A. Sorana; Garzia, Livia; Shih, David J. H.; Zuyderduyn, Scott; Huang, Xi; Skowron, Patryk; Remke, Marc; Cavalli, Florence M. G.; Ramaswamy, Vijay; Lindsay, Patricia E.; Jelveh, Salomeh; Donovan, Laura K.; Wang, Xin; Luu, Betty; Zayne, Kory; Li, Yisu; Mayoh, Chelsea; Thiessen, Nina; Mercier, Eloi; Mungall, Karen L.; Ma, Yusanne; Tse, Kane; Zeng, Thomas; Shumansky, Karey; Roth, Andrew J. L.; Shah, Sohrab; Farooq, Hamza; Kijima, Noriyuki; Holgado, Borja L.; Lee, John J. Y.; Matan-Lithwick, Stuart; Liu, Jessica; Mack, Stephen C.; Manno, Alex; Michealraj, K. A.; Nor, Carolina; Peacock, John; Qin, Lei; Reimand, Juri; Rolider, Adi; Thompson, Yuan Y.; Wu, Xiaochong; Pugh, Trevor; Ally, Adrian; Bilenky, Mikhail; Butterfield, Yaron S. N.; Carlsen, Rebecca; Cheng, Young; Chuah, Eric; Corbett, Richard D.; Dhalla, Noreen; He, An; Lee, Darlene; Li, Haiyan I.; Long, William; Mayo, Michael; Plettner, Patrick; Qian, Jenny Q.; Schein, Jacqueline E.; Tam, Angela; Wong, Tina; Birol, Inanc; Zhao, Yongjun; Faria, Claudia C.; Pimentel, José; Nunes, Sofia; Shalaby, Tarek; Grotzer, Michael; Pollack, Ian F.; Hamilton, Ronald L.; Li, Xiao-Nan; Bendel, Anne E.; Fults, Daniel W.; Walter, Andrew W.; Kumabe, Toshihiro; Tominaga, Teiji; Collins, V. Peter; Cho, Yoon-Jae; Hoffman, Caitlin; Lyden, David; Wisoff, Jeffrey H.; Garvin, James H.; Stearns, Duncan S.; Massimi, Luca; Schüller, Ulrich; Sterba, Jaroslav; Zitterbart, Karel; Puget, Stephanie; Ayrault, Olivier; Dunn, Sandra E.; Tirapelli, Daniela P. C.; Carlotti, Carlos G.; Wheeler, Helen; Hallahan, Andrew R.; Ingram, Wendy; MacDonald, Tobey J.; Olson, Jeffrey J.; Van Meir, Erwin G.; Lee, Ji-Yeoun; Wang, Kyu-Chang; Kim, Seung-Ki; Cho, Byung-Kyu; Pietsch, Torsten; Fleischhack, Gudrun; Tippelt, Stephan; Ra, Young Shin; Bailey, Simon; Lindsey, Janet C.; Clifford, Steven C.; Eberhart, Charles G.; Cooper, Michael K.; Packer, Roger J.; Massimino, Maura; Garre, Maria Luisa; Bartels, Ute; Tabori, Uri; Hawkins, Cynthia E.; Dirks, Peter; Bouffet, Eric; Rutka, James T.; Wechsler-Reya, Robert J.; Weiss, William A.; Collier, Lara S.; Dupuy, Adam J.; Korshunov, Andrey; Jones, David T. W.; Kool, Marcel; Northcott, Paul A.; Pfister, Stefan M.; Largaespada, David A.; Mungall, Andrew J.; Moore, Richard A.; Jabado, Nada; Bader, Gary D.; Jones, Steven J. M.; Malkin, David; Marra, Marco A.; Taylor, Michael D.
2016-01-01
The development of targeted anti-cancer therapies through the study of cancer genomes is intended to increase survival rates and decrease treatment-related toxicity. We treated a transposon–driven, functional genomic mouse model of medulloblastoma with ‘humanized’ in vivo therapy (microneurosurgical tumour resection followed by multi-fractionated, image-guided radiotherapy). Genetic events in recurrent murine medulloblastoma exhibit a very poor overlap with those in matched murine diagnostic samples (<5%). Whole-genome sequencing of 33 pairs of human diagnostic and post-therapy medulloblastomas demonstrated substantial genetic divergence of the dominant clone after therapy (<12% diagnostic events were retained at recurrence). In both mice and humans, the dominant clone at recurrence arose through clonal selection of a pre-existing minor clone present at diagnosis. Targeted therapy is unlikely to be effective in the absence of the target, therefore our results offer a simple, proximal, and remediable explanation for the failure of prior clinical trials of targeted therapy. PMID:26760213
Mutants of Cre recombinase with improved accuracy
Eroshenko, Nikolai; Church, George M.
2013-01-01
Despite rapid advances in genome engineering technologies, inserting genes into precise locations in the human genome remains an outstanding problem. It has been suggested that site-specific recombinases can be adapted towards use as transgene delivery vectors. The specificity of recombinases can be altered either with directed evolution or via fusions to modular DNA-binding domains. Unfortunately, both wildtype and altered variants often have detectable activities at off-target sites. Here we use bacterial selections to identify mutations in the dimerization surface of Cre recombinase (R32V, R32M, and 303GVSdup) that improve the accuracy of recombination. The mutants are functional in bacteria, in human cells, and in vitro (except for 303GVSdup, which we did not purify), and have improved selectivity against both model off-target sites and the entire E. coli genome. We propose that destabilizing binding cooperativity may be a general strategy for improving the accuracy of dimeric DNA-binding proteins. PMID:24056590
A deep auto-encoder model for gene expression prediction.
Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua
2017-11-17
Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Development of plant condition measurement - The Jimah Model
NASA Astrophysics Data System (ADS)
Evans, Roy F.; Syuhaimi, Mohd; Mazli, Mohammad; Kamarudin, Nurliyana; Maniza Othman, Faiz
2012-05-01
The Jimah Model is an information management model. The model has been designed to facilitate analysis of machine condition by integrating diagnostic data with quantitative and qualitative information. The model treats data as a single strand of information - metaphorically a 'genome' of data. The 'Genome' is structured to be representative of plant function and identifies the condition of selected components (or genes) in each machine. To date in industry, computer aided work processes used with traditional industrial practices, have been unable to consistently deliver a standard of information suitable for holistic evaluation of machine condition and change. Significantly the reengineered site strategies necessary for implementation of this "data genome concept" have resulted in enhanced knowledge and management of plant condition. In large plant with high initial equipment cost and subsequent high maintenance costs, accurate measurement of major component condition becomes central to whole of life management and replacement decisions. A case study following implementation of the model at a major power station site in Malaysia (Jimah) shows that modeling of plant condition and wear (in real time) can be made a practical reality.
Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping
2013-12-27
With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.
2012-01-01
Background Multi-trait genomic models in a Bayesian context can be used to estimate genomic (co)variances, either for a complete genome or for genomic regions (e.g. per chromosome) for the purpose of multi-trait genomic selection or to gain further insight into the genomic architecture of related traits such as mammary disease traits in dairy cattle. Methods Data on progeny means of six traits related to mastitis resistance in dairy cattle (general mastitis resistance and five pathogen-specific mastitis resistance traits) were analyzed using a bivariate Bayesian SNP-based genomic model with a common prior distribution for the marker allele substitution effects and estimation of the hyperparameters in this prior distribution from the progeny means data. From the Markov chain Monte Carlo samples of the allele substitution effects, genomic (co)variances were calculated on a whole-genome level, per chromosome, and in regions of 100 SNP on a chromosome. Results Genomic proportions of the total variance differed between traits. Genomic correlations were lower than pedigree-based genetic correlations and they were highest between general mastitis and pathogen-specific traits because of the part-whole relationship between these traits. The chromosome-wise genomic proportions of the total variance differed between traits, with some chromosomes explaining higher or lower values than expected in relation to chromosome size. Few chromosomes showed pleiotropic effects and only chromosome 19 had a clear effect on all traits, indicating the presence of QTL with a general effect on mastitis resistance. The region-wise patterns of genomic variances differed between traits. Peaks indicating QTL were identified but were not very distinctive because a common prior for the marker effects was used. There was a clear difference in the region-wise patterns of genomic correlation among combinations of traits, with distinctive peaks indicating the presence of pleiotropic QTL. Conclusions The results show that it is possible to estimate, genome-wide and region-wise genomic (co)variances of mastitis resistance traits in dairy cattle using multivariate genomic models. PMID:22640006
Global Organization of a Positive-strand RNA Virus Genome
Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew
2013-01-01
The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie
2014-06-18
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.« less
Diploidy and the selective advantage for sexual reproduction in unicellular organisms.
Kleiman, Maya; Tannenbaum, Emmanuel
2009-11-01
This article develops mathematical models describing the evolutionary dynamics of both asexually and sexually reproducing populations of diploid unicellular organisms. The asexual and sexual life cycles are based on the asexual and sexual life cycles in Saccharomyces cerevisiae, Baker's yeast, which normally reproduces by asexual budding, but switches to sexual reproduction when stressed. The mathematical models consider three reproduction pathways: (1) Asexual reproduction, (2) self-fertilization, and (3) sexual reproduction. We also consider two forms of genome organization. In the first case, we assume that the genome consists of two multi-gene chromosomes, whereas in the second case, we consider the opposite extreme and assume that each gene defines a separate chromosome, which we call the multi-chromosome genome. These two cases are considered to explore the role that recombination has on the mutation-selection balance and the selective advantage of the various reproduction strategies. We assume that the purpose of diploidy is to provide redundancy, so that damage to a gene may be repaired using the other, presumably undamaged copy (a process known as homologous recombination repair). As a result, we assume that the fitness of the organism only depends on the number of homologous gene pairs that contain at least one functional copy of a given gene. If the organism has at least one functional copy of every gene in the genome, we assume a fitness of 1. In general, if the organism has l homologous pairs that lack a functional copy of the given gene, then the fitness of the organism is kappa(l). The kappa(l) are assumed to be monotonically decreasing, so that kappa(0) = 1 > kappa(1) > kappa(2) > cdots, three dots, centered > kappa(infinity) = 0. For nearly all of the reproduction strategies we consider, we find, in the limit of large N, that the mean fitness at mutation-selection balance is max{2e(-mu) - 1,0} where N is the number of genes in the haploid set of the genome, epsilon is the probability that a given DNA template strand of a given gene produces a mutated daughter during replication, and mu = Nepsilon. The only exception is the sexual reproduction pathway for the multi-chromosomed genome. Assuming a multiplicative fitness landscape where kappa(l) = alpha(l) for alpha in (0, 1), this strategy is found to have a mean fitness that exceeds the mean fitness of all the other strategies. Furthermore, while other reproduction strategies experience a total loss of viability due to the steady accumulation of deleterious mutations once mu exceeds [Formula: see text] no such transition occurs in the sexual pathway. Indeed, in the limit as alpha --> 1 for the multiplicative landscape, we can show that the mean fitness for the sexual pathway with the multi-chromosomed genome converges to e(-2mu), which is always positive. We explicitly allow for mitotic recombination in this study, which, in contrast to previous studies using different models, does not have any advantage over other asexual reproduction strategies. The results of this article provide a basis for understanding the selective advantage of the specific meiotic pathway that is employed by sexually reproducing organisms. The results of this article also suggest an explanation for why unicellular organisms such as Saccharomyces cerevisiae (Baker's yeast) switch to a sexual mode of reproduction when stressed. While the results of this article are based on modeling mutation-propagation in unicellular organisms, they nevertheless suggest that, in more complex organisms with significantly larger genomes, sex is necessary to prevent the loss of viability of a population due to genetic drift. Finally, and perhaps most importantly, the results of this article demonstrate a selective advantage for sexual reproduction with fewer and much less restrictive assumptions than those of previous studies.
Bernatchez, L
2016-12-01
The first goal of this paper was to overview modern approaches to local adaptation, with a focus on the use of population genomics data to detect signals of natural selection in fishes. Several mechanisms are discussed that may enhance the maintenance of genetic variation and evolutionary potential, which have been overlooked and should be considered in future theoretical development and predictive models: the prevalence of soft sweeps, polygenic basis of adaptation, balancing selection and transient polymorphisms, parallel evolution, as well as epigenetic variation. Research on fish population genomics has provided ample evidence for local adaptation at the genome level. Pervasive adaptive evolution, however, seems to almost never involve the fixation of beneficial alleles. Instead, adaptation apparently proceeds most commonly by soft sweeps entailing shifts in frequencies of alleles being shared between differentially adapted populations. One obvious factor contributing to the maintenance of standing genetic variation in the face of selective pressures is that adaptive phenotypic traits are most often highly polygenic, and consequently the response to selection should derive mostly from allelic co-variances among causative loci rather than pronounced allele frequency changes. Balancing selection in its various forms may also play an important role in maintaining adaptive genetic variation and the evolutionary potential of species to cope with environmental change. A large body of literature on fishes also shows that repeated evolution of adaptive phenotypes is a ubiquitous evolutionary phenomenon that seems to occur most often via different genetic solutions, further adding to the potential options of species to cope with a changing environment. Moreover, a paradox is emerging from recent fish studies whereby populations of highly reduced effective population sizes and impoverished genetic diversity can apparently retain their adaptive potential in some circumstances. Although more empirical support is needed, several recent studies suggest that epigenetic variation could account for this apparent paradox. Therefore, epigenetic variation should be fully integrated with considerations pertaining to role of soft sweeps, polygenic and balancing selection, as well as repeated adaptation involving different genetic basis towards improving models predicting the evolutionary potential of species to cope with a changing world. © 2016 The Fisheries Society of the British Isles.
De La Torre, Amanda R; Roberts, David R; Aitken, Sally N
2014-01-01
The maintenance of species boundaries despite interspecific gene flow has been a continuous source of interest in evolutionary biology. Many hybridizing species have porous genomes with regions impermeable to introgression, conferring reproductive barriers between species. We used ecological niche modelling to study the glacial and postglacial recolonization patterns between the widely hybridizing spruce species Picea glauca and P. engelmannii in western North America. Genome-wide estimates of admixture based on a panel of 311 candidate gene single nucleotide polymorphisms (SNP) from 290 genes were used to assess levels of admixture and introgression and to identify loci putatively involved in adaptive differences or reproductive barriers between species. Our palaeoclimatic modelling suggests that these two closely related species have a long history of hybridization and introgression, dating to at least 21 000 years ago, yet species integrity is maintained by a combination of strong environmental selection and reduced current interspecific gene flow. Twenty loci showed evidence of divergent selection, including six loci that were both Fst outliers and associated with climatic gradients, and fourteen loci that were either outliers or showed associations with climate. These included genes responsible for carbohydrate metabolism, signal transduction and transcription factors. PMID:24597663
Leempoel, Kevin; Parisod, Christian; Geiser, Céline; Joost, Stéphane
2018-02-01
Plant species are known to adapt locally to their environment, particularly in mountainous areas where conditions can vary drastically over short distances. The climate of such landscapes being largely influenced by topography, using fine-scale models to evaluate environmental heterogeneity may help detecting adaptation to micro-habitats. Here, we applied a multiscale landscape genomic approach to detect evidence of local adaptation in the alpine plant Biscutella laevigata . The two gene pools identified, experiencing limited gene flow along a 1-km ridge, were different in regard to several habitat features derived from a very high resolution (VHR) digital elevation model (DEM). A correlative approach detected signatures of selection along environmental gradients such as altitude, wind exposure, and solar radiation, indicating adaptive pressures likely driven by fine-scale topography. Using a large panel of DEM-derived variables as ecologically relevant proxies, our results highlighted the critical role of spatial resolution. These high-resolution multiscale variables indeed indicate that the robustness of associations between genetic loci and environmental features depends on spatial parameters that are poorly documented. We argue that the scale issue is critical in landscape genomics and that multiscale ecological variables are key to improve our understanding of local adaptation in highly heterogeneous landscapes.
Mitochondrial genetic codes evolve to match amino acid requirements of proteins.
Swire, Jonathan; Judson, Olivia P; Burt, Austin
2005-01-01
Mitochondria often use genetic codes different from the standard genetic code. Now that many mitochondrial genomes have been sequenced, these variant codes provide the first opportunity to examine empirically the processes that produce new genetic codes. The key question is: Are codon reassignments the sole result of mutation and genetic drift? Or are they the result of natural selection? Here we present an analysis of 24 phylogenetically independent codon reassignments in mitochondria. Although the mutation-drift hypothesis can explain reassignments from stop to an amino acid, we found that it cannot explain reassignments from one amino acid to another. In particular--and contrary to the predictions of the mutation-drift hypothesis--the codon involved in such a reassignment was not rare in the ancestral genome. Instead, such reassignments appear to take place while the codon is in use at an appreciable frequency. Moreover, the comparison of inferred amino acid usage in the ancestral genome with the neutral expectation shows that the amino acid gaining the codon was selectively favored over the amino acid losing the codon. These results are consistent with a simple model of weak selection on the amino acid composition of proteins in which codon reassignments are selected because they compensate for multiple slightly deleterious mutations throughout the mitochondrial genome. We propose that the selection pressure is for reduced protein synthesis cost: most reassignments give amino acids that are less expensive to synthesize. Taken together, our results strongly suggest that mitochondrial genetic codes evolve to match the amino acid requirements of proteins.
Genetic diversity in the interference selection limit.
Good, Benjamin H; Walczak, Aleksandra M; Neher, Richard A; Desai, Michael M
2014-03-01
Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, similar to models of quantitative genetics, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a "linkage block"). We exploit this insensitivity in a new "coarse-grained" coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations that create the same variance in fitness. This approximation generates accurate and efficient predictions for silent site variability when interference is common. However, these results suggest that there is reduced power to resolve individual selection pressures when interference is sufficiently widespread, since a broad range of parameters possess nearly identical patterns of silent site variability.
Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes.
Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W
2013-01-01
Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.
Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes
Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W.
2013-01-01
Introduction Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. Results We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Conclusion Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study. PMID:23922837
Genomic selection for quantitative adult plant stem rust resistance in wheat
USDA-ARS?s Scientific Manuscript database
Quantitative adult plant resistance (APR) to stem rust (Puccinia graminis f. sp. tritici) is an important breeding target in wheat (Triticum aestivum L.) and a potential target for genomic selection (GS). To evaluate the relative importance of known APR loci in applying genomic selection, we charact...
Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations
Ji, Yuan; Schaid, Daniel J; Desta, Zeruesenay; Kubo, Michiaki; Batzler, Anthony J; Snyder, Karen; Mushiroda, Taisei; Kamatani, Naoyuki; Ogburn, Evan; Hall-Flavin, Daniel; Flockhart, David; Nakamura, Yusuke; Mrazek, David A; Weinshilboum, Richard M
2014-01-01
Aims Citalopram (CT) and escitalopram (S-CT) are among the most widely prescribed selective serotonin reuptake inhibitors used to treat major depressive disorder (MDD). We applied a genome-wide association study to identify genetic factors that contribute to variation in plasma concentrations of CT or S-CT and their metabolites in MDD patients treated with CT or S-CT. Methods Our genome-wide association study was performed using samples from 435 MDD patients. Linear mixed models were used to account for within-subject correlations of longitudinal measures of plasma drug/metabolite concentrations (4 and 8 weeks after the initiation of drug therapy), and single-nucleotide polymorphisms (SNPs) were modelled as additive allelic effects. Results Genome-wide significant associations were observed for S-CT concentration with SNPs in or near the CYP2C19 gene on chromosome 10 (rs1074145, P = 4.1 × 10−9) and with S-didesmethylcitalopram concentration for SNPs near the CYP2D6 locus on chromosome 22 (rs1065852, P = 2.0 × 10−16), supporting the important role of these cytochrome P450 (CYP) enzymes in biotransformation of citalopram. After adjustment for the effect of CYP2C19 functional alleles, the analyses also identified novel loci that will require future replication and functional validation. Conclusions In vitro and in vivo studies have suggested that the biotransformation of CT to monodesmethylcitalopram and didesmethylcitalopram is mediated by CYP isozymes. The results of our genome-wide association study performed in MDD patients treated with CT or S-CT have confirmed those observations but also identified novel genomic loci that might play a role in variation in plasma levels of CT or its metabolites during the treatment of MDD patients with these selective serotonin reuptake inhibitors. PMID:24528284
Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations.
Ji, Yuan; Schaid, Daniel J; Desta, Zeruesenay; Kubo, Michiaki; Batzler, Anthony J; Snyder, Karen; Mushiroda, Taisei; Kamatani, Naoyuki; Ogburn, Evan; Hall-Flavin, Daniel; Flockhart, David; Nakamura, Yusuke; Mrazek, David A; Weinshilboum, Richard M
2014-08-01
Citalopram (CT) and escitalopram (S-CT) are among the most widely prescribed selective serotonin reuptake inhibitors used to treat major depressive disorder (MDD). We applied a genome-wide association study to identify genetic factors that contribute to variation in plasma concentrations of CT or S-CT and their metabolites in MDD patients treated with CT or S-CT. Our genome-wide association study was performed using samples from 435 MDD patients. Linear mixed models were used to account for within-subject correlations of longitudinal measures of plasma drug/metabolite concentrations (4 and 8 weeks after the initiation of drug therapy), and single-nucleotide polymorphisms (SNPs) were modelled as additive allelic effects. Genome-wide significant associations were observed for S-CT concentration with SNPs in or near the CYP2C19 gene on chromosome 10 (rs1074145, P = 4.1 × 10(-9) ) and with S-didesmethylcitalopram concentration for SNPs near the CYP2D6 locus on chromosome 22 (rs1065852, P = 2.0 × 10(-16) ), supporting the important role of these cytochrome P450 (CYP) enzymes in biotransformation of citalopram. After adjustment for the effect of CYP2C19 functional alleles, the analyses also identified novel loci that will require future replication and functional validation. In vitro and in vivo studies have suggested that the biotransformation of CT to monodesmethylcitalopram and didesmethylcitalopram is mediated by CYP isozymes. The results of our genome-wide association study performed in MDD patients treated with CT or S-CT have confirmed those observations but also identified novel genomic loci that might play a role in variation in plasma levels of CT or its metabolites during the treatment of MDD patients with these selective serotonin reuptake inhibitors. © 2014 The British Pharmacological Society.
Landscape genomics: natural selection drives the evolution of mitogenome in penguins.
Ramos, Barbara; González-Acuña, Daniel; Loyola, David E; Johnson, Warren E; Parker, Patricia G; Massaro, Melanie; Dantas, Gisele P M; Miranda, Marcelo D; Vianna, Juliana A
2018-01-16
Mitochondria play a key role in the balance of energy and heat production, and therefore the mitochondrial genome is under natural selection by environmental temperature and food availability, since starvation can generate more efficient coupling of energy production. However, selection over mitochondrial DNA (mtDNA) genes has usually been evaluated at the population level. We sequenced by NGS 12 mitogenomes and with four published genomes, assessed genetic variation in ten penguin species distributed from the equator to Antarctica. Signatures of selection of 13 mitochondrial protein-coding genes were evaluated by comparing among species within and among genera (Spheniscus, Pygoscelis, Eudyptula, Eudyptes and Aptenodytes). The genetic data were correlated with environmental data obtained through remote sensing (sea surface temperature [SST], chlorophyll levels [Chl] and a combination of SST and Chl [COM]) through the distribution of these species. We identified the complete mtDNA genomes of several penguin species, including ND6 and 8 tRNAs on the light strand and 12 protein coding genes, 14 tRNAs and two rRNAs positioned on the heavy strand. The highest diversity was found in NADH dehydrogenase genes and the lowest in COX genes. The lowest evolutionary divergence among species was between Humboldt (Spheniscus humboldti) and Galapagos (S. mendiculus) penguins (0.004), while the highest was observed between little penguin (Eudyptula minor) and Adélie penguin (Pygoscelis adeliae) (0.097). We identified a signature of purifying selection (Ka/Ks < 1) across the mitochondrial genome, which is consistent with the hypothesis that purifying selection is constraining mitogenome evolution to maintain Oxidative phosphorylation (OXPHOS) proteins and functionality. Pairwise species maximum-likelihood analyses of selection at codon sites suggest positive selection has occurred on ATP8 (Fixed-Effects Likelihood, FEL) and ND4 (Single Likelihood Ancestral Counting, SLAC) in all penguins. In contrast, COX1 had a signature of strong negative selection. ND4 Ka/Ks ratios were highly correlated with SST (Mantel, p-value: 0.0001; GLM, p-value: 0.00001) and thus may be related to climate adaptation throughout penguin speciation. These results identify mtDNA candidate genes under selection which could be involved in broad-scale adaptations of penguins to their environment. Such knowledge may be particularly useful for developing predictive models of how these species may respond to severe climatic changes in the future.
Repeated divergent selection on pigmentation genes in a rapid finch radiation
Campagna, Leonardo; Repenning, Márcio; Silveira, Luís Fábio; Fontana, Carla Suertegaray; Tubaro, Pablo L.; Lovette, Irby J.
2017-01-01
Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits. PMID:28560331
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Predicting discovery rates of genomic features.
Gravel, Simon
2014-06-01
Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. Copyright © 2014 by the Genetics Society of America.
Jha, Aashish R; Miles, Cecelia M; Lippert, Nodia R; Brown, Christopher D; White, Kevin P; Kreitman, Martin
2015-10-01
Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Zhen, Ying; Harrigan, Ryan J; Ruegg, Kristen C; Anderson, Eric C; Ng, Thomas C; Lao, Sirena; Lohmueller, Kirk E; Smith, Thomas B
2017-10-01
The little greenbul, a common rainforest passerine from sub-Saharan Africa, has been the subject of long-term evolutionary studies to understand the mechanisms leading to rainforest speciation. Previous research found morphological and behavioural divergence across rainforest-savannah transition zones (ecotones), and a pattern of divergence with gene flow suggesting divergent natural selection has contributed to adaptive divergence and ecotones could be important areas for rainforests speciation. Recent advances in genomics and environmental modelling make it possible to examine patterns of genetic divergence in a more comprehensive fashion. To assess the extent to which natural selection may drive patterns of differentiation, here we investigate patterns of genomic differentiation among populations across environmental gradients and regions. We find compelling evidence that individuals form discrete genetic clusters corresponding to distinctive environmental characteristics and habitat types. Pairwise F ST between populations in different habitats is significantly higher than within habitats, and this differentiation is greater than what is expected from geographic distance alone. Moreover, we identified 140 SNPs that showed extreme differentiation among populations through a genomewide selection scan. These outliers were significantly enriched in exonic and coding regions, suggesting their functional importance. Environmental association analysis of SNP variation indicates that several environmental variables, including temperature and elevation, play important roles in driving the pattern of genomic diversification. Results lend important new genomic evidence for environmental gradients being important in population differentiation. © 2017 John Wiley & Sons Ltd.
Genome characterization of the selected long- and short-sleep mouse lines.
Dowell, Robin; Odell, Aaron; Richmond, Phillip; Malmer, Daniel; Halper-Stromberg, Eitan; Bennett, Beth; Larson, Colin; Leach, Sonia; Radcliffe, Richard A
2016-12-01
The Inbred Long- and Short-Sleep (ILS, ISS) mouse lines were selected for differences in acute ethanol sensitivity using the loss of righting response (LORR) as the selection trait. The lines show an over tenfold difference in LORR and, along with a recombinant inbred panel derived from them (the LXS), have been widely used to dissect the genetic underpinnings of acute ethanol sensitivity. Here we have sequenced the genomes of the ILS and ISS to investigate the DNA variants that contribute to their sensitivity difference. We identified ~2.7 million high-confidence SNPs and small indels and ~7000 structural variants between the lines; variants were found to occur in 6382 annotated genes. Using a hidden Markov model, we were able to reconstruct the genome-wide ancestry patterns of the eight inbred progenitor strains from which the ILS and ISS were derived, and found that quantitative trait loci that have been mapped for LORR were slightly enriched for DNA variants. Finally, by mapping and quantifying RNA-seq reads from the ILS and ISS to their strain-specific genomes rather than to the reference genome, we found a substantial improvement in a differential expression analysis between the lines. This work will help in identifying and characterizing the DNA sequence variants that contribute to the difference in ethanol sensitivity between the ILS and ISS and will also aid in accurate quantification of RNA-seq data generated from the LXS RIs.
Lalusin, Antonio; Borromeo, Teresita; Gregorio, Glenn; Hernandez, Jose; Virk, Parminder; Collard, Bertrand; McCouch, Susan R.
2015-01-01
Genome-wide association mapping studies (GWAS) are frequently used to detect QTL in diverse collections of crop germplasm, based on historic recombination events and linkage disequilibrium across the genome. Generally, diversity panels genotyped with high density SNP panels are utilized in order to assay a wide range of alleles and haplotypes and to monitor recombination breakpoints across the genome. By contrast, GWAS have not generally been performed in breeding populations. In this study we performed association mapping for 19 agronomic traits including yield and yield components in a breeding population of elite irrigated tropical rice breeding lines so that the results would be more directly applicable to breeding than those from a diversity panel. The population was genotyped with 71,710 SNPs using genotyping-by-sequencing (GBS), and GWAS performed with the explicit goal of expediting selection in the breeding program. Using this breeding panel we identified 52 QTL for 11 agronomic traits, including large effect QTLs for flowering time and grain length/grain width/grain-length-breadth ratio. We also identified haplotypes that can be used to select plants in our population for short stature (plant height), early flowering time, and high yield, and thus demonstrate the utility of association mapping in breeding populations for informing breeding decisions. We conclude by exploring how the newly identified significant SNPs and insights into the genetic architecture of these quantitative traits can be leveraged to build genomic-assisted selection models. PMID:25785447
Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng
2018-01-01
Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes. PMID:29670636
Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng
2018-01-01
Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes.
Molecular hyperdiversity and evolution in very large populations.
Cutter, Asher D; Jovelin, Richard; Dey, Alivia
2013-04-01
The genomic density of sequence polymorphisms critically affects the sensitivity of inferences about ongoing sequence evolution, function and demographic history. Most animal and plant genomes have relatively low densities of polymorphisms, but some species are hyperdiverse with neutral nucleotide heterozygosity exceeding 5%. Eukaryotes with extremely large populations, mimicking bacterial and viral populations, present novel opportunities for studying molecular evolution in sexually reproducing taxa with complex development. In particular, hyperdiverse species can help answer controversial questions about the evolution of genome complexity, the limits of natural selection, modes of adaptation and subtleties of the mutation process. However, such systems have some inherent complications and here we identify topics in need of theoretical developments. Close relatives of the model organisms Caenorhabditis elegans and Drosophila melanogaster provide known examples of hyperdiverse eukaryotes, encouraging functional dissection of resulting molecular evolutionary patterns. We recommend how best to exploit hyperdiverse populations for analysis, for example, in quantifying the impact of noncrossover recombination in genomes and for determining the identity and micro-evolutionary selective pressures on noncoding regulatory elements. © 2013 Blackwell Publishing Ltd.
Genomic signatures of selection at linked sites: unifying the disparity among species
Cutter, Asher D.; Payseur, Bret A.
2014-01-01
Population genetics theory supplies powerful predictions about how natural selection interacts with genetic linkage to sculpt the genomic landscape of nucleotide polymorphism. Both the spread of beneficial mutations and removal of deleterious mutations act to depress polymorphism levels, especially in low-recombination regions. However, empiricists have documented extreme disparities among species. Here we characterize the dominant features that could drive variation in linked selection among species, including roles for selective sweeps being ‘hard’ or ‘soft’, and concealing by demography and genomic confounds. We advocate targeted studies of close relatives to unify our understanding of how selection and linkage interact to shape genome evolution. PMID:23478346
Adventures in hepatocarcinogenesis.
Pitot, Henry C
2007-01-01
Neoplasia is a heritably altered, relatively autonomous growth of tissue. Hepatocarcinogenesis, the pathogenesis of neoplasia in liver, as modeled in the rat exhibits three distinct, quantifiable stages: initiation, promotion, and progression. Simple mutations and/or epigenetic alterations may result in the irreversible stage of initiation. The stage of promotion results from selective enhancement of cell replication and selective inhibition of cellular apoptosis of initiated cells dependent on the genetic and/or epigenetic alterations of the latter. The irreversible stage of progression results from initial karyotypic alterations that evolve into greater degrees of genomic instability. The initial genomic alteration in the transition from promotion to progression may involve primarily epigenetic mechanisms driven by epigenetic and genetic alterations fixed during the stage of promotion.
Klassen, Jonathan L.
2010-01-01
Background Carotenoids are multifunctional, taxonomically widespread and biotechnologically important pigments. Their biosynthesis serves as a model system for understanding the evolution of secondary metabolism. Microbial carotenoid diversity and evolution has hitherto been analyzed primarily from structural and biosynthetic perspectives, with the few phylogenetic analyses of microbial carotenoid biosynthetic proteins using either used limited datasets or lacking methodological rigor. Given the recent accumulation of microbial genome sequences, a reappraisal of microbial carotenoid biosynthetic diversity and evolution from the perspective of comparative genomics is warranted to validate and complement models of microbial carotenoid diversity and evolution based upon structural and biosynthetic data. Methodology/Principal Findings Comparative genomics were used to identify and analyze in silico microbial carotenoid biosynthetic pathways. Four major phylogenetic lineages of carotenoid biosynthesis are suggested composed of: (i) Proteobacteria; (ii) Firmicutes; (iii) Chlorobi, Cyanobacteria and photosynthetic eukaryotes; and (iv) Archaea, Bacteroidetes and two separate sub-lineages of Actinobacteria. Using this phylogenetic framework, specific evolutionary mechanisms are proposed for carotenoid desaturase CrtI-family enzymes and carotenoid cyclases. Several phylogenetic lineage-specific evolutionary mechanisms are also suggested, including: (i) horizontal gene transfer; (ii) gene acquisition followed by differential gene loss; (iii) co-evolution with other biochemical structures such as proteorhodopsins; and (iv) positive selection. Conclusions/Significance Comparative genomics analyses of microbial carotenoid biosynthetic proteins indicate a much greater taxonomic diversity then that identified based on structural and biosynthetic data, and divides microbial carotenoid biosynthesis into several, well-supported phylogenetic lineages not evident previously. This phylogenetic framework is applicable to understanding the evolution of specific carotenoid biosynthetic proteins or the unique characteristics of carotenoid biosynthetic evolution in a specific phylogenetic lineage. Together, these analyses suggest a “bramble” model for microbial carotenoid biosynthesis whereby later biosynthetic steps exhibit greater evolutionary plasticity and reticulation compared to those closer to the biosynthetic “root”. Structural diversification may be constrained (“trimmed”) where selection is strong, but less so where selection is weaker. These analyses also highlight likely productive avenues for future research and bioprospecting by identifying both gaps in current knowledge and taxa which may particularly facilitate carotenoid diversification. PMID:20582313
Population genetics inside a cell: Mutations and mitochondrial genome maintenance
NASA Astrophysics Data System (ADS)
Goyal, Sidhartha; Shraiman, Boris; Gottschling, Dan
2012-02-01
In realistic ecological and evolutionary systems natural selection acts on multiple levels, i.e. it acts on individuals as well as on collection of individuals. An understanding of evolutionary dynamics of such systems is limited in large part due to the lack of experimental systems that can challenge theoretical models. Mitochondrial genomes (mtDNA) are subjected to selection acting on cellular as well as organelle levels. It is well accepted that mtDNA in yeast Saccharomyces cerevisiae is unstable and can degrade over time scales comparable to yeast cell division time. We utilize a recent technology designed in Gottschling lab to extract DNA from populations of aged yeast cells and deep sequencing to characterize mtDNA variation in a population of young and old cells. In tandem, we developed a stochastic model that includes the essential features of mitochondrial biology that provides a null model for expected mtDNA variation. Overall, we find approximately 2% of the polymorphic loci that show significant increase in frequency as cells age providing direct evidence for organelle level selection. Such quantitative study of mtDNA dynamics is absolutely essential to understand the propagation of mtDNA mutations linked to a spectrum of age-related diseases in humans.
Chromatin Landscapes of Retroviral and Transposon Integration Profiles
Badhai, Jitendra; Rust, Alistair G.; Rad, Roland; Hilkens, John; Berns, Anton; van Lohuizen, Maarten; Wessels, Lodewyk F. A.; de Ridder, Jeroen
2014-01-01
The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of to unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%–33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes. PMID:24721906
Genome-wide scans for candidate genes involved in the aquatic adaptation of dolphins.
Sun, Yan-Bo; Zhou, Wei-Ping; Liu, He-Qun; Irwin, David M; Shen, Yong-Yi; Zhang, Ya-Ping
2013-01-01
Since their divergence from the terrestrial artiodactyls, cetaceans have fully adapted to an aquatic lifestyle, which represents one of the most dramatic transformations in mammalian evolutionary history. Numerous morphological and physiological characters of cetaceans have been acquired in response to this drastic habitat transition, such as thickened blubber, echolocation, and ability to hold their breath for a long period of time. However, knowledge about the molecular basis underlying these adaptations is still limited. The sequence of the genome of Tursiops truncates provides an opportunity for a comparative genomic analyses to examine the molecular adaptation of this species. Here, we constructed 11,838 high-quality orthologous gene alignments culled from the dolphin and four other terrestrial mammalian genomes and screened for positive selection occurring in the dolphin lineage. In total, 368 (3.1%) of the genes were identified as having undergone positive selection by the branch-site model. Functional characterization of these genes showed that they are significantly enriched in the categories of lipid transport and localization, ATPase activity, sense perception of sound, and muscle contraction, areas that are potentially related to cetacean adaptations. In contrast, we did not find a similar pattern in the cow, a closely related species. We resequenced some of the positively selected sites (PSSs), within the positively selected genes, and showed that most of our identified PSSs (50/52) could be replicated. The results from this study should have important implications for our understanding of cetacean evolution and their adaptations to the aquatic environment.
A whole genome Bayesian scan for adaptive genetic divergence in West African cattle
2009-01-01
Background The recent settlement of cattle in West Africa after several waves of migration from remote centres of domestication has imposed dramatic changes in their environmental conditions, in particular through exposure to new pathogens. West African cattle populations thus represent an appealing model to unravel the genome response to adaptation to tropical conditions. The purpose of this study was to identify footprints of adaptive selection at the whole genome level in a newly collected data set comprising 36,320 SNPs genotyped in 9 West African cattle populations. Results After a detailed analysis of population structure, we performed a scan for SNP differentiation via a previously proposed Bayesian procedure including extensions to improve the detection of loci under selection. Based on these results we identified 53 genomic regions and 42 strong candidate genes. Their physiological functions were mainly related to immune response (MHC region which was found under strong balancing selection, CD79A, CXCR4, DLK1, RFX3, SEMA4A, TICAM1 and TRIM21), nervous system (NEUROD6, OLFM2, MAGI1, SEMA4A and HTR4) and skin and hair properties (EDNRB, TRSP1 and KRTAP8-1). Conclusion The main possible underlying selective pressures may be related to climatic conditions but also to the host response to pathogens such as Trypanosoma(sp). Overall, these results might open the way towards the identification of important variants involved in adaptation to tropical conditions and in particular to resistance to tropical infectious diseases. PMID:19930592
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.
Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan
2011-11-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
Imputation of unordered markers and the impact on genomic selection accuracy
USDA-ARS?s Scientific Manuscript database
Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large propo...
The development of genomics applied to dairy breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) has profoundly changed dairy cattle breeding in the last decade and can be defined as the use of genomic breeding values (GEBV) in selection programs. The GEBV is the sum of the effects of dense DNA markers across the whole genome, capturing all the quantitative trait loci (QT...
The function of dog models in developing gene therapy strategies for human health.
Nowend, Keri L; Starr-Moss, Alison N; Murphy, Keith E
2011-08-01
The domestic dog is of great benefit to humankind, not only through companionship and working activities cultivated through domestication and selective breeding, but also as a model for biomedical research. Many single-gene traits have been well-characterized at the genomic level, and recent advances in whole-genome association studies will allow for better understanding of complex, multigenic hereditary diseases. Additionally, the dog serves as an invaluable large animal model for assessment of novel therapeutic agents. Thus, the dog has filled a crucial step in the translation of basic research to new treatment regimens for various human diseases. Four well-characterized diseases in canine models are discussed as they relate to other animal model availability, novel therapeutic approach, and extrapolation to human gene therapy trials.
Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus
2014-12-01
Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
NASA Astrophysics Data System (ADS)
Su, Hailin; Li, Hengde; Wang, Shi; Wang, Yangfan; Bao, Zhenmin
2017-02-01
Genomic selection is more and more popular in animal and plant breeding industries all around the world, as it can be applied early in life without impacting selection candidates. The objective of this study was to bring the advantages of genomic selection to scallop breeding. Two different genomic selection tools MixP and gsbay were applied on genomic evaluation of simulated data and Zhikong scallop ( Chlamys farreri) field data. The data were compared with genomic best linear unbiased prediction (GBLUP) method which has been applied widely. Our results showed that both MixP and gsbay could accurately estimate single-nucleotide polymorphism (SNP) marker effects, and thereby could be applied for the analysis of genomic estimated breeding values (GEBV). In simulated data from different scenarios, the accuracy of GEBV acquired was ranged from 0.20 to 0.78 by MixP; it was ranged from 0.21 to 0.67 by gsbay; and it was ranged from 0.21 to 0.61 by GBLUP. Estimations made by MixP and gsbay were expected to be more reliable than those estimated by GBLUP. Predictions made by gsbay were more robust, while with MixP the computation is much faster, especially in dealing with large-scale data. These results suggested that both algorithms implemented by MixP and gsbay are feasible to carry out genomic selection in scallop breeding, and more genotype data will be necessary to produce genomic estimated breeding values with a higher accuracy for the industry.
Aagaard, Jan E; George, Renee D; Fishman, Lila; Maccoss, Michael J; Swanson, Willie J
2013-01-01
Understanding the genetic basis of reproductive isolation promises insight into speciation and the origins of biological diversity. While progress has been made in identifying genes underlying barriers to reproduction that function after fertilization (post-zygotic isolation), we know much less about earlier acting pre-zygotic barriers. Of particular interest are barriers involved in mating and fertilization that can evolve extremely rapidly under sexual selection, suggesting they may play a prominent role in the initial stages of reproductive isolation. A significant challenge to the field of speciation genetics is developing new approaches for identification of candidate genes underlying these barriers, particularly among non-traditional model systems. We employ powerful proteomic and genomic strategies to study the genetic basis of conspecific pollen precedence, an important component of pre-zygotic reproductive isolation among yellow monkeyflowers (Mimulus spp.) resulting from male pollen competition. We use isotopic labeling in combination with shotgun proteomics to identify more than 2,000 male function (pollen tube) proteins within maternal reproductive structures (styles) of M. guttatus flowers where pollen competition occurs. We then sequence array-captured pollen tube exomes from a large outcrossing population of M. guttatus, and identify those genes with evidence of selective sweeps or balancing selection consistent with their role in pollen competition. We also test for evidence of positive selection on these genes more broadly across yellow monkeyflowers, because a signal of adaptive divergence is a common feature of genes causing reproductive isolation. Together the molecular evolution studies identify 159 pollen tube proteins that are candidate genes for conspecific pollen precedence. Our work demonstrates how powerful proteomic and genomic tools can be readily adapted to non-traditional model systems, allowing for genome-wide screens towards the goal of identifying the molecular basis of genetically complex traits.
Trait variation and genetic diversity in a banana genomic selection training population
Nyine, Moses; Uwimana, Brigitte; Swennen, Rony; Batte, Michael; Brown, Allan; Christelová, Pavla; Hřibová, Eva; Lorenzen, Jim
2017-01-01
Banana (Musa spp.) is an important crop in the African Great Lakes region in terms of income and food security, with the highest per capita consumption worldwide. Pests, diseases and climate change hamper sustainable production of bananas. New breeding tools with increased crossbreeding efficiency are being investigated to breed for resistant, high yielding hybrids of East African Highland banana (EAHB). These include genomic selection (GS), which will benefit breeding through increased genetic gain per unit time. Understanding trait variation and the correlation among economically important traits is an essential first step in the development and selection of suitable GS models for banana. In this study, we tested the hypothesis that trait variations in bananas are not affected by cross combination, cycle, field management and their interaction with genotype. A training population created using EAHB breeding material and its progeny was phenotyped in two contrasting conditions. A high level of correlation among vegetative and yield related traits was observed. Therefore, genomic selection models could be developed for traits that are easily measured. It is likely that the predictive ability of traits that are difficult to phenotype will be similar to less difficult traits they are highly correlated with. Genotype response to cycle and field management practices varied greatly with respect to traits. Yield related traits accounted for 31–35% of principal component variation under low and high input field management conditions. Resistance to Black Sigatoka was stable across cycles but varied under different field management depending on the genotype. The best cross combination was 1201K-1xSH3217 based on selection response (R) of hybrids. Genotyping using simple sequence repeat (SSR) markers revealed that the training population was genetically diverse, reflecting a complex pedigree background, which was mostly influenced by the male parents. PMID:28586365
Trait variation and genetic diversity in a banana genomic selection training population.
Nyine, Moses; Uwimana, Brigitte; Swennen, Rony; Batte, Michael; Brown, Allan; Christelová, Pavla; Hřibová, Eva; Lorenzen, Jim; Doležel, Jaroslav
2017-01-01
Banana (Musa spp.) is an important crop in the African Great Lakes region in terms of income and food security, with the highest per capita consumption worldwide. Pests, diseases and climate change hamper sustainable production of bananas. New breeding tools with increased crossbreeding efficiency are being investigated to breed for resistant, high yielding hybrids of East African Highland banana (EAHB). These include genomic selection (GS), which will benefit breeding through increased genetic gain per unit time. Understanding trait variation and the correlation among economically important traits is an essential first step in the development and selection of suitable GS models for banana. In this study, we tested the hypothesis that trait variations in bananas are not affected by cross combination, cycle, field management and their interaction with genotype. A training population created using EAHB breeding material and its progeny was phenotyped in two contrasting conditions. A high level of correlation among vegetative and yield related traits was observed. Therefore, genomic selection models could be developed for traits that are easily measured. It is likely that the predictive ability of traits that are difficult to phenotype will be similar to less difficult traits they are highly correlated with. Genotype response to cycle and field management practices varied greatly with respect to traits. Yield related traits accounted for 31-35% of principal component variation under low and high input field management conditions. Resistance to Black Sigatoka was stable across cycles but varied under different field management depending on the genotype. The best cross combination was 1201K-1xSH3217 based on selection response (R) of hybrids. Genotyping using simple sequence repeat (SSR) markers revealed that the training population was genetically diverse, reflecting a complex pedigree background, which was mostly influenced by the male parents.
Stochastic model search with binary outcomes for genome-wide association studies.
Russu, Alberto; Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo
2012-06-01
The spread of case-control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Signatures of natural selection and ecological differentiation in microbial genomes.
Shapiro, B Jesse
2014-01-01
We live in a microbial world. Most of the genetic and metabolic diversity that exists on earth - and has existed for billions of years - is microbial. Making sense of this vast diversity is a daunting task, but one that can be approached systematically by analyzing microbial genome sequences. This chapter explores how the evolutionary forces of recombination and selection act to shape microbial genome sequences, leaving signatures that can be detected using comparative genomics and population-genetic tests for selection. I describe the major classes of tests, paying special attention to their relative strengths and weaknesses when applied to microbes. Specifically, I apply a suite of tests for selection to a set of closely-related bacterial genomes with different microhabitat preferences within the marine water column, shedding light on the genomic mechanisms of ecological differentiation in the wild. I will focus on the joint problem of simultaneously inferring the boundaries between microbial populations, and the selective forces operating within and between populations.
Genomic prediction in a nuclear population of layers using single-step models.
Yan, Yiyuan; Wu, Guiqin; Liu, Aiqiao; Sun, Congjiao; Han, Wenpeng; Li, Guangqi; Yang, Ning
2018-02-01
Single-step genomic prediction method has been proposed to improve the accuracy of genomic prediction by incorporating information of both genotyped and ungenotyped animals. The objective of this study is to compare the prediction performance of single-step model with a 2-step models and the pedigree-based models in a nuclear population of layers. A total of 1,344 chickens across 4 generations were genotyped by a 600 K SNP chip. Four traits were analyzed, i.e., body weight at 28 wk (BW28), egg weight at 28 wk (EW28), laying rate at 38 wk (LR38), and Haugh unit at 36 wk (HU36). In predicting offsprings, individuals from generation 1 to 3 were used as training data and females from generation 4 were used as validation set. The accuracies of predicted breeding values by pedigree BLUP (PBLUP), genomic BLUP (GBLUP), SSGBLUP and single-step blending (SSBlending) were compared for both genotyped and ungenotyped individuals. For genotyped females, GBLUP performed no better than PBLUP because of the small size of training data, while the 2 single-step models predicted more accurately than the PBLUP model. The average predictive ability of SSGBLUP and SSBlending were 16.0% and 10.8% higher than the PBLUP model across traits, respectively. Furthermore, the predictive abilities for ungenotyped individuals were also enhanced. The average improvements of prediction abilities were 5.9% and 1.5% for SSGBLUP and SSBlending model, respectively. It was concluded that single-step models, especially the SSGBLUP model, can yield more accurate prediction of genetic merits and are preferable for practical implementation of genomic selection in layers. © 2017 Poultry Science Association Inc.
A first generation BAC-based physical map of the rainbow trout genome
Palti, Yniv; Luo, Ming-Cheng; Hu, Yuqin; Genet, Carine; You, Frank M; Vallejo, Roger L; Thorgaard, Gary H; Wheeler, Paul A; Rexroad, Caird E
2009-01-01
Background Rainbow trout (Oncorhynchus mykiss) are the most-widely cultivated cold freshwater fish in the world and an important model species for many research areas. Coupling great interest in this species as a research model with the need for genetic improvement of aquaculture production efficiency traits justifies the continued development of genomics research resources. Many quantitative trait loci (QTL) have been identified for production and life-history traits in rainbow trout. A bacterial artificial chromosome (BAC) physical map is needed to facilitate fine mapping of QTL and the selection of positional candidate genes for incorporation in marker-assisted selection (MAS) for improving rainbow trout aquaculture production. This resource will also facilitate efforts to obtain and assemble a whole-genome reference sequence for this species. Results The physical map was constructed from DNA fingerprinting of 192,096 BAC clones using the 4-color high-information content fingerprinting (HICF) method. The clones were assembled into physical map contigs using the finger-printing contig (FPC) program. The map is composed of 4,173 contigs and 9,379 singletons. The total number of unique fingerprinting fragments (consensus bands) in contigs is 1,185,157, which corresponds to an estimated physical length of 2.0 Gb. The map assembly was validated by 1) comparison with probe hybridization results and agarose gel fingerprinting contigs; and 2) anchoring large contigs to the microsatellite-based genetic linkage map. Conclusion The production and validation of the first BAC physical map of the rainbow trout genome is described in this paper. We are currently integrating this map with the NCCCWA genetic map using more than 200 microsatellites isolated from BAC end sequences and by identifying BACs that harbor more than 300 previously mapped markers. The availability of an integrated physical and genetic map will enable detailed comparative genome analyses, fine mapping of QTL, positional cloning, selection of positional candidate genes for economically important traits and the incorporation of MAS into rainbow trout breeding programs. PMID:19814815
Kogelman, Lisette J A; Zhernakova, Daria V; Westra, Harm-Jan; Cirera, Susanna; Fredholm, Merete; Franke, Lude; Kadarmideen, Haja N
2015-10-20
Obesity is a multi-factorial health problem in which genetic factors play an important role. Limited results have been obtained in single-gene studies using either genomic or transcriptomic data. RNA sequencing technology has shown its potential in gaining accurate knowledge about the transcriptome, and may reveal novel genes affecting complex diseases. Integration of genomic and transcriptomic variation (expression quantitative trait loci [eQTL] mapping) has identified causal variants that affect complex diseases. We integrated transcriptomic data from adipose tissue and genomic data from a porcine model to investigate the mechanisms involved in obesity using a systems genetics approach. Using a selective gene expression profiling approach, we selected 36 animals based on a previously created genomic Obesity Index for RNA sequencing of subcutaneous adipose tissue. Differential expression analysis was performed using the Obesity Index as a continuous variable in a linear model. eQTL mapping was then performed to integrate 60 K porcine SNP chip data with the RNA sequencing data. Results were restricted based on genome-wide significant single nucleotide polymorphisms, detected differentially expressed genes, and previously detected co-expressed gene modules. Further data integration was performed by detecting co-expression patterns among eQTLs and integration with protein data. Differential expression analysis of RNA sequencing data revealed 458 differentially expressed genes. The eQTL mapping resulted in 987 cis-eQTLs and 73 trans-eQTLs (false discovery rate < 0.05), of which the cis-eQTLs were associated with metabolic pathways. We reduced the eQTL search space by focusing on differentially expressed and co-expressed genes and disease-associated single nucleotide polymorphisms to detect obesity-related genes and pathways. Building a co-expression network using eQTLs resulted in the detection of a module strongly associated with lipid pathways. Furthermore, we detected several obesity candidate genes, for example, ENPP1, CTSL, and ABHD12B. To our knowledge, this is the first study to perform an integrated genomics and transcriptomics (eQTL) study using, and modeling, genomic and subcutaneous adipose tissue RNA sequencing data on obesity in a porcine model. We detected several pathways and potential causal genes for obesity. Further validation and investigation may reveal their exact function and association with obesity.
Non-additive Effects in Genomic Selection
Varona, Luis; Legarra, Andres; Toro, Miguel A.; Vitezica, Zulma G.
2018-01-01
In the last decade, genomic selection has become a standard in the genetic evaluation of livestock populations. However, most procedures for the implementation of genomic selection only consider the additive effects associated with SNP (Single Nucleotide Polymorphism) markers used to calculate the prediction of the breeding values of candidates for selection. Nevertheless, the availability of estimates of non-additive effects is of interest because: (i) they contribute to an increase in the accuracy of the prediction of breeding values and the genetic response; (ii) they allow the definition of mate allocation procedures between candidates for selection; and (iii) they can be used to enhance non-additive genetic variation through the definition of appropriate crossbreeding or purebred breeding schemes. This study presents a review of methods for the incorporation of non-additive genetic effects into genomic selection procedures and their potential applications in the prediction of future performance, mate allocation, crossbreeding, and purebred selection. The work concludes with a brief outline of some ideas for future lines of that may help the standard inclusion of non-additive effects in genomic selection. PMID:29559995
Non-additive Effects in Genomic Selection.
Varona, Luis; Legarra, Andres; Toro, Miguel A; Vitezica, Zulma G
2018-01-01
In the last decade, genomic selection has become a standard in the genetic evaluation of livestock populations. However, most procedures for the implementation of genomic selection only consider the additive effects associated with SNP (Single Nucleotide Polymorphism) markers used to calculate the prediction of the breeding values of candidates for selection. Nevertheless, the availability of estimates of non-additive effects is of interest because: (i) they contribute to an increase in the accuracy of the prediction of breeding values and the genetic response; (ii) they allow the definition of mate allocation procedures between candidates for selection; and (iii) they can be used to enhance non-additive genetic variation through the definition of appropriate crossbreeding or purebred breeding schemes. This study presents a review of methods for the incorporation of non-additive genetic effects into genomic selection procedures and their potential applications in the prediction of future performance, mate allocation, crossbreeding, and purebred selection. The work concludes with a brief outline of some ideas for future lines of that may help the standard inclusion of non-additive effects in genomic selection.
Lo, Chiao-Ling; Lossie, Amy C; Liang, Tiebing; Liu, Yunlong; Xuei, Xiaoling; Lumeng, Lawrence; Zhou, Feng C; Muir, William M
2016-08-01
Investigations on the influence of nature vs. nurture on Alcoholism (Alcohol Use Disorder) in human have yet to provide a clear view on potential genomic etiologies. To address this issue, we sequenced a replicated animal model system bidirectionally-selected for alcohol preference (AP). This model is uniquely suited to map genetic effects with high reproducibility, and resolution. The origin of the rat lines (an 8-way cross) resulted in small haplotype blocks (HB) with a corresponding high level of resolution. We sequenced DNAs from 40 samples (10 per line of each replicate) to determine allele frequencies and HB. We achieved ~46X coverage per line and replicate. Excessive differentiation in the genomic architecture between lines, across replicates, termed signatures of selection (SS), were classified according to gene and region. We identified SS in 930 genes associated with AP. The majority (50%) of the SS were confined to single gene regions, the greatest numbers of which were in promoters (284) and intronic regions (169) with the least in exon's (4), suggesting that differences in AP were primarily due to alterations in regulatory regions. We confirmed previously identified genes and found many new genes associated with AP. Of those newly identified genes, several demonstrated neuronal function involved in synaptic memory and reward behavior, e.g. ion channels (Kcnf1, Kcnn3, Scn5a), excitatory receptors (Grin2a, Gria3, Grip1), neurotransmitters (Pomc), and synapses (Snap29). This study not only reveals the polygenic architecture of AP, but also emphasizes the importance of regulatory elements, consistent with other complex traits.
Coi, A L; Bigey, F; Mallet, S; Marsit, S; Zara, G; Gladieux, P; Galeote, V; Budroni, M; Dequin, S; Legras, J L
2017-04-01
The molecular and evolutionary processes underlying fungal domestication remain largely unknown despite the importance of fungi to bioindustry and for comparative adaptation genomics in eukaryotes. Wine fermentation and biological ageing are performed by strains of S. cerevisiae with, respectively, pelagic fermentative growth on glucose and biofilm aerobic growth utilizing ethanol. Here, we use environmental samples of wine and flor yeasts to investigate the genomic basis of yeast adaptation to contrasted anthropogenic environments. Phylogenetic inference and population structure analysis based on single nucleotide polymorphisms revealed a group of flor yeasts separated from wine yeasts. A combination of methods revealed several highly differentiated regions between wine and flor yeasts, and analyses using codon-substitution models for detecting molecular adaptation identified sites under positive selection in the high-affinity transporter gene ZRT1. The cross-population composite likelihood ratio revealed selective sweeps at three regions, including in the hexose transporter gene HXT7, the yapsin gene YPS6 and the membrane protein coding gene MTS27. Our analyses also revealed that the biological ageing environment has led to the accumulation of numerous mutations in proteins from several networks, including Flo11 regulation and divalent metal transport. Together, our findings suggest that the tuning of FLO11 expression and zinc transport networks are a distinctive feature of the genetic changes underlying the domestication of flor yeasts. Our study highlights the multiplicity of genomic changes underlying yeast adaptation to man-made habitats and reveals that flor/wine yeast lineage can serve as a useful model for studying the genomics of adaptive divergence. © 2017 John Wiley & Sons Ltd.
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.
Xu, Wenxuan; Zhang, Li; Lu, Yaping
2016-06-01
The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Comparative analysis and visualization of multiple collinear genomes
2012-01-01
Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
Park, Seongjun; Ruhlman, Tracey A; Weng, Mao-Lun; Hajrah, Nahid H; Sabir, Jamal S M; Jansen, Robert K
2017-06-01
Geraniaceae have emerged as a model system for investigating the causes and consequences of variation in plastid and mitochondrial genomes. Incredible structural variation in plastid genomes (plastomes) and highly accelerated evolutionary rates have been reported in selected lineages and functional groups of genes in both plastomes and mitochondrial genomes (mitogenomes), and these phenomena have been implicated in cytonuclear incompatibility. Previous organelle genome studies have included limited sampling of Geranium, the largest genus in the family with over 400 species. This study reports on rates and patterns of nucleotide substitutions in plastomes and mitogenomes of 17 species of Geranium and representatives of other Geraniaceae. As detected across other angiosperms, substitution rates in the plastome are 3.5 times higher than the mitogenome in most Geranium. However, in the branch leading to Geranium brycei/Geranium incanum mitochondrial genes experienced significantly higher dN and dS than plastid genes, a pattern that has only been detected in one other angiosperm. Furthermore, rate accelerations differ in the two organelle genomes with plastomes having increased dN and mitogenomes with increased dS. In the Geranium phaeum/Geranium reflexum clade, duplicate copies of clpP and rpoA genes that experienced asymmetric rate divergence were detected in the single copy region of the plastome. In the case of rpoA, the branch leading to G. phaeum/G. reflexum experienced positive selection or relaxation of purifying selection. Finally, the evolution of acetyl-CoA carboxylase is unusual in Geraniaceae because it is only the second angiosperm family where both prokaryotic and eukaryotic ACCases functionally coexist in the plastid. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Park, Seongjun; Ruhlman, Tracey A.; Weng, Mao-Lun; Hajrah, Nahid H.; Sabir, Jamal S.M.
2017-01-01
Abstract Geraniaceae have emerged as a model system for investigating the causes and consequences of variation in plastid and mitochondrial genomes. Incredible structural variation in plastid genomes (plastomes) and highly accelerated evolutionary rates have been reported in selected lineages and functional groups of genes in both plastomes and mitochondrial genomes (mitogenomes), and these phenomena have been implicated in cytonuclear incompatibility. Previous organelle genome studies have included limited sampling of Geranium, the largest genus in the family with over 400 species. This study reports on rates and patterns of nucleotide substitutions in plastomes and mitogenomes of 17 species of Geranium and representatives of other Geraniaceae. As detected across other angiosperms, substitution rates in the plastome are 3.5 times higher than the mitogenome in most Geranium. However, in the branch leading to Geranium brycei/Geranium incanum mitochondrial genes experienced significantly higher dN and dS than plastid genes, a pattern that has only been detected in one other angiosperm. Furthermore, rate accelerations differ in the two organelle genomes with plastomes having increased dN and mitogenomes with increased dS. In the Geranium phaeum/Geranium reflexum clade, duplicate copies of clpP and rpoA genes that experienced asymmetric rate divergence were detected in the single copy region of the plastome. In the case of rpoA, the branch leading to G. phaeum/G. reflexum experienced positive selection or relaxation of purifying selection. Finally, the evolution of acetyl-CoA carboxylase is unusual in Geraniaceae because it is only the second angiosperm family where both prokaryotic and eukaryotic ACCases functionally coexist in the plastid. PMID:28854633
Hornett, Emily A; Moran, Bruce; Reynolds, Louise A; Charlat, Sylvain; Tazzyman, Samuel; Wedell, Nina; Jiggins, Chris D; Hurst, Greg D D
2014-12-01
Symbionts that distort their host's sex ratio by favouring the production and survival of females are common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to parity, typified by the rapid spread of host 'suppressor' loci that restore male survival/development. In this study, we investigated the genomic impact of a selective event of this kind in the butterfly Hypolimnas bolina. Through linkage mapping, we first identified a genomic region that was necessary for males to survive Wolbachia-induced male-killing. We then investigated the genomic impact of the rapid spread of suppression, which converted the Samoan population of this butterfly from a 100:1 female-biased sex ratio in 2001 to a 1:1 sex ratio by 2006. Models of this process revealed the potential for a chromosome-wide effect. To measure the impact of this episode of selection directly, the pattern of genetic variation before and after the spread of suppression was compared. Changes in allele frequencies were observed over a 25 cM region surrounding the suppressor locus, with a reduction in overall diversity observed at loci that co-segregate with the suppressor. These changes exceeded those expected from drift and occurred alongside the generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that the suppressor was likely to have been introduced via immigration rather than through de novo mutation. In addition, further sampling in 2010 indicated that many of the introduced variants were lost or had declined in frequency since 2006. We hypothesize that this loss may have resulted from a period of purifying selection, removing deleterious material that introgressed during the initial sweep. Our observations of the impact of suppression of sex ratio distorting activity reveal a very wide genomic imprint, reflecting its status as one of the strongest selective forces in nature.
2013-01-01
Background Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed. Results A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits. The selection regions were not distributed randomly or uniformly throughout the genome. Instead, clusters of selection hotspots in certain genomic regions were observed. Moreover, a set of candidate genes (4.38% of the total annotated genes) significantly affected by selection underlying soybean domestication and genetic improvement were identified. Conclusions Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes/loci underlying agronomically important traits. PMID:23984715
Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji
2018-05-14
Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, K.
2016-01-11
The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less
van Geest, Geert; Voorrips, Roeland E; Esselink, Danny; Post, Aike; Visser, Richard Gf; Arens, Paul
2017-08-07
Cultivated chrysanthemum is an outcrossing hexaploid (2n = 6× = 54) with a disputed mode of inheritance. In this paper, we present a single nucleotide polymorphism (SNP) selection pipeline that was used to design an Affymetrix Axiom array with 183 k SNPs from RNA sequencing data (1). With this array, we genotyped four bi-parental populations (with sizes of 405, 53, 76 and 37 offspring plants respectively), and a cultivar panel of 63 genotypes. Further, we present a method for dosage scoring in hexaploids from signal intensities of the array based on mixture models (2) and validation of selection steps in the SNP selection pipeline (3). The resulting genotypic data is used to draw conclusions on the mode of inheritance in chrysanthemum (4), and to make an inference on allelic expression bias (5). With use of the mixture model approach, we successfully called the dosage of 73,936 out of 183,130 SNPs (40.4%) that segregated in any of the bi-parental populations. To investigate the mode of inheritance, we analysed markers that segregated in the large bi-parental population (n = 405). Analysis of segregation of duplex x nulliplex SNPs resulted in evidence for genome-wide hexasomic inheritance. This evidence was substantiated by the absence of strong linkage between markers in repulsion, which indicated absence of full disomic inheritance. We present the success rate of SNP discovery out of RNA sequencing data as affected by different selection steps, among which SNP coverage over genotypes and use of different types of sequence read mapping software. Genomic dosage highly correlated with relative allele coverage from the RNA sequencing data, indicating that most alleles are expressed according to their genomic dosage. The large population, genotyped with a very large number of markers, is a unique framework for extensive genetic analyses in hexaploid chrysanthemum. As starting point, we show conclusive evidence for genome-wide hexasomic inheritance.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits
van Zanten, Martijn
2015-01-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation. PMID:26496492
Evolution and the complexity of bacteriophages.
Serwer, Philip
2007-03-13
The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.
Li, Q G; Wadell, G
1988-01-01
Restriction endonucleases BamHI, BclI, BglI, BglII, BstEII, EcoRI, HindIII, HpaI, SalI, SmalI, XbalI, and XholI were used to analyze 61 selected strains of adenovirus type 3 (Ad3) isolated from Africa, Asia, Australia, Europe, North America, and South America. It was noted that the use of BamHI, BclI, BglII, HpaI, SalI, and SmaI was sufficient to distinguish 17 genome types; 13 of them were newly identified. All 17 Ad3 genome types could be divided into three genomic clusters. Genome types of Ad3 cluster 1 occurred in Africa, Europe, South America, and North America. Genomic cluster 2 was identified in Africa; genomic cluster 3 was identified in Africa, Asia, Australia, Europe (a few), and North America. This was of interest because 15 identified genome types of Ad7 could also be divided into three genomic clusters. The degree of genetic relatedness between the 17 Ad3 and the 15 Ad7 genome types was analyzed and was expressed in a three-dimensional model. Images PMID:2838500
Dynamics of Genome Rearrangement in Bacterial Populations
Darling, Aaron E.; Miklós, István; Ragan, Mark A.
2008-01-01
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes. PMID:18650965
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Genomic Signature of Kin Selection in an Ant with Obligately Sterile Workers
Warner, Michael R.; Mikheyev, Alexander S.
2017-01-01
Abstract Kin selection is thought to drive the evolution of cooperation and conflict, but the specific genes and genome-wide patterns shaped by kin selection are unknown. We identified thousands of genes associated with the sterile ant worker caste, the archetype of an altruistic phenotype shaped by kin selection, and then used population and comparative genomic approaches to study patterns of molecular evolution at these genes. Consistent with population genetic theoretical predictions, worker-upregulated genes experienced reduced selection compared with genes upregulated in reproductive castes. Worker-upregulated genes included more taxonomically restricted genes, indicating that the worker caste has recruited more novel genes, yet these genes also experienced reduced selection. Our study identifies a putative genomic signature of kin selection and helps to integrate emerging sociogenomic data with longstanding social evolution theory. PMID:28419349
A decade of pig genome sequencing: a window on pig domestication and evolution.
Groenen, Martien A M
2016-03-29
Insight into how genomes change and adapt due to selection addresses key questions in evolutionary biology and in domestication of animals and plants by humans. In that regard, the pig and its close relatives found in Africa and Eurasia represent an excellent group of species that enables studies of the effect of both natural and human-mediated selection on the genome. The recent completion of the draft genome sequence of a domestic pig and the development of next-generation sequencing technology during the past decade have created unprecedented possibilities to address these questions in great detail. In this paper, I review recent whole-genome sequencing studies in the pig and closely-related species that provide insight into the demography, admixture and selection of these species and, in particular, how domestication and subsequent selection of Sus scrofa have shaped the genomes of these animals.
Zou, Meng; Liu, Zhaoqi; Zhang, Xiang-Sun; Wang, Yong
2015-10-15
In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. ywang@amss.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Stephan, Wolfgang
2016-01-01
In the past 15 years, numerous methods have been developed to detect selective sweeps underlying adaptations. These methods are based on relatively simple population genetic models, including one or two loci at which positive directional selection occurs, and one or two marker loci at which the impact of selection on linked neutral variation is quantified. Information about the phenotype under selection is not included in these models (except for fitness). In contrast, in the quantitative genetic models of adaptation, selection acts on one or more phenotypic traits, such that a genotype-phenotype map is required to bridge the gap to population genetics theory. Here I describe the range of population genetic models from selective sweeps in a panmictic population of constant size to evolutionary traffic when simultaneous sweeps at multiple loci interfere, and I also consider the case of polygenic selection characterized by subtle allele frequency shifts at many loci. Furthermore, I present an overview of the statistical tests that have been proposed based on these population genetics models to detect evidence for positive selection in the genome. © 2015 John Wiley & Sons Ltd.
Tollis, Marc; Hutchins, Elizabeth D; Stapley, Jessica; Rupp, Shawn M; Eckalbar, Walter L; Maayan, Inbar; Lasku, Eris; Infante, Carlos R; Dennis, Stuart R; Robertson, Joel A; May, Catherine M; Bermingham, Eldredge; DeNardo, Dale F; Hsieh, Shi-Tong Tonia; Kulathinal, Rob J; McMillan, William Owen; Menke, Douglas B; Pratt, Stephen C; Rawls, Jeffery Alan; Sanjur, Oris; Wilson-Rawls, Jeanne; Wilson Sayres, Melissa A; Fisher, Rebecca E
2018-01-01
Abstract Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation. Here, we sequence and assemble the draft genomes of three anole species—Anolis frenatus, Anolis auratus, and Anolis apletophallus—for comparison with the available reference genome of Anolis carolinensis. Comparative analyses reveal a rapid background rate of molecular evolution consistent with a model of punctuated equilibrium, and strong purifying selection on functional genomic elements in anoles. We find evidence for accelerated evolution in genes involved in behavior, sensory perception, and reproduction, as well as in genes regulating limb bud development and hindlimb specification. Morphometric analyses of anole fore and hindlimbs corroborated these findings. We detect signatures of positive selection across several genes related to the development and regulation of the forebrain, hormones, and the iguanian lizard dewlap, suggesting molecular changes underlying behavioral adaptations known to reinforce species boundaries were a key component in the diversification of anole lizards. PMID:29360978
NASA Astrophysics Data System (ADS)
Hei, T. K.; Piao, C. Q.; Wu, L. J.; Willey, J. C.; Hall, E. J.
1998-11-01
Carcinogenesis is postulated to be a progressive multistage process characterized by an increase in genomic instability and clonal selection with each mutational event endowing a selective growth advantage. Genomic instability as manifested by the amplification of specific gene fragments is common among tumor and transformed cells. In the present study, immortalized human bronchial (BEP2D) cells were irradiated with graded doses of either 1GeV/nucleon 56Fe ions or 150 keV/μm alpha particles. Transformed cells developed through a series of successive steps before becoming tumorigenic in nude mice. Tumorigenic cells showed neither ras mutations nor deletion in the p16 tumor suppressor gene. In contrast, they harbored mutations in the p53 gene and over-expressed cyclin D1. Genomic instability among transformed cells at various stage of the carcinogenic process was examined based on frequencies of PALA resistance. Incidence of genomic instability was highest among established tumor cell lines relative to transformed, non-tumorigenic and control cell lines. Treatment of BEP2D cells with a 4 mM dose of the aminothiol WR-1065 significantly reduced their neoplastic transforming response to 56Fe particles. This model provides an opportunity to study the cellular and molecular mechanisms involved in malignant transformation of human epithelial cells by heavy ions.
Mousel, Michelle R.; Reynolds, James O.; White, Stephen N.
2015-01-01
Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10-5) were identified including markers in or near PIK3CB (P = 2.22x10-6; additive model), KCNB1 (P = 2.93x10-6; dominance model), ZC3H12C (P = 3.25x10-6; genotypic model), JPH1 (P = 4.68x20-6; genotypic model), and MYO3B (P = 5.74x10-6; recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection. PMID:26098909
Mousel, Michelle R; Reynolds, James O; White, Stephen N
2015-01-01
Entropion is an inward rolling of the eyelid allowing contact between the eyelashes and cornea that may lead to blindness if not corrected. Although many mammalian species, including humans and dogs, are afflicted by congenital entropion, no specific genes or gene regions related to development of entropion have been reported in any mammalian species to date. Entropion in domestic sheep is known to have a genetic component therefore, we used domestic sheep as a model system to identify genomic regions containing genes associated with entropion. A genome-wide association was conducted with congenital entropion in 998 Columbia, Polypay, and Rambouillet sheep genotyped with 50,000 SNP markers. Prevalence of entropion was 6.01%, with all breeds represented. Logistic regression was performed in PLINK with additive allelic, recessive, dominant, and genotypic inheritance models. Two genome-wide significant (empirical P<0.05) SNP were identified, specifically markers in SLC2A9 (empirical P = 0.007; genotypic model) and near NLN (empirical P = 0.026; dominance model). Six additional genome-wide suggestive SNP (nominal P<1x10(-5)) were identified including markers in or near PIK3CB (P = 2.22x10(-6); additive model), KCNB1 (P = 2.93x10(-6); dominance model), ZC3H12C (P = 3.25x10(-6); genotypic model), JPH1 (P = 4.68x20(-6); genotypic model), and MYO3B (P = 5.74x10(-6); recessive model). This is the first report of specific gene regions associated with congenital entropion in any mammalian species, to our knowledge. Further, none of these genes have previously been associated with any eyelid traits. These results represent the first genome-wide analysis of gene regions associated with entropion and provide target regions for the development of sheep genetic markers for marker-assisted selection.
Xu, Shuhua
2015-01-01
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution. PMID:26053627
Ecological genomics of natural plant populations: the Israeli perspective.
Nevo, Eviatar
2009-01-01
The genomic era revolutionized evolutionary population biology. The ecological genomics of the wild progenitors of wheat and barley reviewed here was central in the research program of the Institute of Evolution, University of Haifa, since 1975 ( http://evolution.haifa.ac.il ). We explored the following questions: (1) How much of the genomic and phenomic diversity of wild progenitors of cultivars (wild emmer wheat, Triticum dicoccoides, the progenitor of most wheat, plus wild relatives of the Aegilops species; wild barley, Hordeum spontaneum, the progenitor of cultivated barley; wild oat, Avena sterilis, the progenitor of cultivated oats; and wild lettuce species, Lactuca, the progenitor and relatives of cultivated lettuce) are adaptive and processed by natural selection at both coding and noncoding genomic regions? (2) What is the origin and evolution of genomic adaptation and speciation processes and their regulation by mutation, recombination, and transposons under spatiotemporal variables and stressful macrogeographic and microgeographic environments? (3) How much genetic resources are harbored in the wild progenitors for crop improvement? We advanced ecological genetics into ecological genomics and analyzed (regionally across Israel and the entire Near East Fertile Crescent and locally at microsites, focusing on the "Evolution Canyon" model) hundreds of populations and thousands of genotypes for protein (allozyme) and deoxyribonucleic acid (DNA) (coding and noncoding) diversity, partly combined with phenotypic diversity. The environmental stresses analyzed included abiotic (climatic and microclimatic, edaphic) and biotic (pathogens, demographic) stresses. Recently, we introduced genetic maps, cloning, and transformation of candidate genes. Our results indicate abundant genotypic and phenotypic diversity in natural plant populations. The organization and evolution of molecular and organismal diversity in plant populations, at all genomic regions and geographical scales, are nonrandom and are positively correlated with, and partly predictable by, abiotic and biotic environmental heterogeneity and stress. Biodiversity evolution, even in small isolated populations, is primarily driven by natural selection including diversifying, balancing, cyclical, and purifying selection regimes interacting with, but, ultimately, overriding the effects of mutation, migration, and stochasticity. The progenitors of cultivated plants harbor rich genetic resources and are the best hope for crop improvement by both classical and modern biotechnological methods. Future studies should focus on the interplay between structural and functional genome organization focusing on gene regulation.
Evaluation of redundancy analysis to identify signatures of local adaptation.
Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric
2018-05-26
Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Watanabe, Takahito; Noji, Sumihare; Mito, Taro
2016-01-01
Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms.
Conditions for the Evolution of Gene Clusters in Bacterial Genomes
Ballouz, Sara; Francis, Andrew R.; Lan, Ruiting; Tanaka, Mark M.
2010-01-01
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through a variety of mechanisms, such as increased efficiency of coregulation. An alternative and controversial hypothesis is the selfish operon model, which asserts that clustered arrangements of genes are more easily transferred to other species, thus improving the prospects for survival of the cluster. According to another hypothesis (the persistence model), genes that are in close proximity are less likely to be disrupted by deletions. Here we develop computational models to study the conditions under which gene clusters can evolve and persist. First, we examine the selfish operon model by re-implementing the simulation and running it under a wide range of conditions. Second, we introduce and study a Moran process in which there is natural selection for gene clustering and rearrangement occurs by genome inversion events. Finally, we develop and study a model that includes selection and inversion, which tracks the occurrence and fixation of rearrangements. Surprisingly, gene clusters fail to evolve under a wide range of conditions. Factors that promote the evolution of gene clusters include a low number of genes in the pathway, a high population size, and in the case of the selfish operon model, a high horizontal transfer rate. The computational analysis here has shown that the evolution of gene clusters can occur under both direct and indirect selection as long as certain conditions hold. Under these conditions the selfish operon model is still viable as an explanation for the evolution of gene clusters. PMID:20168992
Aris-Brosou, Stéphane; Bielawski, Joseph P
2006-08-15
A popular approach to examine the roles of mutation and selection in the evolution of genomes has been to consider the relationship between codon bias and synonymous rates of molecular evolution. A significant relationship between these two quantities is taken to indicate the action of weak selection on substitutions among synonymous codons. The neutral theory predicts that the rate of evolution is inversely related to the level of functional constraint. Therefore, selection against the use of non-preferred codons among those coding for the same amino acid should result in lower rates of synonymous substitution as compared with sites not subject to such selection pressures. However, reliably measuring the extent of such a relationship is problematic, as estimates of synonymous rates are sensitive to our assumptions about the process of molecular evolution. Previous studies showed the importance of accounting for unequal codon frequencies, in particular when synonymous codon usage is highly biased. Yet, unequal codon frequencies can be modeled in different ways, making different assumptions about the mutation process. Here we conduct a simulation study to evaluate two different ways of modeling uneven codon frequencies and show that both model parameterizations can have a dramatic impact on rate estimates and affect biological conclusions about genome evolution. We reanalyze three large data sets to demonstrate the relevance of our results to empirical data analysis.
Prospects for Genomic Selection in Cassava Breeding.
Wolfe, Marnin D; Del Carpio, Dunia Pino; Alabi, Olumide; Ezenwaka, Lydia C; Ikeogu, Ugochukwu N; Kayondo, Ismail S; Lozano, Roberto; Okeke, Uche G; Ozimati, Alfred A; Williams, Esuma; Egesi, Chiedozie; Kawuki, Robert S; Kulakow, Peter; Rabbi, Ismail Y; Jannink, Jean-Luc
2017-11-01
Cassava ( Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) has been implemented at three breeding institutions in Africa to reduce cycle times. Initial studies provided promising estimates of predictive abilities. Here, we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: cross-validation within populations, cross-population prediction and cross-generation prediction. We also evaluated the impact of increasing the training population (TP) size by phenotyping progenies selected either at random or with a genetic algorithm. Cross-validation results were mostly consistent across programs, with nonadditive models predicting of 10% better on average. Cross-population accuracy was generally low (mean = 0.18) but prediction of cassava mosaic disease increased up to 57% in one Nigerian population when data from another related population were combined. Accuracy across generations was poorer than within-generation accuracy, as expected, but accuracy for dry matter content and mosaic disease severity should be sufficient for rapid-cycling GS. Selection of a prediction model made some difference across generations, but increasing TP size was more important. With a genetic algorithm, selection of one-third of progeny could achieve an accuracy equivalent to phenotyping all progeny. We are in the early stages of GS for this crop but the results are promising for some traits. General guidelines that are emerging are that TPs need to continue to grow but phenotyping can be done on a cleverly selected subset of individuals, reducing the overall phenotyping burden. Copyright © 2017 Crop Science Society of America.
USDA-ARS?s Scientific Manuscript database
A reassociation kinetics-based approach was used to reduce the complexity of genomic DNA from the Deutsch laboratory strain of the cattle tick, Rhipicephalus microplus, to facilitate genome sequencing. Selected genomic DNA (Cot value = 660) was sequenced using 454 GS FLX technology, resulting in 356...
Belfield, Eric J.; Gan, Xiangchao; Mithani, Aziz; Brown, Carly; Jiang, Caifu; Franklin, Keara; Alvey, Elizabeth; Wibowo, Anjar; Jung, Marko; Bailey, Kit; Kalwani, Sharan; Ragoussis, Jiannis; Mott, Richard; Harberd, Nicholas P.
2012-01-01
Ionizing radiation has long been known to induce heritable mutagenic change in DNA sequence. However, the genome-wide effect of radiation is not well understood. Here we report the molecular properties and frequency of mutations in phenotypically selected mutant lines isolated following exposure of the genetic model flowering plant Arabidopsis thaliana to fast neutrons (FNs). Previous studies suggested that FNs predominantly induce deletions longer than a kilobase in A. thaliana. However, we found a higher frequency of single base substitution than deletion mutations. While the overall frequency and molecular spectrum of fast-neutron (FN)–induced single base substitutions differed substantially from those of “background” mutations arising spontaneously in laboratory-grown plants, G:C>A:T transitions were favored in both. We found that FN-induced G:C>A:T transitions were concentrated at pyrimidine dinucleotide sites, suggesting that FNs promote the formation of mutational covalent linkages between adjacent pyrimidine residues. In addition, we found that FNs induced more single base than large deletions, and that these single base deletions were possibly caused by replication slippage. Our observations provide an initial picture of the genome-wide molecular profile of mutations induced in A. thaliana by FN irradiation and are particularly informative of the nature and extent of genome-wide mutation in lines selected on the basis of mutant phenotypes from FN-mutagenized A. thaliana populations. PMID:22499668
Genomic Signatures Reveal New Evidences for Selection of Important Traits in Domestic Cattle
Xu, Lingyang; Bickhart, Derek M.; Cole, John B.; Schroeder, Steven G.; Song, Jiuzhou; Tassell, Curtis P. Van; Sonstegard, Tad S.; Liu, George E.
2015-01-01
We investigated diverse genomic selections using high-density single nucleotide polymorphism data of five distinct cattle breeds. Based on allele frequency differences, we detected hundreds of candidate regions under positive selection across Holstein, Angus, Charolais, Brahman, and N'Dama. In addition to well-known genes such as KIT, MC1R, ASIP, GHR, LCORL, NCAPG, WIF1, and ABCA12, we found evidence for a variety of novel and less-known genes under selection in cattle, such as LAP3, SAR1B, LRIG3, FGF5, and NUDCD3. Selective sweeps near LAP3 were then validated by next-generation sequencing. Genome-wide association analysis involving 26,362 Holsteins confirmed that LAP3 and SAR1B were related to milk production traits, suggesting that our candidate regions were likely functional. In addition, haplotype network analyses further revealed distinct selective pressures and evolution patterns across these five cattle breeds. Our results provided a glimpse into diverse genomic selection during cattle domestication, breed formation, and recent genetic improvement. These findings will facilitate genome-assisted breeding to improve animal production and health. PMID:25431480
Pentatricopeptide 336 and mitochondrial sorting in cucumber
USDA-ARS?s Scientific Manuscript database
Cucumber is a unique model plant for organellar genetics because its three genomes show differential transmission: maternal for chloroplast, paternal for mitochondrial and bi-parental for nuclear. A cucumber line has been selected showing a paternally transmitted, strongly mosaic (MSC) phenotype as...
Does genomic selection have a future in plant breeding?
Jonas, Elisabeth; de Koning, Dirk-Jan
2013-09-01
Plant breeding largely depends on phenotypic selection in plots and only for some, often disease-resistance-related traits, uses genetic markers. The more recently developed concept of genomic selection, using a black box approach with no need of prior knowledge about the effect or function of individual markers, has also been proposed as a great opportunity for plant breeding. Several empirical and theoretical studies have focused on the possibility to implement this as a novel molecular method across various species. Although we do not question the potential of genomic selection in general, in this Opinion, we emphasize that genomic selection approaches from dairy cattle breeding cannot be easily applied to complex plant breeding. Copyright © 2013 Elsevier Ltd. All rights reserved.
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.
Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon
2008-07-28
Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Energy efficiency trade-offs drive nucleotide usage in transcribed regions
Chen, Wei-Hua; Lu, Guanting; Bork, Peer; Hu, Songnian; Lercher, Martin J.
2016-01-01
Efficient nutrient usage is a trait under universal selection. A substantial part of cellular resources is spent on making nucleotides. We thus expect preferential use of cheaper nucleotides especially in transcribed sequences, which are often amplified thousand-fold compared with genomic sequences. To test this hypothesis, we derive a mutation-selection-drift equilibrium model for nucleotide skews (strand-specific usage of ‘A' versus ‘T' and ‘G' versus ‘C'), which explains nucleotide skews across 1,550 prokaryotic genomes as a consequence of selection on efficient resource usage. Transcription-related selection generally favours the cheaper nucleotides ‘U' and ‘C' at synonymous sites. However, the information encoded in mRNA is further amplified through translation. Due to unexpected trade-offs in the codon table, cheaper nucleotides encode on average energetically more expensive amino acids. These trade-offs apply to both strand-specific nucleotide usage and GC content, causing a universal bias towards the more expensive nucleotides ‘A' and ‘G' at non-synonymous coding sites. PMID:27098217
The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma
Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M.; Ramírez, Santiago R.
2017-01-01
Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant–insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa, and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. PMID:28701376
The Nuclear and Mitochondrial Genomes of the Facultatively Eusocial Orchid Bee Euglossa dilemma.
Brand, Philipp; Saleh, Nicholas; Pan, Hailin; Li, Cai; Kapheim, Karen M; Ramírez, Santiago R
2017-09-07
Bees provide indispensable pollination services to both agricultural crops and wild plant populations, and several species of bees have become important models for the study of learning and memory, plant-insect interactions, and social behavior. Orchid bees (Apidae: Euglossini) are especially important to the fields of pollination ecology, evolution, and species conservation. Here we report the nuclear and mitochondrial genome sequences of the orchid bee Euglossa dilemma Bembé & Eltz. E. dilemma was selected because it is widely distributed, highly abundant, and it was recently naturalized in the southeastern United States. We provide a high-quality assembly of the 3.3 Gb genome, and an official gene set of 15,904 gene annotations. We find high conservation of gene synteny with the honey bee throughout 80 MY of divergence time. This genomic resource represents the first draft genome of the orchid bee genus Euglossa , and the first draft orchid bee mitochondrial genome, thus representing a valuable resource to the research community. Copyright © 2017 Brand et al.
Assessing non-additive effects in GBLUP model.
Vieira, I C; Dos Santos, J P R; Pires, L P M; Lima, B M; Gonçalves, F M A; Balestre, M
2017-05-10
Understanding non-additive effects in the expression of quantitative traits is very important in genotype selection, especially in species where the commercial products are clones or hybrids. The use of molecular markers has allowed the study of non-additive genetic effects on a genomic level, in addition to a better understanding of its importance in quantitative traits. Thus, the purpose of this study was to evaluate the behavior of the GBLUP model in different genetic models and relationship matrices and their influence on the estimates of genetic parameters. We used real data of the circumference at breast height in Eucalyptus spp and simulated data from a population of F 2 . Three commonly reported kinship structures in the literature were adopted. The simulation results showed that the inclusion of epistatic kinship improved prediction estimates of genomic breeding values. However, the non-additive effects were not accurately recovered. The Fisher information matrix for real dataset showed high collinearity in estimates of additive, dominant, and epistatic variance, causing no gain in the prediction of the unobserved data and convergence problems. Estimates presented differences of genetic parameters and correlations considering the different kinship structures. Our results show that the inclusion of non-additive effects can improve the predictive ability or even the prediction of additive effects. However, the high distortions observed in the variance estimates when the Hardy-Weinberg equilibrium assumption is violated due to the presence of selection or inbreeding can converge at zero gains in models that consider epistasis in genomic kinship.
Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian
2016-01-01
Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696
Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian
2016-01-01
Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus.
Genome-wide scans for loci under selection in humans
2005-01-01
Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection. PMID:16004726
Model-Averaged ℓ1 Regularization using Markov Chain Monte Carlo Model Composition
Fraley, Chris; Percival, Daniel
2014-01-01
Bayesian Model Averaging (BMA) is an effective technique for addressing model uncertainty in variable selection problems. However, current BMA approaches have computational difficulty dealing with data in which there are many more measurements (variables) than samples. This paper presents a method for combining ℓ1 regularization and Markov chain Monte Carlo model composition techniques for BMA. By treating the ℓ1 regularization path as a model space, we propose a method to resolve the model uncertainty issues arising in model averaging from solution path point selection. We show that this method is computationally and empirically effective for regression and classification in high-dimensional datasets. We apply our technique in simulations, as well as to some applications that arise in genomics. PMID:25642001
Ren, Wen-Long; Wen, Yang-Jun; Dunwell, Jim M; Zhang, Yuan-Ming
2018-03-01
Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.
New transgenic models of Parkinson's disease using genome editing technology.
Cota-Coronado, J A; Sandoval-Ávila, S; Gaytan-Dávila, Y P; Diaz, N F; Vega-Ruiz, B; Padilla-Camberos, E; Díaz-Martínez, N E
2017-11-28
Parkinson's disease (PD) is the second most common neurodegenerative disorder. It is characterised by selective loss of dopaminergic neurons in the substantia nigra pars compacta, which results in dopamine depletion, leading to a number of motor and non-motor symptoms. In recent years, the development of new animal models using nuclease-based genome-editing technology (ZFN, TALEN, and CRISPR/Cas9 nucleases) has enabled the introduction of custom-made modifications into the genome to replicate key features of PD, leading to significant advances in our understanding of the pathophysiology of the disease. We review the most recent studies on this new generation of in vitro and in vivo PD models, which replicate the most relevant symptoms of the disease and enable better understanding of the aetiology and mechanisms of PD. This may be helpful in the future development of effective treatments to halt or slow disease progression. Copyright © 2017 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
Genetic Diversity in the Interference Selection Limit
Good, Benjamin H.; Walczak, Aleksandra M.; Neher, Richard A.; Desai, Michael M.
2014-01-01
Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, similar to models of quantitative genetics, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a “linkage block”). We exploit this insensitivity in a new “coarse-grained” coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations that create the same variance in fitness. This approximation generates accurate and efficient predictions for silent site variability when interference is common. However, these results suggest that there is reduced power to resolve individual selection pressures when interference is sufficiently widespread, since a broad range of parameters possess nearly identical patterns of silent site variability. PMID:24675740
Zhou, Zhan; Zou, Yangyun; Liu, Gangbiao; Zhou, Jingqi; Wu, Jingcheng; Zhao, Shimin; Su, Zhixi; Gu, Xun
2017-08-29
Human genes exhibit different effects on fitness in cancer and normal cells. Here, we present an evolutionary approach to measure the selection pressure on human genes, using the well-known ratio of the nonsynonymous to synonymous substitution rate in both cancer genomes ( C N / C S ) and normal populations ( p N / p S ). A new mutation-profile-based method that adopts sample-specific mutation rate profiles instead of conventional substitution models was developed. We found that cancer-specific selection pressure is quite different from the selection pressure at the species and population levels. Both the relaxation of purifying selection on passenger mutations and the positive selection of driver mutations may contribute to the increased C N / C S values of human genes in cancer genomes compared with the p N / p S values in human populations. The C N / C S values also contribute to the improved classification of cancer genes and a better understanding of the onco-functionalization of cancer genes during oncogenesis. The use of our computational pipeline to identify cancer-specific positively and negatively selected genes may provide useful information for understanding the evolution of cancers and identifying possible targets for therapeutic intervention.
Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow.
Ravinet, M; Faria, R; Butlin, R K; Galindo, J; Bierne, N; Rafajlović, M; Noor, M A F; Mehlig, B; Westram, A M
2017-08-01
Speciation, the evolution of reproductive isolation among populations, is continuous, complex, and involves multiple, interacting barriers. Until it is complete, the effects of this process vary along the genome and can lead to a heterogeneous genomic landscape with peaks and troughs of differentiation and divergence. When gene flow occurs during speciation, barriers restricting gene flow locally in the genome lead to patterns of heterogeneity. However, genomic heterogeneity can also be produced or modified by variation in factors such as background selection and selective sweeps, recombination and mutation rate variation, and heterogeneous gene density. Extracting the effects of gene flow, divergent selection and reproductive isolation from such modifying factors presents a major challenge to speciation genomics. We argue one of the principal aims of the field is to identify the barrier loci involved in limiting gene flow. We first summarize the expected signatures of selection at barrier loci, at the genomic regions linked to them and across the entire genome. We then discuss the modifying factors that complicate the interpretation of the observed genomic landscape. Finally, we end with a road map for future speciation research: a proposal for how to account for these modifying factors and to progress towards understanding the nature of barrier loci. Despite the difficulties of interpreting empirical data, we argue that the availability of promising technical and analytical methods will shed further light on the important roles that gene flow and divergent selection have in shaping the genomic landscape of speciation. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
USDA-ARS?s Scientific Manuscript database
Background: BAC-based physical maps provide for sequencing across an entire genome or selected sub-genome regions of biological interest. Using the minimum tiling path as a guide, it is possible to select specific BAC clones from prioritized genome sections such as a genetically defined QTL interv...
solGS: a web-based tool for genomic selection
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, ana...
The population genomics of rhesus macaques (Macaca mulatta) based on whole-genome sequences
Xue, Cheng; Raveendran, Muthuswamy; Harris, R. Alan; Fawcett, Gloria L.; Liu, Xiaoming; White, Simon; Dahdouli, Mahmoud; Rio Deiros, David; Below, Jennifer E.; Salerno, William; Cox, Laura; Fan, Guoping; Ferguson, Betsy; Horvath, Julie; Johnson, Zach; Kanthaswamy, Sree; Kubisch, H. Michael; Liu, Dahai; Platt, Michael; Smith, David G.; Sun, Binghua; Vallender, Eric J.; Wang, Feng; Wiseman, Roger W.; Chen, Rui; Muzny, Donna M.; Gibbs, Richard A.; Yu, Fuli; Rogers, Jeffrey
2016-01-01
Rhesus macaques (Macaca mulatta) are the most widely used nonhuman primate in biomedical research, have the largest natural geographic distribution of any nonhuman primate, and have been the focus of much evolutionary and behavioral investigation. Consequently, rhesus macaques are one of the most thoroughly studied nonhuman primate species. However, little is known about genome-wide genetic variation in this species. A detailed understanding of extant genomic variation among rhesus macaques has implications for the use of this species as a model for studies of human health and disease, as well as for evolutionary population genomics. Whole-genome sequencing analysis of 133 rhesus macaques revealed more than 43.7 million single-nucleotide variants, including thousands predicted to alter protein sequences, transcript splicing, and transcription factor binding sites. Rhesus macaques exhibit 2.5-fold higher overall nucleotide diversity and slightly elevated putative functional variation compared with humans. This functional variation in macaques provides opportunities for analyses of coding and noncoding variation, and its cellular consequences. Despite modestly higher levels of nonsynonymous variation in the macaques, the estimated distribution of fitness effects and the ratio of nonsynonymous to synonymous variants suggest that purifying selection has had stronger effects in rhesus macaques than in humans. Demographic reconstructions indicate this species has experienced a consistently large but fluctuating population size. Overall, the results presented here provide new insights into the population genomics of nonhuman primates and expand genomic information directly relevant to primate models of human disease. PMID:27934697
Sanjak, Jaleal S.; Long, Anthony D.; Thornton, Kevin R.
2017-01-01
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. PMID:28103232
Fernández-Fueyo, Elena; Ruiz-Dueñas, Francisco J.; Miki, Yuta; Martínez, María Jesús; Hammel, Kenneth E.; Martínez, Angel T.
2012-01-01
The white-rot fungus Ceriporiopsis subvermispora delignifies lignocellulose with high selectivity, but until now it has appeared to lack the specialized peroxidases, termed lignin peroxidases (LiPs) and versatile peroxidases (VPs), that are generally thought important for ligninolysis. We screened the recently sequenced C. subvermispora genome for genes that encode peroxidases with a potential ligninolytic role. A total of 26 peroxidase genes was apparent after a structural-functional classification based on homology modeling and a search for diagnostic catalytic amino acid residues. In addition to revealing the presence of nine heme-thiolate peroxidase superfamily members and the unexpected absence of the dye-decolorizing peroxidase superfamily, the search showed that the C. subvermispora genome encodes 16 class II enzymes in the plant-fungal-bacterial peroxidase superfamily, where LiPs and VPs are classified. The 16 encoded enzymes include 13 putative manganese peroxidases and one generic peroxidase but most notably two peroxidases containing the catalytic tryptophan characteristic of LiPs and VPs. We expressed these two enzymes in Escherichia coli and determined their substrate specificities on typical LiP/VP substrates, including nonphenolic lignin model monomers and dimers, as well as synthetic lignin. The results show that the two newly discovered C. subvermispora peroxidases are functionally competent LiPs and also suggest that they are phylogenetically and catalytically intermediate between classical LiPs and VPs. These results offer new insight into selective lignin degradation by C. subvermispora. PMID:22437835
Stochastic model search with binary outcomes for genome-wide association studies
Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo
2012-01-01
Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. PMID:22534080
DNA replication stress as a hallmark of cancer.
Macheret, Morgane; Halazonetis, Thanos D
2015-01-01
Human cancers share properties referred to as hallmarks, among which sustained proliferation, escape from apoptosis, and genomic instability are the most pervasive. The sustained proliferation hallmark can be explained by mutations in oncogenes and tumor suppressors that regulate cell growth, whereas the escape from apoptosis hallmark can be explained by mutations in the TP53, ATM, or MDM2 genes. A model to explain the presence of the three hallmarks listed above, as well as the patterns of genomic instability observed in human cancers, proposes that the genes driving cell proliferation induce DNA replication stress, which, in turn, generates genomic instability and selects for escape from apoptosis. Here, we review the data that support this model, as well as the mechanisms by which oncogenes induce replication stress. Further, we argue that DNA replication stress should be considered as a hallmark of cancer because it likely drives cancer development and is very prevalent.
2016-01-01
This review aimed to arrange the process of a systematic review of genome-wide association studies in order to practice and apply a genome-wide meta-analysis (GWMA). The process has a series of five steps: searching and selection, extraction of related information, evaluation of validity, meta-analysis by type of genetic model, and evaluation of heterogeneity. In contrast to intervention meta-analyses, GWMA has to evaluate the Hardy–Weinberg equilibrium (HWE) in the third step and conduct meta-analyses by five potential genetic models, including dominant, recessive, homozygote contrast, heterozygote contrast, and allelic contrast in the fourth step. The ‘genhwcci’ and ‘metan’ commands of STATA software evaluate the HWE and calculate a summary effect size, respectively. A meta-regression using the ‘metareg’ command of STATA should be conducted to evaluate related factors of heterogeneities. PMID:28092928
"Wrecks of Ancient Life": Genetic Variants Vetted by Natural Selection.
Postlethwait, John H
2015-07-01
The Genetics Society of America's George W. Beadle Award honors individuals who have made outstanding contributions to the community of genetics researchers and who exemplify the qualities of its namesake as a respected academic, administrator, and public servant. The 2015 recipient is John Postlethwait. He has made groundbreaking contributions in developing the zebrafish as a molecular genetic model and in understanding the evolution of new gene functions in vertebrates. He built the first zebrafish genetic map and showed that its genome, along with that of distantly related teleost fish, had been duplicated. Postlethwait played an integral role in the zebrafish genome-sequencing project and elucidated the genomic organization of several fish species. Postlethwait is also honored for his active involvement with the zebrafish community, advocacy for zebrafish as a model system, and commitment to driving the field forward. Copyright © 2015 by the Genetics Society of America.
MIPS: analysis and annotation of genome information in 2007
Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.
2008-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298
MIPS: analysis and annotation of genome information in 2007.
Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T
2008-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Machine learning applications in genetics and genomics.
Libbrecht, Maxwell W; Noble, William Stafford
2015-06-01
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
Identifying spatio-temporal dynamics of Ebola in Sierra Leone using virus genomes
Proctor, Joshua L.
2017-01-01
Containing the recent West African outbreak of Ebola virus (EBOV) required the deployment of substantial global resources. Despite recent progress in analysing and modelling EBOV epidemiological data, a complete characterization of the spatio-temporal spread of Ebola cases remains a challenge. In this work, we offer a novel perspective on the EBOV epidemic in Sierra Leone that uses individual virus genome sequences to inform population-level, spatial models. Calibrated to phylogenetic linkages of virus genomes, these spatial models provide unique insight into the disease mobility of EBOV in Sierra Leone without the need for human mobility data. Consistent with other investigations, our results show that the spread of EBOV during the beginning and middle portions of the epidemic strongly depended on the size of and distance between populations. Our phylodynamic analysis also revealed a change in model preference towards a spatial model with power-law characteristics in the latter portion of the epidemic, correlated with the timing of major intervention campaigns. More generally, we believe this framework, pairing molecular diagnostics with a dynamic model selection procedure, has the potential to be a powerful forecasting tool along with offering operationally relevant guidance for surveillance and sampling strategies during an epidemic. PMID:29187639
Powell, Kim L.; Zhu, Mingfu; Campbell, C. Ryan; Maia, Jessica M.; Ren, Zhong; Jones, Nigel C.; O’Brien, Terence J.; Petrovski, Slavé
2017-01-01
Objective The Genetic Absence Epilepsy Rats from Strasbourg (GAERS) are an inbreed Wistar rat strain widely used as a model of genetic generalised epilepsy with absence seizures. As in humans, the genetic architecture that results in genetic generalized epilepsy in GAERS is poorly understood. Here we present the strain-specific variants found among the epileptic GAERS and their related Non-Epileptic Control (NEC) strain. The GAERS and NEC represent a powerful opportunity to identify neurobiological factors that are associated with the genetic generalised epilepsy phenotype. Methods We performed whole genome sequencing on adult epileptic GAERS and adult NEC rats, a strain derived from the same original Wistar colony. We also generated whole genome sequencing on four double-crossed (GAERS with NEC) F2 selected for high-seizing (n = 2) and non-seizing (n = 2) phenotypes. Results Specific to the GAERS genome, we identified 1.12 million single nucleotide variants, 296.5K short insertion-deletions, and 354 putative copy number variants that result in complete or partial loss/duplication of 41 genes. Of the GAERS-specific variants that met high quality criteria, 25 are annotated as stop codon gain/loss, 56 as putative essential splice sites, and 56 indels are predicted to result in a frameshift. Subsequent screening against the two F2 progeny sequenced for having the highest and two F2 progeny for having the lowest seizure burden identified only the selected Cacna1h GAERS-private protein-coding variant as exclusively co-segregating with the two high-seizing F2 rats. Significance This study highlights an approach for using whole genome sequencing to narrow down to a manageable candidate list of genetic variants in a complex genetic epilepsy animal model, and suggests utility of this sequencing design to investigate other spontaneously occurring animal models of human disease. PMID:28708842
Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.
Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric
2016-08-30
Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.
Kelleher, Erin S; Barbash, Daniel A
2013-08-01
The Piwi-interacting RNA (piRNA) pathway defends animal genomes against the harmful consequences of transposable element (TE) infection by imposing small-RNA-mediated silencing. Because silencing is targeted by TE-derived piRNAs, piRNA production is posited to be central to the evolution of genome defense. We harnessed genomic data sets from Drosophila melanogaster, including genome-wide measures of piRNA, mRNA, and genomic abundance, along with estimates of age structure and risk of ectopic recombination, to address fundamental questions about the functional and evolutionary relationships between TE families and their regulatory piRNAs. We demonstrate that mRNA transcript abundance, robustness of "ping-pong" amplification, and representation in piRNA clusters together explain the majority of variation in piRNA abundance between TE families, providing the first robust statistical support for the prevailing model of piRNA biogenesis. Intriguingly, we also discover that the most transpositionally active TE families, with the greatest capacity to induce harmful mutations or disrupt gametogenesis, are not necessarily the most abundant among piRNAs. Rather, the level of piRNA targeting is largely independent of recent transposition rate for active TE families, but is rapidly lost for inactive TEs. These observations are consistent with population genetic theory that suggests a limited selective advantage for host repression of transposition. Additionally, we find no evidence that piRNA targeting responds to selection against a second major cost of TE infection: ectopic recombination between TE insertions. Our observations confirm the pivotal role of piRNA-mediated silencing in defending the genome against selfish transposition, yet also suggest limits to the optimization of host genome defense.
An overview of bioinformatics methods for modeling biological pathways in yeast
Hou, Jie; Acharya, Lipi; Zhu, Dongxiao
2016-01-01
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein–protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae. In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways in S. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed. PMID:26476430
The genome landscape of indigenous African cattle.
Kim, Jaemin; Hanotte, Olivier; Mwai, Okeyo Ally; Dessie, Tadelle; Bashir, Salim; Diallo, Boubacar; Agaba, Morris; Kim, Kwondo; Kwak, Woori; Sung, Samsun; Seo, Minseok; Jeong, Hyeonsoo; Kwon, Taehyung; Taye, Mengistie; Song, Ki-Duk; Lim, Dajeong; Cho, Seoae; Lee, Hyun-Jeong; Yoon, Duhak; Oh, Sung Jong; Kemp, Stephen; Lee, Hak-Kyo; Kim, Heebal
2017-02-20
The history of African indigenous cattle and their adaptation to environmental and human selection pressure is at the root of their remarkable diversity. Characterization of this diversity is an essential step towards understanding the genomic basis of productivity and adaptation to survival under African farming systems. We analyze patterns of African cattle genetic variation by sequencing 48 genomes from five indigenous populations and comparing them to the genomes of 53 commercial taurine breeds. We find the highest genetic diversity among African zebu and sanga cattle. Our search for genomic regions under selection reveals signatures of selection for environmental adaptive traits. In particular, we identify signatures of selection including genes and/or pathways controlling anemia and feeding behavior in the trypanotolerant N'Dama, coat color and horn development in Ankole, and heat tolerance and tick resistance across African cattle especially in zebu breeds. Our findings unravel at the genome-wide level, the unique adaptive diversity of African cattle while emphasizing the opportunities for sustainable improvement of livestock productivity on the continent.
Fitness consequences of sex-specific selection.
Connallon, Tim; Cox, Robert M; Calsbeek, Ryan
2010-06-01
Theory suggests that sex-specific selection can facilitate adaptation in sexually reproducing populations. However, sexual conflict theory and recent experiments indicate that sex-specific selection is potentially costly due to sexual antagonism: alleles harmful to one sex can accumulate within a population because they are favored in the other sex. Whether sex-specific selection provides a net fitness benefit or cost depends, in part, on the relative frequency and strength of sexually concordant versus sexually antagonistic selection throughout a species' genome. Here, we model the net fitness consequences of sex-specific selection while explicitly considering both sexually concordant and sexually antagonistic selection. The model shows that, even when sexual antagonism is rare, the fitness costs that it imposes will generally overwhelm fitness benefits of sexually concordant selection. Furthermore, the cost of sexual antagonism is, at best, only partially resolved by the evolution of sex-limited gene expression. To evaluate the key parameters of the model, we analyze an extensive dataset of sex-specific selection gradients from wild populations, along with data from the experimental evolution literature. The model and data imply that sex-specific selection may likely impose a net cost on sexually reproducing species, although additional research will be required to confirm this conclusion.
Konijnendijk, Nellie; Shikano, Takahito; Daneels, Dorien; Volckaert, Filip A M; Raeymaekers, Joost A M
2015-09-01
Local adaptation is often obvious when gene flow is impeded, such as observed at large spatial scales and across strong ecological contrasts. However, it becomes less certain at small scales such as between adjacent populations or across weak ecological contrasts, when gene flow is strong. While studies on genomic adaptation tend to focus on the former, less is known about the genomic targets of natural selection in the latter situation. In this study, we investigate genomic adaptation in populations of the three-spined stickleback Gasterosteus aculeatus L. across a small-scale ecological transition with salinities ranging from brackish to fresh. Adaptation to salinity has been repeatedly demonstrated in this species. A genome scan based on 87 microsatellite markers revealed only few signatures of selection, likely owing to the constraints that homogenizing gene flow puts on adaptive divergence. However, the detected loci appear repeatedly as targets of selection in similar studies of genomic adaptation in the three-spined stickleback. We conclude that the signature of genomic selection in the face of strong gene flow is weak, yet detectable. We argue that the range of studies of genomic divergence should be extended to include more systems characterized by limited geographical and ecological isolation, which is often a realistic setting in nature.
Bahbahani, Hussain; Clifford, Harry; Wragg, David; Mbole-Kariuki, Mary N; Van Tassell, Curtis; Sonstegard, Tad; Woolhouse, Mark; Hanotte, Olivier
2015-01-01
The small East African Shorthorn Zebu (EASZ) is the main indigenous cattle across East Africa. A recent genome wide SNP analysis revealed an ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signatures of positive selection in their genome, with the aim to provide qualitative insights about the corresponding selective pressures. Four hundred and twenty-five EASZ and four reference populations (Holstein-Friesian, Jersey, N’Dama and Nellore) were analysed using 46,171 SNPs covering all autosomes and the X chromosome. Following FST and two extended haplotype homozygosity-based (iHS and Rsb) analyses 24 candidate genome regions within 14 autosomes and the X chromosome were revealed, in which 18 and 4 were previously identified in tropical-adapted and commercial breeds, respectively. These regions overlap with 340 bovine QTL. They include 409 annotated genes, in which 37 were considered as candidates. These genes are involved in various biological pathways (e.g. immunity, reproduction, development and heat tolerance). Our results support that different selection pressures (e.g. environmental constraints, human selection, genome admixture constrains) have shaped the genome of EASZ. We argue that these candidate regions represent genome landmarks to be maintained in breeding programs aiming to improve sustainable livestock productivity in the tropics. PMID:26130263
Nagasaki, Masao; Yamaguchi, Rui; Yoshida, Ryo; Imoto, Seiya; Doi, Atsushi; Tamada, Yoshinori; Matsuno, Hiroshi; Miyano, Satoru; Higuchi, Tomoyuki
2006-01-01
We propose an automatic construction method of the hybrid functional Petri net as a simulation model of biological pathways. The problems we consider are how we choose the values of parameters and how we set the network structure. Usually, we tune these unknown factors empirically so that the simulation results are consistent with biological knowledge. Obviously, this approach has the limitation in the size of network of interest. To extend the capability of the simulation model, we propose the use of data assimilation approach that was originally established in the field of geophysical simulation science. We provide genomic data assimilation framework that establishes a link between our simulation model and observed data like microarray gene expression data by using a nonlinear state space model. A key idea of our genomic data assimilation is that the unknown parameters in simulation model are converted as the parameter of the state space model and the estimates are obtained as the maximum a posteriori estimators. In the parameter estimation process, the simulation model is used to generate the system model in the state space model. Such a formulation enables us to handle both the model construction and the parameter tuning within a framework of the Bayesian statistical inferences. In particular, the Bayesian approach provides us a way of controlling overfitting during the parameter estimations that is essential for constructing a reliable biological pathway. We demonstrate the effectiveness of our approach using synthetic data. As a result, parameter estimation using genomic data assimilation works very well and the network structure is suitably selected.
Orozco-terWengel, Pablo; Kapun, Martin; Nolte, Viola; Kofler, Robert; Flatt, Thomas; Schlötterer, Christian
2012-10-01
The genomic basis of adaptation to novel environments is a fundamental problem in evolutionary biology that has gained additional importance in the light of the recent global change discussion. Here, we combined laboratory natural selection (experimental evolution) in Drosophila melanogaster with genome-wide next generation sequencing of DNA pools (Pool-Seq) to identify alleles that are favourable in a novel laboratory environment and traced their trajectories during the adaptive process. Already after 15 generations, we identified a pronounced genomic response to selection, with almost 5000 single nucleotide polymorphisms (SNP; genome-wide false discovery rates < 0.005%) deviating from neutral expectation. Importantly, the evolutionary trajectories of the selected alleles were heterogeneous, with the alleles falling into two distinct classes: (i) alleles that continuously rise in frequency; and (ii) alleles that at first increase rapidly but whose frequencies then reach a plateau. Our data thus suggest that the genomic response to selection can involve a large number of selected SNPs that show unexpectedly complex evolutionary trajectories, possibly due to nonadditive effects. © 2012 Blackwell Publishing Ltd.
Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.
Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin
2016-01-01
First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.
Guyon, Richard; Senger, Fabrice; Rakotomanga, Michaelle; Sadequi, Naoual; Volckaert, Filip A M; Hitte, Christophe; Galibert, Francis
2010-10-01
The selective breeding of fish for aquaculture purposes requires the understanding of the genetic basis of traits such as growth, behaviour, resistance to pathogens and sex determinism. Access to well-developed genomic resources is a prerequisite to improve the knowledge of these traits. Having this aim in mind, a radiation hybrid (RH) panel of European sea bass (Dicentrarchus labrax) was constructed from splenocytes irradiated at 3000 rad, allowing the construction of a 1581 marker RH map. A total of 1440 gene markers providing ~4400 anchors with the genomes of three-spined stickleback, medaka, pufferfish and zebrafish, helped establish synteny relationships with these model species. The identification of Conserved Segments Ordered (CSO) between sea bass and model species allows the anticipation of the position of any sea bass gene from its location in model genomes. Synteny relationships between sea bass and gilthead seabream were addressed by mapping 37 orthologous markers. The sea bass genetic linkage map was integrated in the RH map through the mapping of 141 microsatellites. We are thus able to present the first complete gene map of sea bass. It will facilitate linkage studies and the identification of candidate genes and Quantitative Trait Loci (QTL). The RH map further positions sea bass as a genetic and evolutionary model of Perciformes and supports their ongoing aquaculture expansion. Copyright © 2010 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vamathevan, Jessica J., E-mail: jessica.j.vamathevan@gsk.com; Hall, Matthew D.; Hasan, Samiul
Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of themore » Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research. - Highlights: • Genomes of the minipig and beagle dog, two species used in pharmaceutical studies. • First systematic comparative genome analysis of human and six experimental animals. • Key drug toxicology genes display unique duplication patterns across species. • Comparison of 317 drug targets show species-specific evolutionary patterns.« less
Schrider, Daniel R; Kern, Andrew D
2014-06-09
Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly, this signal has been exploited for over a decade through the use of genomic comparisons of distantly related species. While this is so, the functional complement of the genome changes extensively across time and between lineages; therefore, evidence of the current action of purifying selection in humans is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here, we assess the potential of this approach by examining an ultradeep sample of human mitochondrial genomes (n = 16,411). We show that the high density of polymorphism in this data set precisely delineates regions experiencing purifying selection. Furthermore, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mitochondrial DNA (ρ = 0.51; P < 2.2 × 10(-16)). These two measures track one another at a remarkably fine scale across many loci-a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal with surprising precision which regions in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete human genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Rübben, Albert; Nordhoff, Ole
2013-01-01
Summary Most clinically distinguishable malignant tumors are characterized by specific mutations, specific patterns of chromosomal rearrangements and a predominant mechanism of genetic instability but it remains unsolved whether modifications of cancer genomes can be explained solely by mutations and selection through the cancer microenvironment. It has been suggested that internal dynamics of genomic modifications as opposed to the external evolutionary forces have a significant and complex impact on Darwinian species evolution. A similar situation can be expected for somatic cancer evolution as molecular key mechanisms encountered in species evolution also constitute prevalent mutation mechanisms in human cancers. This assumption is developed into a systems approach of carcinogenesis which focuses on possible inner constraints of the genome architecture on lineage selection during somatic cancer evolution. The proposed systems approach can be considered an analogy to the concept of evolvability in species evolution. The principal hypothesis is that permissive or restrictive effects of the genome architecture on lineage selection during somatic cancer evolution exist and have a measurable impact. The systems approach postulates three classes of lineage selection effects of the genome architecture on somatic cancer evolution: i) effects mediated by changes of fitness of cells of cancer lineage, ii) effects mediated by changes of mutation probabilities and iii) effects mediated by changes of gene designation and physical and functional genome redundancy. Physical genome redundancy is the copy number of identical genetic sequences. Functional genome redundancy of a gene or a regulatory element is defined as the number of different genetic elements, regardless of copy number, coding for the same specific biological function within a cancer cell. Complex interactions of the genome architecture on lineage selection may be expected when modifications of the genome architecture have multiple and possibly opposed effects which manifest themselves at disparate times and progression stages. Dissection of putative mechanisms mediating constraints exerted by the genome architecture on somatic cancer evolution may provide an algorithm for understanding and predicting as well as modifying somatic cancer evolution in individual patients. PMID:23336076
Accuracy of genomic selection in European maize elite breeding populations.
Zhao, Yusheng; Gowda, Manje; Liu, Wenxin; Würschum, Tobias; Maurer, Hans P; Longin, Friedrich H; Ranc, Nicolas; Reif, Jochen C
2012-03-01
Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3-4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.
Mutation-selection balance in mixed mating populations.
Kelly, John K
2007-05-21
An approximation to the average number of deleterious mutations per gamete, Q, is derived from a model allowing selection on both zygotes and male gametes. Progeny are produced by either outcrossing or self-fertilization with fixed probabilities. The genetic model is a standard in evolutionary biology: mutations occur at unlinked loci, have equivalent effects, and combine multiplicatively to determine fitness. The approximation developed here treats individual mutation counts with a generalized Poisson model conditioned on the distribution of selfing histories in the population. The approximation is accurate across the range of parameter sets considered and provides both analytical insights and greatly increased computational speed. Model predictions are discussed in relation to several outstanding problems, including the estimation of the genomic deleterious mutation rates (U), the generality of "selective interference" among loci, and the consequences of gametic selection for the joint distribution of inbreeding depression and mating system across species. Finally, conflicting results from previous analytical treatments of mutation-selection balance are resolved to assumptions about the life-cycle and the initial fate of mutations.
Hao, Chenyang; Wang, Yuquan; Chao, Shiaoman; Li, Tian; Liu, Hongxia; Wang, Lanfen; Zhang, Xueyong
2017-01-30
A Chinese wheat mini core collection was genotyped using the wheat 9 K iSelect SNP array. Total 2420 and 2396 polymorphic SNPs were detected on the A and the B genome chromosomes, which formed 878 haplotype blocks. There were more blocks in the B genome, but the average block size was significantly (P < 0.05) smaller than those in the A genome. Intense selection (domestication and breeding) had a stronger effect on the A than on the B genome chromosomes. Based on the genetic pedigrees, many blocks can be traced back to a well-known Strampelli cross, which was made one century ago. Furthermore, polyploidization of wheat (both tetraploidization and hexaploidization) induced revolutionary changes in both the A and the B genomes, with a greater increase of gene diversity compared to their diploid ancestors. Modern breeding has dramatically increased diversity in the gene coding regions, though obvious blocks were formed on most of the chromosomes in both tetraploid and hexaploid wheats. Tag-SNP markers identified in this study can be used for marker assisted selection using haplotype blocks as a wheat breeding strategy. This strategy can also be employed to facilitate genome selection in other self-pollinating crop species.
Genomic selection of agronomic traits in hybrid rice using an NCII population.
Xu, Yang; Wang, Xin; Ding, Xiaowen; Zheng, Xingfei; Yang, Zefeng; Xu, Chenwu; Hu, Zhongli
2018-05-10
Hybrid breeding is an effective tool to improve yield in rice, while parental selection remains the key and difficult issue. Genomic selection (GS) provides opportunities to predict the performance of hybrids before phenotypes are measured. However, the application of GS is influenced by several genetic and statistical factors. Here, we used a rice North Carolina II (NC II) population constructed by crossing 115 rice varieties with five male sterile lines as a model to evaluate effects of statistical methods, heritability, marker density and training population size on prediction for hybrid performance. From the comparison of six GS methods, we found that predictabilities for different methods are significantly different, with genomic best linear unbiased prediction (GBLUP) and least absolute shrinkage and selection operation (LASSO) being the best, support vector machine (SVM) and partial least square (PLS) being the worst. The marker density has lower influence on predicting rice hybrid performance compared with the size of training population. Additionally, we used the 575 (115 × 5) hybrid rice as a training population to predict eight agronomic traits of all hybrids derived from 120 (115 + 5) rice varieties each mating with 3023 rice accessions from the 3000 rice genomes project (3 K RGP). Of the 362,760 potential hybrids, selection of the top 100 predicted hybrids would lead to 35.5%, 23.25%, 30.21%, 42.87%, 61.80%, 75.83%, 19.24% and 36.12% increase in grain yield per plant, thousand-grain weight, panicle number per plant, plant height, secondary branch number, grain number per panicle, panicle length and primary branch number, respectively. This study evaluated the factors affecting predictabilities for hybrid prediction and demonstrated the implementation of GS to predict hybrid performance of rice. Our results suggest that GS could enable the rapid selection of superior hybrids, thus increasing the efficiency of rice hybrid breeding.
Watanabe, Takahito; Noji, Sumihare; Mito, Taro
2014-08-15
Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zheng, Xiu-Deng; Tao, Yi
2017-03-01
The evolutionary significance of the interaction between paternal and maternal genomes in fertilized zygotes is a very interesting and challenging question. Wang et al. developed the concept of epigenetic game theory, and they try to use this concept to explain the interaction between paternal and maternal genomes in fertilized zygotes [1]. They emphasize that the embryogenesis can be considered as an ecological system in which two highly distinct and specialized gametes coordinate through either cooperation or competition, or both, to maximize the fitness of embryos under Darwinian selection. More specifically, they integrate game theory to model the pattern of coordination of paternal genome and maternal genomes mediated by DNA methylation dynamics, and they called this epigenetic game theory.
Swaggart, Kayleigh A.; Pavlicev, Mihaela; Muglia, Louis J.
2015-01-01
The molecular mechanisms controlling human birth timing at term, or resulting in preterm birth, have been the focus of considerable investigation, but limited insights have been gained over the past 50 years. In part, these processes have remained elusive because of divergence in reproductive strategies and physiology shown by model organisms, making extrapolation to humans uncertain. Here, we summarize the evolution of progesterone signaling and variation in pregnancy maintenance and termination. We use this comparative physiology to support the hypothesis that selective pressure on genomic loci involved in the timing of parturition have shaped human birth timing, and that these loci can be identified with comparative genomic strategies. Previous limitations imposed by divergence of mechanisms provide an important new opportunity to elucidate fundamental pathways of parturition control through increasing availability of sequenced genomes and associated reproductive physiology characteristics across diverse organisms. PMID:25646385
An experimental validation of genomic selection in octoploid strawberry
Gezan, Salvador A; Osorio, Luis F; Verma, Sujeet; Whitaker, Vance M
2017-01-01
The primary goal of genomic selection is to increase genetic gains for complex traits by predicting performance of individuals for which phenotypic data are not available. The objective of this study was to experimentally evaluate the potential of genomic selection in strawberry breeding and to define a strategy for its implementation. Four clonally replicated field trials, two in each of 2 years comprised of a total of 1628 individuals, were established in 2013–2014 and 2014–2015. Five complex yield and fruit quality traits with moderate to low heritability were assessed in each trial. High-density genotyping was performed with the Affymetrix Axiom IStraw90 single-nucleotide polymorphism array, and 17 479 polymorphic markers were chosen for analysis. Several methods were compared, including Genomic BLUP, Bayes B, Bayes C, Bayesian LASSO Regression, Bayesian Ridge Regression and Reproducing Kernel Hilbert Spaces. Cross-validation within training populations resulted in higher values than for true validations across trials. For true validations, Bayes B gave the highest predictive abilities on average and also the highest selection efficiencies, particularly for yield traits that were the lowest heritability traits. Selection efficiencies using Bayes B for parent selection ranged from 74% for average fruit weight to 34% for early marketable yield. A breeding strategy is proposed in which advanced selection trials are utilized as training populations and in which genomic selection can reduce the breeding cycle from 3 to 2 years for a subset of untested parents based on their predicted genomic breeding values. PMID:28090334
De Kort, H; Vandepitte, K; Mergeay, J; Mijnsbrugge, K V; Honnay, O
2015-01-01
The evaluation of the molecular signatures of selection in species lacking an available closely related reference genome remains challenging, yet it may provide valuable fundamental insights into the capacity of populations to respond to environmental cues. We screened 25 native populations of the tree species Frangula alnus subsp. alnus (Rhamnaceae), covering three different geographical scales, for 183 annotated single-nucleotide polymorphisms (SNPs). Standard population genomic outlier screens were combined with individual-based and multivariate landscape genomic approaches to examine the strength of selection relative to neutral processes in shaping genomic variation, and to identify the main environmental agents driving selection. Our results demonstrate a more distinct signature of selection with increasing geographical distance, as indicated by the proportion of SNPs (i) showing exceptional patterns of genetic diversity and differentiation (outliers) and (ii) associated with climate. Both temperature and precipitation have an important role as selective agents in shaping adaptive genomic differentiation in F. alnus subsp. alnus, although their relative importance differed among spatial scales. At the ‘intermediate' and ‘regional' scales, where limited genetic clustering and high population diversity were observed, some indications of natural selection may suggest a major role for gene flow in safeguarding adaptability. High genetic diversity at loci under selection in particular, indicated considerable adaptive potential, which may nevertheless be compromised by the combined effects of climate change and habitat fragmentation. PMID:25944466
Palaiokostas, Christos; Ferraresso, Serena; Franch, Rafaella; Houston, Ross D.; Bargelloni, Luca
2016-01-01
Gilthead sea bream (Sparus aurata) is a species of paramount importance to the Mediterranean aquaculture industry, with an annual production exceeding 140,000 metric tons. Pasteurellosis due to the Gram-negative bacterium Photobacterium damselae subsp. piscicida (Phdp) causes significant mortality, especially during larval and juvenile stages, and poses a serious threat to bream production. Selective breeding for improved resistance to pasteurellosis is a promising avenue for disease control, and the use of genetic markers to predict breeding values can improve the accuracy of selection, and allow accurate calculation of estimated breeding values of nonchallenged animals. In the current study, a population of 825 sea bream juveniles, originating from a factorial cross between 67 broodfish (32 sires, 35 dams), were challenged by 30 min immersion with 1 × 105 CFU virulent Phdp. Mortalities and survivors were recorded and sampled for genotyping by sequencing. The restriction-site associated DNA sequencing approach, 2b-RAD, was used to generate genome-wide single nucleotide polymorphism (SNP) genotypes for all samples. A high-density linkage map containing 12,085 SNPs grouped into 24 linkage groups (consistent with the karyotype) was constructed. The heritability of surviving days (censored data) was 0.22 (95% highest density interval: 0.11–0.36) and 0.28 (95% highest density interval: 0.17–0.4) using the pedigree and the genomic relationship matrix respectively. A genome-wide association study did not reveal individual SNPs significantly associated with resistance at a genome-wide significance level. Genomic prediction approaches were tested to investigate the potential of the SNPs obtained by 2b-RAD for estimating breeding values for resistance. The accuracy of the genomic prediction models (r = 0.38–0.46) outperformed the traditional BLUP approach based on pedigree records (r = 0.30). Overall results suggest that major quantitative trait loci affecting resistance to pasteurellosis were not present in this population, but highlight the effectiveness of 2b-RAD genotyping by sequencing for genomic selection in a mass spawning fish species. PMID:27652890
Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats
Botcheva, Krassimira; McCorkle, Sean R.
2014-11-21
The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We reportmore » distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.« less
Sunflower Hybrid Breeding: From Markers to Genomic Selection
Dimitrijevic, Aleksandra; Horn, Renate
2018-01-01
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits. PMID:29387071
Schrider, Daniel R.; Mendes, Fábio K.; Hahn, Matthew W.; Kern, Andrew D.
2015-01-01
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of “soft shoulders” underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans. PMID:25716978
Schrider, Daniel R; Mendes, Fábio K; Hahn, Matthew W; Kern, Andrew D
2015-05-01
Characterizing the nature of the adaptive process at the genetic level is a central goal for population genetics. In particular, we know little about the sources of adaptive substitution or about the number of adaptive variants currently segregating in nature. Historically, population geneticists have focused attention on the hard-sweep model of adaptation in which a de novo beneficial mutation arises and rapidly fixes in a population. Recently more attention has been given to soft-sweep models, in which alleles that were previously neutral, or nearly so, drift until such a time as the environment shifts and their selection coefficient changes to become beneficial. It remains an active and difficult problem, however, to tease apart the telltale signatures of hard vs. soft sweeps in genomic polymorphism data. Through extensive simulations of hard- and soft-sweep models, here we show that indeed the two might not be separable through the use of simple summary statistics. In particular, it seems that recombination in regions linked to, but distant from, sites of hard sweeps can create patterns of polymorphism that closely mirror what is expected to be found near soft sweeps. We find that a very similar situation arises when using haplotype-based statistics that are aimed at detecting partial or ongoing selective sweeps, such that it is difficult to distinguish the shoulder of a hard sweep from the center of a partial sweep. While knowing the location of the selected site mitigates this problem slightly, we show that stochasticity in signatures of natural selection will frequently cause the signal to reach its zenith far from this site and that this effect is more severe for soft sweeps; thus inferences of the target as well as the mode of positive selection may be inaccurate. In addition, both the time since a sweep ends and biologically realistic levels of allelic gene conversion lead to errors in the classification and identification of selective sweeps. This general problem of "soft shoulders" underscores the difficulty in differentiating soft and partial sweeps from hard-sweep scenarios in molecular population genomics data. The soft-shoulder effect also implies that the more common hard sweeps have been in recent evolutionary history, the more prevalent spurious signatures of soft or partial sweeps may appear in some genome-wide scans. Copyright © 2015 by the Genetics Society of America.
Reproductive technologies combine well with genomic selection in dairy breeding programs.
Thomasen, J R; Willam, A; Egger-Danner, C; Sørensen, A C
2016-02-01
The objective of the present study was to examine whether genomic selection of females interacts with the use of reproductive technologies (RT) to increase annual monetary genetic gain (AMGG). This was tested using a factorial design with 3 factors: genomic selection of females (0 or 2,000 genotyped heifers per year), RT (0 or 50 donors selected at 14 mo of age for producing 10 offspring), and 2 reliabilities of genomic prediction. In addition, different strategies for use of RT and how strategies interact with the reliability of genomic prediction were investigated using stochastic simulation by varying (1) number of donors (25, 50, 100, 200), (2) number of calves born per donor (10 or 20), (3) age of donor (2 or 14 mo), and (4) number of sires (25, 50, 100, 200). In total, 72 different breeding schemes were investigated. The profitability of the different breeding strategies was evaluated by deterministic simulation by varying the costs of a born calf with reproductive technologies at levels of €500, €1,000, and €1,500. The results confirm our hypothesis that combining genomic selection of females with use of RT increases AMGG more than in a reference scheme without genomic selection in females. When the reliability of genomic prediction is high, the effect on rate of inbreeding (ΔF) is small. The study also demonstrates favorable interaction effects between the components of the breeder's equation (selection intensity, selection accuracy, generation interval) for the bull dam donor path, leading to higher AMGG. Increasing the donor program and number of born calves to achieve higher AMGG is associated with the undesirable effect of increased ΔF. This can be alleviated, however, by increasing the numbers of sires without compromising AMGG remarkably. For the major part of the investigated donor schemes, the investment in RT is profitable in dairy cattle populations, even at high levels of costs for RT. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
New frontiers in the study of human cultural and genetic evolution.
Ross, Cody T; Richerson, Peter J
2014-12-01
In this review, we discuss the dynamic linkages between culture and the genetic evolution of the human species. We begin by briefly describing the framework of gene-culture coevolutionary (or dual-inheritance) models for human evolutionary change. Until recently, the literature on gene-culture coevolution was composed primarily of mathematical models and formalized theory describing the complex dynamics underlying human behavior, adaptation, and technological evolution, but had little empirical support concerning genetics. The rapid progress in the fields of molecular genetics and genomics, however, is now providing the kinds of data needed to produce rich empirical support for gene-culture coevolutionary models. We briefly outline how theoretical and methodological progress in genome sciences has provided ways for the strength of selection on genes to be evaluated, and then outline how evidence of selection on several key genes can be directly linked to human cultural practices. We then describe some exciting new directions in the empirical study of gene-culture coevolution, and conclude with a discussion of the role of gene-culture evolutionary models in the future integration of medical, biological, and social sciences. Copyright © 2014 Elsevier Ltd. All rights reserved.
Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven
2015-01-01
Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855
Optimization of Swine Breeding Programs Using Genomic Selection with ZPLAN+
Lopez, B. M.; Kang, H. S.; Kim, T. H.; Viterbo, V. S.; Kim, H. S.; Na, C. S.; Seo, K. S.
2016-01-01
The objective of this study was to evaluate the present conventional selection program of a swine nucleus farm and compare it with a new selection strategy employing genomic enhanced breeding value (GEBV) as the selection criteria. The ZPLAN+ software was employed to calculate and compare the genetic gain, total cost, return and profit of each selection strategy. The first strategy reflected the current conventional breeding program, which was a progeny test system (CS). The second strategy was a selection scheme based strictly on genomic information (GS1). The third scenario was the same as GS1, but the selection by GEBV was further supplemented by the performance test (GS2). The last scenario was a mixture of genomic information and progeny tests (GS3). The results showed that the accuracy of the selection index of young boars of GS1 was 26% higher than that of CS. On the other hand, both GS2 and GS3 gave 31% higher accuracy than CS for young boars. The annual monetary genetic gain of GS1, GS2 and GS3 was 10%, 12%, and 11% higher, respectively, than that of CS. As expected, the discounted costs of genomic selection strategies were higher than those of CS. The costs of GS1, GS2 and GS3 were 35%, 73%, and 89% higher than those of CS, respectively, assuming a genotyping cost of $120. As a result, the discounted profit per animal of GS1 and GS2 was 8% and 2% higher, respectively, than that of CS while GS3 was 6% lower. Comparison among genomic breeding scenarios revealed that GS1 was more profitable than GS2 and GS3. The genomic selection schemes, especially GS1 and GS2, were clearly superior to the conventional scheme in terms of monetary genetic gain and profit. PMID:26954222
Feltus, F Alex
2014-06-01
Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.