Genomic selection in plant breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor ...
Muley, Vijaykumar Yogesh; Ranjan, Akash
2012-01-01
Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.
Genomic selection in plant breeding.
Newell, Mark A; Jannink, Jean-Luc
2014-01-01
Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor and major marker effects. Thus, the GEBV may capture more of the genetic variation for the particular trait under selection.
Navigating the Interface Between Landscape Genetics and Landscape Genomics.
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used.
Navigating the Interface Between Landscape Genetics and Landscape Genomics
Storfer, Andrew; Patton, Austin; Fraik, Alexandra K.
2018-01-01
As next-generation sequencing data become increasingly available for non-model organisms, a shift has occurred in the focus of studies of the geographic distribution of genetic variation. Whereas landscape genetics studies primarily focus on testing the effects of landscape variables on gene flow and genetic population structure, landscape genomics studies focus on detecting candidate genes under selection that indicate possible local adaptation. Navigating the transition between landscape genomics and landscape genetics can be challenging. The number of molecular markers analyzed has shifted from what used to be a few dozen loci to thousands of loci and even full genomes. Although genome scale data can be separated into sets of neutral loci for analyses of gene flow and population structure and putative loci under selection for inference of local adaptation, there are inherent differences in the questions that are addressed in the two study frameworks. We discuss these differences and their implications for study design, marker choice and downstream analysis methods. Similar to the rapid proliferation of analysis methods in the early development of landscape genetics, new analytical methods for detection of selection in landscape genomics studies are burgeoning. We focus on genome scan methods for detection of selection, and in particular, outlier differentiation methods and genetic-environment association tests because they are the most widely used. Use of genome scan methods requires an understanding of the potential mismatches between the biology of a species and assumptions inherent in analytical methods used, which can lead to high false positive rates of detected loci under selection. Key to choosing appropriate genome scan methods is an understanding of the underlying demographic structure of study populations, and such data can be obtained using neutral loci from the generated genome-wide data or prior knowledge of a species' phylogeographic history. To this end, we summarize recent simulation studies that test the power and accuracy of genome scan methods under a variety of demographic scenarios and sampling designs. We conclude with a discussion of additional considerations for future method development, and a summary of methods that show promise for landscape genomics studies but are not yet widely used. PMID:29593776
Efficient Breeding by Genomic Mating.
Akdemir, Deniz; Sánchez, Julio I
2016-01-01
Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.
Jonas, Elisabeth; de Koning, Dirk Jan
Genomic Selection is an important topic in quantitative genetics and breeding. Not only does it allow the full use of current molecular genetic technologies, it stimulates also the development of new methods and models. Genomic selection, if fully implemented in commercial farming, should have a major impact on the productivity of various agricultural systems. But suggested approaches need to be applicable in commercial breeding populations. Many of the published research studies focus on methodologies. We conclude from the reviewed publications, that a stronger focus on strategies for the implementation of genomic selection in advanced breeding lines, introduction of new varieties, hybrids or multi-line crosses is needed. Efforts to find solutions for a better prediction and integration of environmental influences need to continue within applied breeding schemes. Goals of the implementation of genomic selection into crop breeding should be carefully defined and crop breeders in the private sector will play a substantial part in the decision-making process. However, the lack of published results from studies within, or in collaboration with, private companies diminishes the knowledge on the status of genomic selection within applied breeding programmes. Studies on the implementation of genomic selection in plant breeding need to evaluate models and methods with an enhanced emphasis on population-specific requirements and production environments. Adaptation of methods to breeding schemes or changes to breeding programmes for a better integration of genomic selection strategies are needed across species. More openness with a continuous exchange will contribute to successes.
A Selective Review of Group Selection in High-Dimensional Models
Huang, Jian; Breheny, Patrick; Ma, Shuangge
2013-01-01
Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707
Assessing genomic selection prediction accuracy in a dynamic barley breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection is a method to improve quantitative traits in crops and livestock by estimating breeding values of selection candidates using phenotype and genome-wide marker data sets. Prediction accuracy has been evaluated through simulation and cross-validation, however validation based on prog...
Accuracy and training population design for genomic selection in elite north american oats
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) is a method to estimate the breeding values of individuals by using markers throughout the genome. We evaluated the accuracies of GS using data from five traits on 446 oat lines genotyped with 1005 Diversity Array Technology (DArT) markers and two GS methods (RR-BLUP and Bayes...
Imputation of unordered markers and the impact on genomic selection accuracy
USDA-ARS?s Scientific Manuscript database
Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large propo...
Practical Approaches for Detecting Selection in Microbial Genomes.
Hedge, Jessica; Wilson, Daniel J
2016-02-01
Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.
Lee, Ciaran M; Cradick, Thomas J; Fine, Eli J; Bao, Gang
2016-01-01
The rapid advancement in targeted genome editing using engineered nucleases such as ZFNs, TALENs, and CRISPR/Cas9 systems has resulted in a suite of powerful methods that allows researchers to target any genomic locus of interest. A complementary set of design tools has been developed to aid researchers with nuclease design, target site selection, and experimental validation. Here, we review the various tools available for target selection in designing engineered nucleases, and for quantifying nuclease activity and specificity, including web-based search tools and experimental methods. We also elucidate challenges in target selection, especially in predicting off-target effects, and discuss future directions in precision genome editing and its applications. PMID:26750397
solGS: a web-based tool for genomic selection
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, ana...
USDA-ARS?s Scientific Manuscript database
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its ef...
Allele frequency changes due to hitch-hiking in genomic selection programs
2014-01-01
Background Genomic selection makes it possible to reduce pedigree-based inbreeding over best linear unbiased prediction (BLUP) by increasing emphasis on own rather than family information. However, pedigree inbreeding might not accurately reflect loss of genetic variation and the true level of inbreeding due to changes in allele frequencies and hitch-hiking. This study aimed at understanding the impact of using long-term genomic selection on changes in allele frequencies, genetic variation and level of inbreeding. Methods Selection was performed in simulated scenarios with a population of 400 animals for 25 consecutive generations. Six genetic models were considered with different heritabilities and numbers of QTL (quantitative trait loci) affecting the trait. Four selection criteria were used, including selection on own phenotype and on estimated breeding values (EBV) derived using phenotype-BLUP, genomic BLUP and Bayesian Lasso. Changes in allele frequencies at QTL, markers and linked neutral loci were investigated for the different selection criteria and different scenarios, along with the loss of favourable alleles and the rate of inbreeding measured by pedigree and runs of homozygosity. Results For each selection criterion, hitch-hiking in the vicinity of the QTL appeared more extensive when accuracy of selection was higher and the number of QTL was lower. When inbreeding was measured by pedigree information, selection on genomic BLUP EBV resulted in lower levels of inbreeding than selection on phenotype BLUP EBV, but this did not always apply when inbreeding was measured by runs of homozygosity. Compared to genomic BLUP, selection on EBV from Bayesian Lasso led to less genetic drift, reduced loss of favourable alleles and more effectively controlled the rate of both pedigree and genomic inbreeding in all simulated scenarios. In addition, selection on EBV from Bayesian Lasso showed a higher selection differential for mendelian sampling terms than selection on genomic BLUP EBV. Conclusions Neutral variation can be shaped to a great extent by the hitch-hiking effects associated with selection, rather than just by genetic drift. When implementing long-term genomic selection, strategies for genomic control of inbreeding are essential, due to a considerable hitch-hiking effect, regardless of the method that is used for prediction of EBV. PMID:24495634
NASA Astrophysics Data System (ADS)
Su, Hailin; Li, Hengde; Wang, Shi; Wang, Yangfan; Bao, Zhenmin
2017-02-01
Genomic selection is more and more popular in animal and plant breeding industries all around the world, as it can be applied early in life without impacting selection candidates. The objective of this study was to bring the advantages of genomic selection to scallop breeding. Two different genomic selection tools MixP and gsbay were applied on genomic evaluation of simulated data and Zhikong scallop ( Chlamys farreri) field data. The data were compared with genomic best linear unbiased prediction (GBLUP) method which has been applied widely. Our results showed that both MixP and gsbay could accurately estimate single-nucleotide polymorphism (SNP) marker effects, and thereby could be applied for the analysis of genomic estimated breeding values (GEBV). In simulated data from different scenarios, the accuracy of GEBV acquired was ranged from 0.20 to 0.78 by MixP; it was ranged from 0.21 to 0.67 by gsbay; and it was ranged from 0.21 to 0.61 by GBLUP. Estimations made by MixP and gsbay were expected to be more reliable than those estimated by GBLUP. Predictions made by gsbay were more robust, while with MixP the computation is much faster, especially in dealing with large-scale data. These results suggested that both algorithms implemented by MixP and gsbay are feasible to carry out genomic selection in scallop breeding, and more genotype data will be necessary to produce genomic estimated breeding values with a higher accuracy for the industry.
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic losses in salmonid aquaculture, and traditional family-based breeding programs aimed at improving BCWD resistance have been limited to exploiting only between-family variation. We used genomic selection (GS) models to predict genomic br...
Schrider, Daniel R.; Kern, Andrew D.
2015-01-01
The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212
NASA Astrophysics Data System (ADS)
Rachmatia, H.; Kusuma, W. A.; Hasibuan, L. S.
2017-05-01
Selection in plant breeding could be more effective and more efficient if it is based on genomic data. Genomic selection (GS) is a new approach for plant-breeding selection that exploits genomic data through a mechanism called genomic prediction (GP). Most of GP models used linear methods that ignore effects of interaction among genes and effects of higher order nonlinearities. Deep belief network (DBN), one of the architectural in deep learning methods, is able to model data in high level of abstraction that involves nonlinearities effects of the data. This study implemented DBN for developing a GP model utilizing whole-genome Single Nucleotide Polymorphisms (SNPs) as data for training and testing. The case study was a set of traits in maize. The maize dataset was acquisitioned from CIMMYT’s (International Maize and Wheat Improvement Center) Global Maize program. Based on Pearson correlation, DBN is outperformed than other methods, kernel Hilbert space (RKHS) regression, Bayesian LASSO (BL), best linear unbiased predictor (BLUP), in case allegedly non-additive traits. DBN achieves correlation of 0.579 within -1 to 1 range.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Akgöl, Batuhan; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2017-02-01
Early generation genomic selection is superior to conventional phenotypic selection in line breeding and can be strongly improved by including additional information from preliminary yield trials. The selection of lines that enter resource-demanding multi-environment trials is a crucial decision in every line breeding program as a large amount of resources are allocated for thoroughly testing these potential varietal candidates. We compared conventional phenotypic selection with various genomic selection approaches across multiple years as well as the merit of integrating phenotypic information from preliminary yield trials into the genomic selection framework. The prediction accuracy using only phenotypic data was rather low (r = 0.21) for grain yield but could be improved by modeling genetic relationships in unreplicated preliminary yield trials (r = 0.33). Genomic selection models were nevertheless found to be superior to conventional phenotypic selection for predicting grain yield performance of lines across years (r = 0.39). We subsequently simplified the problem of predicting untested lines in untested years to predicting tested lines in untested years by combining breeding values from preliminary yield trials and predictions from genomic selection models by a heritability index. This genomic assisted selection led to a 20% increase in prediction accuracy, which could be further enhanced by an appropriate marker selection for both grain yield (r = 0.48) and protein content (r = 0.63). The easy to implement and robust genomic assisted selection gave thus a higher prediction accuracy than either conventional phenotypic or genomic selection alone. The proposed method took the complex inheritance of both low and high heritable traits into account and appears capable to support breeders in their selection decisions to develop enhanced varieties more efficiently.
Application of genomic selection in farm animal breeding.
Tan, Cheng; Bian, Cheng; Yang, Da; Li, Ning; Wu, Zhen-Fang; Hu, Xiao-Xiang
2017-11-20
Genomic selection (GS) has become a widely accepted method in animal breeding to genetically improve economic traits. With the declining costs of high-density SNP chips and next-generation sequencing, GS has been applied in dairy cattle, swine, poultry and other animals and gained varying degrees of success. Currently, major challenges in GS studies include further reducing the cost of genome-wide SNP genotyping and improving the predictive accuracy of genomic estimated breeding value (GEBV). In this review, we summarize various methods for genome-wide SNP genotyping and GEBV prediction, and give a brief introduction of GS in livestock and poultry breeding. This review will provide a reference for further implementation of GS in farm animal breeding.
Whole-genome regression and prediction methods applied to plant and animal breeding.
de Los Campos, Gustavo; Hickey, John M; Pong-Wong, Ricardo; Daetwyler, Hans D; Calus, Mario P L
2013-02-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.
Whole-Genome Regression and Prediction Methods Applied to Plant and Animal Breeding
de los Campos, Gustavo; Hickey, John M.; Pong-Wong, Ricardo; Daetwyler, Hans D.; Calus, Mario P. L.
2013-01-01
Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade. PMID:22745228
Laurenson, Yan C S M; Kyriazakis, Ilias; Bishop, Stephen C
2013-10-18
Estimated breeding values (EBV) for faecal egg count (FEC) and genetic markers for host resistance to nematodes may be used to identify resistant animals for selective breeding programmes. Similarly, targeted selective treatment (TST) requires the ability to identify the animals that will benefit most from anthelmintic treatment. A mathematical model was used to combine the concepts and evaluate the potential of using genetic-based methods to identify animals for a TST regime. EBVs obtained by genomic prediction were predicted to be the best determinant criterion for TST in terms of the impact on average empty body weight and average FEC, whereas pedigree-based EBVs for FEC were predicted to be marginally worse than using phenotypic FEC as a determinant criterion. Whilst each method has financial implications, if the identification of host resistance is incorporated into a wider genomic selection indices or selective breeding programmes, then genetic or genomic information may be plausibly included in TST regimes. Copyright © 2013 Elsevier B.V. All rights reserved.
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
2015-01-01
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Daetwyler, Hans D; Calus, Mario P L; Pong-Wong, Ricardo; de Los Campos, Gustavo; Hickey, John M
2013-02-01
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.
Daetwyler, Hans D.; Calus, Mario P. L.; Pong-Wong, Ricardo; de los Campos, Gustavo; Hickey, John M.
2013-01-01
The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals. PMID:23222650
Detecting and Characterizing Genomic Signatures of Positive Selection in Global Populations
Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M.; Small, Kerrin S.; Clark, Taane G.; Kwiatkowski, Dominic P.; Teo, Yik-Ying
2013-01-01
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. PMID:23731540
Mating programs including genomic relationships and dominance effects
USDA-ARS?s Scientific Manuscript database
Breed associations, artificial-insemination organizations, and on-farm software providers need new computerized mating programs for genomic selection so that genomic inbreeding could be minimized by comparing genotypes of potential mates. Efficient methods for transferring elements of the genomic re...
Constructs and methods for genome editing and genetic engineering of fungi and protists
Hittinger, Christopher Todd; Alexander, William Gerald
2018-01-30
Provided herein are constructs for genome editing or genetic engineering in fungi or protists, methods of using the constructs and media for use in selecting cells. The construct include a polynucleotide encoding a thymidine kinase operably connected to a promoter, suitably a constitutive promoter; a polynucleotide encoding an endonuclease operably connected to an inducible promoter; and a recognition site for the endonuclease. The constructs may also include selectable markers for use in selecting recombinations.
Does genomic selection have a future in plant breeding?
Jonas, Elisabeth; de Koning, Dirk-Jan
2013-09-01
Plant breeding largely depends on phenotypic selection in plots and only for some, often disease-resistance-related traits, uses genetic markers. The more recently developed concept of genomic selection, using a black box approach with no need of prior knowledge about the effect or function of individual markers, has also been proposed as a great opportunity for plant breeding. Several empirical and theoretical studies have focused on the possibility to implement this as a novel molecular method across various species. Although we do not question the potential of genomic selection in general, in this Opinion, we emphasize that genomic selection approaches from dairy cattle breeding cannot be easily applied to complex plant breeding. Copyright © 2013 Elsevier Ltd. All rights reserved.
Manel, S; Perrier, C; Pratlong, M; Abi-Rached, L; Paganini, J; Pontarotti, P; Aurelle, D
2016-01-01
Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.
Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J
2012-02-09
The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.
Kwong, Qi Bin; Ong, Ai Ling; Teh, Chee Keng; Chew, Fook Tim; Tammi, Martti; Mayes, Sean; Kulaveerasingam, Harikrishna; Yeoh, Suat Hui; Harikrishna, Jennifer Ann; Appleton, David Ross
2017-06-06
Genomic selection (GS) uses genome-wide markers to select individuals with the desired overall combination of breeding traits. A total of 1,218 individuals from a commercial population of Ulu Remis x AVROS (UR x AVROS) were genotyped using the OP200K array. The traits of interest included: shell-to-fruit ratio (S/F, %), mesocarp-to-fruit ratio (M/F, %), kernel-to-fruit ratio (K/F, %), fruit per bunch (F/B, %), oil per bunch (O/B, %) and oil per palm (O/P, kg/palm/year). Genomic heritabilities of these traits were estimated to be in the range of 0.40 to 0.80. GS methods assessed were RR-BLUP, Bayes A (BA), Cπ (BC), Lasso (BL) and Ridge Regression (BRR). All methods resulted in almost equal prediction accuracy. The accuracy achieved ranged from 0.40 to 0.70, correlating with the heritability of traits. By selecting the most important markers, RR-BLUP B has the potential to outperform other methods. The marker density for certain traits can be further reduced based on the linkage disequilibrium (LD). Together with in silico breeding, GS is now being used in oil palm breeding programs to hasten parental palm selection.
Genomic Selection in Plant Breeding: Methods, Models, and Perspectives.
Crossa, José; Pérez-Rodríguez, Paulino; Cuevas, Jaime; Montesinos-López, Osval; Jarquín, Diego; de Los Campos, Gustavo; Burgueño, Juan; González-Camacho, Juan M; Pérez-Elizalde, Sergio; Beyene, Yoseph; Dreisigacker, Susanne; Singh, Ravi; Zhang, Xuecai; Gowda, Manje; Roorkiwal, Manish; Rutkoski, Jessica; Varshney, Rajeev K
2017-11-01
Genomic selection (GS) facilitates the rapid selection of superior genotypes and accelerates the breeding cycle. In this review, we discuss the history, principles, and basis of GS and genomic-enabled prediction (GP) as well as the genetics and statistical complexities of GP models, including genomic genotype×environment (G×E) interactions. We also examine the accuracy of GP models and methods for two cereal crops and two legume crops based on random cross-validation. GS applied to maize breeding has shown tangible genetic gains. Based on GP results, we speculate how GS in germplasm enhancement (i.e., prebreeding) programs could accelerate the flow of genes from gene bank accessions to elite lines. Recent advances in hyperspectral image technology could be combined with GS and pedigree-assisted breeding. Copyright © 2017 Elsevier Ltd. All rights reserved.
2009-01-01
Background Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle. Methods Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls. Results For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy. All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time. Conclusions The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended. PMID:20043835
Yoshizumi, Takeshi; Oikawa, Kazusato; Chuah, Jo-Ann; Kodama, Yutaka; Numata, Keiji
2018-05-14
Selective gene delivery into organellar genomes (mitochondrial and plastid genomes) has been limited because of a lack of appropriate platform technology, even though these organelles are essential for metabolite and energy production. Techniques for selective organellar modification are needed to functionally improve organelles and produce transplastomic/transmitochondrial plants. However, no method for mitochondrial genome modification has yet been established for multicellular organisms including plants. Likewise, modification of plastid genomes has been limited to a few plant species and algae. In the present study, we developed ionic complexes of fusion peptides containing organellar targeting signal and plasmid DNA for selective delivery of exogenous DNA into the plastid and mitochondrial genomes of intact plants. This is the first report of exogenous DNA being integrated into the mitochondrial genomes of not only plants, but also multicellular organisms in general. This fusion peptide-mediated gene delivery system is a breakthrough platform for both plant organellar biotechnology and gene therapy for mitochondrial diseases in animals.
Avian Disease and Oncology Laboratory (ADOL) research update
USDA-ARS?s Scientific Manuscript database
GENOMICS To meet the growing demands of consumers, the poultry industry will need to continue to improve methods of selection in breeding programs for production and associated traits. One possible solution is genome-wide marker-assisted selection (GWMAS). In brief, evenly-spaced genetic markers s...
Selecting sequence variants to improve genomic predictions for dairy cattle
USDA-ARS?s Scientific Manuscript database
Millions of genetic variants have been identified by population-scale sequencing projects, but subsets are needed for routine genomic predictions or to include on genotyping arrays. Methods of selecting sequence variants were compared using both simulated sequence genotypes and actual data from run ...
Chen, Minhui; Wang, Jiying; Wang, Yanping; Wu, Ying; Fu, Jinluan; Liu, Jian-Feng
2018-05-18
Currently, genome-wide scans for positive selection signatures in commercial breed have been investigated. However, few studies have focused on selection footprints of indigenous breeds. Laiwu pig is an invaluable Chinese indigenous pig breed with extremely high proportion of intramuscular fat (IMF), and an excellent model to detect footprint as the result of natural and artificial selection for fat deposition in muscle. In this study, based on GeneSeek Genomic profiler Porcine HD data, three complementary methods, F ST , iHS (integrated haplotype homozygosity score) and CLR (composite likelihood ratio), were implemented to detect selection signatures in the whole genome of Laiwu pigs. Totally, 175 candidate selected regions were obtained by at least two of the three methods, which covered 43.75 Mb genomic regions and corresponded to 1.79% of the genome sequence. Gene annotation of the selected regions revealed a list of functionally important genes for feed intake and fat deposition, reproduction, and immune response. Especially, in accordance to the phenotypic features of Laiwu pigs, among the candidate genes, we identified several genes, NPY1R, NPY5R, PIK3R1 and JAKMIP1, involved in the actions of two sets of neurons, which are central regulators in maintaining the balance between food intake and energy expenditure. Our results identified a number of regions showing signatures of selection, as well as a list of functionally candidate genes with potential effect on phenotypic traits, especially fat deposition in muscle. Our findings provide insights into the mechanisms of artificial selection of fat deposition and further facilitate follow-up functional studies.
Non-additive Effects in Genomic Selection
Varona, Luis; Legarra, Andres; Toro, Miguel A.; Vitezica, Zulma G.
2018-01-01
In the last decade, genomic selection has become a standard in the genetic evaluation of livestock populations. However, most procedures for the implementation of genomic selection only consider the additive effects associated with SNP (Single Nucleotide Polymorphism) markers used to calculate the prediction of the breeding values of candidates for selection. Nevertheless, the availability of estimates of non-additive effects is of interest because: (i) they contribute to an increase in the accuracy of the prediction of breeding values and the genetic response; (ii) they allow the definition of mate allocation procedures between candidates for selection; and (iii) they can be used to enhance non-additive genetic variation through the definition of appropriate crossbreeding or purebred breeding schemes. This study presents a review of methods for the incorporation of non-additive genetic effects into genomic selection procedures and their potential applications in the prediction of future performance, mate allocation, crossbreeding, and purebred selection. The work concludes with a brief outline of some ideas for future lines of that may help the standard inclusion of non-additive effects in genomic selection. PMID:29559995
Non-additive Effects in Genomic Selection.
Varona, Luis; Legarra, Andres; Toro, Miguel A; Vitezica, Zulma G
2018-01-01
In the last decade, genomic selection has become a standard in the genetic evaluation of livestock populations. However, most procedures for the implementation of genomic selection only consider the additive effects associated with SNP (Single Nucleotide Polymorphism) markers used to calculate the prediction of the breeding values of candidates for selection. Nevertheless, the availability of estimates of non-additive effects is of interest because: (i) they contribute to an increase in the accuracy of the prediction of breeding values and the genetic response; (ii) they allow the definition of mate allocation procedures between candidates for selection; and (iii) they can be used to enhance non-additive genetic variation through the definition of appropriate crossbreeding or purebred breeding schemes. This study presents a review of methods for the incorporation of non-additive genetic effects into genomic selection procedures and their potential applications in the prediction of future performance, mate allocation, crossbreeding, and purebred selection. The work concludes with a brief outline of some ideas for future lines of that may help the standard inclusion of non-additive effects in genomic selection.
Detecting and characterizing genomic signatures of positive selection in global populations.
Liu, Xuanyao; Ong, Rick Twee-Hee; Pillai, Esakimuthu Nisha; Elzein, Abier M; Small, Kerrin S; Clark, Taane G; Kwiatkowski, Dominic P; Teo, Yik-Ying
2013-06-06
Natural selection is a significant force that shapes the architecture of the human genome and introduces diversity across global populations. The question of whether advantageous mutations have arisen in the human genome as a result of single or multiple mutation events remains unanswered except for the fact that there exist a handful of genes such as those that confer lactase persistence, affect skin pigmentation, or cause sickle cell anemia. We have developed a long-range-haplotype method for identifying genomic signatures of positive selection to complement existing methods, such as the integrated haplotype score (iHS) or cross-population extended haplotype homozygosity (XP-EHH), for locating signals across the entire allele frequency spectrum. Our method also locates the founder haplotypes that carry the advantageous variants and infers their corresponding population frequencies. This presents an opportunity to systematically interrogate the whole human genome whether a selection signal shared across different populations is the consequence of a single mutation process followed subsequently by gene flow between populations or of convergent evolution due to the occurrence of multiple independent mutation events either at the same variant or within the same gene. The application of our method to data from 14 populations across the world revealed that positive-selection events tend to cluster in populations of the same ancestry. Comparing the founder haplotypes for events that are present across different populations revealed that convergent evolution is a rare occurrence and that the majority of shared signals stem from the same evolutionary event. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
swga: a primer design toolkit for selective whole genome amplification.
Clarke, Erik L; Sundararaman, Sesh A; Seifert, Stephanie N; Bushman, Frederic D; Hahn, Beatrice H; Brisson, Dustin
2017-07-15
Population genomic analyses are often hindered by difficulties in obtaining sufficient numbers of genomes for analysis by DNA sequencing. Selective whole-genome amplification (SWGA) provides an efficient approach to amplify microbial genomes from complex backgrounds for sequence acquisition. However, the process of designing sets of primers for this method has many degrees of freedom and would benefit from an automated process to evaluate the vast number of potential primer sets. Here, we present swga , a program that identifies primer sets for SWGA and evaluates them for efficiency and selectivity. We used swga to design and test primer sets for the selective amplification of Wolbachia pipientis genomic DNA from infected Drosophila melanogaster and Mycobacterium tuberculosis from human blood. We identify primer sets that successfully amplify each against their backgrounds and describe a general method for using swga for arbitrary targets. In addition, we describe characteristics of primer sets that correlate with successful amplification, and present guidelines for implementation of SWGA to detect new targets. Source code and documentation are freely available on https://www.github.com/eclarke/swga . The program is implemented in Python and C and licensed under the GNU Public License. ecl@mail.med.upenn.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Leichty, Aaron R; Brisson, Dustin
2014-10-01
Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with <6.7-fold amplification of the background. SWGA-treated genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.
Multilocus approaches for the measurement of selection on correlated genetic loci.
Gompert, Zachariah; Egan, Scott P; Barrett, Rowan D H; Feder, Jeffrey L; Nosil, Patrik
2017-01-01
The study of ecological speciation is inherently linked to the study of selection. Methods for estimating phenotypic selection within a generation based on associations between trait values and fitness (e.g. survival) of individuals are established. These methods attempt to disentangle selection acting directly on a trait from indirect selection caused by correlations with other traits via multivariate statistical approaches (i.e. inference of selection gradients). The estimation of selection on genotypic or genomic variation could also benefit from disentangling direct and indirect selection on genetic loci. However, achieving this goal is difficult with genomic data because the number of potentially correlated genetic loci (p) is very large relative to the number of individuals sampled (n). In other words, the number of model parameters exceeds the number of observations (p ≫ n). We present simulations examining the utility of whole-genome regression approaches (i.e. Bayesian sparse linear mixed models) for quantifying direct selection in cases where p ≫ n. Such models have been used for genome-wide association mapping and are common in artificial breeding. Our results show they hold promise for studies of natural selection in the wild and thus of ecological speciation. But we also demonstrate important limitations to the approach and discuss study designs required for more robust inferences. © 2016 John Wiley & Sons Ltd.
Primer in Genetics and Genomics, Article 2-Advancing Nursing Research With Genomic Approaches.
Lee, Hyunhwa; Gill, Jessica; Barr, Taura; Yun, Sijung; Kim, Hyungsuk
2017-03-01
Nurses investigate reasons for variable patient symptoms and responses to treatments to inform how best to improve outcomes. Genomics has the potential to guide nursing research exploring contributions to individual variability. This article is meant to serve as an introduction to the novel methods available through genomics for addressing this critical issue and includes a review of methodological considerations for selected genomic approaches. This review presents essential concepts in genetics and genomics that will allow readers to identify upcoming trends in genomics nursing research and improve research practice. It introduces general principles of genomic research and provides an overview of the research process. It also highlights selected nursing studies that serve as clinical examples of the use of genomic technologies. Finally, the authors provide suggestions about how to apply genomic technology in nursing research along with directions for future research. Using genomic approaches in nursing research can advance the understanding of the complex pathophysiology of disease susceptibility and different patient responses to interventions. Nurses should be incorporating genomics into education, clinical practice, and research as the influence of genomics in health-care research and practice continues to grow. Nurses are also well placed to translate genomic discoveries into improved methods for patient assessment and intervention.
Genomic selection for fruit quality traits in apple (Malus×domestica Borkh.).
Kumar, Satish; Chagné, David; Bink, Marco C A M; Volz, Richard K; Whitworth, Claire; Carlisle, Charmaine
2012-01-01
The genome sequence of apple (Malus×domestica Borkh.) was published more than a year ago, which helped develop an 8K SNP chip to assist in implementing genomic selection (GS). In apple breeding programmes, GS can be used to obtain genomic breeding values (GEBV) for choosing next-generation parents or selections for further testing as potential commercial cultivars at a very early stage. Thus GS has the potential to accelerate breeding efficiency significantly because of decreased generation interval or increased selection intensity. We evaluated the accuracy of GS in a population of 1120 seedlings generated from a factorial mating design of four females and two male parents. All seedlings were genotyped using an Illumina Infinium chip comprising 8,000 single nucleotide polymorphisms (SNPs), and were phenotyped for various fruit quality traits. Random-regression best liner unbiased prediction (RR-BLUP) and the Bayesian LASSO method were used to obtain GEBV, and compared using a cross-validation approach for their accuracy to predict unobserved BLUP-BV. Accuracies were very similar for both methods, varying from 0.70 to 0.90 for various fruit quality traits. The selection response per unit time using GS compared with the traditional BLUP-based selection were very high (>100%) especially for low-heritability traits. Genome-wide average estimated linkage disequilibrium (LD) between adjacent SNPs was 0.32, with a relatively slow decay of LD in the long range (r(2) = 0.33 and 0.19 at 100 kb and 1,000 kb respectively), contributing to the higher accuracy of GS. Distribution of estimated SNP effects revealed involvement of large effect genes with likely pleiotropic effects. These results demonstrated that genomic selection is a credible alternative to conventional selection for fruit quality traits.
USDA-ARS?s Scientific Manuscript database
The development of genomic selection methodology, with accompanying substantial gains in reliability for low-heritability traits, may dramatically improve the feasibility of genetic improvement of dairy cow health. Many methods for genomic analysis have now been developed, including the “Bayesian Al...
Integrating genomic selection into dairy cattle breeding programmes: a review.
Bouquet, A; Juga, J
2013-05-01
Extensive genetic progress has been achieved in dairy cattle populations on many traits of economic importance because of efficient breeding programmes. Success of these programmes has relied on progeny testing of the best young males to accurately assess their genetic merit and hence their potential for breeding. Over the last few years, the integration of dense genomic information into statistical tools used to make selection decisions, commonly referred to as genomic selection, has enabled gains in predicting accuracy of breeding values for young animals without own performance. The possibility to select animals at an early stage allows defining new breeding strategies aimed at boosting genetic progress while reducing costs. The first objective of this article was to review methods used to model and optimize breeding schemes integrating genomic selection and to discuss their relative advantages and limitations. The second objective was to summarize the main results and perspectives on the use of genomic selection in practical breeding schemes, on the basis of the example of dairy cattle populations. Two main designs of breeding programmes integrating genomic selection were studied in dairy cattle. Genomic selection can be used either for pre-selecting males to be progeny tested or for selecting males to be used as active sires in the population. The first option produces moderate genetic gains without changing the structure of breeding programmes. The second option leads to large genetic gains, up to double those of conventional schemes because of a major reduction in the mean generation interval, but it requires greater changes in breeding programme structure. The literature suggests that genomic selection becomes more attractive when it is coupled with embryo transfer technologies to further increase selection intensity on the dam-to-sire pathway. The use of genomic information also offers new opportunities to improve preservation of genetic variation. However, recent simulation studies have shown that putting constraints on genomic inbreeding rates for defining optimal contributions of breeding animals could significantly reduce achievable genetic gain. Finally, the article summarizes the potential of genomic selection to include new traits in the breeding goal to meet societal demands regarding animal health and environmental efficiency in animal production.
Mehrban, Hossein; Lee, Deuk Hwan; Moradi, Mohammad Hossein; IlCho, Chung; Naserkheil, Masoumeh; Ibáñez-Escriche, Noelia
2017-01-04
Hanwoo beef is known for its marbled fat, tenderness, juiciness and characteristic flavor, as well as for its low cholesterol and high omega 3 fatty acid contents. As yet, there has been no comprehensive investigation to estimate genomic selection accuracy for carcass traits in Hanwoo cattle using dense markers. This study aimed at evaluating the accuracy of alternative statistical methods that differed in assumptions about the underlying genetic model for various carcass traits: backfat thickness (BT), carcass weight (CW), eye muscle area (EMA), and marbling score (MS). Accuracies of direct genomic breeding values (DGV) for carcass traits were estimated by applying fivefold cross-validation to a dataset including 1183 animals and approximately 34,000 single nucleotide polymorphisms (SNPs). Accuracies of BayesC, Bayesian LASSO (BayesL) and genomic best linear unbiased prediction (GBLUP) methods were similar for BT, EMA and MS. However, for CW, DGV accuracy was 7% higher with BayesC than with BayesL and GBLUP. The increased accuracy of BayesC, compared to GBLUP and BayesL, was maintained for CW, regardless of the training sample size, but not for BT, EMA, and MS. Genome-wide association studies detected consistent large effects for SNPs on chromosomes 6 and 14 for CW. The predictive performance of the models depended on the trait analyzed. For CW, the results showed a clear superiority of BayesC compared to GBLUP and BayesL. These findings indicate the importance of using a proper variable selection method for genomic selection of traits and also suggest that the genetic architecture that underlies CW differs from that of the other carcass traits analyzed. Thus, our study provides significant new insights into the carcass traits of Hanwoo cattle.
Bangera, Rama; Correa, Katharina; Lhorente, Jean P; Figueroa, René; Yáñez, José M
2017-01-31
Salmon Rickettsial Syndrome (SRS) caused by Piscirickettsia salmonis is a major disease affecting the Chilean salmon industry. Genomic selection (GS) is a method wherein genome-wide markers and phenotype information of full-sibs are used to predict genomic EBV (GEBV) of selection candidates and is expected to have increased accuracy and response to selection over traditional pedigree based Best Linear Unbiased Prediction (PBLUP). Widely used GS methods such as genomic BLUP (GBLUP), SNPBLUP, Bayes C and Bayesian Lasso may perform differently with respect to accuracy of GEBV prediction. Our aim was to compare the accuracy, in terms of reliability of genome-enabled prediction, from different GS methods with PBLUP for resistance to SRS in an Atlantic salmon breeding program. Number of days to death (DAYS), binary survival status (STATUS) phenotypes, and 50 K SNP array genotypes were obtained from 2601 smolts challenged with P. salmonis. The reliability of different GS methods at different SNP densities with and without pedigree were compared to PBLUP using a five-fold cross validation scheme. Heritability estimated from GS methods was significantly higher than PBLUP. Pearson's correlation between predicted GEBV from PBLUP and GS models ranged from 0.79 to 0.91 and 0.79-0.95 for DAYS and STATUS, respectively. The relative increase in reliability from different GS methods for DAYS and STATUS with 50 K SNP ranged from 8 to 25% and 27-30%, respectively. All GS methods outperformed PBLUP at all marker densities. DAYS and STATUS showed superior reliability over PBLUP even at the lowest marker density of 3 K and 500 SNP, respectively. 20 K SNP showed close to maximal reliability for both traits with little improvement using higher densities. These results indicate that genomic predictions can accelerate genetic progress for SRS resistance in Atlantic salmon and implementation of this approach will contribute to the control of SRS in Chile. We recommend GBLUP for routine GS evaluation because this method is computationally faster and the results are very similar with other GS methods. The use of lower density SNP or the combination of low density SNP and an imputation strategy may help to reduce genotyping costs without compromising gain in reliability.
A Model-Based Approach for Identifying Signatures of Ancient Balancing Selection in Genetic Data
DeGiorgio, Michael; Lohmueller, Kirk E.; Nielsen, Rasmus
2014-01-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates. PMID:25144706
A model-based approach for identifying signatures of ancient balancing selection in genetic data.
DeGiorgio, Michael; Lohmueller, Kirk E; Nielsen, Rasmus
2014-08-01
While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.
Oba, Mami; Tsuchiaka, Shinobu; Omatsu, Tsutomu; Katayama, Yukie; Otomaru, Konosuke; Hirata, Teppei; Aoki, Hiroshi; Murata, Yoshiteru; Makino, Shinji; Nagai, Makoto; Mizutani, Tetsuya
2018-01-08
We tested usefulness of a target enrichment system SureSelect, a comprehensive viral nucleic acid detection method, for rapid identification of viral pathogens in feces samples of cattle, pigs and goats. This system enriches nucleic acids of target viruses in clinical/field samples by using a library of biotinylated RNAs with sequences complementary to the target viruses. The enriched nucleic acids are amplified by PCR and subjected to next generation sequencing to identify the target viruses. In many samples, SureSelect target enrichment method increased efficiencies for detection of the viruses listed in the biotinylated RNA library. Furthermore, this method enabled us to determine nearly full-length genome sequence of porcine parainfluenza virus 1 and greatly increased Breadth, a value indicating the ratio of the mapping consensus length in the reference genome, in pig samples. Our data showed usefulness of SureSelect target enrichment system for comprehensive analysis of genomic information of various viruses in field samples. Copyright © 2017 Elsevier Inc. All rights reserved.
Potential benefits of genomic selection on genetic gain of small ruminant breeding programs.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Elsen, J M
2013-08-01
In conventional small ruminant breeding programs, only pedigree and phenotype records are used to make selection decisions but prospects of including genomic information are now under consideration. The objective of this study was to assess the potential benefits of genomic selection on the genetic gain in French sheep and goat breeding designs of today. Traditional and genomic scenarios were modeled with deterministic methods for 3 breeding programs. The models included decisional variables related to male selection candidates, progeny testing capacity, and economic weights that were optimized to maximize annual genetic gain (AGG) of i) a meat sheep breeding program that improved a meat trait of heritability (h(2)) = 0.30 and a maternal trait of h(2) = 0.09 and ii) dairy sheep and goat breeding programs that improved a milk trait of h(2) = 0.30. Values of ±0.20 of genetic correlation between meat and maternal traits were considered to study their effects on AGG. The Bulmer effect was accounted for and the results presented here are the averages of AGG after 10 generations of selection. Results showed that current traditional breeding programs provide an AGG of 0.095 genetic standard deviation (σa) for meat and 0.061 σa for maternal trait in meat breed and 0.147 σa and 0.120 σa in sheep and goat dairy breeds, respectively. By optimizing decisional variables, the AGG with traditional selection methods increased to 0.139 σa for meat and 0.096 σa for maternal traits in meat breeding programs and to 0.174 σa and 0.183 σa in dairy sheep and goat breeding programs, respectively. With a medium-sized reference population (nref) of 2,000 individuals, the best genomic scenarios gave an AGG that was 17.9% greater than with traditional selection methods with optimized values of decisional variables for combined meat and maternal traits in meat sheep, 51.7% in dairy sheep, and 26.2% in dairy goats. The superiority of genomic schemes increased with the size of the reference population and genomic selection gave the best results when nref > 1,000 individuals for dairy breeds and nref > 2,000 individuals for meat breed. Genetic correlation between meat and maternal traits had a large impact on the genetic gain of both traits. Changes in AGG due to correlation were greatest for low heritable maternal traits. As a general rule, AGG was increased both by optimizing selection designs and including genomic information.
Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R
2015-02-01
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline.
Spindel, Jennifer; Begum, Hasina; Akdemir, Deniz; Virk, Parminder; Collard, Bertrand; Redoña, Edilberto; Atlin, Gary; Jannink, Jean-Luc; McCouch, Susan R.
2015-01-01
Genomic Selection (GS) is a new breeding method in which genome-wide markers are used to predict the breeding value of individuals in a breeding population. GS has been shown to improve breeding efficiency in dairy cattle and several crop plant species, and here we evaluate for the first time its efficacy for breeding inbred lines of rice. We performed a genome-wide association study (GWAS) in conjunction with five-fold GS cross-validation on a population of 363 elite breeding lines from the International Rice Research Institute's (IRRI) irrigated rice breeding program and herein report the GS results. The population was genotyped with 73,147 markers using genotyping-by-sequencing. The training population, statistical method used to build the GS model, number of markers, and trait were varied to determine their effect on prediction accuracy. For all three traits, genomic prediction models outperformed prediction based on pedigree records alone. Prediction accuracies ranged from 0.31 and 0.34 for grain yield and plant height to 0.63 for flowering time. Analyses using subsets of the full marker set suggest that using one marker every 0.2 cM is sufficient for genomic selection in this collection of rice breeding materials. RR-BLUP was the best performing statistical method for grain yield where no large effect QTL were detected by GWAS, while for flowering time, where a single very large effect QTL was detected, the non-GS multiple linear regression method outperformed GS models. For plant height, in which four mid-sized QTL were identified by GWAS, random forest produced the most consistently accurate GS models. Our results suggest that GS, informed by GWAS interpretations of genetic architecture and population structure, could become an effective tool for increasing the efficiency of rice breeding as the costs of genotyping continue to decline. PMID:25689273
Akanno, E C; Schenkel, F S; Sargolzaei, M; Friendship, R M; Robinson, J A B
2014-10-01
Genetic improvement of pigs in tropical developing countries has focused on imported exotic populations which have been subjected to intensive selection with attendant high population-wide linkage disequilibrium (LD). Presently, indigenous pig population with limited selection and low LD are being considered for improvement. Given that the infrastructure for genetic improvement using the conventional BLUP selection methods are lacking, a genome-wide selection (GS) program was proposed for developing countries. A simulation study was conducted to evaluate the option of using 60 K SNP panel and observed amount of LD in the exotic and indigenous pig populations. Several scenarios were evaluated including different size and structure of training and validation populations, different selection methods and long-term accuracy of GS in different population/breeding structures and traits. The training set included previously selected exotic population, unselected indigenous population and their crossbreds. Traits studied included number born alive (NBA), average daily gain (ADG) and back fat thickness (BFT). The ridge regression method was used to train the prediction model. The results showed that accuracies of genomic breeding values (GBVs) in the range of 0.30 (NBA) to 0.86 (BFT) in the validation population are expected if high density marker panels are utilized. The GS method improved accuracy of breeding values better than pedigree-based approach for traits with low heritability and in young animals with no performance data. Crossbred training population performed better than purebreds when validation was in populations with similar or a different structure as in the training set. Genome-wide selection holds promise for genetic improvement of pigs in the tropics. © 2014 Blackwell Verlag GmbH.
Khang, Chang Hyun; Park, Sook-Young; Lee, Yong-Hwan; Kang, Seogchan
2005-06-01
Rapid progress in fungal genome sequencing presents many new opportunities for functional genomic analysis of fungal biology through the systematic mutagenesis of the genes identified through sequencing. However, the lack of efficient tools for targeted gene replacement is a limiting factor for fungal functional genomics, as it often necessitates the screening of a large number of transformants to identify the desired mutant. We developed an efficient method of gene replacement and evaluated factors affecting the efficiency of this method using two plant pathogenic fungi, Magnaporthe grisea and Fusarium oxysporum. This method is based on Agrobacterium tumefaciens-mediated transformation with a mutant allele of the target gene flanked by the herpes simplex virus thymidine kinase (HSVtk) gene as a conditional negative selection marker against ectopic transformants. The HSVtk gene product converts 5-fluoro-2'-deoxyuridine to a compound toxic to diverse fungi. Because ectopic transformants express HSVtk, while gene replacement mutants lack HSVtk, growing transformants on a medium amended with 5-fluoro-2'-deoxyuridine facilitates the identification of targeted mutants by counter-selecting against ectopic transformants. In addition to M. grisea and F. oxysporum, the method and associated vectors are likely to be applicable to manipulating genes in a broad spectrum of fungi, thus potentially serving as an efficient, universal functional genomic tool for harnessing the growing body of fungal genome sequence data to study fungal biology.
Upweighting rare favourable alleles increases long-term genetic gain in genomic selection programs.
Liu, Huiming; Meuwissen, Theo H E; Sørensen, Anders C; Berg, Peer
2015-03-21
The short-term impact of using different genomic prediction (GP) models in genomic selection has been intensively studied, but their long-term impact is poorly understood. Furthermore, long-term genetic gain of genomic selection is expected to improve by using Jannink's weighting (JW) method, in which rare favourable marker alleles are upweighted in the selection criterion. In this paper, we extend the JW method by including an additional parameter to decrease the emphasis on rare favourable alleles over the time horizon, with the purpose of further improving the long-term genetic gain. We call this new method dynamic weighting (DW). The paper explores the long-term impact of different GP models with or without weighting methods. Different selection criteria were tested by simulating a population of 500 animals with truncation selection of five males and 50 females. Selection criteria included unweighted and weighted genomic estimated breeding values using the JW or DW methods, for which ridge regression (RR) and Bayesian lasso (BL) were used to estimate marker effects. The impacts of these selection criteria were compared under three genetic architectures, i.e. varying numbers of QTL for the trait and for two time horizons of 15 (TH15) or 40 (TH40) generations. For unweighted GP, BL resulted in up to 21.4% higher long-term genetic gain and 23.5% lower rate of inbreeding under TH40 than RR. For weighted GP, DW resulted in 1.3 to 5.5% higher long-term gain compared to unweighted GP. JW, however, showed a 6.8% lower long-term genetic gain relative to unweighted GP when BL was used to estimate the marker effects. Under TH40, both DW and JW obtained significantly higher genetic gain than unweighted GP. With DW, the long-term genetic gain was increased by up to 30.8% relative to unweighted GP, and also increased by 8% relative to JW, although at the expense of a lower short-term gain. Irrespective of the number of QTL simulated, BL is superior to RR in maintaining genetic variance and therefore results in higher long-term genetic gain. Moreover, DW is a promising method with which high long-term genetic gain can be expected within a fixed time frame.
2014-01-01
Background Signatures of selection are regions in the genome that have been preferentially increased in frequency and fixed in a population because of their functional importance in specific processes. These regions can be detected because of their lower genetic variability and specific regional linkage disequilibrium (LD) patterns. Methods By comparing the differences in regional LD variation between dairy and beef cattle types, and between indicine and taurine subspecies, we aim at finding signatures of selection for production and adaptation in cattle breeds. The VarLD method was applied to compare the LD variation in the autosomal genome between breeds, including Angus and Brown Swiss, representing taurine breeds, and Nelore and Gir, representing indicine breeds. Genomic regions containing the top 0.01 and 0.1 percentile of signals were characterized using the UMD3.1 Bos taurus genome assembly to identify genes in those regions and compared with previously reported selection signatures and regions with copy number variation. Results For all comparisons, the top 0.01 and 0.1 percentile included 26 and 165 signals and 17 and 125 genes, respectively, including TECRL, BT.23182 or FPPS, CAST, MYOM1, UVRAG and DNAJA1. Conclusions The VarLD method is a powerful tool to identify differences in linkage disequilibrium between cattle populations and putative signatures of selection with potential adaptive and productive importance. PMID:24592996
Genome dynamics of the human embryonic kidney 293 lineage in response to cell biology manipulations.
Lin, Yao-Cheng; Boone, Morgane; Meuris, Leander; Lemmens, Irma; Van Roy, Nadine; Soete, Arne; Reumers, Joke; Moisse, Matthieu; Plaisance, Stéphane; Drmanac, Radoje; Chen, Jason; Speleman, Frank; Lambrechts, Diether; Van de Peer, Yves; Tavernier, Jan; Callewaert, Nico
2014-09-03
The HEK293 human cell lineage is widely used in cell biology and biotechnology. Here we use whole-genome resequencing of six 293 cell lines to study the dynamics of this aneuploid genome in response to the manipulations used to generate common 293 cell derivatives, such as transformation and stable clone generation (293T); suspension growth adaptation (293S); and cytotoxic lectin selection (293SG). Remarkably, we observe that copy number alteration detection could identify the genomic region that enabled cell survival under selective conditions (i.c. ricin selection). Furthermore, we present methods to detect human/vector genome breakpoints and a user-friendly visualization tool for the 293 genome data. We also establish that the genome structure composition is in steady state for most of these cell lines when standard cell culturing conditions are used. This resource enables novel and more informed studies with 293 cells, and we will distribute the sequenced cell lines to this effect.
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds.
Yang, Songbai; Li, Xiuling; Li, Kui; Fan, Bin; Tang, Zhonglin
2014-01-15
Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.
A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds
2014-01-01
Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs. PMID:24422716
Screening of duplicated loci reveals hidden divergence patterns in a complex salmonid genome
Limborg, Morten T.; Larson, Wesley; Seeb, Lisa W.; Seeb, James E.
2017-01-01
A whole-genome duplication (WGD) doubles the entire genomic content of a species and is thought to have catalysed adaptive radiation in some polyploid-origin lineages. However, little is known about general consequences of a WGD because gene duplicates (i.e., paralogs) are commonly filtered in genomic studies; such filtering may remove substantial portions of the genome in data sets from polyploid-origin species. We demonstrate a new method that enables genome-wide scans for signatures of selection at both nonduplicated and duplicated loci by taking locus-specific copy number into account. We apply this method to RAD sequence data from different ecotypes of a polyploid-origin salmonid (Oncorhynchus nerka) and reveal signatures of divergent selection that would have been missed if duplicated loci were filtered. We also find conserved signatures of elevated divergence at pairs of homeologous chromosomes with residual tetrasomic inheritance, suggesting that joint evolution of some nondiverged gene duplicates may affect the adaptive potential of these genes. These findings illustrate that including duplicated loci in genomic analyses enables novel insights into the evolutionary consequences of WGDs and local segmental gene duplications.
Economic evaluation of genomic selection in small ruminants: a sheep meat breeding program.
Shumbusho, F; Raoul, J; Astruc, J M; Palhiere, I; Lemarié, S; Fugeray-Scarbel, A; Elsen, J M
2016-06-01
Recent genomic evaluation studies using real data and predicting genetic gain by modeling breeding programs have reported moderate expected benefits from the replacement of classic selection schemes by genomic selection (GS) in small ruminants. The objectives of this study were to compare the cost, monetary genetic gain and economic efficiency of classic selection and GS schemes in the meat sheep industry. Deterministic methods were used to model selection based on multi-trait indices from a sheep meat breeding program. Decisional variables related to male selection candidates and progeny testing were optimized to maximize the annual monetary genetic gain (AMGG), that is, a weighted sum of meat and maternal traits annual genetic gains. For GS, a reference population of 2000 individuals was assumed and genomic information was available for evaluation of male candidates only. In the classic selection scheme, males breeding values were estimated from own and offspring phenotypes. In GS, different scenarios were considered, differing by the information used to select males (genomic only, genomic+own performance, genomic+offspring phenotypes). The results showed that all GS scenarios were associated with higher total variable costs than classic selection (if the cost of genotyping was 123 euros/animal). In terms of AMGG and economic returns, GS scenarios were found to be superior to classic selection only if genomic information was combined with their own meat phenotypes (GS-Pheno) or with their progeny test information. The predicted economic efficiency, defined as returns (proportional to number of expressions of AMGG in the nucleus and commercial flocks) minus total variable costs, showed that the best GS scenario (GS-Pheno) was up to 15% more efficient than classic selection. For all selection scenarios, optimization increased the overall AMGG, returns and economic efficiency. As a conclusion, our study shows that some forms of GS strategies are more advantageous than classic selection, provided that GS is already initiated (i.e. the initial reference population is available). Optimizing decisional variables of the classic selection scheme could be of greater benefit than including genomic information in optimized designs.
The locus of sexual selection: moving sexual selection studies into the post-genomics era.
Wilkinson, G S; Breden, F; Mank, J E; Ritchie, M G; Higginson, A D; Radwan, J; Jaquiery, J; Salzburger, W; Arriero, E; Barribeau, S M; Phillips, P C; Renn, S C P; Rowe, L
2015-04-01
Sexual selection drives fundamental evolutionary processes such as trait elaboration and speciation. Despite this importance, there are surprisingly few examples of genes unequivocally responsible for variation in sexually selected phenotypes. This lack of information inhibits our ability to predict phenotypic change due to universal behaviours, such as fighting over mates and mate choice. Here, we discuss reasons for this apparent gap and provide recommendations for how it can be overcome by adopting contemporary genomic methods, exploiting underutilized taxa that may be ideal for detecting the effects of sexual selection and adopting appropriate experimental paradigms. Identifying genes that determine variation in sexually selected traits has the potential to improve theoretical models and reveal whether the genetic changes underlying phenotypic novelty utilize common or unique molecular mechanisms. Such a genomic approach to sexual selection will help answer questions in the evolution of sexually selected phenotypes that were first asked by Darwin and can furthermore serve as a model for the application of genomics in all areas of evolutionary biology. © 2015 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2015 European Society For Evolutionary Biology.
GAPIT: genome association and prediction integrated tool.
Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu
2012-09-15
Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.
[Genome editing of industrial microorganism].
Zhu, Linjiang; Li, Qi
2015-03-01
Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.
Recent advances in understanding the role of nutrition in human genome evolution.
Ye, Kaixiong; Gu, Zhenglong
2011-11-01
Dietary transitions in human history have been suggested to play important roles in the evolution of mankind. Genetic variations caused by adaptation to diet during human evolution could have important health consequences in current society. The advance of sequencing technologies and the rapid accumulation of genome information provide an unprecedented opportunity to comprehensively characterize genetic variations in human populations and unravel the genetic basis of human evolution. Series of selection detection methods, based on various theoretical models and exploiting different aspects of selection signatures, have been developed. Their applications at the species and population levels have respectively led to the identification of human specific selection events that distinguish human from nonhuman primates and local adaptation events that contribute to human diversity. Scrutiny of candidate genes has revealed paradigms of adaptations to specific nutritional components and genome-wide selection scans have verified the prevalence of diet-related selection events and provided many more candidates awaiting further investigation. Understanding the role of diet in human evolution is fundamental for the development of evidence-based, genome-informed nutritional practices in the era of personal genomics.
Colleau, Jean-Jacques; Palhière, Isabelle; Rodríguez-Ramilo, Silvia T; Legarra, Andres
2017-12-01
Pedigree-based management of genetic diversity in populations, e.g., using optimal contributions, involves computation of the [Formula: see text] type yielding elements (relationships) or functions (usually averages) of relationship matrices. For pedigree-based relationships [Formula: see text], a very efficient method exists. When all the individuals of interest are genotyped, genomic management can be addressed using the genomic relationship matrix [Formula: see text]; however, to date, the computational problem of efficiently computing [Formula: see text] has not been well studied. When some individuals of interest are not genotyped, genomic management should consider the relationship matrix [Formula: see text] that combines genotyped and ungenotyped individuals; however, direct computation of [Formula: see text] is computationally very demanding, because construction of a possibly huge matrix is required. Our work presents efficient ways of computing [Formula: see text] and [Formula: see text], with applications on real data from dairy sheep and dairy goat breeding schemes. For genomic relationships, an efficient indirect computation with quadratic instead of cubic cost is [Formula: see text], where Z is a matrix relating animals to genotypes. For the relationship matrix [Formula: see text], we propose an indirect method based on the difference between vectors [Formula: see text], which involves computation of [Formula: see text] and of products such as [Formula: see text] and [Formula: see text], where [Formula: see text] is a working vector derived from [Formula: see text]. The latter computation is the most demanding but can be done using sparse Cholesky decompositions of matrix [Formula: see text], which allows handling very large genomic and pedigree data files. Studies based on simulations reported in the literature show that the trends of average relationships in [Formula: see text] and [Formula: see text] differ as genomic selection proceeds. When selection is based on genomic relationships but management is based on pedigree data, the true genetic diversity is overestimated. However, our tests on real data from sheep and goat obtained before genomic selection started do not show this. We present efficient methods to compute elements and statistics of the genomic relationships [Formula: see text] and of matrix [Formula: see text] that combines ungenotyped and genotyped individuals. These methods should be useful to monitor and handle genomic diversity.
Multi-locus analysis of genomic time series data from experimental evolution.
Terhorst, Jonathan; Schlötterer, Christian; Song, Yun S
2015-04-01
Genomic time series data generated by evolve-and-resequence (E&R) experiments offer a powerful window into the mechanisms that drive evolution. However, standard population genetic inference procedures do not account for sampling serially over time, and new methods are needed to make full use of modern experimental evolution data. To address this problem, we develop a Gaussian process approximation to the multi-locus Wright-Fisher process with selection over a time course of tens of generations. The mean and covariance structure of the Gaussian process are obtained by computing the corresponding moments in discrete-time Wright-Fisher models conditioned on the presence of a linked selected site. This enables our method to account for the effects of linkage and selection, both along the genome and across sampled time points, in an approximate but principled manner. We first use simulated data to demonstrate the power of our method to correctly detect, locate and estimate the fitness of a selected allele from among several linked sites. We study how this power changes for different values of selection strength, initial haplotypic diversity, population size, sampling frequency, experimental duration, number of replicates, and sequencing coverage depth. In addition to providing quantitative estimates of selection parameters from experimental evolution data, our model can be used by practitioners to design E&R experiments with requisite power. We also explore how our likelihood-based approach can be used to infer other model parameters, including effective population size and recombination rate. Then, we apply our method to analyze genome-wide data from a real E&R experiment designed to study the adaptation of D. melanogaster to a new laboratory environment with alternating cold and hot temperatures.
2013-01-01
Background Machine learning techniques are becoming useful as an alternative approach to conventional medical diagnosis or prognosis as they are good for handling noisy and incomplete data, and significant results can be attained despite a small sample size. Traditionally, clinicians make prognostic decisions based on clinicopathologic markers. However, it is not easy for the most skilful clinician to come out with an accurate prognosis by using these markers alone. Thus, there is a need to use genomic markers to improve the accuracy of prognosis. The main aim of this research is to apply a hybrid of feature selection and machine learning methods in oral cancer prognosis based on the parameters of the correlation of clinicopathologic and genomic markers. Results In the first stage of this research, five feature selection methods have been proposed and experimented on the oral cancer prognosis dataset. In the second stage, the model with the features selected from each feature selection methods are tested on the proposed classifiers. Four types of classifiers are chosen; these are namely, ANFIS, artificial neural network, support vector machine and logistic regression. A k-fold cross-validation is implemented on all types of classifiers due to the small sample size. The hybrid model of ReliefF-GA-ANFIS with 3-input features of drink, invasion and p63 achieved the best accuracy (accuracy = 93.81%; AUC = 0.90) for the oral cancer prognosis. Conclusions The results revealed that the prognosis is superior with the presence of both clinicopathologic and genomic markers. The selected features can be investigated further to validate the potential of becoming as significant prognostic signature in the oral cancer studies. PMID:23725313
Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei
2014-12-10
Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.
A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation
Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.
2016-01-01
Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.
Improving the baking quality of bread wheat by genomic selection in early generations.
Michel, Sebastian; Kummer, Christian; Gallee, Martin; Hellinger, Jakob; Ametz, Christian; Akgöl, Batuhan; Epure, Doru; Güngör, Huseyin; Löschenberger, Franziska; Buerstmayr, Hermann
2018-02-01
Genomic selection shows great promise for pre-selecting lines with superior bread baking quality in early generations, 3 years ahead of labour-intensive, time-consuming, and costly quality analysis. The genetic improvement of baking quality is one of the grand challenges in wheat breeding as the assessment of the associated traits often involves time-consuming, labour-intensive, and costly testing forcing breeders to postpone sophisticated quality tests to the very last phases of variety development. The prospect of genomic selection for complex traits like grain yield has been shown in numerous studies, and might thus be also an interesting method to select for baking quality traits. Hence, we focused in this study on the accuracy of genomic selection for laborious and expensive to phenotype quality traits as well as its selection response in comparison with phenotypic selection. More than 400 genotyped wheat lines were, therefore, phenotyped for protein content, dough viscoelastic and mixing properties related to baking quality in multi-environment trials 2009-2016. The average prediction accuracy across three independent validation populations was r = 0.39 and could be increased to r = 0.47 by modelling major QTL as fixed effects as well as employing multi-trait prediction models, which resulted in an acceptable prediction accuracy for all dough rheological traits (r = 0.38-0.63). Genomic selection can furthermore be applied 2-3 years earlier than direct phenotypic selection, and the estimated selection response was nearly twice as high in comparison with indirect selection by protein content for baking quality related traits. This considerable advantage of genomic selection could accordingly support breeders in their selection decisions and aid in efficiently combining superior baking quality with grain yield in newly developed wheat varieties.
Thinking too positive? Revisiting current methods of population genetic selection inference.
Bank, Claudia; Ewing, Gregory B; Ferrer-Admettla, Anna; Foll, Matthieu; Jensen, Jeffrey D
2014-12-01
In the age of next-generation sequencing, the availability of increasing amounts and improved quality of data at decreasing cost ought to allow for a better understanding of how natural selection is shaping the genome than ever before. However, alternative forces, such as demography and background selection (BGS), obscure the footprints of positive selection that we would like to identify. In this review, we illustrate recent developments in this area, and outline a roadmap for improved selection inference. We argue (i) that the development and obligatory use of advanced simulation tools is necessary for improved identification of selected loci, (ii) that genomic information from multiple time points will enhance the power of inference, and (iii) that results from experimental evolution should be utilized to better inform population genomic studies. Copyright © 2014 Elsevier Ltd. All rights reserved.
Deep Learning for Population Genetic Inference.
Sheehan, Sara; Song, Yun S
2016-03-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme.
Deep Learning for Population Genetic Inference
Sheehan, Sara; Song, Yun S.
2016-01-01
Given genomic variation data from multiple individuals, computing the likelihood of complex population genetic models is often infeasible. To circumvent this problem, we introduce a novel likelihood-free inference framework by applying deep learning, a powerful modern technique in machine learning. Deep learning makes use of multilayer neural networks to learn a feature-based function from the input (e.g., hundreds of correlated summary statistics of data) to the output (e.g., population genetic parameters of interest). We demonstrate that deep learning can be effectively employed for population genetic inference and learning informative features of data. As a concrete application, we focus on the challenging problem of jointly inferring natural selection and demography (in the form of a population size change history). Our method is able to separate the global nature of demography from the local nature of selection, without sequential steps for these two factors. Studying demography and selection jointly is motivated by Drosophila, where pervasive selection confounds demographic analysis. We apply our method to 197 African Drosophila melanogaster genomes from Zambia to infer both their overall demography, and regions of their genome under selection. We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection. Interestingly, we find that soft sweeps and balancing selection occur more frequently closer to the centromere of each chromosome. In addition, our demographic inference suggests that previously estimated bottlenecks for African Drosophila melanogaster are too extreme. PMID:27018908
Interpreting the genomic landscape of speciation: a road map for finding barriers to gene flow.
Ravinet, M; Faria, R; Butlin, R K; Galindo, J; Bierne, N; Rafajlović, M; Noor, M A F; Mehlig, B; Westram, A M
2017-08-01
Speciation, the evolution of reproductive isolation among populations, is continuous, complex, and involves multiple, interacting barriers. Until it is complete, the effects of this process vary along the genome and can lead to a heterogeneous genomic landscape with peaks and troughs of differentiation and divergence. When gene flow occurs during speciation, barriers restricting gene flow locally in the genome lead to patterns of heterogeneity. However, genomic heterogeneity can also be produced or modified by variation in factors such as background selection and selective sweeps, recombination and mutation rate variation, and heterogeneous gene density. Extracting the effects of gene flow, divergent selection and reproductive isolation from such modifying factors presents a major challenge to speciation genomics. We argue one of the principal aims of the field is to identify the barrier loci involved in limiting gene flow. We first summarize the expected signatures of selection at barrier loci, at the genomic regions linked to them and across the entire genome. We then discuss the modifying factors that complicate the interpretation of the observed genomic landscape. Finally, we end with a road map for future speciation research: a proposal for how to account for these modifying factors and to progress towards understanding the nature of barrier loci. Despite the difficulties of interpreting empirical data, we argue that the availability of promising technical and analytical methods will shed further light on the important roles that gene flow and divergent selection have in shaping the genomic landscape of speciation. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Hickey, John M; Chiurugwi, Tinashe; Mackay, Ian; Powell, Wayne
2017-08-30
The rate of annual yield increases for major staple crops must more than double relative to current levels in order to feed a predicted global population of 9 billion by 2050. Controlled hybridization and selective breeding have been used for centuries to adapt plant and animal species for human use. However, achieving higher, sustainable rates of improvement in yields in various species will require renewed genetic interventions and dramatic improvement of agricultural practices. Genomic prediction of breeding values has the potential to improve selection, reduce costs and provide a platform that unifies breeding approaches, biological discovery, and tools and methods. Here we compare and contrast some animal and plant breeding approaches to make a case for bringing the two together through the application of genomic selection. We propose a strategy for the use of genomic selection as a unifying approach to deliver innovative 'step changes' in the rate of genetic gain at scale.
Will genomic selection be a practical method for plant breeding?
Nakaya, Akihiro; Isobe, Sachiko N
2012-11-01
Genomic selection or genome-wide selection (GS) has been highlighted as a new approach for marker-assisted selection (MAS) in recent years. GS is a form of MAS that selects favourable individuals based on genomic estimated breeding values. Previous studies have suggested the utility of GS, especially for capturing small-effect quantitative trait loci, but GS has not become a popular methodology in the field of plant breeding, possibly because there is insufficient information available on GS for practical use. In this review, GS is discussed from a practical breeding viewpoint. Statistical approaches employed in GS are briefly described, before the recent progress in GS studies is surveyed. GS practices in plant breeding are then reviewed before future prospects are discussed. Statistical concepts used in GS are discussed with genetic models and variance decomposition, heritability, breeding value and linear model. Recent progress in GS studies is reviewed with a focus on empirical studies. For the practice of GS in plant breeding, several specific points are discussed including linkage disequilibrium, feature of populations and genotyped markers and breeding scheme. Currently, GS is not perfect, but it is a potent, attractive and valuable approach for plant breeding. This method will be integrated into many practical breeding programmes in the near future with further advances and the maturing of its theory.
Detection of genomic signatures of recent selection in commercial broiler chickens.
Fu, Weixuan; Lee, William R; Abasht, Behnam
2016-08-26
Identification of the genomic signatures of recent selection may help uncover causal polymorphisms controlling traits relevant to recent decades of selective breeding in livestock. In this study, we aimed at detecting signatures of recent selection in commercial broiler chickens using genotype information from single nucleotide polymorphisms (SNPs). A total of 565 chickens from five commercial purebred lines, including three broiler sire (male) lines and two broiler dam (female) lines, were genotyped using the 60K SNP Illumina iSelect chicken array. To detect genomic signatures of recent selection, we applied two methods based on population comparison, cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio (XP-CLR), and further analyzed the results to find genomic regions under recent selection in multiple purebred lines. A total of 321 candidate selection regions spanning approximately 1.45 % of the chicken genome in each line were detected by consensus of results of both XP-EHH and XP-CLR methods. To minimize false discovery due to genetic drift, only 42 of the candidate selection regions that were shared by 2 or more purebred lines were considered as high-confidence selection regions in the study. Of these 42 regions, 20 were 50 kb or less while 4 regions were larger than 0.5 Mb. In total, 91 genes could be found in the 42 regions, among which 19 regions contained only 1 or 2 genes, and 9 regions were located at gene deserts. Our results provide a genome-wide scan of recent selection signatures in five purebred lines of commercial broiler chickens. We found several candidate genes for recent selection in multiple lines, such as SOX6 (Sex Determining Region Y-Box 6) and cTR (Thyroid hormone receptor beta). These genes may have been under recent selection due to their essential roles in growth, development and reproduction in chickens. Furthermore, our results suggest that in some candidate regions, the same or opposite alleles have been under recent selection in multiple lines. Most of the candidate genes in the selection regions are novel, and as such they should be of great interest for future research into the genetic architecture of traits relevant to modern broiler breeding.
An experimental validation of genomic selection in octoploid strawberry
Gezan, Salvador A; Osorio, Luis F; Verma, Sujeet; Whitaker, Vance M
2017-01-01
The primary goal of genomic selection is to increase genetic gains for complex traits by predicting performance of individuals for which phenotypic data are not available. The objective of this study was to experimentally evaluate the potential of genomic selection in strawberry breeding and to define a strategy for its implementation. Four clonally replicated field trials, two in each of 2 years comprised of a total of 1628 individuals, were established in 2013–2014 and 2014–2015. Five complex yield and fruit quality traits with moderate to low heritability were assessed in each trial. High-density genotyping was performed with the Affymetrix Axiom IStraw90 single-nucleotide polymorphism array, and 17 479 polymorphic markers were chosen for analysis. Several methods were compared, including Genomic BLUP, Bayes B, Bayes C, Bayesian LASSO Regression, Bayesian Ridge Regression and Reproducing Kernel Hilbert Spaces. Cross-validation within training populations resulted in higher values than for true validations across trials. For true validations, Bayes B gave the highest predictive abilities on average and also the highest selection efficiencies, particularly for yield traits that were the lowest heritability traits. Selection efficiencies using Bayes B for parent selection ranged from 74% for average fruit weight to 34% for early marketable yield. A breeding strategy is proposed in which advanced selection trials are utilized as training populations and in which genomic selection can reduce the breeding cycle from 3 to 2 years for a subset of untested parents based on their predicted genomic breeding values. PMID:28090334
USDA-ARS?s Scientific Manuscript database
The objective of this study was to compare methods for genomic evaluation in a Rainbow Trout (Oncorhynchus mykiss) population for survival when challenged by Flavobacterium psychrophilum, the causative agent of bacterial cold water disease (BCWD). The used methods were: 1)regular ssGBLUP that assume...
Ishiyama, Izumi; Tanzawa, Tetsuro; Watanabe, Maiko; Maeda, Tadahiko; Muto, Kaori; Tamakoshi, Akiko; Nagai, Akiko; Yamagata, Zentaro
2012-05-01
This study aimed to assess public attitudes in Japan to the promotion of genomic selection in crop studies and to examine associated factors. We analysed data from a nationwide opinion survey. A total of 4,000 people were selected from the Japanese general population by a stratified two-phase sampling method, and 2,171 people participated by post; this survey asked about the pros and cons of crop-related genomic studies promotion, examined people's scientific literacy in genomics, and investigated factors thought to be related to genomic literacy and attitude. The relationships were examined using logistic regression models stratified by gender. Survey results showed that 50.0% of respondents approved of the promotion of crop-related genomic studies, while 6.7% disapproved. No correlation was found between literacy and attitude towards promotion. Trust in experts, belief in science, an interest in genomic studies and willingness to purchase new products correlated with a positive attitude towards crop-related genomic studies.
A Meta-Assembly of Selection Signatures in Cattle
Randhawa, Imtiaz A. S.; Khatkar, Mehar S.; Thomson, Peter C.; Raadsma, Herman W.
2016-01-01
Since domestication, significant genetic improvement has been achieved for many traits of commercial importance in cattle, including adaptation, appearance and production. In response to such intense selection pressures, the bovine genome has undergone changes at the underlying regions of functional genetic variants, which are termed “selection signatures”. This article reviews 64 recent (2009–2015) investigations testing genomic diversity for departure from neutrality in worldwide cattle populations. In particular, we constructed a meta-assembly of 16,158 selection signatures for individual breeds and their archetype groups (European, African, Zebu and composite) from 56 genome-wide scans representing 70,743 animals of 90 pure and crossbred cattle breeds. Meta-selection-scores (MSS) were computed by combining published results at every given locus, within a sliding window span. MSS were adjusted for common samples across studies and were weighted for significance thresholds across and within studies. Published selection signatures show extensive coverage across the bovine genome, however, the meta-assembly provides a consensus profile of 263 genomic regions of which 141 were unique (113 were breed-specific) and 122 were shared across cattle archetypes. The most prominent peaks of MSS represent regions under selection across multiple populations and harboured genes of known major effects (coat color, polledness and muscle hypertrophy) and genes known to influence polygenic traits (stature, adaptation, feed efficiency, immunity, behaviour, reproduction, beef and dairy production). As the first meta-assembly of selection signatures, it offers novel insights about the hotspots of selective sweeps in the bovine genome, and this method could equally be applied to other species. PMID:27045296
Genomics and Biochemistry of Saccharomyces cerevisiae Wine Yeast Strains.
Eldarov, M A; Kishkovskaia, S A; Tanaschuk, T N; Mardanov, A V
2016-12-01
Saccharomyces yeasts have been used for millennia for the production of beer, wine, bread, and other fermented products. Long-term "unconscious" selection and domestication led to the selection of hundreds of strains with desired production traits having significant phenotypic and genetic differences from their wild ancestors. This review summarizes the results of recent research in deciphering the genomes of wine Saccharomyces strains, the use of comparative genomics methods to study the mechanisms of yeast genome evolution under conditions of artificial selection, and the use of genomic and postgenomic approaches to identify the molecular nature of the important characteristics of commercial wine strains of Saccharomyces. Succinctly, data concerning metagenomics of microbial communities of grapes and wine and the dynamics of yeast and bacterial flora in the course of winemaking is provided. A separate section is devoted to an overview of the physiological, genetic, and biochemical features of sherry yeast strains used to produce biologically aged wines. The goal of the review is to convince the reader of the efficacy of new genomic and postgenomic technologies as tools for developing strategies for targeted selection and creation of new strains using "classical" and modern techniques for improving winemaking technology.
CRISPR-Cas9-Based Genome Editing of Human Induced Pluripotent Stem Cells.
Giacalone, Joseph C; Sharma, Tasneem P; Burnight, Erin R; Fingert, John F; Mullins, Robert F; Stone, Edwin M; Tucker, Budd A
2018-02-28
Human induced pluripotent stem cells (hiPSCs) are the ideal cell source for autologous cell replacement. However, for patients with Mendelian diseases, genetic correction of the original disease-causing mutation is likely required prior to cellular differentiation and transplantation. The emergence of the CRISPR-Cas9 system has revolutionized the field of genome editing. By introducing inexpensive reagents that are relatively straightforward to design and validate, it is now possible to correct genetic variants or insert desired sequences at any location within the genome. CRISPR-based genome editing of patient-specific iPSCs shows great promise for future autologous cell replacement therapies. One caveat, however, is that hiPSCs are notoriously difficult to transfect, and optimized experimental design considerations are often necessary. This unit describes design strategies and methods for efficient CRISPR-based genome editing of patient- specific iPSCs. Additionally, it details a flexible approach that utilizes positive selection to generate clones with a desired genomic modification, Cre-lox recombination to remove the integrated selection cassette, and negative selection to eliminate residual hiPSCs with intact selection cassettes. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Variation block-based genomics method for crop plants.
Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon
2014-06-15
In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
Evaluation of redundancy analysis to identify signatures of local adaptation.
Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric
2018-05-26
Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Evaluation of methods and marker Systems in Genomic Selection of oil palm (Elaeis guineensis Jacq.).
Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Yeoh, Suat Hui; Appleton, David Ross; Harikrishna, Jennifer Ann
2017-12-11
Genomic selection (GS) uses genome-wide markers as an attempt to accelerate genetic gain in breeding programs of both animals and plants. This approach is particularly useful for perennial crops such as oil palm, which have long breeding cycles, and for which the optimal method for GS is still under debate. In this study, we evaluated the effect of different marker systems and modeling methods for implementing GS in an introgressed dura family derived from a Deli dura x Nigerian dura (Deli x Nigerian) with 112 individuals. This family is an important breeding source for developing new mother palms for superior oil yield and bunch characters. The traits of interest selected for this study were fruit-to-bunch (F/B), shell-to-fruit (S/F), kernel-to-fruit (K/F), mesocarp-to-fruit (M/F), oil per palm (O/P) and oil-to-dry mesocarp (O/DM). The marker systems evaluated were simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). RR-BLUP, Bayesian A, B, Cπ, LASSO, Ridge Regression and two machine learning methods (SVM and Random Forest) were used to evaluate GS accuracy of the traits. The kinship coefficient between individuals in this family ranged from 0.35 to 0.62. S/F and O/DM had the highest genomic heritability, whereas F/B and O/P had the lowest. The accuracies using 135 SSRs were low, with accuracies of the traits around 0.20. The average accuracy of machine learning methods was 0.24, as compared to 0.20 achieved by other methods. The trait with the highest mean accuracy was F/B (0.28), while the lowest were both M/F and O/P (0.18). By using whole genomic SNPs, the accuracies for all traits, especially for O/DM (0.43), S/F (0.39) and M/F (0.30) were improved. The average accuracy of machine learning methods was 0.32, compared to 0.31 achieved by other methods. Due to high genomic resolution, the use of whole-genome SNPs improved the efficiency of GS dramatically for oil palm and is recommended for dura breeding programs. Machine learning slightly outperformed other methods, but required parameters optimization for GS implementation.
Lampe, David J; Witherspoon, David J; Soto-Adames, Felipe N; Robertson, Hugh M
2003-04-01
We report the isolation and sequencing of genomic copies of mariner transposons involved in recent horizontal transfers into the genomes of the European earwig, Forficula auricularia; the European honey bee, Apis mellifera; the Mediterranean fruit fly, Ceratitis capitata; and a blister beetle, Epicauta funebris, insects from four different orders. These elements are in the mellifera subfamily and are the second documented example of full-length mariner elements involved in this kind of phenomenon. We applied maximum likelihood methods to the coding sequences and determined that the copies in each genome were evolving neutrally, whereas reconstructed ancestral coding sequences appeared to be under selection, which strengthens our previous hypothesis that the primary selective constraint on mariner sequence evolution is the act of horizontal transfer between genomes.
Nwakanma, Davis C.; Duffy, Craig W.; Amambua-Ngwa, Alfred; Oriero, Eniyou C.; Bojang, Kalifa A.; Pinder, Margaret; Drakeley, Chris J.; Sutherland, Colin J.; Milligan, Paul J.; MacInnis, Bronwyn; Kwiatkowski, Dominic P.; Clark, Taane G.; Greenwood, Brian M.; Conway, David J.
2014-01-01
Background. Analysis of genome-wide polymorphism in many organisms has potential to identify genes under recent selection. However, data on historical allele frequency changes are rarely available for direct confirmation. Methods. We genotyped single nucleotide polymorphisms (SNPs) in 4 Plasmodium falciparum drug resistance genes in 668 archived parasite-positive blood samples of a Gambian population between 1984 and 2008. This covered a period before antimalarial resistance was detected locally, through subsequent failure of multiple drugs until introduction of artemisinin combination therapy. We separately performed genome-wide sequence analysis of 52 clinical isolates from 2008 to prospect for loci under recent directional selection. Results. Resistance alleles increased from very low frequencies, peaking in 2000 for chloroquine resistance-associated crt and mdr1 genes and at the end of the survey period for dhfr and dhps genes respectively associated with pyrimethamine and sulfadoxine resistance. Temporal changes fit a model incorporating likely selection coefficients over the period. Three of the drug resistance loci were in the top 4 regions under strong selection implicated by the genome-wide analysis. Conclusions. Genome-wide polymorphism analysis of an endemic population sample robustly identifies loci with detailed documentation of recent selection, demonstrating power to prospectively detect emerging drug resistance genes. PMID:24265439
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.
Pehkonen, Petri; Wong, Garry; Törönen, Petri
2010-01-01
Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
Will genomic selection be a practical method for plant breeding?
Nakaya, Akihiro; Isobe, Sachiko N.
2012-01-01
Background Genomic selection or genome-wide selection (GS) has been highlighted as a new approach for marker-assisted selection (MAS) in recent years. GS is a form of MAS that selects favourable individuals based on genomic estimated breeding values. Previous studies have suggested the utility of GS, especially for capturing small-effect quantitative trait loci, but GS has not become a popular methodology in the field of plant breeding, possibly because there is insufficient information available on GS for practical use. Scope In this review, GS is discussed from a practical breeding viewpoint. Statistical approaches employed in GS are briefly described, before the recent progress in GS studies is surveyed. GS practices in plant breeding are then reviewed before future prospects are discussed. Conclusions Statistical concepts used in GS are discussed with genetic models and variance decomposition, heritability, breeding value and linear model. Recent progress in GS studies is reviewed with a focus on empirical studies. For the practice of GS in plant breeding, several specific points are discussed including linkage disequilibrium, feature of populations and genotyped markers and breeding scheme. Currently, GS is not perfect, but it is a potent, attractive and valuable approach for plant breeding. This method will be integrated into many practical breeding programmes in the near future with further advances and the maturing of its theory. PMID:22645117
Martin-Ortigosa, Susana; Peterson, David J.; Valenstein, Justin S.; Lin, Victor S.-Y.; Trewyn, Brian G.; Lyznik, L. Alexander; Wang, Kan
2014-01-01
The delivery of proteins instead of DNA into plant cells allows for a transient presence of the protein or enzyme that can be useful for biochemical analysis or genome modifications. This may be of particular interest for genome editing, because it can avoid DNA (transgene) integration into the genome and generate precisely modified “nontransgenic” plants. In this work, we explore direct protein delivery to plant cells using mesoporous silica nanoparticles (MSNs) as carriers to deliver Cre recombinase protein into maize (Zea mays) cells. Cre protein was loaded inside the pores of gold-plated MSNs, and these particles were delivered by the biolistic method to plant cells harboring loxP sites flanking a selection gene and a reporter gene. Cre protein was released inside the cell, leading to recombination of the loxP sites and elimination of both genes. Visual selection was used to select recombination events from which fertile plants were regenerated. Up to 20% of bombarded embryos produced calli with the recombined loxP sites under our experimental conditions. This direct and reproducible technology offers an alternative for DNA-free genome-editing technologies in which MSNs can be tailored to accommodate the desired enzyme and to reach the desired tissue through the biolistic method. PMID:24376280
Kim, Tae Hoon; Dekker, Job
2018-05-01
Owing to its digital nature, ChIP-seq has become the standard method for genome-wide ChIP analysis. Using next-generation sequencing platforms (notably the Illumina Genome Analyzer), millions of short sequence reads can be obtained. The densities of recovered ChIP sequence reads along the genome are used to determine the binding sites of the protein. Although a relatively small amount of ChIP DNA is required for ChIP-seq, the current sequencing platforms still require amplification of the ChIP DNA by ligation-mediated PCR (LM-PCR). This protocol, which involves linker ligation followed by size selection, is the standard ChIP-seq protocol using an Illumina Genome Analyzer. The size-selected ChIP DNA is amplified by LM-PCR and size-selected for the second time. The purified ChIP DNA is then loaded into the Genome Analyzer. The ChIP DNA can also be processed in parallel for ChIP-chip results. © 2018 Cold Spring Harbor Laboratory Press.
Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming
2016-01-01
Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400
A privacy-preserving solution for compressed storage and selective retrieval of genomic data.
Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S; Molyneaux, Adam; Xu, Zhenyu; Fellay, Jacques; Steinmetz, Lars M; Hubaux, Jean-Pierre
2016-12-01
In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. © 2016 Huang et al.; Published by Cold Spring Harbor Laboratory Press.
A privacy-preserving solution for compressed storage and selective retrieval of genomic data
Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S.; Molyneaux, Adam; Xu, Zhenyu; Hubaux, Jean-Pierre
2016-01-01
In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients’ complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. PMID:27789525
Heslot, Nicolas; Akdemir, Deniz; Sorrells, Mark E; Jannink, Jean-Luc
2014-02-01
Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1% on average and the variability in prediction accuracy decreased by 10.8%. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
Reitzel, A M; Herrera, S; Layden, M J; Martindale, M Q; Shank, T M
2013-06-01
Characterization of large numbers of single-nucleotide polymorphisms (SNPs) throughout a genome has the power to refine the understanding of population demographic history and to identify genomic regions under selection in natural populations. To this end, population genomic approaches that harness the power of next-generation sequencing to understand the ecology and evolution of marine invertebrates represent a boon to test long-standing questions in marine biology and conservation. We employed restriction-site-associated DNA sequencing (RAD-seq) to identify SNPs in natural populations of the sea anemone Nematostella vectensis, an emerging cnidarian model with a broad geographic range in estuarine habitats in North and South America, and portions of England. We identified hundreds of SNP-containing tags in thousands of RAD loci from 30 barcoded individuals inhabiting four locations from Nova Scotia to South Carolina. Population genomic analyses using high-confidence SNPs resulted in a highly-resolved phylogeography, a result not achieved in previous studies using traditional markers. Plots of locus-specific FST against heterozygosity suggest that a majority of polymorphic sites are neutral, with a smaller proportion suggesting evidence for balancing selection. Loci inferred to be under balancing selection were mapped to the genome, where 90% were located in gene bodies, indicating potential targets of selection. The results from analyses with and without a reference genome supported similar conclusions, further highlighting RAD-seq as a method that can be efficiently applied to species lacking existing genomic resources. We discuss the utility of RAD-seq approaches in burgeoning Nematostella research as well as in other cnidarian species, particularly corals and jellyfishes, to determine phylogeographic relationships of populations and identify regions of the genome undergoing selection. © 2013 John Wiley & Sons Ltd.
Reitzel, A.M.; Herrera, S.; Layden, M.J.; Martindale, M.Q.; Shank, T.M.
2013-01-01
Characterization of large numbers of single nucleotide polymorphisms (SNPs) throughout a genome has the power to refine the understanding of population demographic history and to identify genomic regions under selection in natural populations. To this end, population genomic approaches that harness the power of next-generation sequencing to understand the ecology and evolution of marine invertebrates represent a boon to test long-standing questions in marine biology and conservation. We employed restriction-site-associated DNA sequencing (RAD-seq) to identify SNPs in natural populations of the sea anemone Nematostella vectensis, an emerging cnidarian model with a broad geographic range in estuarine habitats in North and South America, and portions of England. We identified hundreds of SNP-containing tags in thousands of RAD loci from 30 barcoded individuals inhabiting four locations from Nova Scotia to South Carolina. Population genomic analyses using high-confidence SNPs resulted in a highly-resolved phylogeography, a result not achieved in previous studies using traditional markers. Plots of locus-specific FST against heterozygosity suggest that a majority of polymorphic sites are neutral, with a smaller proportion suggesting evidence for balancing selection. Loci inferred to be under balancing selection were mapped to the genome, where 90% were located in gene bodies, indicating potential targets of selection. Results from analyses with and without a reference genome supported similar conclusions, further supporting RAD-seq as a method that can be efficiently applied to species lacking existing genomic resources. We discuss the utility of RAD-seq approaches in burgeoning Nematostella research as well as in other cnidarian species, particularly corals, to determine phylogeographic relationships of populations and identify regions of the genome undergoing selection. PMID:23473066
Selective Amplification of the Genome Surrounding Key Placental Genes in Trophoblast Giant Cells.
Hannibal, Roberta L; Baker, Julie C
2016-01-25
While most cells maintain a diploid state, polyploid cells exist in many organisms and are particularly prevalent within the mammalian placenta [1], where they can generate more than 900 copies of the genome [2]. Polyploidy is thought to be an efficient method of increasing the content of the genome by avoiding the costly and slow process of cytokinesis [1, 3, 4]. Polyploidy can also affect gene regulation by amplifying a subset of genomic regions required for specific cellular function [1, 3, 4]. This mechanism is found in the fruit fly Drosophila melanogaster, where polyploid ovarian follicle cells amplify genomic regions containing chorion genes, which facilitate secretion of eggshell proteins [5]. Here, we report that genomic amplification also occurs in mammals at selective regions of the genome in parietal trophoblast giant cells (p-TGCs) of the mouse placenta. Using whole-genome sequencing (WGS) and digital droplet PCR (ddPCR) of mouse p-TGCs, we identified five amplified regions, each containing a gene family known to be involved in mammalian placentation: the prolactins (two clusters), serpins, cathepsins, and the natural killer (NK)/C-type lectin (CLEC) complex [6-12]. We report here the first description of amplification at selective genomic regions in mammals and present evidence that this is an important mode of genome regulation in placental TGCs. Copyright © 2016 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Breeding and selection for the traits with polygenic inheritance is a challenging task that can be done by phenotypic selection, by marker-assisted selection or by genome wide selection. We tested predictive ability of four selection models in a biparental population genotyped with 95 SNP markers an...
A Ranking Approach to Genomic Selection.
Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori
2015-01-01
Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
Mapping genomic features to functional traits through microbial whole genome sequences.
Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott
2014-01-01
Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.
iPat: intelligent prediction and association tool for genomic research.
Chen, Chunpeng James; Zhang, Zhiwu
2018-06-01
The ultimate goal of genomic research is to effectively predict phenotypes from genotypes so that medical management can improve human health and molecular breeding can increase agricultural production. Genomic prediction or selection (GS) plays a complementary role to genome-wide association studies (GWAS), which is the primary method to identify genes underlying phenotypes. Unfortunately, most computing tools cannot perform data analyses for both GWAS and GS. Furthermore, the majority of these tools are executed through a command-line interface (CLI), which requires programming skills. Non-programmers struggle to use them efficiently because of the steep learning curves and zero tolerance for data formats and mistakes when inputting keywords and parameters. To address these problems, this study developed a software package, named the Intelligent Prediction and Association Tool (iPat), with a user-friendly graphical user interface. With iPat, GWAS or GS can be performed using a pointing device to simply drag and/or click on graphical elements to specify input data files, choose input parameters and select analytical models. Models available to users include those implemented in third party CLI packages such as GAPIT, PLINK, FarmCPU, BLINK, rrBLUP and BGLR. Users can choose any data format and conduct analyses with any of these packages. File conversions are automatically conducted for specified input data and selected packages. A GWAS-assisted genomic prediction method was implemented to perform genomic prediction using any GWAS method such as FarmCPU. iPat was written in Java for adaptation to multiple operating systems including Windows, Mac and Linux. The iPat executable file, user manual, tutorials and example datasets are freely available at http://zzlab.net/iPat. zhiwu.zhang@wsu.edu.
Introduction of structural affinity handles as a tool in selective nucleic acid separations
NASA Technical Reports Server (NTRS)
Willson, III, Richard Coale (Inventor); Cano, Luis Antonio (Inventor)
2011-01-01
The method is used for separating nucleic acids and other similar constructs. It involves selective introduction, enhancement, or stabilization of affinity handles such as single-strandedness in the undesired (or desired) nucleic acids as compared to the usual structure (e.g., double-strandedness) of the desired (or undesired) nucleic acids. The undesired (or desired) nucleic acids are separated from the desired (or undesired) nucleic acids due to capture by methods including but not limited to immobilized metal affinity chromatography, immobilized single-stranded DNA binding (SSB) protein, and immobilized oligonucleotides. The invention is useful to: remove contaminating genomic DNA from plasmid DNA; remove genomic DNA from plasmids, BACs, and similar constructs; selectively separate oligonucleotides and similar DNA fragments from their partner strands; purification of aptamers, (deoxy)-ribozymes and other highly structured nucleic acids; Separation of restriction fragments without using agarose gels; manufacture recombinant Taq polymerase or similar products that are sensitive to host genomic DNA contamination; and other applications.
Novel genetic tools for studying food-borne Salmonella.
Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael
2009-04-01
Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.
Zhang, Cheng; Ni, Pan; Ahmad, Hafiz Ishfaq; Gemingguli, M; Baizilaitibei, A; Gulibaheti, D; Fang, Yaping; Wang, Haiyang; Asif, Akhtar Rasool; Xiao, Changyi; Chen, Jianhai; Ma, Yunlong; Liu, Xiangdong; Du, Xiaoyong; Zhao, Shuhong
2018-01-01
Animal domestication gives rise to gradual changes at the genomic level through selection in populations. Selective sweeps have been traced in the genomes of many animal species, including humans, cattle, and dogs. However, little is known regarding positional candidate genes and genomic regions that exhibit signatures of selection in domestic horses. In addition, an understanding of the genetic processes underlying horse domestication, especially the origin of Chinese native populations, is still lacking. In our study, we generated whole genome sequences from 4 Chinese native horses and combined them with 48 publicly available full genome sequences, from which 15 341 213 high-quality unique single-nucleotide polymorphism variants were identified. Kazakh and Lichuan horses are 2 typical Asian native breeds that were formed in Kazakh or Northwest China and South China, respectively. We detected 1390 loss-of-function (LoF) variants in protein-coding genes, and gene ontology (GO) enrichment analysis revealed that some LoF-affected genes were overrepresented in GO terms related to the immune response. Bayesian clustering, distance analysis, and principal component analysis demonstrated that the population structure of these breeds largely reflected weak geographic patterns. Kazakh and Lichuan horses were assigned to the same lineage with other Asian native breeds, in agreement with previous studies on the genetic origin of Chinese domestic horses. We applied the composite likelihood ratio method to scan for genomic regions showing signals of recent selection in the horse genome. A total of 1052 genomic windows of 10 kB, corresponding to 933 distinct core regions, significantly exceeded neutral simulations. The GO enrichment analysis revealed that the genes under selective sweeps were overrepresented with GO terms, including “negative regulation of canonical Wnt signaling pathway,” “muscle contraction,” and “axon guidance.” Frequent exercise training in domestic horses may have resulted in changes in the expression of genes related to metabolism, muscle structure, and the nervous system.
Lado, Bettina; Matus, Ivan; Rodríguez, Alejandra; Inostroza, Luis; Poland, Jesse; Belzile, François; del Pozo, Alejandro; Quincke, Martín; Castro, Marina; von Zitzewitz, Jarislav
2013-12-09
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.
Ferrarini, Alberto; Forcato, Claudio; Buson, Genny; Tononi, Paola; Del Monaco, Valentina; Terracciano, Mario; Bolognesi, Chiara; Fontana, Francesca; Medoro, Gianni; Neves, Rui; Möhlendick, Birte; Rihawi, Karim; Ardizzoni, Andrea; Sumanasuriya, Semini; Flohr, Penny; Lambros, Maryou; de Bono, Johann; Stoecklein, Nikolas H; Manaresi, Nicolò
2018-01-01
Chromosomal instability and associated chromosomal aberrations are hallmarks of cancer and play a critical role in disease progression and development of resistance to drugs. Single-cell genome analysis has gained interest in latest years as a source of biomarkers for targeted-therapy selection and drug resistance, and several methods have been developed to amplify the genomic DNA and to produce libraries suitable for Whole Genome Sequencing (WGS). However, most protocols require several enzymatic and cleanup steps, thus increasing the complexity and length of protocols, while robustness and speed are key factors for clinical applications. To tackle this issue, we developed a single-tube, single-step, streamlined protocol, exploiting ligation mediated PCR (LM-PCR) Whole Genome Amplification (WGA) method, for low-pass genome sequencing with the Ion Torrent™ platform and copy number alterations (CNAs) calling from single cells. The method was evaluated on single cells isolated from 6 aberrant cell lines of the NCI-H series. In addition, to demonstrate the feasibility of the workflow on clinical samples, we analyzed single circulating tumor cells (CTCs) and white blood cells (WBCs) isolated from the blood of patients affected by prostate cancer or lung adenocarcinoma. The results obtained show that the developed workflow generates data accurately representing whole genome absolute copy number profiles of single cell and allows alterations calling at resolutions down to 100 Kbp with as few as 200,000 reads. The presented data demonstrate the feasibility of the Ampli1™ WGA-based low-pass workflow for detection of CNAs in single tumor cells which would be of particular interest for genome-driven targeted therapy selection and for monitoring of disease progression.
Evans, Nicholas G; Moreno, Jonathan D
2015-02-01
A recent article by Maxwell J. Mehlman and Tracy Yeheng Li, in the Journal of Law and the Biosciences , sought to examine the ethical, legal, social, and policy issues associated with the use of genetic screening and germ-line therapies ('genomic technologies') by the US Military. In this commentary, we will elaborate several related matters: the relationship between genetic and non-genetic screening methods, the history of selection processes and force strength, and the consequences and ethics of, as Mehlman and Li suggest, engineering enhanced soldiers. We contend, first, that the strengths of genomic testing as a method of determining enrollment in the armed forces has limited appeal, given the state of current selection methods in the US armed forces. Second, that the vagaries of genetic selection, much like other forms of selection that do not bear causally or reliably on soldier performance (such as race, gender, and sexuality), pose a systematic threat to force strength by limiting the (valuable) diversity of combat units. Third, that the idea of enhancing warfighters through germ-line interventions poses serious ethical issues in terms of the control and ownership of 'enhancements' when members separate from service.
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.
2005-08-26
Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
Azevedo, C F; Nascimento, M; Silva, F F; Resende, M D V; Lopes, P S; Guimarães, S E F; Glória, L S
2015-10-09
A significant contribution of molecular genetics is the direct use of DNA information to identify genetically superior individuals. With this approach, genome-wide selection (GWS) can be used for this purpose. GWS consists of analyzing a large number of single nucleotide polymorphism markers widely distributed in the genome; however, because the number of markers is much larger than the number of genotyped individuals, and such markers are highly correlated, special statistical methods are widely required. Among these methods, independent component regression, principal component regression, partial least squares, and partial principal components stand out. Thus, the aim of this study was to propose an application of the methods of dimensionality reduction to GWS of carcass traits in an F2 (Piau x commercial line) pig population. The results show similarities between the principal and the independent component methods and provided the most accurate genomic breeding estimates for most carcass traits in pigs.
Sorbolini, Silvia; Marras, Gabriele; Gaspa, Giustino; Dimauro, Corrado; Cellesi, Massimo; Valentini, Alessio; Macciotta, Nicolò Pp
2015-06-23
Domestication and selection are processes that alter the pattern of within- and between-population genetic variability. They can be investigated at the genomic level by tracing the so-called selection signatures. Recently, sequence polymorphisms at the genome-wide level have been investigated in a wide range of animals. A common approach to detect selection signatures is to compare breeds that have been selected for different breeding goals (i.e. dairy and beef cattle). However, genetic variations in different breeds with similar production aptitudes and similar phenotypes can be related to differences in their selection history. In this study, we investigated selection signatures between two Italian beef cattle breeds, Piemontese and Marchigiana, using genotyping data that was obtained with the Illumina BovineSNP50 BeadChip. The comparison was based on the fixation index (Fst), combined with a locally weighted scatterplot smoothing (LOWESS) regression and a control chart approach. In addition, analyses of Fst were carried out to confirm candidate genes. In particular, data were processed using the varLD method, which compares the regional variation of linkage disequilibrium between populations. Genome scans confirmed the presence of selective sweeps in the genomic regions that harbour candidate genes that are known to affect productive traits in cattle such as DGAT1, ABCG2, CAPN3, MSTN and FTO. In addition, several new putative candidate genes (for example ALAS1, ABCB8, ACADS and SOD1) were detected. This study provided evidence on the different selection histories of two cattle breeds and the usefulness of genomic scans to detect selective sweeps even in cattle breeds that are bred for similar production aptitudes.
A probabilistic method for testing and estimating selection differences between populations
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-01-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. PMID:26463656
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops.
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an "island model" inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement.
Island-Model Genomic Selection for Long-Term Genetic Improvement of Autogamous Crops
Yabe, Shiori; Yamasaki, Masanori; Ebana, Kaworu; Hayashi, Takeshi; Iwata, Hiroyoshi
2016-01-01
Acceleration of genetic improvement of autogamous crops such as wheat and rice is necessary to increase cereal production in response to the global food crisis. Population and pedigree methods of breeding, which are based on inbred line selection, are used commonly in the genetic improvement of autogamous crops. These methods, however, produce a few novel combinations of genes in a breeding population. Recurrent selection promotes recombination among genes and produces novel combinations of genes in a breeding population, but it requires inaccurate single-plant evaluation for selection. Genomic selection (GS), which can predict genetic potential of individuals based on their marker genotype, might have high reliability of single-plant evaluation and might be effective in recurrent selection. To evaluate the efficiency of recurrent selection with GS, we conducted simulations using real marker genotype data of rice cultivars. Additionally, we introduced the concept of an “island model” inspired by evolutionary algorithms that might be useful to maintain genetic variation through the breeding process. We conducted GS simulations using real marker genotype data of rice cultivars to evaluate the efficiency of recurrent selection and the island model in an autogamous species. Results demonstrated the importance of producing novel combinations of genes through recurrent selection. An initial population derived from admixture of multiple bi-parental crosses showed larger genetic gains than a population derived from a single bi-parental cross in whole cycles, suggesting the importance of genetic variation in an initial population. The island-model GS better maintained genetic improvement in later generations than the other GS methods, suggesting that the island-model GS can utilize genetic variation in breeding and can retain alleles with small effects in the breeding population. The island-model GS will become a new breeding method that enhances the potential of genomic selection in autogamous crops, especially bringing long-term improvement. PMID:27115872
Chávez-Galarza, Julio; Henriques, Dora; Johnston, J Spencer; Azevedo, João C; Patton, John C; Muñoz, Irene; De la Rúa, Pilar; Pinto, M Alice
2013-12-01
Understanding the genetic mechanisms of adaptive population divergence is one of the most fundamental endeavours in evolutionary biology and is becoming increasingly important as it will allow predictions about how organisms will respond to global environmental crisis. This is particularly important for the honey bee, a species of unquestionable ecological and economical importance that has been exposed to increasing human-mediated selection pressures. Here, we conducted a single nucleotide polymorphism (SNP)-based genome scan in honey bees collected across an environmental gradient in Iberia and used four FST -based outlier tests to identify genomic regions exhibiting signatures of selection. Additionally, we analysed associations between genetic and environmental data for the identification of factors that might be correlated or act as selective pressures. With these approaches, 4.4% (17 of 383) of outlier loci were cross-validated by four FST -based methods, and 8.9% (34 of 383) were cross-validated by at least three methods. Of the 34 outliers, 15 were found to be strongly associated with one or more environmental variables. Further support for selection, provided by functional genomic information, was particularly compelling for SNP outliers mapped to different genes putatively involved in the same function such as vision, xenobiotic detoxification and innate immune response. This study enabled a more rigorous consideration of selection as the underlying cause of diversity patterns in Iberian honey bees, representing an important first step towards the identification of polymorphisms implicated in local adaptation and possibly in response to recent human-mediated environmental changes. © 2013 John Wiley & Sons Ltd.
Cow genotyping strategies for genomic selection in a small dairy cattle population.
Jenko, J; Wiggans, G R; Cooper, T A; Eaglen, S A E; Luff, W G de L; Bichard, M; Pong-Wong, R; Woolliams, J A
2017-01-01
This study compares how different cow genotyping strategies increase the accuracy of genomic estimated breeding values (EBV) in dairy cattle breeds with low numbers. In these breeds, few sires have progeny records, and genotyping cows can improve the accuracy of genomic EBV. The Guernsey breed is a small dairy cattle breed with approximately 14,000 recorded individuals worldwide. Predictions of phenotypes of milk yield, fat yield, protein yield, and calving interval were made for Guernsey cows from England and Guernsey Island using genomic EBV, with training sets including 197 de-regressed proofs of genotyped bulls, with cows selected from among 1,440 genotyped cows using different genotyping strategies. Accuracies of predictions were tested using 10-fold cross-validation among the cows. Genomic EBV were predicted using 4 different methods: (1) pedigree BLUP, (2) genomic BLUP using only bulls, (3) univariate genomic BLUP using bulls and cows, and (4) bivariate genomic BLUP. Genotyping cows with phenotypes and using their data for the prediction of single nucleotide polymorphism effects increased the correlation between genomic EBV and phenotypes compared with using only bulls by 0.163±0.022 for milk yield, 0.111±0.021 for fat yield, and 0.113±0.018 for protein yield; a decrease of 0.014±0.010 for calving interval from a low base was the only exception. Genetic correlation between phenotypes from bulls and cows were approximately 0.6 for all yield traits and significantly different from 1. Only a very small change occurred in correlation between genomic EBV and phenotypes when using the bivariate model. It was always better to genotype all the cows, but when only half of the cows were genotyped, a divergent selection strategy was better compared with the random or directional selection approach. Divergent selection of 30% of the cows remained superior for the yield traits in 8 of 10 folds. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Palaiokostas, Christos; Cariou, Sophie; Bestin, Anastasia; Bruant, Jean-Sebastien; Haffray, Pierrick; Morin, Thierry; Cabon, Joëlle; Allal, François; Vandeputte, Marc; Houston, Ross D
2018-06-08
European sea bass (Dicentrarchus labrax) is one of the most important species for European aquaculture. Viral nervous necrosis (VNN), commonly caused by the redspotted grouper nervous necrosis virus (RGNNV), can result in high levels of morbidity and mortality, mainly during the larval and juvenile stages of cultured sea bass. In the absence of efficient therapeutic treatments, selective breeding for host resistance offers a promising strategy to control this disease. Our study aimed at investigating genetic resistance to VNN and genomic-based approaches to improve disease resistance by selective breeding. A population of 1538 sea bass juveniles from a factorial cross between 48 sires and 17 dams was challenged with RGNNV with mortalities and survivors being recorded and sampled for genotyping by the RAD sequencing approach. We used genome-wide genotype data from 9195 single nucleotide polymorphisms (SNPs) for downstream analysis. Estimates of heritability of survival on the underlying scale for the pedigree and genomic relationship matrices were 0.27 (HPD interval 95%: 0.14-0.40) and 0.43 (0.29-0.57), respectively. Classical genome-wide association analysis detected genome-wide significant quantitative trait loci (QTL) for resistance to VNN on chromosomes (unassigned scaffolds in the case of 'chromosome' 25) 3, 20 and 25 (P < 1e06). Weighted genomic best linear unbiased predictor provided additional support for the QTL on chromosome 3 and suggested that it explained 4% of the additive genetic variation. Genomic prediction approaches were tested to investigate the potential of using genome-wide SNP data to estimate breeding values for resistance to VNN and showed that genomic prediction resulted in a 13% increase in successful classification of resistant and susceptible animals compared to pedigree-based methods, with Bayes A and Bayes B giving the highest predictive ability. Genome-wide significant QTL were identified but each with relatively small effects on the trait. Tests of genomic prediction suggested that incorporating genome-wide SNP data is likely to result in higher accuracy of estimated breeding values for resistance to VNN. RAD sequencing is an effective method for generating such genome-wide SNPs, and our findings highlight the potential of genomic selection to breed farmed European sea bass with improved resistance to VNN.
Validating genomic reliabilities and gains from phenotypic updates
USDA-ARS?s Scientific Manuscript database
Reliability can be validated from the variance of the difference of earlier and later estimated breeding values as a fraction of the genetic variance. This new method avoids using squared correlations that can be biased downward by selection. Published genomic reliabilities of U.S. young bulls agree...
IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH
In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...
A Novel Yeast Genomics Method for Identifying New Breast Cancer Susceptibility
2005-05-01
selectable marker and tracing this marker through several passages in nonselective medium. The selectable marker will be the hygromycin phosphotransferase ... hygromycin and sensitivity to (32), thereby providing both positive and negative selectivity. The assay involved measurement of the frequency of gancyclovir
GWASinlps: Nonlocal prior based iterative SNP selection tool for genome-wide association studies.
Sanyal, Nilotpal; Lo, Min-Tzu; Kauppi, Karolina; Djurovic, Srdjan; Andreassen, Ole A; Johnson, Valen E; Chen, Chi-Hua
2018-06-19
Multiple marker analysis of the genome-wide association study (GWAS) data has gained ample attention in recent years. However, because of the ultra high-dimensionality of GWAS data, such analysis is challenging. Frequently used penalized regression methods often lead to large number of false positives, whereas Bayesian methods are computationally very expensive. Motivated to ameliorate these issues simultaneously, we consider the novel approach of using nonlocal priors in an iterative variable selection framework. We develop a variable selection method, named, iterative nonlocal prior based selection for GWAS, or GWASinlps, that combines, in an iterative variable selection framework, the computational efficiency of the screen-and-select approach based on some association learning and the parsimonious uncertainty quantification provided by the use of nonlocal priors. The hallmark of our method is the introduction of 'structured screen-and-select' strategy, that considers hierarchical screening, which is not only based on response-predictor associations, but also based on response-response associations, and concatenates variable selection within that hierarchy. Extensive simulation studies with SNPs having realistic linkage disequilibrium structures demonstrate the advantages of our computationally efficient method compared to several frequentist and Bayesian variable selection methods, in terms of true positive rate, false discovery rate, mean squared error, and effect size estimation error. Further, we provide empirical power analysis useful for study design. Finally, a real GWAS data application was considered with human height as phenotype. An R-package for implementing the GWASinlps method is available at https://cran.r-project.org/web/packages/GWASinlps/index.html. Supplementary data are available at Bioinformatics online.
Kavakiotis, Ioannis; Samaras, Patroklos; Triantafyllidis, Alexandros; Vlahavas, Ioannis
2017-11-01
Single Nucleotide Polymorphism (SNPs) are, nowadays, becoming the marker of choice for biological analyses involving a wide range of applications with great medical, biological, economic and environmental interest. Classification tasks i.e. the assignment of individuals to groups of origin based on their (multi-locus) genotypes, are performed in many fields such as forensic investigations, discrimination between wild and/or farmed populations and others. Τhese tasks, should be performed with a small number of loci, for computational as well as biological reasons. Thus, feature selection should precede classification tasks, especially for Single Nucleotide Polymorphism (SNP) datasets, where the number of features can amount to hundreds of thousands or millions. In this paper, we present a novel data mining approach, called FIFS - Frequent Item Feature Selection, based on the use of frequent items for selection of the most informative markers from population genomic data. It is a modular method, consisting of two main components. The first one identifies the most frequent and unique genotypes for each sampled population. The second one selects the most appropriate among them, in order to create the informative SNP subsets to be returned. The proposed method (FIFS) was tested on a real dataset, which comprised of a comprehensive coverage of pig breed types present in Britain. This dataset consisted of 446 individuals divided in 14 sub-populations, genotyped at 59,436 SNPs. Our method outperforms the state-of-the-art and baseline methods in every case. More specifically, our method surpassed the assignment accuracy threshold of 95% needing only half the number of SNPs selected by other methods (FIFS: 28 SNPs, Delta: 70 SNPs Pairwise FST: 70 SNPs, In: 100 SNPs.) CONCLUSION: Our approach successfully deals with the problem of informative marker selection in high dimensional genomic datasets. It offers better results compared to existing approaches and can aid biologists in selecting the most informative markers with maximum discrimination power for optimization of cost-effective panels with applications related to e.g. species identification, wildlife management, and forensics. Copyright © 2017 Elsevier Ltd. All rights reserved.
An Efficient Method for Genomic DNA Extraction from Different Molluscs Species
Pereira, Jorge C.; Chaves, Raquel; Bastos, Estela; Leitão, Alexandra; Guedes-Pinto, Henrique
2011-01-01
The selection of a DNA extraction method is a critical step when subsequent analysis depends on the DNA quality and quantity. Unlike mammals, for which several capable DNA extraction methods have been developed, for molluscs the availability of optimized genomic DNA extraction protocols is clearly insufficient. Several aspects such as animal physiology, the type (e.g., adductor muscle or gills) or quantity of tissue, can explain the lack of efficiency (quality and yield) in molluscs genomic DNA extraction procedure. In an attempt to overcome these aspects, this work describes an efficient method for molluscs genomic DNA extraction that was tested in several species from different orders: Veneridae, Ostreidae, Anomiidae, Cardiidae (Bivalvia) and Muricidae (Gastropoda), with different weight sample tissues. The isolated DNA was of high molecular weight with high yield and purity, even with reduced quantities of tissue. Moreover, the genomic DNA isolated, demonstrated to be suitable for several downstream molecular techniques, such as PCR sequencing among others. PMID:22174651
Limited Evidence for Classic Selective Sweeps in African Populations
Granka, Julie M.; Henn, Brenna M.; Gignoux, Christopher R.; Kidd, Jeffrey M.; Bustamante, Carlos D.; Feldman, Marcus W.
2012-01-01
While hundreds of loci have been identified as reflecting strong-positive selection in human populations, connections between candidate loci and specific selective pressures often remain obscure. This study investigates broader patterns of selection in African populations, which are underrepresented despite their potential to offer key insights into human adaptation. We scan for hard selective sweeps using several haplotype and allele-frequency statistics with a data set of nearly 500,000 genome-wide single-nucleotide polymorphisms in 12 highly diverged African populations that span a range of environments and subsistence strategies. We find that positive selection does not appear to be a strong determinant of allele-frequency differentiation among these African populations. Haplotype statistics do identify putatively selected regions that are shared across African populations. However, as assessed by extensive simulations, patterns of haplotype sharing between African populations follow neutral expectations and suggest that tails of the empirical distributions contain false-positive signals. After highlighting several genomic regions where positive selection can be inferred with higher confidence, we use a novel method to identify biological functions enriched among populations’ empirical tail genomic windows, such as immune response in agricultural groups. In general, however, it seems that current methods for selection scans are poorly suited to populations that, like the African populations in this study, are affected by ascertainment bias and have low levels of linkage disequilibrium, possibly old selective sweeps, and potentially reduced phasing accuracy. Additionally, population history can confound the interpretation of selection statistics, suggesting that greater care is needed in attributing broad genetic patterns to human adaptation. PMID:22960214
Genomic adaptation of admixed dairy cattle in East Africa
Kim, Eui-Soo; Rothschild, Max F.
2014-01-01
Dairy cattle in East Africa imported from the U.S. and Europe have been adapted to new environments. In small local farms, cattle have generally been maintained by crossbreeding that could increase survivability under a severe environment. Eventually, genomic ancestry of a specific breed will be nearly fixed in genomic regions of local breeds or crossbreds when it is advantageous for survival or production in harsh environments. To examine this situation, 25 Friesians and 162 local cattle produced by crossbreeding of dairy breeds in Kenya were sampled and genotyped using 50K SNPs. Using principal component analysis (PCA), the admixed local cattle were found to consist of several imported breeds, including Guernsey, Norwegian Red, and Holstein. To infer the influence of parental breeds on genomic regions, local ancestry mapping was performed based on the similarity of haplotypes. As a consequence, it appears that no genomic region has been under the complete influence of a specific parental breed. Nonetheless, the ancestry of Holstein-Friesians was substantial in most genomic regions (>80%). Furthermore, we examined the frequency of the most common haplotypes from parental breeds that have changed substantially in Kenyan crossbreds during admixture. The frequency of these haplotypes from parental breeds, which were likely to be selected in temperate regions, has deviated considerably from expected frequency in 11 genomic regions. Additionally, extended haplotype homozygosity (EHH) based methods were applied to identify the regions responding to recent selection in crossbreds, called candidate regions, resulting in seven regions that appeared to be affected by Holstein-Friesians. However, some signatures of selection were less dependent on Holsteins-Friesians, suggesting evidence of adaptation in East Africa. The analysis of local ancestry is a useful approach to understand the detailed genomic structure and may reveal regions of the genome required for specialized adaptation when combined with methods for searching for the recent changes of haplotype frequency in an admixed population. PMID:25566325
2012-01-01
Background Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation. Methods An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis. Results Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored. Conclusions The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. PMID:22462519
Lado, Bettina; Matus, Ivan; Rodríguez, Alejandra; Inostroza, Luis; Poland, Jesse; Belzile, François; del Pozo, Alejandro; Quincke, Martín; Castro, Marina; von Zitzewitz, Jarislav
2013-01-01
In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models. PMID:24082033
Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R
2017-07-05
Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.
Repar, Jelena; Warnecke, Tobias
2017-01-01
Abstract Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin–terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus–Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. PMID:28407093
Nurses' competence in genetics: An integrative review.
Wright, Helen; Zhao, Lin; Birks, Melanie; Mills, Jane
2018-01-29
The aim of this integrative review was to update a mixed method systematic review by Skirton, O'Connor, and Humphreys (2012) that reported on nurses' levels of competence in using genetics in clinical practice. Three electronic databases were searched using selected key words. Research studies published in English between January 2011 and September 2017 reporting levels of nurse competence in genetics or genomics were eligible for inclusion. The selected studies were subjected to thematic analysis. Three main themes were identified: (i) genomic knowledge and utilization, (ii) perceived relevance to practice, and (iii) genomic education. While the reviewed papers produced varied findings, many nurses were shown to have poor genomic knowledge and/or competency, and yet there was a consensus that most nurses believe genomics is important to their practice. The present review indicated that in the past 5 years nurses have made minimal progress toward achieving the core genomic competencies appropriate for clinical practice. © 2018 John Wiley & Sons Australia, Ltd.
Detection of selective sweeps in structured populations: a comparison of recent methods.
Vatsiou, Alexandra I; Bazin, Eric; Gaggiotti, Oscar E
2016-01-01
Identifying genomic regions targeted by positive selection has been a long-standing interest of evolutionary biologists. This objective was difficult to achieve until the recent emergence of next-generation sequencing, which is fostering the development of large-scale catalogues of genetic variation for increasing number of species. Several statistical methods have been recently developed to analyse these rich data sets, but there is still a poor understanding of the conditions under which these methods produce reliable results. This study aims at filling this gap by assessing the performance of genome-scan methods that consider explicitly the physical linkage among SNPs surrounding a selected variant. Our study compares the performance of seven recent methods for the detection of selective sweeps (iHS, nSL, EHHST, xp-EHH, XP-EHHST, XPCLR and hapFLK). We use an individual-based simulation approach to investigate the power and accuracy of these methods under a wide range of population models under both hard and soft sweeps. Our results indicate that XPCLR and hapFLK perform best and can detect soft sweeps under simple population structure scenarios if migration rate is low. All methods perform poorly with moderate-to-high migration rates, or with weak selection and very poorly under a hierarchical population structure. Finally, no single method is able to detect both starting and nearly completed selective sweeps. However, combining several methods (XPCLR or hapFLK with iHS or nSL) can greatly increase the power to pinpoint the selected region. © 2015 John Wiley & Sons Ltd.
SNP selection and classification of genome-wide SNP data using stratified sampling random forests.
Wu, Qingyao; Ye, Yunming; Liu, Yang; Ng, Michael K
2012-09-01
For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is too time-consuming and not favorable in GWA for high-dimensional data. The main aim of this paper is to propose a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. Our idea is to design an equal-width discretization scheme for informativeness to divide SNPs into multiple groups. In feature subspace selection, we randomly select the same number of SNPs from each group and combine them to form a subspace to generate a decision tree. The advantage of this stratified sampling procedure can make sure each subspace contains enough useful SNPs, but can avoid a very high computational cost of exhaustive search of an optimal mtry, and maintain the randomness of a random forest. We employ two genome-wide SNP data sets (Parkinson case-control data comprised of 408 803 SNPs and Alzheimer case-control data comprised of 380 157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman's random forest generation method. For Parkinson data, we also show some interesting genes identified by the method, which may be associated with neurological disorders for further biological investigations.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
Manunza, A.; Cardoso, T. F.; Noce, A.; Martínez, A.; Pons, A.; Bermejo, L. A.; Landi, V.; Sànchez, A.; Jordana, J.; Delgado, J. V.; Adán, S.; Capote, J.; Vidal, O.; Ugarte, E.; Arranz, J. J.; Calvo, J. H.; Casellas, J.; Amills, M.
2016-01-01
The goals of the current work were to analyse the population structure of 11 Spanish ovine breeds and to detect genomic regions that may have been targeted by selection. A total of 141 individuals were genotyped with the Infinium 50 K Ovine SNP BeadChip (Illumina). We combined this dataset with Spanish ovine data previously reported by the International Sheep Genomics Consortium (N = 229). Multidimensional scaling and Admixture analyses revealed that Canaria de Pelo and, to a lesser extent, Roja Mallorquina, Latxa and Churra are clearly differentiated populations, while the remaining seven breeds (Ojalada, Castellana, Gallega, Xisqueta, Ripollesa, Rasa Aragonesa and Segureña) share a similar genetic background. Performance of a genome scan with BayeScan and hapFLK allowed us identifying three genomic regions that are consistently detected with both methods i.e. Oar3 (150–154 Mb), Oar6 (4–49 Mb) and Oar13 (68–74 Mb). Neighbor-joining trees based on polymorphisms mapping to these three selective sweeps did not show a clustering of breeds according to their predominant productive specialization (except the local tree based on Oar13 SNPs). Such cryptic signatures of selection have been also found in the bovine genome, posing a considerable challenge to understand the biological consequences of artificial selection. PMID:27272025
Manunza, A; Cardoso, T F; Noce, A; Martínez, A; Pons, A; Bermejo, L A; Landi, V; Sànchez, A; Jordana, J; Delgado, J V; Adán, S; Capote, J; Vidal, O; Ugarte, E; Arranz, J J; Calvo, J H; Casellas, J; Amills, M
2016-06-07
The goals of the current work were to analyse the population structure of 11 Spanish ovine breeds and to detect genomic regions that may have been targeted by selection. A total of 141 individuals were genotyped with the Infinium 50 K Ovine SNP BeadChip (Illumina). We combined this dataset with Spanish ovine data previously reported by the International Sheep Genomics Consortium (N = 229). Multidimensional scaling and Admixture analyses revealed that Canaria de Pelo and, to a lesser extent, Roja Mallorquina, Latxa and Churra are clearly differentiated populations, while the remaining seven breeds (Ojalada, Castellana, Gallega, Xisqueta, Ripollesa, Rasa Aragonesa and Segureña) share a similar genetic background. Performance of a genome scan with BayeScan and hapFLK allowed us identifying three genomic regions that are consistently detected with both methods i.e. Oar3 (150-154 Mb), Oar6 (4-49 Mb) and Oar13 (68-74 Mb). Neighbor-joining trees based on polymorphisms mapping to these three selective sweeps did not show a clustering of breeds according to their predominant productive specialization (except the local tree based on Oar13 SNPs). Such cryptic signatures of selection have been also found in the bovine genome, posing a considerable challenge to understand the biological consequences of artificial selection.
Genomic selection of agronomic traits in hybrid rice using an NCII population.
Xu, Yang; Wang, Xin; Ding, Xiaowen; Zheng, Xingfei; Yang, Zefeng; Xu, Chenwu; Hu, Zhongli
2018-05-10
Hybrid breeding is an effective tool to improve yield in rice, while parental selection remains the key and difficult issue. Genomic selection (GS) provides opportunities to predict the performance of hybrids before phenotypes are measured. However, the application of GS is influenced by several genetic and statistical factors. Here, we used a rice North Carolina II (NC II) population constructed by crossing 115 rice varieties with five male sterile lines as a model to evaluate effects of statistical methods, heritability, marker density and training population size on prediction for hybrid performance. From the comparison of six GS methods, we found that predictabilities for different methods are significantly different, with genomic best linear unbiased prediction (GBLUP) and least absolute shrinkage and selection operation (LASSO) being the best, support vector machine (SVM) and partial least square (PLS) being the worst. The marker density has lower influence on predicting rice hybrid performance compared with the size of training population. Additionally, we used the 575 (115 × 5) hybrid rice as a training population to predict eight agronomic traits of all hybrids derived from 120 (115 + 5) rice varieties each mating with 3023 rice accessions from the 3000 rice genomes project (3 K RGP). Of the 362,760 potential hybrids, selection of the top 100 predicted hybrids would lead to 35.5%, 23.25%, 30.21%, 42.87%, 61.80%, 75.83%, 19.24% and 36.12% increase in grain yield per plant, thousand-grain weight, panicle number per plant, plant height, secondary branch number, grain number per panicle, panicle length and primary branch number, respectively. This study evaluated the factors affecting predictabilities for hybrid prediction and demonstrated the implementation of GS to predict hybrid performance of rice. Our results suggest that GS could enable the rapid selection of superior hybrids, thus increasing the efficiency of rice hybrid breeding.
Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P
2018-01-01
Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048
Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P
2018-03-01
Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.
A probabilistic method for testing and estimating selection differences between populations.
He, Yungang; Wang, Minxian; Huang, Xin; Li, Ran; Xu, Hongyang; Xu, Shuhua; Jin, Li
2015-12-01
Human populations around the world encounter various environmental challenges and, consequently, develop genetic adaptations to different selection forces. Identifying the differences in natural selection between populations is critical for understanding the roles of specific genetic variants in evolutionary adaptation. Although numerous methods have been developed to detect genetic loci under recent directional selection, a probabilistic solution for testing and quantifying selection differences between populations is lacking. Here we report the development of a probabilistic method for testing and estimating selection differences between populations. By use of a probabilistic model of genetic drift and selection, we showed that logarithm odds ratios of allele frequencies provide estimates of the differences in selection coefficients between populations. The estimates approximate a normal distribution, and variance can be estimated using genome-wide variants. This allows us to quantify differences in selection coefficients and to determine the confidence intervals of the estimate. Our work also revealed the link between genetic association testing and hypothesis testing of selection differences. It therefore supplies a solution for hypothesis testing of selection differences. This method was applied to a genome-wide data analysis of Han and Tibetan populations. The results confirmed that both the EPAS1 and EGLN1 genes are under statistically different selection in Han and Tibetan populations. We further estimated differences in the selection coefficients for genetic variants involved in melanin formation and determined their confidence intervals between continental population groups. Application of the method to empirical data demonstrated the outstanding capability of this novel approach for testing and quantifying differences in natural selection. © 2015 He et al.; Published by Cold Spring Harbor Laboratory Press.
Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.
2015-01-01
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477
Parks, Donovan H; Imelfort, Michael; Skennerton, Connor T; Hugenholtz, Philip; Tyson, Gene W
2015-07-01
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. © 2015 Parks et al.; Published by Cold Spring Harbor Laboratory Press.
Bomba, Lorenzo; Nicolazzi, Ezequiel L; Milanesi, Marco; Negrini, Riccardo; Mancini, Giordano; Biscarini, Filippo; Stella, Alessandra; Valentini, Alessio; Ajmone-Marsan, Paolo
2015-04-02
A number of methods are available to scan a genome for selection signatures by evaluating patterns of diversity within and between breeds. Among these, "extended haplotype homozygosity" (EHH) is a reliable approach to detect genome regions under recent selective pressure. The objective of this study was to use this approach to identify regions that are under recent positive selection and shared by the most representative Italian dairy and beef cattle breeds. A total of 3220 animals from Italian Holstein (2179), Italian Brown (775), Simmental (493), Marchigiana (485) and Piedmontese (379) breeds were genotyped with the Illumina BovineSNP50 BeadChip v.1. After standard quality control procedures, genotypes were phased and core haplotypes were identified. The decay of linkage disequilibrium (LD) for each core haplotype was assessed by measuring the EHH. Since accurate estimates of local recombination rates were not available, relative EHH (rEHH) was calculated for each core haplotype. Genomic regions that carry frequent core haplotypes and with significant rEHH values were considered as candidates for recent positive selection. Candidate regions were aligned across to identify signals shared by dairy or beef cattle breeds. Overall, 82 and 87 common regions were detected among dairy and beef cattle breeds, respectively. Bioinformatic analysis identified 244 and 232 genes in these common genomic regions. Gene annotation and pathway analysis showed that these genes are involved in molecular functions that are biologically related to milk or meat production. Our results suggest that a multi-breed approach can lead to the identification of genomic signatures in breeds of cattle that are selected for the same production goal and thus to the localisation of genomic regions of interest in dairy and beef production.
Selfish drive can trump function when animal mitochondrial genomes compete.
Ma, Hansong; O'Farrell, Patrick H
2016-07-01
Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection. In contrast, matchups between distantly related genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome, leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes showed that the noncoding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, in each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection, promoting change in the sequences influencing transmission.
Selfish drive can trump function when animal mitochondrial genomes compete
Ma, Hansong; O’Farrell, Patrick H.
2016-01-01
Mitochondrial genomes compete for transmission from mother to progeny. We explored this competition by introducing a second genome into Drosophila melanogaster to follow transmission. Competitions between closely related genomes favored those functional in electron transport, resulting in a host-beneficial purifying selection1. Contrastingly, matchups between distant genomes often favored those with negligible, negative or lethal consequences, indicating selfish selection. Exhibiting powerful selfish selection, a genome carrying a detrimental mutation displaced a complementing genome leading to population death after several generations. In a different pairing, opposing selfish and purifying selection counterbalanced to give stable transmission of two genomes. Sequencing of recombinant mitochondrial genomes revealed that the non-coding region, containing origins of replication, governs selfish transmission. Uniparental inheritance prevents encounters between distantly related genomes. Nonetheless, within each maternal lineage, constant competition among sibling genomes selects for super-replicators. We suggest that this relentless competition drives positive selection promoting change in the sequences influencing transmission. PMID:27270106
Wolc, Anna; Stricker, Chris; Arango, Jesus; Settar, Petek; Fulton, Janet E; O'Sullivan, Neil P; Preisinger, Rudolf; Habier, David; Fernando, Rohan; Garrick, Dorian J; Lamont, Susan J; Dekkers, Jack C M
2011-01-21
Genomic selection involves breeding value estimation of selection candidates based on high-density SNP genotypes. To quantify the potential benefit of genomic selection, accuracies of estimated breeding values (EBV) obtained with different methods using pedigree or high-density SNP genotypes were evaluated and compared in a commercial layer chicken breeding line. The following traits were analyzed: egg production, egg weight, egg color, shell strength, age at sexual maturity, body weight, albumen height, and yolk weight. Predictions appropriate for early or late selection were compared. A total of 2,708 birds were genotyped for 23,356 segregating SNP, including 1,563 females with records. Phenotypes on relatives without genotypes were incorporated in the analysis (in total 13,049 production records).The data were analyzed with a Reduced Animal Model using a relationship matrix based on pedigree data or on marker genotypes and with a Bayesian method using model averaging. Using a validation set that consisted of individuals from the generation following training, these methods were compared by correlating EBV with phenotypes corrected for fixed effects, selecting the top 30 individuals based on EBV and evaluating their mean phenotype, and by regressing phenotypes on EBV. Using high-density SNP genotypes increased accuracies of EBV up to two-fold for selection at an early age and by up to 88% for selection at a later age. Accuracy increases at an early age can be mostly attributed to improved estimates of parental EBV for shell quality and egg production, while for other egg quality traits it is mostly due to improved estimates of Mendelian sampling effects. A relatively small number of markers was sufficient to explain most of the genetic variation for egg weight and body weight.
Predicting discovery rates of genomic features.
Gravel, Simon
2014-06-01
Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. Copyright © 2014 by the Genetics Society of America.
Microsatellites as targets of natural selection.
Haasl, Ryan J; Payseur, Bret A
2013-02-01
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich's ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants.
Microsatellites as Targets of Natural Selection
Haasl, Ryan J.; Payseur, Bret A.
2013-01-01
The ability to survey polymorphism on a genomic scale has enabled genome-wide scans for the targets of natural selection. Theory that connects patterns of genetic variation to evidence of natural selection most often assumes a diallelic locus and no recurrent mutation. Although these assumptions are suitable to selection that targets single nucleotide variants, fundamentally different types of mutation generate abundant polymorphism in genomes. Moreover, recent empirical results suggest that mutationally complex, multiallelic loci including microsatellites and copy number variants are sometimes targeted by natural selection. Given their abundance, the lack of inference methods tailored to the mutational peculiarities of these types of loci represents a notable gap in our ability to interrogate genomes for signatures of natural selection. Previous theoretical investigations of mutation-selection balance at multiallelic loci include assumptions that limit their application to inference from empirical data. Focusing on microsatellites, we assess the dynamics and population-level consequences of selection targeting mutationally complex variants. We develop general models of a multiallelic fitness surface, a realistic model of microsatellite mutation, and an efficient simulation algorithm. Using these tools, we explore mutation-selection-drift equilibrium at microsatellites and investigate the mutational history and selective regime of the microsatellite that causes Friedreich’s ataxia. We characterize microsatellite selective events by their duration and cost, note similarities to sweeps from standing point variation, and conclude that it is premature to label microsatellites as ubiquitous agents of efficient adaptive change. Together, our models and simulation algorithm provide a powerful framework for statistical inference, which can be used to test the neutrality of microsatellites and other multiallelic variants. PMID:23104080
TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS
Jones, Matthew R.; Good, Jeffrey M.
2016-01-01
The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993
Muleta, Kebede T; Bulli, Peter; Zhang, Zhiwu; Chen, Xianming; Pumphrey, Michael
2017-11-01
Harnessing diversity from germplasm collections is more feasible today because of the development of lower-cost and higher-throughput genotyping methods. However, the cost of phenotyping is still generally high, so efficient methods of sampling and exploiting useful diversity are needed. Genomic selection (GS) has the potential to enhance the use of desirable genetic variation in germplasm collections through predicting the genomic estimated breeding values (GEBVs) for all traits that have been measured. Here, we evaluated the effects of various scenarios of population genetic properties and marker density on the accuracy of GEBVs in the context of applying GS for wheat ( L.) germplasm use. Empirical data for adult plant resistance to stripe rust ( f. sp. ) collected on 1163 spring wheat accessions and genotypic data based on the wheat 9K single nucleotide polymorphism (SNP) iSelect assay were used for various genomic prediction tests. Unsurprisingly, the results of the cross-validation tests demonstrated that prediction accuracy increased with an increase in training population size and marker density. It was evident that using all the available markers (5619) was unnecessary for capturing the trait variation in the germplasm collection, with no further gain in prediction accuracy beyond 1 SNP per 3.2 cM (∼1850 markers), which is close to the linkage disequilibrium decay rate in this population. Collectively, our results suggest that larger germplasm collections may be efficiently sampled via lower-density genotyping methods, whereas genetic relationships between the training and validation populations remain critical when exploiting GS to select from germplasm collections. Copyright © 2017 Crop Science Society of America.
Zvinorova, P I; Halimani, T E; Muchadeyi, F C; Matika, O; Riggio, V; Dzama, K
2016-07-30
The control of gastrointestinal nematodes (GIN) is mainly based on the use of drugs, grazing management, use of copper oxide wire particles and bioactive forages. Resistance to anthelmintic drugs in small ruminants is documented worldwide. Host genetic resistance to parasites, has been increasingly used as a complementary control strategy, along with the conventional intervention methods mentioned above. Genetic diversity in resistance to GIN has been well studied in experimental and commercial flocks in temperate climates and more developed economies. However, there are very few report outputs from the more extensive low-input/output smallholder systems in developing and emerging countries. Furthermore, results on quantitative trait loci (QTL) associated with nematode resistance from various studies have not always been consistent, mainly due to the different nematodes studied, different host breeds, ages, climates, natural infections versus artificial challenges, infection level at sampling periods, among others. The increasing use of genetic markers (Single Nucleotide Polymorphisms, SNPs) in GWAS or the use of whole genome sequence data and a plethora of analytic methods offer the potential to identify loci or regions associated nematode resistance. Genomic selection as a genome-wide level method overcomes the need to identify candidate genes. Benefits in genomic selection are now being realised in dairy cattle and sheep under commercial settings in the more advanced countries. However, despite the commercial benefits of using these tools, there are practical problems associated with incorporating the use of marker-assisted selection or genomic selection in low-input/output smallholder farming systems breeding schemes. Unlike anthelmintic resistance, there is no empirical evidence suggesting that nematodes will evolve rapidly in response to resistant hosts. The strategy of nematode control has evolved to a more practical manipulation of host-parasite equilibrium in grazing systems by implementation of various strategies, in which improvement of genetic resistance of small ruminant should be included. Therefore, selection for resistant hosts can be considered as one of the sustainable control strategy, although it will be most effective when used to complement other control strategies such as grazing management and improving efficiency of anthelmintics currently. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Automated ensemble assembly and validation of microbial genomes.
Koren, Sergey; Treangen, Todd J; Hill, Christopher M; Pop, Mihai; Phillippy, Adam M
2014-05-03
The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
The scope and strength of sex-specific selection in genome evolution
Wright, A E; Mank, J E
2013-01-01
Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. PMID:23848139
Exploring new alleles for frost tolerance in winter rye.
Erath, Wiltrud; Bauer, Eva; Fowler, D Brian; Gordillo, Andres; Korzun, Viktor; Ponomareva, Mira; Schmidt, Malthe; Schmiedchen, Brigitta; Wilde, Peer; Schön, Chris-Carolin
2017-10-01
Rye genetic resources provide a valuable source of new alleles for the improvement of frost tolerance in rye breeding programs. Frost tolerance is a must-have trait for winter cereal production in northern and continental cropping areas. Genetic resources should harbor promising alleles for the improvement of frost tolerance of winter rye elite lines. For frost tolerance breeding, the identification of quantitative trait loci (QTL) and the choice of optimum genome-based selection methods are essential. We identified genomic regions involved in frost tolerance of winter rye by QTL mapping in a biparental population derived from a highly frost tolerant selection from the Canadian cultivar Puma and the European elite line Lo157. Lines per se and their testcrosses were phenotyped in a controlled freeze test and in multi-location field trials in Russia and Canada. Three QTL on chromosomes 4R, 5R, and 7R were consistently detected across environments. The QTL on 5R is congruent with the genomic region harboring the Frost resistance locus 2 (Fr-2) in Triticeae. The Puma allele at the Fr-R2 locus was found to significantly increase frost tolerance. A comparison of predictive ability obtained from the QTL-based model with different whole-genome prediction models revealed that besides a few large, also small QTL effects contribute to the genomic variance of frost tolerance in rye. Genomic prediction models assigning a high weight to the Fr-R2 locus allow increasing the selection intensity for frost tolerance by genome-based pre-selection of promising candidates.
Selective recruitment of nuclear factors to productively replicating herpes simplex virus genomes.
Dembowski, Jill A; DeLuca, Neal A
2015-05-01
Much of the HSV-1 life cycle is carried out in the cell nucleus, including the expression, replication, repair, and packaging of viral genomes. Viral proteins, as well as cellular factors, play essential roles in these processes. Isolation of proteins on nascent DNA (iPOND) was developed to label and purify cellular replication forks. We adapted aspects of this method to label viral genomes to both image, and purify replicating HSV-1 genomes for the identification of associated proteins. Many viral and cellular factors were enriched on viral genomes, including factors that mediate DNA replication, repair, chromatin remodeling, transcription, and RNA processing. As infection proceeded, packaging and structural components were enriched to a greater extent. Among the more abundant proteins that copurified with genomes were the viral transcription factor ICP4 and the replication protein ICP8. Furthermore, all seven viral replication proteins were enriched on viral genomes, along with cellular PCNA and topoisomerases, while other cellular replication proteins were not detected. The chromatin-remodeling complexes present on viral genomes included the INO80, SWI/SNF, NURD, and FACT complexes, which may prevent chromatinization of the genome. Consistent with this conclusion, histones were not readily recovered with purified viral genomes, and imaging studies revealed an underrepresentation of histones on viral genomes. RNA polymerase II, the mediator complex, TFIID, TFIIH, and several other transcriptional activators and repressors were also affinity purified with viral DNA. The presence of INO80, NURD, SWI/SNF, mediator, TFIID, and TFIIH components is consistent with previous studies in which these complexes copurified with ICP4. Therefore, ICP4 is likely involved in the recruitment of these key cellular chromatin remodeling and transcription factors to viral genomes. Taken together, iPOND is a valuable method for the study of viral genome dynamics during infection and provides a comprehensive view of how HSV-1 selectively utilizes cellular resources.
Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T
2015-01-01
Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural. © 2014 Wiley Periodicals, Inc.
Latent feature decompositions for integrative analysis of multi-platform genomic data
Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran
2015-01-01
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492
Detecting Signatures of Positive Selection along Defined Branches of a Population Tree Using LSD.
Librado, Pablo; Orlando, Ludovic
2018-06-01
Identifying the genomic basis underlying local adaptation is paramount to evolutionary biology, and bears many applications in the fields of conservation biology, crop, and animal breeding, as well as personalized medicine. Although many approaches have been developed to detect signatures of positive selection within single populations and population pairs, the increasing wealth of high-throughput sequencing data requires improved methods capable of handling multiple, and ideally large number of, populations in a single analysis. In this study, we introduce LSD (levels of exclusively shared differences), a fast and flexible framework to perform genome-wide selection scans, along the internal and external branches of a given population tree. We use forward simulations to demonstrate that LSD can identify branches targeted by positive selection with remarkable sensitivity and specificity. We illustrate a range of potential applications by analyzing data from the 1000 Genomes Project and uncover a list of adaptive candidates accompanying the expansion of anatomically modern humans out of Africa and their spread to Europe.
Sternburg, Erin L; Dias, Kristen C; Karginov, Fedor V
2017-06-16
The CRISPR/Cas9 genome engineering system has revolutionized biology by allowing for precise genome editing with little effort. Guided by a single guide RNA (sgRNA) that confers specificity, the Cas9 protein cleaves both DNA strands at the targeted locus. The DNA break can trigger either non-homologous end joining (NHEJ) or homology directed repair (HDR). NHEJ can introduce small deletions or insertions which lead to frame-shift mutations, while HDR allows for larger and more precise perturbations. Here, we present protocols for generating knockout cell lines by coupling established CRISPR/Cas9 methods with two options for downstream selection/screening. The NHEJ approach uses a single sgRNA cut site and selection-independent screening, where protein production is assessed by dot immunoblot in a high-throughput manner. The HDR approach uses two sgRNA cut sites that span the gene of interest. Together with a provided HDR template, this method can achieve deletion of tens of kb, aided by the inserted selectable resistance marker. The appropriate applications and advantages of each method are discussed.
Genome Engineering in Bacillus anthracis Using Cre Recombinase
Pomerantsev, Andrei P.; Sitaraman, Ramakrishnan; Galloway, Craig R.; Kivovich, Violetta; Leppla, Stephen H.
2006-01-01
Genome engineering is a powerful method for the study of bacterial virulence. With the availability of the complete genomic sequence of Bacillus anthracis, it is now possible to inactivate or delete selected genes of interest. However, many current methods for disrupting or deleting more than one gene require use of multiple antibiotic resistance determinants. In this report we used an approach that temporarily inserts an antibiotic resistance marker into a selected region of the genome and subsequently removes it, leaving the target region (a single gene or a larger genomic segment) permanently mutated. For this purpose, a spectinomycin resistance cassette flanked by bacteriophage P1 loxP sites oriented as direct repeats was inserted within a selected gene. After identification of strains having the spectinomycin cassette inserted by a double-crossover event, a thermo-sensitive plasmid expressing Cre recombinase was introduced at the permissive temperature. Cre recombinase action at the loxP sites excised the spectinomycin marker, leaving a single loxP site within the targeted gene or genomic segment. The Cre-expressing plasmid was then removed by growth at the restrictive temperature. The procedure could then be repeated to mutate additional genes. In this way, we sequentially mutated two pairs of genes: pepM and spo0A, and mcrB and mrr. Furthermore, loxP sites introduced at distant genes could be recombined by Cre recombinase to cause deletion of large intervening regions. In this way, we deleted the capBCAD region of the pXO2 plasmid and the entire 30 kb of chromosomal DNA between the mcrB and mrr genes, and in the latter case we found that the 32 intervening open reading frames were not essential to growth. PMID:16369025
A multi-strategy approach to informative gene identification from gene expression data.
Liu, Ziying; Phan, Sieu; Famili, Fazel; Pan, Youlian; Lenferink, Anne E G; Cantin, Christiane; Collins, Catherine; O'Connor-McCourt, Maureen D
2010-02-01
An unsupervised multi-strategy approach has been developed to identify informative genes from high throughput genomic data. Several statistical methods have been used in the field to identify differentially expressed genes. Since different methods generate different lists of genes, it is very challenging to determine the most reliable gene list and the appropriate method. This paper presents a multi-strategy method, in which a combination of several data analysis techniques are applied to a given dataset and a confidence measure is established to select genes from the gene lists generated by these techniques to form the core of our final selection. The remainder of the genes that form the peripheral region are subject to exclusion or inclusion into the final selection. This paper demonstrates this methodology through its application to an in-house cancer genomics dataset and a public dataset. The results indicate that our method provides more reliable list of genes, which are validated using biological knowledge, biological experiments, and literature search. We further evaluated our multi-strategy method by consolidating two pairs of independent datasets, each pair is for the same disease, but generated by different labs using different platforms. The results showed that our method has produced far better results.
Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R
2017-01-01
Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.
Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V
2018-02-01
Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.
Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars
2017-02-10
The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks.
Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng
2016-11-15
Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery.
Methods for Optimizing CRISPR-Cas9 Genome Editing Specificity
Tycko, Josh; Myer, Vic E.; Hsu, Patrick D.
2016-01-01
Summary Advances in the development of delivery, repair, and specificity strategies for the CRISPR-Cas9 genome engineering toolbox are helping researchers understand gene function with unprecedented precision and sensitivity. CRISPR-Cas9 also holds enormous therapeutic potential for the treatment of genetic disorders by directly correcting disease-causing mutations. Although the Cas9 protein has been shown to bind and cleave DNA at off-target sites, the field of Cas9 specificity is rapidly progressing with marked improvements in guide RNA selection, protein and guide engineering, novel enzymes, and off-target detection methods. We review important challenges and breakthroughs in the field as a comprehensive practical guide to interested users of genome editing technologies, highlighting key tools and strategies for optimizing specificity. The genome editing community should now strive to standardize such methods for measuring and reporting off-target activity, while keeping in mind that the goal for specificity should be continued improvement and vigilance. PMID:27494557
Statistical Selection of Biological Models for Genome-Wide Association Analyses.
Bi, Wenjian; Kang, Guolian; Pounds, Stanley B
2018-05-24
Genome-wide association studies have discovered many biologically important associations of genes with phenotypes. Typically, genome-wide association analyses formally test the association of each genetic feature (SNP, CNV, etc) with the phenotype of interest and summarize the results with multiplicity-adjusted p-values. However, very small p-values only provide evidence against the null hypothesis of no association without indicating which biological model best explains the observed data. Correctly identifying a specific biological model may improve the scientific interpretation and can be used to more effectively select and design a follow-up validation study. Thus, statistical methodology to identify the correct biological model for a particular genotype-phenotype association can be very useful to investigators. Here, we propose a general statistical method to summarize how accurately each of five biological models (null, additive, dominant, recessive, co-dominant) represents the data observed for each variant in a GWAS study. We show that the new method stringently controls the false discovery rate and asymptotically selects the correct biological model. Simulations of two-stage discovery-validation studies show that the new method has these properties and that its validation power is similar to or exceeds that of simple methods that use the same statistical model for all SNPs. Example analyses of three data sets also highlight these advantages of the new method. An R package is freely available at www.stjuderesearch.org/site/depts/biostats/maew. Copyright © 2018. Published by Elsevier Inc.
Repar, Jelena; Warnecke, Tobias
2017-08-01
Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin-terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus-Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Dynamics of Genome Rearrangement in Bacterial Populations
Darling, Aaron E.; Miklós, István; Ragan, Mark A.
2008-01-01
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes. PMID:18650965
Toward Genomics-Based Breeding in C3 Cool-Season Perennial Grasses.
Talukder, Shyamal K; Saha, Malay C
2017-01-01
Most important food and feed crops in the world belong to the C3 grass family. The future of food security is highly reliant on achieving genetic gains of those grasses. Conventional breeding methods have already reached a plateau for improving major crops. Genomics tools and resources have opened an avenue to explore genome-wide variability and make use of the variation for enhancing genetic gains in breeding programs. Major C3 annual cereal breeding programs are well equipped with genomic tools; however, genomic research of C3 cool-season perennial grasses is lagging behind. In this review, we discuss the currently available genomics tools and approaches useful for C3 cool-season perennial grass breeding. Along with a general review, we emphasize the discussion focusing on forage grasses that were considered orphan and have little or no genetic information available. Transcriptome sequencing and genotype-by-sequencing technology for genome-wide marker detection using next-generation sequencing (NGS) are very promising as genomics tools. Most C3 cool-season perennial grass members have no prior genetic information; thus NGS technology will enhance collinear study with other C3 model grasses like Brachypodium and rice. Transcriptomics data can be used for identification of functional genes and molecular markers, i.e., polymorphism markers and simple sequence repeats (SSRs). Genome-wide association study with NGS-based markers will facilitate marker identification for marker-assisted selection. With limited genetic information, genomic selection holds great promise to breeders for attaining maximum genetic gain of the cool-season C3 perennial grasses. Application of all these tools can ensure better genetic gains, reduce length of selection cycles, and facilitate cultivar development to meet the future demand for food and fodder.
The scope and strength of sex-specific selection in genome evolution.
Wright, A E; Mank, J E
2013-09-01
Males and females share the vast majority of their genomes and yet are often subject to different, even conflicting, selection. Genomic and transcriptomic developments have made it possible to assess sex-specific selection at the molecular level, and it is clear that sex-specific selection shapes the evolutionary properties of several genomic characteristics, including transcription, post-transcriptional regulation, imprinting, genome structure and gene sequence. Sex-specific selection is strongly influenced by mating system, which also causes neutral evolutionary changes that affect different regions of the genome in different ways. Here, we synthesize theoretical and molecular work in order to provide a cohesive view of the role of sex-specific selection and mating system in genome evolution. We also highlight the need for a combined approach, incorporating both genomic data and experimental phenotypic studies, in order to understand precisely how sex-specific selection drives evolutionary change across the genome. © 2013 The Authors. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.
2013-01-01
Background High resolution melting analysis (HRM) is a rapid and cost-effective technique for the characterisation of PCR amplicons. Because the reverse genetics of segmented influenza A viruses allows the generation of numerous influenza A virus reassortants within a short time, methods for the rapid selection of the correct recombinants are very useful. Methods PCR primer pairs covering the single nucleotide polymorphism (SNP) positions of two different influenza A H5N1 strains were designed. Reassortants of the two different H5N1 isolates were used as a model to prove the suitability of HRM for the selection of the correct recombinants. Furthermore, two different cycler instruments were compared. Results Both cycler instruments generated comparable average melting peaks, which allowed the easy identification and selection of the correct cloned segments or reassorted viruses. Conclusions HRM is a highly suitable method for the rapid and precise characterisation of cloned influenza A genomes. PMID:24028349
Fleming, Damarius S.; Weigend, Steffen; Simianer, Henner; Weigend, Annett; Rothschild, Max; Schmidt, Carl; Ashwell, Chris; Persia, Mike; Reecy, James; Lamont, Susan J.
2017-01-01
Global climate change is increasing the magnitude of environmental stressors, such as temperature, pathogens, and drought, that limit the survivability and sustainability of livestock production. Poultry production and its expansion is dependent upon robust animals that are able to cope with stressors in multiple environments. Understanding the genetic strategies that indigenous, noncommercial breeds have evolved to survive in their environment could help to elucidate molecular mechanisms underlying biological traits of environmental adaptation. We examined poultry from diverse breeds and climates of Africa and Northern Europe for selection signatures that have allowed them to adapt to their indigenous environments. Selection signatures were studied using a combination of population genomic methods that employed FST, integrated haplotype score (iHS), and runs of homozygosity (ROH) procedures. All the analyses indicated differences in environment as a driver of selective pressure in both groups of populations. The analyses revealed unique differences in the genomic regions under selection pressure from the environment for each population. The African chickens showed stronger selection toward stress signaling and angiogenesis, while the Northern European chickens showed more selection pressure toward processes related to energy homeostasis. The results suggest that chromosomes 2 and 27 are the most diverged between populations and the most selected upon within the African (chromosome 27) and Northern European (chromosome 2) birds. Examination of the divergent populations has provided new insight into genes under possible selection related to tolerance of a population’s indigenous environment that may be baselines for examining the genomic contribution to tolerance adaptions. PMID:28341699
Fleming, Damarius S; Weigend, Steffen; Simianer, Henner; Weigend, Annett; Rothschild, Max; Schmidt, Carl; Ashwell, Chris; Persia, Mike; Reecy, James; Lamont, Susan J
2017-05-05
Global climate change is increasing the magnitude of environmental stressors, such as temperature, pathogens, and drought, that limit the survivability and sustainability of livestock production. Poultry production and its expansion is dependent upon robust animals that are able to cope with stressors in multiple environments. Understanding the genetic strategies that indigenous, noncommercial breeds have evolved to survive in their environment could help to elucidate molecular mechanisms underlying biological traits of environmental adaptation. We examined poultry from diverse breeds and climates of Africa and Northern Europe for selection signatures that have allowed them to adapt to their indigenous environments. Selection signatures were studied using a combination of population genomic methods that employed F ST , integrated haplotype score (iHS), and runs of homozygosity (ROH) procedures. All the analyses indicated differences in environment as a driver of selective pressure in both groups of populations. The analyses revealed unique differences in the genomic regions under selection pressure from the environment for each population. The African chickens showed stronger selection toward stress signaling and angiogenesis, while the Northern European chickens showed more selection pressure toward processes related to energy homeostasis. The results suggest that chromosomes 2 and 27 are the most diverged between populations and the most selected upon within the African (chromosome 27) and Northern European (chromosome 2) birds. Examination of the divergent populations has provided new insight into genes under possible selection related to tolerance of a population's indigenous environment that may be baselines for examining the genomic contribution to tolerance adaptions. Copyright © 2017 Fleming et al.
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context.
Martinez, Josue G; Carroll, Raymond J; Müller, Samuel; Sampson, Joshua N; Chatterjee, Nilanjan
2011-11-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso.
SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.
Xu, Wenxuan; Zhang, Li; Lu, Yaping
2016-06-01
The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.
Simultaneous gene finding in multiple genomes.
König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario
2016-11-15
As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Transformation-associated recombination (TAR) cloning for genomics studies and synthetic biology
Kouprina, Natalay; Larionov, Vladimir
2016-01-01
Transformation-associated recombination (TAR) cloning represents a unique tool for isolation and manipulation of large DNA molecules. The technique exploits a high level of homologous recombination in the yeast Sacharomyces cerevisiae. So far, TAR cloning is the only method available to selectively recover chromosomal segments up to 300 kb in length from complex and simple genomes. In addition, TAR cloning allows the assembly and cloning of entire microbe genomes up to several Mb as well as engineering of large metabolic pathways. In this review, we summarize applications of TAR cloning for functional/structural genomics and synthetic biology. PMID:27116033
Gamete selection for forage quality improvement in tall fescue
USDA-ARS?s Scientific Manuscript database
Within the Festuca-Lolium genome complex there is a need for modern breeding approaches that facilitate the rapid development of improved germplasm or cultivars. Traditional recurrent or mass-selection methods for population or synthetic development are labor intensive and time consuming. The use ...
LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.
Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon
2008-07-28
Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Minimal-assumption inference from population-genomic data
NASA Astrophysics Data System (ADS)
Weissman, Daniel; Hallatschek, Oskar
Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
Advances and Challenges in Genomic Selection for Disease Resistance.
Poland, Jesse; Rutkoski, Jessica
2016-08-04
Breeding for disease resistance is a central focus of plant breeding programs, as any successful variety must have the complete package of high yield, disease resistance, agronomic performance, and end-use quality. With the need to accelerate the development of improved varieties, genomics-assisted breeding is becoming an important tool in breeding programs. With marker-assisted selection, there has been success in breeding for disease resistance; however, much of this work and research has focused on identifying, mapping, and selecting for major resistance genes that tend to be highly effective but vulnerable to breakdown with rapid changes in pathogen races. In contrast, breeding for minor-gene quantitative resistance tends to produce more durable varieties but is a more challenging breeding objective. As the genetic architecture of resistance shifts from single major R genes to a diffused architecture of many minor genes, the best approach for molecular breeding will shift from marker-assisted selection to genomic selection. Genomics-assisted breeding for quantitative resistance will therefore necessitate whole-genome prediction models and selection methodology as implemented for classical complex traits such as yield. Here, we examine multiple case studies testing whole-genome prediction models and genomic selection for disease resistance. In general, whole-genome models for disease resistance can produce prediction accuracy suitable for application in breeding. These models also largely outperform multiple linear regression as would be applied in marker-assisted selection. With the implementation of genomic selection for yield and other agronomic traits, whole-genome marker profiles will be available for the entire set of breeding lines, enabling genomic selection for disease at no additional direct cost. In this context, the scope of implementing genomics selection for disease resistance, and specifically for quantitative resistance and quarantined pathogens, becomes a tractable and powerful approach in breeding programs.
The hunt for origins of DNA replication in multicellular eukaryotes
Urban, John M.; Foulk, Michael S.; Casella, Cinzia
2015-01-01
Origins of DNA replication (ORIs) occur at defined regions in the genome. Although DNA sequence defines the position of ORIs in budding yeast, the factors for ORI specification remain elusive in metazoa. Several methods have been used recently to map ORIs in metazoan genomes with the hope that features for ORI specification might emerge. These methods are reviewed here with analysis of their advantages and shortcomings. The various factors that may influence ORI selection for initiation of DNA replication are discussed. PMID:25926981
2011-01-01
Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol. PMID:21205303
The Strength of Selection against Neanderthal Introgression
Juric, Ivan
2016-01-01
Hybridization between humans and Neanderthals has resulted in a low level of Neanderthal ancestry scattered across the genomes of many modern-day humans. After hybridization, on average, selection appears to have removed Neanderthal alleles from the human population. Quantifying the strength and causes of this selection against Neanderthal ancestry is key to understanding our relationship to Neanderthals and, more broadly, how populations remain distinct after secondary contact. Here, we develop a novel method for estimating the genome-wide average strength of selection and the density of selected sites using estimates of Neanderthal allele frequency along the genomes of modern-day humans. We confirm that East Asians had somewhat higher initial levels of Neanderthal ancestry than Europeans even after accounting for selection. We find that the bulk of purifying selection against Neanderthal ancestry is best understood as acting on many weakly deleterious alleles. We propose that the majority of these alleles were effectively neutral—and segregating at high frequency—in Neanderthals, but became selected against after entering human populations of much larger effective size. While individually of small effect, these alleles potentially imposed a heavy genetic load on the early-generation human–Neanderthal hybrids. This work suggests that differences in effective population size may play a far more important role in shaping levels of introgression than previously thought. PMID:27824859
Edea, Z; Hong, J-K; Jung, J-H; Kim, D-W; Kim, Y-M; Kim, E-S; Shin, S S; Jung, Y C; Kim, K-S
2017-08-01
The development of high throughput genotyping techniques has facilitated the identification of selection signatures of pigs. The detection of genomic selection signals in a population subjected to differential selection pressures may provide insights into the genes associated with economically and biologically important traits. To identify genomic regions under selection, we genotyped 488 Duroc (D) pigs and 155 D × Korean native pigs (DKNPs) using the Porcine SNP70K BeadChip. By applying the F ST and extended haplotype homozygosity (EHH-Rsb) methods, we detected genes under directional selection associated with growth/stature (DOCK7, PLCB4, HS2ST1, FBP2 and TG), carcass and meat quality (TG, COL14A1, FBXO5, NR3C1, SNX7, ARHGAP26 and DPYD), number of teats (LOC100153159 and LRRC1), pigmentation (MME) and ear morphology (SOX5), which are all mostly near or at fixation. These results could be a basis for investigating the underlying mutations associated with observed phenotypic variation. Validation using genome-wide association analysis would also facilitate the inclusion of some of these markers in genetic evaluation programs. © 2017 Stichting International Foundation for Animal Genetics.
Sun, J-T; Jin, P-Y; Hoffmann, A A; Duan, X-Z; Dai, J; Hu, G; Xue, X-F; Hong, X-Y
2018-05-24
There is increasing evidence that mitochondrial genomes (mitogenomes) can be under selection, whereas the selective regimes shaping mitogenome evolution remain largely unclear. To test for mitochondrial genome evolution in relation to the climate adaptation, we explored mtDNA variation in two spider mite (Tetranychus) species, which distribute across different climates. We sequenced 26 complete mitogenomes of T. truncatus which occurs in both warm and cold regions, and 9 complete mitogenomes of T. pueraricola which is only restricted in warm regions. Patterns of evolution in the two species mitogenomes were compared through a series of d N /d S methods and physicochemical profiles of amino acid replacements. We found that (1) the mitogenomes of both species were under widespread purifying selection. (2) Elevated directional adaptive selection was observed in the T. truncatus mitogenome, perhaps linked to the cold climates adaptation of T. truncatus. (3) The strength of selection varied across genes, and diversifying positive selection detected on ND4 and ATP6 pointed to their crucial roles during adaptation to different climatic conditions. This study gained insight into the mitogenome evolution in relation to the climate adaptation. This article is protected by copyright. All rights reserved. © 2018 The Royal Entomological Society.
Zhou, Zhan; Zou, Yangyun; Liu, Gangbiao; Zhou, Jingqi; Wu, Jingcheng; Zhao, Shimin; Su, Zhixi; Gu, Xun
2017-08-29
Human genes exhibit different effects on fitness in cancer and normal cells. Here, we present an evolutionary approach to measure the selection pressure on human genes, using the well-known ratio of the nonsynonymous to synonymous substitution rate in both cancer genomes ( C N / C S ) and normal populations ( p N / p S ). A new mutation-profile-based method that adopts sample-specific mutation rate profiles instead of conventional substitution models was developed. We found that cancer-specific selection pressure is quite different from the selection pressure at the species and population levels. Both the relaxation of purifying selection on passenger mutations and the positive selection of driver mutations may contribute to the increased C N / C S values of human genes in cancer genomes compared with the p N / p S values in human populations. The C N / C S values also contribute to the improved classification of cancer genes and a better understanding of the onco-functionalization of cancer genes during oncogenesis. The use of our computational pipeline to identify cancer-specific positively and negatively selected genes may provide useful information for understanding the evolution of cancers and identifying possible targets for therapeutic intervention.
Identification of coding and non-coding mutational hotspots in cancer genomes.
Piraino, Scott W; Furney, Simon J
2017-01-05
The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.
Thermodynamically optimal whole-genome tiling microarray design and validation.
Cho, Hyejin; Chou, Hui-Hsien
2016-06-13
Microarray is an efficient apparatus to interrogate the whole transcriptome of species. Microarray can be designed according to annotated gene sets, but the resulted microarrays cannot be used to identify novel transcripts and this design method is not applicable to unannotated species. Alternatively, a whole-genome tiling microarray can be designed using only genomic sequences without gene annotations, and it can be used to detect novel RNA transcripts as well as known genes. The difficulty with tiling microarray design lies in the tradeoff between probe-specificity and coverage of the genome. Sequence comparison methods based on BLAST or similar software are commonly employed in microarray design, but they cannot precisely determine the subtle thermodynamic competition between probe targets and partially matched probe nontargets during hybridizations. Using the whole-genome thermodynamic analysis software PICKY to design tiling microarrays, we can achieve maximum whole-genome coverage allowable under the thermodynamic constraints of each target genome. The resulted tiling microarrays are thermodynamically optimal in the sense that all selected probes share the same melting temperature separation range between their targets and closest nontargets, and no additional probes can be added without violating the specificity of the microarray to the target genome. This new design method was used to create two whole-genome tiling microarrays for Escherichia coli MG1655 and Agrobacterium tumefaciens C58 and the experiment results validated the design.
Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths
Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T
2014-01-01
How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method. PMID:26146586
Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.
Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T
How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.
A Novel Tool for Microbial Genome Editing Using the Restriction-Modification System.
Bai, Hua; Deng, Aihua; Liu, Shuwen; Cui, Di; Qiu, Qidi; Wang, Laiyou; Yang, Zhao; Wu, Jie; Shang, Xiuling; Zhang, Yun; Wen, Tingyi
2018-01-19
Scarless genetic manipulation of genomes is an essential tool for biological research. The restriction-modification (R-M) system is a defense system in bacteria that protects against invading genomes on the basis of its ability to distinguish foreign DNA from self DNA. Here, we designed an R-M system-mediated genome editing (RMGE) technique for scarless genetic manipulation in different microorganisms. For bacteria with Type IV REase, an RMGE technique using the inducible DNA methyltransferase gene, bceSIIM (RMGE-bceSIIM), as the counter-selection cassette was developed to edit the genome of Escherichia coli. For bacteria without Type IV REase, an RMGE technique based on a restriction endonuclease (RMGE-mcrA) was established in Bacillus subtilis. These techniques were successfully used for gene deletion and replacement with nearly 100% counter-selection efficiencies, which were higher and more stable compared to conventional methods. Furthermore, precise point mutation without limiting sites was achieved in E. coli using RMGE-bceSIIM to introduce a single base mutation of A128C into the rpsL gene. In addition, the RMGE-mcrA technique was applied to delete the CAN1 gene in Saccharomyces cerevisiae DAY414 with 100% counter-selection efficiency. The effectiveness of the RMGE technique in E. coli, B. subtilis, and S. cerevisiae suggests the potential universal usefulness of this technique for microbial genome manipulation.
Tamate, Satoshi; Iwasaki, Watal M; Krysko, Kenneth L; Camposano, Brian J; Mori, Hideaki; Funayama, Ryo; Nakayama, Keiko; Makino, Takashi; Kawata, Masakado
2017-12-21
Invaded species often can rapidly expand and establish in novel environments through adaptive evolution, resulting in devastating effects on native communities. However, it is unclear if genetic variation at whole-genomic levels is actually reduced in the introduced populations and which genetic changes have occurred responding to adaptation to new environments. In the 1960s, Anolis carolinensis was introduced onto one of the Ogasawara Islands, Japan, and subsequently expanded its range rapidly throughout two of the islands. Morphological comparison showed that lower hindlimb length in the introduced populations tended to be longer than those in its native Florida populations. Using re-sequenced whole genomic data, we estimated that the effective population size at the time of introduction was actually small (less than 50). We also inferred putative genomic regions subject to natural selection after this introduction event using SweeD and a method based on Tajima's D, π and F ST . Five candidate genes that were potentially subject to selection were estimated by both methods. The results suggest that there were standing variations that could potentially contribute to adaptation to nonnative environments despite the founder population being small.
2010-01-01
Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788
Via, Sara
2012-01-01
In allopatric populations, geographical separation simultaneously isolates the entire genome, allowing genetic divergence to accumulate virtually anywhere in the genome. In sympatric populations, however, the strong divergent selection required to overcome migration produces a genetic mosaic of divergent and non-divergent genomic regions. In some recent genome scans, each divergent genomic region has been interpreted as an independent incidence of migration/selection balance, such that the reduction of gene exchange is restricted to a few kilobases around each divergently selected gene. I propose an alternative mechanism, ‘divergence hitchhiking’ (DH), in which divergent selection can reduce gene exchange for several megabases around a gene under strong divergent selection. Not all genes/markers within a DH region are divergently selected, yet the entire region is protected to some degree from gene exchange, permitting genetic divergence from mechanisms other than divergent selection to accumulate secondarily. After contrasting DH and multilocus migration/selection balance (MM/SB), I outline a model in which genomic isolation at a given genomic location is jointly determined by DH and genome-wide effects of the progressive reduction in realized migration, then illustrate DH using data from several pairs of incipient species in the wild. PMID:22201174
Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark
2015-01-01
Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.
Lefébure, Tristan; Stanhope, Michael J
2007-01-01
Background The genus Streptococcus is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 Streptococcus genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts. Results Streptococcus genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. S. agalactiae has a large pan-genome, with little recombination in its core-genome, while S. pyogenes has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for S. agalactiae compared to S. pyogenes. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. S. suis was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss. Conclusion Recombination is an important evolutionary force in shaping Streptococcus genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different Streptococcus species to different hosts. PMID:17475002
Machine learning applications in genetics and genomics.
Libbrecht, Maxwell W; Noble, William Stafford
2015-06-01
The field of machine learning, which aims to develop computer algorithms that improve with experience, holds promise to enable computers to assist humans in the analysis of large, complex data sets. Here, we provide an overview of machine learning applications for the analysis of genome sequencing data sets, including the annotation of sequence elements and epigenetic, proteomic or metabolomic data. We present considerations and recurrent challenges in the application of supervised, semi-supervised and unsupervised machine learning methods, as well as of generative and discriminative modelling approaches. We provide general guidelines to assist in the selection of these machine learning methods and their practical application for the analysis of genetic and genomic data sets.
LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms
Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean
2015-01-01
Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. PMID:26377960
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Berry, Nadine Kaye; Bain, Nicole L; Enjeti, Anoop K; Rowlings, Philip
2014-01-01
Aim To evaluate the role of whole genome comparative genomic hybridisation microarray (array-CGH) in detecting genomic imbalances as compared to conventional karyotype (GTG-analysis) or myeloma specific fluorescence in situ hybridisation (FISH) panel in a diagnostic setting for plasma cell dyscrasia (PCD). Methods A myeloma-specific interphase FISH (i-FISH) panel was carried out on CD138 PC-enriched bone marrow (BM) from 20 patients having BM biopsies for evaluation of PCD. Whole genome array-CGH was performed on reference (control) and neoplastic (test patient) genomic DNA extracted from CD138 PC-enriched BM and analysed. Results Comparison of techniques demonstrated a much higher detection rate of genomic imbalances using array-CGH. Genomic imbalances were detected in 1, 19 and 20 patients using GTG-analysis, i-FISH and array-CGH, respectively. Genomic rearrangements were detected in one patient using GTG-analysis and seven patients using i-FISH, while none were detected using array-CGH. I-FISH was the most sensitive method for detecting gene rearrangements and GTG-analysis was the least sensitive method overall. All copy number aberrations observed in GTG-analysis were detected using array-CGH and i-FISH. Conclusions We show that array-CGH performed on CD138-enriched PCs significantly improves the detection of clinically relevant and possibly novel genomic abnormalities in PCD, and thus could be considered as a standard diagnostic technique in combination with IGH rearrangement i-FISH. PMID:23969274
Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc
2017-01-01
Abstract The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. PMID:28981708
Mojo Hand, a TALEN design tool for genome editing applications.
Neff, Kevin L; Argue, David P; Ma, Alvin C; Lee, Han B; Clark, Karl J; Ekker, Stephen C
2013-01-16
Recent studies of transcription activator-like (TAL) effector domains fused to nucleases (TALENs) demonstrate enormous potential for genome editing. Effective design of TALENs requires a combination of selecting appropriate genetic features, finding pairs of binding sites based on a consensus sequence, and, in some cases, identifying endogenous restriction sites for downstream molecular genetic applications. We present the web-based program Mojo Hand for designing TAL and TALEN constructs for genome editing applications (http://www.talendesign.org). We describe the algorithm and its implementation. The features of Mojo Hand include (1) automatic download of genomic data from the National Center for Biotechnology Information, (2) analysis of any DNA sequence to reveal pairs of binding sites based on a user-defined template, (3) selection of restriction-enzyme recognition sites in the spacer between the TAL monomer binding sites including options for the selection of restriction enzyme suppliers, and (4) output files designed for subsequent TALEN construction using the Golden Gate assembly method. Mojo Hand enables the rapid identification of TAL binding sites for use in TALEN design. The assembly of TALEN constructs, is also simplified by using the TAL-site prediction program in conjunction with a spreadsheet management aid of reagent concentrations and TALEN formulation. Mojo Hand enables scientists to more rapidly deploy TALENs for genome editing applications.
A trait stacking system via intra-genomic homologous recombination.
Kumar, Sandeep; Worden, Andrew; Novak, Stephen; Lee, Ryan; Petolino, Joseph F
2016-11-01
A gene targeting method has been developed, which allows the conversion of 'breeding stacks', containing unlinked transgenes into a 'molecular stack' and thereby circumventing the breeding challenges associated with transgene segregation. A gene targeting method has been developed for converting two unlinked trait loci into a single locus transgene stack. The method utilizes intra-genomic homologous recombination (IGHR) between stably integrated target and donor loci which share sequence homology and nuclease cleavage sites whereby the donor contains a promoterless herbicide resistance transgene. Upon crossing with a zinc finger nuclease (ZFN)-expressing plant, double-strand breaks (DSB) are created in both the stably integrated target and donor loci. DSBs flanking the donor locus result in intra-genomic mobilization of a promoterless selectable marker-containing donor sequence, which can be utilized as a template for homology-directed repair of a concomitant DSB at the target locus resulting in a functional selectable marker via nuclease-mediated cassette exchange (NMCE). The method was successfully demonstrated in maize using a glyphosate tolerance gene as a donor whereby up to 3.3 % of the resulting progeny embryos cultured on selection medium regenerated plants with the donor sequence integrated into the target locus. The process could be extended to multiple cycles of trait stacking by virtue of a unique intron sequence homology for NMCE between the target and the donor loci. This is the first report that describes NMCE via IGHR, thereby enabling trait stacking using conventional crossing.
SuperDCA for genome-wide epistasis analysis.
Puranen, Santeri; Pesonen, Maiju; Pensar, Johan; Xu, Ying Ying; Lees, John A; Bentley, Stephen D; Croucher, Nicholas J; Corander, Jukka
2018-05-29
The potential for genome-wide modelling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has previously been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10 4 -10 5 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here, we introduce a novel inference method (SuperDCA) that employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 10 5 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA, thus, holds considerable potential in building understanding about numerous organisms at a systems biological level.
A genome-wide scan for signatures of selection in Azeri and Khuzestani buffalo breeds.
Mokhber, Mahdi; Moradi-Shahrbabak, Mohammad; Sadeghi, Mostafa; Moradi-Shahrbabak, Hossein; Stella, Alessandra; Nicolzzi, Ezequiel; Rahmaninia, Javad; Williams, John L
2018-06-11
Identification of genomic regions that have been targets of selection may shed light on the genetic history of livestock populations and help to identify variation controlling commercially important phenotypes. The Azeri and Kuzestani buffalos are the most common indigenous Iranian breeds which have been subjected to divergent selection and are well adapted to completely different regions. Examining the genetic structure of these populations may identify genomic regions associated with adaptation to the different environments and production goals. A set of 385 water buffalo samples from Azeri (N = 262) and Khuzestani (N = 123) breeds were genotyped using the Axiom® Buffalo Genotyping 90 K Array. The unbiased fixation index method (F ST ) was used to detect signatures of selection. In total, 13 regions with outlier F ST values (0.1%) were identified. Annotation of these regions using the UMD3.1 Bos taurus Genome Assembly was performed to find putative candidate genes and QTLs within the selected regions. Putative candidate genes identified include FBXO9, NDFIP1, ACTR3, ARHGAP26, SERPINF2, BOLA-DRB3, BOLA-DQB, CLN8, and MYOM2. Candidate genes identified in regions potentially under selection were associated with physiological pathways including milk production, cytoskeleton organization, growth, metabolic function, apoptosis and domestication-related changes include immune and nervous system development. The QTL identified are involved in economically important traits in buffalo related to milk composition, udder structure, somatic cell count, meat quality, and carcass and body weight.
Liu, Lei; Ang, Keng Pee; Elliott, J A K; Kent, Matthew Peter; Lien, Sigbjørn; MacDonald, Danielle; Boulding, Elizabeth Grace
2017-03-01
Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.
Taye, Mengistie; Lee, Wonseok; Caetano-Anolles, Kelsey; Dessie, Tadelle; Hanotte, Olivier; Mwai, Okeyo Ally; Kemp, Stephen; Cho, Seoae; Oh, Sung Jong; Lee, Hak-Kyo; Kim, Heebal
2017-12-01
As African indigenous cattle evolved in a hot tropical climate, they have developed an inherent thermotolerance; survival mechanisms include a light-colored and shiny coat, increased sweating, and cellular and molecular mechanisms to cope with high environmental temperature. Here, we report the positive selection signature of genes in African cattle breeds which contribute for their heat tolerance mechanisms. We compared the genomes of five indigenous African cattle breeds with the genomes of four commercial cattle breeds using cross-population composite likelihood ratio (XP-CLR) and cross-population extended haplotype homozygosity (XP-EHH) statistical methods. We identified 296 (XP-EHH) and 327 (XP-CLR) positively selected genes. Gene ontology analysis resulted in 41 biological process terms and six Kyoto Encyclopedia of Genes and Genomes pathways. Several genes and pathways were found to be involved in oxidative stress response, osmotic stress response, heat shock response, hair and skin properties, sweat gland development and sweating, feed intake and metabolism, and reproduction functions. The genes and pathways identified directly or indirectly contribute to the superior heat tolerance mechanisms in African cattle populations. The result will improve our understanding of the biological mechanisms of heat tolerance in African cattle breeds and opens an avenue for further study. © 2017 Japanese Society of Animal Science.
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) simultaneously incorporates dense SNP marker genotypes with phenotypic data from related animals to predict animal-specific genomic breeding value (GEBV), which circumvents the need to measure the disease phenotype in potential breeders. Marker assisted selection (MAS) involv...
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.
Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger
2018-04-19
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
McElroy, Michel S; Navarro, Alberto J R; Mustiga, Guiliana; Stack, Conrad; Gezan, Salvador; Peña, Geover; Sarabia, Widem; Saquicela, Diego; Sotomayor, Ignacio; Douglas, Gavin M; Migicovsky, Zoë; Amores, Freddy; Tarqui, Omar; Myles, Sean; Motamayor, Juan C
2018-01-01
Cacao ( Theobroma cacao ) is a globally important crop, and its yield is severely restricted by disease. Two of the most damaging diseases, witches' broom disease (WBD) and frosty pod rot disease (FPRD), are caused by a pair of related fungi: Moniliophthora perniciosa and Moniliophthora roreri , respectively. Resistant cultivars are the most effective long-term strategy to address Moniliophthora diseases, but efficiently generating resistant and productive new cultivars will require robust methods for screening germplasm before field testing. Marker-assisted selection (MAS) and genomic selection (GS) provide two potential avenues for predicting the performance of new genotypes, potentially increasing the selection gain per unit time. To test the effectiveness of these two approaches, we performed a genome-wide association study (GWAS) and GS on three related populations of cacao in Ecuador genotyped with a 15K single nucleotide polymorphism (SNP) microarray for three measures of WBD infection (vegetative broom, cushion broom, and chirimoya pod), one of FPRD (monilia pod) and two productivity traits (total fresh weight of pods and % healthy pods produced). GWAS yielded several SNPs associated with disease resistance in each population, but none were significantly correlated with the same trait in other populations. Genomic selection, using one population as a training set to estimate the phenotypes of the remaining two (composed of different families), varied among traits, from a mean prediction accuracy of 0.46 (vegetative broom) to 0.15 (monilia pod), and varied between training populations. Simulations demonstrated that selecting seedlings using GWAS markers alone generates no improvement over selecting at random, but that GS improves the selection process significantly. Our results suggest that the GWAS markers discovered here are not sufficiently predictive across diverse germplasm to be useful for MAS, but that using all markers in a GS framework holds substantial promise in accelerating disease-resistance in cacao.
Application of Genomic In Situ Hybridization in Horticultural Science
Ramzan, Fahad; Lim, Ki-Byung
2017-01-01
Molecular cytogenetic techniques, such as in situ hybridization methods, are admirable tools to analyze the genomic structure and function, chromosome constituents, recombination patterns, alien gene introgression, genome evolution, aneuploidy, and polyploidy and also genome constitution visualization and chromosome discrimination from different genomes in allopolyploids of various horticultural crops. Using GISH advancement as multicolor detection is a significant approach to analyze the small and numerous chromosomes in fruit species, for example, Diospyros hybrids. This analytical technique has proved to be the most exact and effective way for hybrid status confirmation and helps remarkably to distinguish donor parental genomes in hybrids such as Clivia, Rhododendron, and Lycoris ornamental hybrids. The genome characterization facilitates in hybrid selection having potential desirable characteristics during the early hybridization breeding, as this technique expedites to detect introgressed sequence chromosomes. This review study epitomizes applications and advancements of genomic in situ hybridization (GISH) techniques in horticultural plants. PMID:28459054
Genome-wide selection components analysis in a fish with male pregnancy.
Flanagan, Sarah P; Jones, Adam G
2017-04-01
A major goal of evolutionary biology is to identify the genome-level targets of natural and sexual selection. With the advent of next-generation sequencing, whole-genome selection components analysis provides a promising avenue in the search for loci affected by selection in nature. Here, we implement a genome-wide selection components analysis in the sex role reversed Gulf pipefish, Syngnathus scovelli. Our approach involves a double-digest restriction-site associated DNA sequencing (ddRAD-seq) technique, applied to adult females, nonpregnant males, pregnant males, and their offspring. An F ST comparison of allele frequencies among these groups reveals 47 genomic regions putatively experiencing sexual selection, as well as 468 regions showing a signature of differential viability selection between males and females. A complementary likelihood ratio test identifies similar patterns in the data as the F ST analysis. Sexual selection and viability selection both tend to favor the rare alleles in the population. Ultimately, we conclude that genome-wide selection components analysis can be a useful tool to complement other approaches in the effort to pinpoint genome-level targets of selection in the wild. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
Lin, Zibei; Shi, Fan; Hayes, Ben J; Daetwyler, Hans D
2017-05-01
Heuristic genomic inbreeding controls reduce inbreeding in genomic breeding schemes without reducing genetic gain. Genomic selection is increasingly being implemented in plant breeding programs to accelerate genetic gain of economically important traits. However, it may cause significant loss of genetic diversity when compared with traditional schemes using phenotypic selection. We propose heuristic strategies to control the rate of inbreeding in outbred plants, which can be categorised into three types: controls during mate allocation, during selection, and simultaneous selection and mate allocation. The proposed mate allocation measure GminF allocates two or more parents for mating in mating groups that minimise coancestry using a genomic relationship matrix. Two types of relationship-adjusted genomic breeding values for parent selection candidates ([Formula: see text]) and potential offspring ([Formula: see text]) are devised to control inbreeding during selection and even enabling simultaneous selection and mate allocation. These strategies were tested in a case study using a simulated perennial ryegrass breeding scheme. As compared to the genomic selection scheme without controls, all proposed strategies could significantly decrease inbreeding while achieving comparable genetic gain. In particular, the scenario using [Formula: see text] in simultaneous selection and mate allocation reduced inbreeding to one-third of the original genomic selection scheme. The proposed strategies are readily applicable in any outbred plant breeding program.
Identifying Loci Under Selection Against Gene Flow in Isolation-with-Migration Models
Sousa, Vitor C.; Carneiro, Miguel; Ferrand, Nuno; Hey, Jody
2013-01-01
When divergence occurs in the presence of gene flow, there can arise an interesting dynamic in which selection against gene flow, at sites associated with population-specific adaptations or genetic incompatibilities, can cause net gene flow to vary across the genome. Loci linked to sites under selection may experience reduced gene flow and may experience genetic bottlenecks by the action of nearby selective sweeps. Data from histories such as these may be poorly fitted by conventional neutral model approaches to demographic inference, which treat all loci as equally subject to forces of genetic drift and gene flow. To allow for demographic inference in the face of such histories, as well as the identification of loci affected by selection, we developed an isolation-with-migration model that explicitly provides for variation among genomic regions in migration rates and/or rates of genetic drift. The method allows for loci to fall into any of multiple groups, each characterized by a different set of parameters, thus relaxing the assumption that all loci share the same demography. By grouping loci, the method can be applied to data with multiple loci and still have tractable dimensionality and statistical power. We studied the performance of the method using simulated data, and we applied the method to study the divergence of two subspecies of European rabbits (Oryctolagus cuniculus). PMID:23457232
Genome engineering using a synthetic gene circuit in Bacillus subtilis.
Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun
2015-03-31
Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes
2017-01-01
Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.
Beyond the standard plate count: genomic views into microbial food ecology
USDA-ARS?s Scientific Manuscript database
Food spoilage is a complex process that involves multiple species with specific niches and metabolic processes; bacterial culturing techniques are the traditional methods for identifying the microbes responsible. These culture-dependent methods may be considered selective, targeting the isolation of...
Gourlay, Louise J; Peano, Clelia; Deantonio, Cecilia; Perletti, Lucia; Pietrelli, Alessandro; Villa, Riccardo; Matterazzo, Elena; Lassaux, Patricia; Santoro, Claudio; Puccio, Simone; Sblattero, Daniele; Bolognesi, Martino
2015-11-01
The 1.8 Å resolution crystal structure of a conserved domain of the potential Burkholderia pseudomallei antigen and trimeric autotransporter BPSL2063 is presented as a structural vaccinology target for melioidosis vaccine development. Since BPSL2063 (1090 amino acids) hosts only one conserved domain, and the expression/purification of the full-length protein proved to be problematic, a domain-filtering library was generated using β-lactamase as a reporter gene to select further BPSL2063 domains. As a result, two domains (D1 and D2) were identified and produced in soluble form in Escherichia coli. Furthermore, as a general tool, a genomic open reading frame-filtering library from the B. pseudomallei genome was also constructed to facilitate the selection of domain boundaries from the entire ORFeome. Such an approach allowed the selection of three potential protein antigens that were also produced in soluble form. The results imply the further development of ORF-filtering methods as a tool in protein-based research to improve the selection and production of soluble proteins or domains for downstream applications such as X-ray crystallography.
Applications of Genomic Selection in Breeding Wheat for Rust Resistance.
Ornella, Leonardo; González-Camacho, Juan Manuel; Dreisigacker, Susanne; Crossa, Jose
2017-01-01
There are a lot of methods developed to predict untested phenotypes in schemes commonly used in genomic selection (GS) breeding. The use of GS for predicting disease resistance has its own particularities: (a) most populations shows additivity in quantitative adult plant resistance (APR); (b) resistance needs effective combinations of major and minor genes; and (c) phenotype is commonly expressed in ordinal categorical traits, whereas most parametric applications assume that the response variable is continuous and normally distributed. Machine learning methods (MLM) can take advantage of examples (data) that capture characteristics of interest from an unknown underlying probability distribution (i.e., data-driven). We introduce some state-of-the-art MLM capable to predict rust resistance in wheat. We also present two parametric R packages for the reader to be able to compare.
Hodzic, Jasin; Gurbeta, Lejla; Omanovic-Miklicanin, Enisa; Badnjevic, Almir
2017-01-01
Introduction: Major advancements in DNA sequencing methods introduced in the first decade of the new millennium initiated a rapid expansion of sequencing studies, which yielded a tremendous amount of DNA sequence data, including whole sequenced genomes of various species, including plants. A set of novel sequencing platforms, often collectively named as “next-generation sequencing” (NGS) completely transformed the life sciences, by allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor. Purpose: of this paper is to present an overview NGS platforms used to produce the current compendium of published draft genomes of various plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and to determine the most frequently used platform for the whole genome sequencing of plants in light of genotypization of immortelle plant. Materials and methods: 45 papers were selected (with 47 presented plant genome draft sequences), and utilized sequencing techniques and NGS platforms (Roche/454, ABI/SOLiD and Illumina/Solexa) in selected papers were determined. Subsequently, frequency of usage of each platform or combination of platforms was calculated. Results: Illumina/Solexa platforms are by used either as sole sequencing tool in 40.42% of published genomes, or in combination with other platforms - additional 48.94% of published genomes, followed by Roche/454 platforms, used in combination with traditional Sanger sequencing method (10.64%), and never as a sole tool. ABI/SOLiD was only used in combination with Illumina/Solexa and Roche/454 in 4.25% of publications. Conclusions: Illumina/Solexa platforms are by far most preferred by researchers, most probably due to most affordable sequencing costs. Taking into consideration the current economic situation in the Balkans region, Illumina Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region. PMID:28974852
Barría, Agustín; Christensen, Kris A.; Yoshida, Grazyella M.; Correa, Katharina; Jedlicki, Ana; Lhorente, Jean P.; Davidson, William S.; Yáñez, José M.
2018-01-01
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD. PMID:29440129
Rolf, Megan M; Taylor, Jeremy F; Schnabel, Robert D; McKay, Stephanie D; McClure, Matthew C; Northcutt, Sally L; Kerley, Monty S; Weaber, Robert L
2010-04-19
Molecular estimates of breeding value are expected to increase selection response due to improvements in the accuracy of selection and a reduction in generation interval, particularly for traits that are difficult or expensive to record or are measured late in life. Several statistical methods for incorporating molecular data into breeding value estimation have been proposed, however, most studies have utilized simulated data in which the generated linkage disequilibrium may not represent the targeted livestock population. A genomic relationship matrix was developed for 698 Angus steers and 1,707 Angus sires using 41,028 single nucleotide polymorphisms and breeding values were estimated using feed efficiency phenotypes (average daily feed intake, residual feed intake, and average daily gain) recorded on the steers. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Results were compared to estimates produced from pedigree-based mixed model analysis of 862 Angus steers with 34,864 identified paternal relatives but no female ancestors. Estimates of additive genetic variance and breeding value accuracies were similar for AFI and RFI using the numerator and genomic relationship matrices despite fewer animals in the genomic analysis. Bootstrap analyses indicated that 2,500-10,000 markers are required for robust estimation of genomic relationship matrices in cattle. This research shows that breeding values and their accuracies may be estimated for commercially important sires for traits recorded in experimental populations without the need for pedigree data to establish identity by descent between members of the commercial and experimental populations when at least 2,500 SNPs are available for the generation of a genomic relationship matrix.
Tollenaere, C; Jacquet, S; Ivanova, S; Loiseau, A; Duplantier, J-M; Streiff, R; Brouat, C
2013-01-01
Genome scans using amplified fragment length polymorphism (AFLP) markers became popular in nonmodel species within the last 10 years, but few studies have tried to characterize the anonymous outliers identified. This study follows on from an AFLP genome scan in the black rat (Rattus rattus), the reservoir of plague (Yersinia pestis infection) in Madagascar. We successfully sequenced 17 of the 22 markers previously shown to be potentially affected by plague-mediated selection and associated with a plague resistance phenotype. Searching these sequences in the genome of the closely related species Rattus norvegicus assigned them to 14 genomic regions, revealing a random distribution of outliers in the genome (no clustering). We compared these results with those of an in silico AFLP study of the R. norvegicus genome, which showed that outlier sequences could not have been inferred by this method in R. rattus (only four of the 15 sequences were predicted). However, in silico analysis allowed the prediction of AFLP markers distribution and the estimation of homoplasy rates, confirming its potential utility for designing AFLP studies in nonmodel species. The 14 genomic regions surrounding AFLP outliers (less than 300 kb from the marker) contained 75 genes encoding proteins of known function, including nine involved in immune function and pathogen defence. We identified the two interleukin 1 genes (Il1a and Il1b) that share homology with an antigen of Y. pestis, as the best candidates for genes subject to plague-mediated natural selection. At least six other genes known to be involved in proinflammatory pathways may also be affected by plague-mediated selection. © 2012 Blackwell Publishing Ltd.
Signatures of selection in tilapia revealed by whole genome resequencing.
Xia, Jun Hong; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Wan, Zi Yi; Li, Jiale; Lin, Haoran; Yue, Gen Hua
2015-09-16
Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10-100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia.
Evolutionary and comparative analyses of the soybean genome
Cannon, Steven B.; Shoemaker, Randy C.
2012-01-01
The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483
Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain
2016-06-03
Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail.
Wragg, David; Marti-Marimon, Maria; Basso, Benjamin; Bidanel, Jean-Pierre; Labarthe, Emmanuelle; Bouchez, Olivier; Le Conte, Yves; Vignal, Alain
2016-01-01
Four main evolutionary lineages of A. mellifera have been described including eastern Europe (C) and western and northern Europe (M). Many apiculturists prefer bees from the C lineage due to their docility and high productivity. In France, the routine importation of bees from the C lineage has resulted in the widespread admixture of bees from the M lineage. The haplodiploid nature of the honeybee Apis mellifera, and its small genome size, permits affordable and extensive genomics studies. As a pilot study of a larger project to characterise French honeybee populations, we sequenced 60 drones sampled from two commercial populations managed for the production of honey and royal jelly. Results indicate a C lineage origin, whilst mitochondrial analysis suggests two drones originated from the O lineage. Analysis of heterozygous SNPs identified potential copy number variants near to genes encoding odorant binding proteins and several cytochrome P450 genes. Signatures of selection were detected using the hapFLK haplotype-based method, revealing several regions under putative selection for royal jelly production. The framework developed during this study will be applied to a broader sampling regime, allowing the genetic diversity of French honeybees to be characterised in detail. PMID:27255426
Jonas, Elisabeth; de Koning, Dirk-Jan
2015-01-01
Genomic selection is a promising development in agriculture, aiming improved production by exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. It opens opportunities for research, as novel algorithms and lab methodologies are developed. Genomic selection can be applied in many breeds and species. Further research on the implementation of genomic selection (GS) in breeding programs is highly desirable not only for the common good, but also the private sector (breeding companies). It has been projected that this approach will improve selection routines, especially in species with long reproduction cycles, late or sex-limited or expensive trait recording and for complex traits. The task of integrating GS into existing breeding programs is, however, not straightforward. Despite successful integration into breeding programs for dairy cattle, it has yet to be shown how much emphasis can be given to the genomic information and how much additional phenotypic information is needed from new selection candidates. Genomic selection is already part of future planning in many breeding companies of pigs and beef cattle among others, but further research is needed to fully estimate how effective the use of genomic information will be for the prediction of the performance of future breeding stock. Genomic prediction of production in crossbreeding and across-breed schemes, costs and choice of individuals for genotyping are reasons for a reluctance to fully rely on genomic information for selection decisions. Breeding objectives are highly dependent on the industry and the additional gain when using genomic information has to be considered carefully. This review synthesizes some of the suggested approaches in selected livestock species including cattle, pig, chicken, and fish. It outlines tasks to help understanding possible consequences when applying genomic information in breeding scenarios. PMID:25750652
Jonas, Elisabeth; de Koning, Dirk-Jan
2015-01-01
Genomic selection is a promising development in agriculture, aiming improved production by exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. It opens opportunities for research, as novel algorithms and lab methodologies are developed. Genomic selection can be applied in many breeds and species. Further research on the implementation of genomic selection (GS) in breeding programs is highly desirable not only for the common good, but also the private sector (breeding companies). It has been projected that this approach will improve selection routines, especially in species with long reproduction cycles, late or sex-limited or expensive trait recording and for complex traits. The task of integrating GS into existing breeding programs is, however, not straightforward. Despite successful integration into breeding programs for dairy cattle, it has yet to be shown how much emphasis can be given to the genomic information and how much additional phenotypic information is needed from new selection candidates. Genomic selection is already part of future planning in many breeding companies of pigs and beef cattle among others, but further research is needed to fully estimate how effective the use of genomic information will be for the prediction of the performance of future breeding stock. Genomic prediction of production in crossbreeding and across-breed schemes, costs and choice of individuals for genotyping are reasons for a reluctance to fully rely on genomic information for selection decisions. Breeding objectives are highly dependent on the industry and the additional gain when using genomic information has to be considered carefully. This review synthesizes some of the suggested approaches in selected livestock species including cattle, pig, chicken, and fish. It outlines tasks to help understanding possible consequences when applying genomic information in breeding scenarios.
Signatures of Long-Term Balancing Selection in Human Genomes
de Filippo, Cesare; Teixeira, João C; Schmidt, Joshua M; Kleinert, Philip; Meyer, Diogo; Andrés, Aida M
2018-01-01
Abstract Balancing selection maintains advantageous diversity in populations through various mechanisms. Although extensively explored from a theoretical perspective, an empirical understanding of its prevalence and targets lags behind our knowledge of positive selection. Here, we describe the Non-central Deviation (NCD), a simple yet powerful statistic to detect long-term balancing selection (LTBS) that quantifies how close frequencies are to expectations under LTBS, and provides the basis for a neutrality test. NCD can be applied to a single locus or genomic data, and can be implemented considering only polymorphisms (NCD1) or also considering fixed differences with respect to an outgroup (NCD2) species. Incorporating fixed differences improves power, and NCD2 has higher power to detect LTBS in humans under different frequencies of the balanced allele(s) than other available methods. Applied to genome-wide data from African and European human populations, in both cases using chimpanzee as an outgroup, NCD2 shows that, albeit not prevalent, LTBS affects a sizable portion of the genome: ∼0.6% of analyzed genomic windows and 0.8% of analyzed positions. Significant windows (P < 0.0001) contain 1.6% of SNPs in the genome, which disproportionally fall within exons and change protein sequence, but are not enriched in putatively regulatory sites. These windows overlap ∼8% of the protein-coding genes, and these have larger number of transcripts than expected by chance even after controlling for gene length. Our catalog includes known targets of LTBS but a majority of them (90%) are novel. As expected, immune-related genes are among those with the strongest signatures, although most candidates are involved in other biological functions, suggesting that LTBS potentially influences diverse human phenotypes. PMID:29608730
Jenko, Janez; Gorjanc, Gregor; Cleveland, Matthew A; Varshney, Rajeev K; Whitelaw, C Bruce A; Woolliams, John A; Hickey, John M
2015-07-02
Genome editing (GE) is a method that enables specific nucleotides in the genome of an individual to be changed. To date, use of GE in livestock has focussed on simple traits that are controlled by a few quantitative trait nucleotides (QTN) with large effects. The aim of this study was to evaluate the potential of GE to improve quantitative traits that are controlled by many QTN, referred to here as promotion of alleles by genome editing (PAGE). Multiple scenarios were simulated to test alternative PAGE strategies for a quantitative trait. They differed in (i) the number of edits per sire (0 to 100), (ii) the number of edits per generation (0 to 500), and (iii) the extent of use of PAGE (i.e. editing all sires or only a proportion of them). The base line scenario involved selecting individuals on true breeding values (i.e., genomic selection only (GS only)-genomic selection with perfect accuracy) for several generations. Alternative scenarios complemented this base line scenario with PAGE (GS + PAGE). The effect of different PAGE strategies was quantified by comparing response to selection, changes in allele frequencies, the number of distinct QTN edited, the sum of absolute effects of the edited QTN per generation, and inbreeding. Response to selection after 20 generations was between 1.08 and 4.12 times higher with GS + PAGE than with GS only. Increases in response to selection were larger with more edits per sire and more sires edited. When the total resources for PAGE were limited, editing a few sires for many QTN resulted in greater response to selection and inbreeding compared to editing many sires for a few QTN. Between the scenarios GS only and GS + PAGE, there was little difference in the average change in QTN allele frequencies, but there was a major difference for the QTN with the largest effects. The sum of the effects of the edited QTN decreased across generations. This study showed that PAGE has great potential for application in livestock breeding programs, but inbreeding needs to be managed.
[Genetic system for maintaining the mitochondrial human genome in yeast Yarrowia lipolytica].
Isakova, E P; Deryabina, Yu I; Velyakova, A V; Biryukova, J K; Teplova, V V; Shevelev, A B
2016-01-01
For the first time, the possibility of maintaining an intact human mitochondrial genome in a heterologous system in the mitochondria of yeast Yarrowia lipolytica is shown. A method for introducing directional changes into the structure of the mitochondrial human genome replicating in Y. lipolytica by an artificially induced ability of yeast mitochondria for homologous recombination is proposed. A method of introducing and using phenotypic selection markers for the presence or absence of defects in genes tRNA-Lys and tRNA-Leu of the mitochondrial genome is developed. The proposed system can be used to correct harmful mutations of the human mitochondrial genome associated with mitochondrial diseases and for preparative amplification of intact mitochondrial DNA with an adjusted sequence in yeast cells. The applicability of the new system for the correction of mutations in the genes of Lys- and Leu-specific tRNAs of the human mitochondrial genome associated with serious and widespread human mitochondrial diseases such as myoclonic epilepsy with lactic acidosis (MELAS) and myoclonic epilepsy with ragged-red fibers (MERRF) is shown.
Empirical Performance of Cross-Validation With Oracle Methods in a Genomics Context
Martinez, Josue G.; Carroll, Raymond J.; Müller, Samuel; Sampson, Joshua N.; Chatterjee, Nilanjan
2012-01-01
When employing model selection methods with oracle properties such as the smoothly clipped absolute deviation (SCAD) and the Adaptive Lasso, it is typical to estimate the smoothing parameter by m-fold cross-validation, for example, m = 10. In problems where the true regression function is sparse and the signals large, such cross-validation typically works well. However, in regression modeling of genomic studies involving Single Nucleotide Polymorphisms (SNP), the true regression functions, while thought to be sparse, do not have large signals. We demonstrate empirically that in such problems, the number of selected variables using SCAD and the Adaptive Lasso, with 10-fold cross-validation, is a random variable that has considerable and surprising variation. Similar remarks apply to non-oracle methods such as the Lasso. Our study strongly questions the suitability of performing only a single run of m-fold cross-validation with any oracle method, and not just the SCAD and Adaptive Lasso. PMID:22347720
Genome-wide scan for selection signatures in six cattle breeds in South Africa.
Makina, Sithembile O; Muchadeyi, Farai C; van Marle-Köster, Este; Taylor, Jerry F; Makgahlela, Mahlako L; Maiwashe, Azwihangwisi
2015-11-26
The detection of selection signatures in breeds of livestock species can contribute to the identification of regions of the genome that are, or have been, functionally important and, as a consequence, have been targeted by selection. This study used two approaches to detect signatures of selection within and between six cattle breeds in South Africa, including Afrikaner (n = 44), Nguni (n = 54), Drakensberger (n = 47), Bonsmara (n = 44), Angus (n = 31) and Holstein (n = 29). The first approach was based on the detection of genomic regions in which haplotypes have been driven towards complete fixation within breeds. The second approach identified regions of the genome that had very different allele frequencies between populations (F ST). Forty-seven candidate genomic regions were identified as harbouring putative signatures of selection using both methods. Twelve of these candidate selected regions were shared among the breeds and ten were validated by previous studies. Thirty-three of these regions were successfully annotated and candidate genes were identified. Among these genes the keratin genes (KRT222, KRT24, KRT25, KRT26, and KRT27) and one heat shock protein gene (HSPB9) on chromosome 19 between 42,896,570 and 42,897,840 bp were detected for the Nguni breed. These genes were previously associated with adaptation to tropical environments in Zebu cattle. In addition, a number of candidate genes associated with the nervous system (WNT5B, FMOD, PRELP, and ATP2B), immune response (CYM, CDC6, and CDK10), production (MTPN, IGFBP4, TGFB1, and AJAP1) and reproductive performance (ADIPOR2, OVOS2, and RBBP8) were also detected as being under selection. The results presented here provide a foundation for detecting mutations that underlie genetic variation of traits that have economic importance for cattle breeds in South Africa.
Behura, Susanta K; Severson, David W
2013-02-01
Codon usage bias refers to the phenomenon where specific codons are used more often than other synonymous codons during translation of genes, the extent of which varies within and among species. Molecular evolutionary investigations suggest that codon bias is manifested as a result of balance between mutational and translational selection of such genes and that this phenomenon is widespread across species and may contribute to genome evolution in a significant manner. With the advent of whole-genome sequencing of numerous species, both prokaryotes and eukaryotes, genome-wide patterns of codon bias are emerging in different organisms. Various factors such as expression level, GC content, recombination rates, RNA stability, codon position, gene length and others (including environmental stress and population size) can influence codon usage bias within and among species. Moreover, there has been a continuous quest towards developing new concepts and tools to measure the extent of codon usage bias of genes. In this review, we outline the fundamental concepts of evolution of the genetic code, discuss various factors that may influence biased usage of synonymous codons and then outline different principles and methods of measurement of codon usage bias. Finally, we discuss selected studies performed using whole-genome sequences of different insect species to show how codon bias patterns vary within and among genomes. We conclude with generalized remarks on specific emerging aspects of codon bias studies and highlight the recent explosion of genome-sequencing efforts on arthropods (such as twelve Drosophila species, species of ants, honeybee, Nasonia and Anopheles mosquitoes as well as the recent launch of a genome-sequencing project involving 5000 insects and other arthropods) that may help us to understand better the evolution of codon bias and its biological significance. © 2012 The Authors. Biological Reviews © 2012 Cambridge Philosophical Society.
Research progress of plant population genomics based on high-throughput sequencing.
Wang, Yun-sheng
2016-08-01
Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
NASA Astrophysics Data System (ADS)
Zhan, Aibin; Bao, Zhenmin; Hu, Xiaoli; Lu, Wei; Hu, Jingjie
2009-06-01
Microsatellite markers have become one kind of the most important molecular tools used in various researches. A large number of microsatellite markers are required for the whole genome survey in the fields of molecular ecology, quantitative genetics and genomics. Therefore, it is extremely necessary to select several versatile, low-cost, efficient and time- and labor-saving methods to develop a large panel of microsatellite markers. In this study, we used Zhikong scallop ( Chlamys farreri) as the target species to compare the efficiency of the five methods derived from three strategies for microsatellite marker development. The results showed that the strategy of constructing small insert genomic DNA library resulted in poor efficiency, while the microsatellite-enriched strategy highly improved the isolation efficiency. Although the mining public database strategy is time- and cost-saving, it is difficult to obtain a large number of microsatellite markers, mainly due to the limited sequence data of non-model species deposited in public databases. Based on the results in this study, we recommend two methods, microsatellite-enriched library construction method and FIASCO-colony hybridization method, for large-scale microsatellite marker development. Both methods were derived from the microsatellite-enriched strategy. The experimental results obtained from Zhikong scallop also provide the reference for microsatellite marker development in other species with large genomes.
LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms.
Money, Daniel; Gardner, Kyle; Migicovsky, Zoë; Schwaninger, Heidi; Zhong, Gan-Yuan; Myles, Sean
2015-09-15
Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-quality reference genomes and panels of reference genotypes that aid in imputation accuracy. In nonmodel organisms, however, genetic and physical maps often are either of poor quality or are completely absent, and there are no panels of reference genotypes available. There is therefore a need for imputation methods designed specifically for nonmodel organisms in which genomic resources are poorly developed and marker order is unreliable or unknown. Here we introduce LinkImpute, a software package based on a k-nearest neighbor genotype imputation method, LD-kNNi, which is designed for unordered markers. No physical or genetic maps are required, and it is designed to work on unphased genotype data from heterozygous species. It exploits the fact that markers useful for imputation often are not physically close to the missing genotype but rather distributed throughout the genome. Using genotyping-by-sequencing data from diverse and heterozygous accessions of apples, grapes, and maize, we compare LD-kNNi with several genotype imputation methods and show that LD-kNNi is fast, comparable in accuracy to the best-existing methods, and exhibits the least bias in allele frequency estimates. Copyright © 2015 Money et al.
A multi-landing pad DNA integration platform for mammalian cell engineering
Gaidukov, Leonid; Wroblewska, Liliana; Teague, Brian; Nelson, Tom; Zhang, Xin; Liu, Yan; Jagtap, Kalpana; Mamo, Selamawit; Tseng, Wen Allen; Lowe, Alexis; Das, Jishnu; Bandara, Kalpanie; Baijuraj, Swetha; Summers, Nevin M; Zhang, Lin; Weiss, Ron
2018-01-01
Abstract Engineering mammalian cell lines that stably express many transgenes requires the precise insertion of large amounts of heterologous DNA into well-characterized genomic loci, but current methods are limited. To facilitate reliable large-scale engineering of CHO cells, we identified 21 novel genomic sites that supported stable long-term expression of transgenes, and then constructed cell lines containing one, two or three ‘landing pad’ recombination sites at selected loci. By using a highly efficient BxB1 recombinase along with different selection markers at each site, we directed recombinase-mediated insertion of heterologous DNA to selected sites, including targeting all three with a single transfection. We used this method to controllably integrate up to nine copies of a monoclonal antibody, representing about 100 kb of heterologous DNA in 21 transcriptional units. Because the integration was targeted to pre-validated loci, recombinant protein expression remained stable for weeks and additional copies of the antibody cassette in the integrated payload resulted in a linear increase in antibody expression. Overall, this multi-copy site-specific integration platform allows for controllable and reproducible insertion of large amounts of DNA into stable genomic sites, which has broad applications for mammalian synthetic biology, recombinant protein production and biomanufacturing. PMID:29617873
Vive la résistance: genome-wide selection against introduced alleles in invasive hybrid zones
Kovach, Ryan P.; Hand, Brian K.; Hohenlohe, Paul A.; Cosart, Ted F.; Boyer, Matthew C.; Neville, Helen H.; Muhlfeld, Clint C.; Amish, Stephen J.; Carim, Kellie; Narum, Shawn R.; Lowe, Winsor H.; Allendorf, Fred W.; Luikart, Gordon
2016-01-01
Evolutionary and ecological consequences of hybridization between native and invasive species are notoriously complicated because patterns of selection acting on non-native alleles can vary throughout the genome and across environments. Rapid advances in genomics now make it feasible to assess locus-specific and genome-wide patterns of natural selection acting on invasive introgression within and among natural populations occupying diverse environments. We quantified genome-wide patterns of admixture across multiple independent hybrid zones of native westslope cutthroat trout and invasive rainbow trout, the world's most widely introduced fish, by genotyping 339 individuals from 21 populations using 9380 species-diagnostic loci. A significantly greater proportion of the genome appeared to be under selection favouring native cutthroat trout (rather than rainbow trout), and this pattern was pervasive across the genome (detected on most chromosomes). Furthermore, selection against invasive alleles was consistent across populations and environments, even in those where rainbow trout were predicted to have a selective advantage (warm environments). These data corroborate field studies showing that hybrids between these species have lower fitness than the native taxa, and show that these fitness differences are due to selection favouring many native genes distributed widely throughout the genome.
A programmable method for massively parallel targeted sequencing
Hopmans, Erik S.; Natsoulis, Georges; Bell, John M.; Grimes, Susan M.; Sieh, Weiva; Ji, Hanlee P.
2014-01-01
We have developed a targeted resequencing approach referred to as Oligonucleotide-Selective Sequencing. In this study, we report a series of significant improvements and novel applications of this method whereby the surface of a sequencing flow cell is modified in situ to capture specific genomic regions of interest from a sample and then sequenced. These improvements include a fully automated targeted sequencing platform through the use of a standard Illumina cBot fluidics station. Targeting optimization increased the yield of total on-target sequencing data 2-fold compared to the previous iteration, while simultaneously increasing the percentage of reads that could be mapped to the human genome. The described assays cover up to 1421 genes with a total coverage of 5.5 Megabases (Mb). We demonstrate a 10-fold abundance uniformity of greater than 90% in 1 log distance from the median and a targeting rate of up to 95%. We also sequenced continuous genomic loci up to 1.5 Mb while simultaneously genotyping SNPs and genes. Variants with low minor allele fraction were sensitively detected at levels of 5%. Finally, we determined the exact breakpoint sequence of cancer rearrangements. Overall, this approach has high performance for selective sequencing of genome targets, configuration flexibility and variant calling accuracy. PMID:24782526
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
Quantifying Selection with Pool-Seq Time Series Data.
Taus, Thomas; Futschik, Andreas; Schlötterer, Christian
2017-11-01
Allele frequency time series data constitute a powerful resource for unraveling mechanisms of adaptation, because the temporal dimension captures important information about evolutionary forces. In particular, Evolve and Resequence (E&R), the whole-genome sequencing of replicated experimentally evolving populations, is becoming increasingly popular. Based on computer simulations several studies proposed experimental parameters to optimize the identification of the selection targets. No such recommendations are available for the underlying parameters selection strength and dominance. Here, we introduce a highly accurate method to estimate selection parameters from replicated time series data, which is fast enough to be applied on a genome scale. Using this new method, we evaluate how experimental parameters can be optimized to obtain the most reliable estimates for selection parameters. We show that the effective population size (Ne) and the number of replicates have the largest impact. Because the number of time points and sequencing coverage had only a minor effect, we suggest that time series analysis is feasible without major increase in sequencing costs. We anticipate that time series analysis will become routine in E&R studies. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Kim, Kwondo; Jung, Jaehoon; Caetano-Anollés, Kelsey; Sung, Samsun; Yoo, DongAhn; Choi, Bong-Hwan; Kim, Hyung-Chul; Jeong, Jin-Young; Cho, Yong-Min; Park, Eung-Woo; Choi, Tae-Jeong; Park, Byoungho; Lim, Dajeong
2018-01-01
Artificial selection has been demonstrated to have a rapid and significant effect on the phenotype and genome of an organism. However, most previous studies on artificial selection have focused solely on genomic sequences modified by artificial selection or genomic sequences associated with a specific trait. In this study, we generated whole genome sequencing data of 126 cattle under artificial selection, and 24,973,862 single nucleotide variants to investigate the relationship among artificial selection, genomic sequences and trait. Using runs of homozygosity detected by the variants, we showed increase of inbreeding for decades, and at the same time demonstrated a little influence of recent inbreeding on body weight. Also, we could identify ~0.2 Mb runs of homozygosity segment which may be created by recent artificial selection. This approach may aid in development of genetic markers directly influenced by artificial selection, and provide insight into the process of artificial selection. PMID:29561881
Roux, Julien; Liu, Jialin; Robinson-Rechavi, Marc
2017-11-01
The evolutionary history of vertebrates is marked by three ancient whole-genome duplications: two successive rounds in the ancestor of vertebrates, and a third one specific to teleost fishes. Biased loss of most duplicates enriched the genome for specific genes, such as slow evolving genes, but this selective retention process is not well understood. To understand what drives the long-term preservation of duplicate genes, we characterized duplicated genes in terms of their expression patterns. We used a new method of expression enrichment analysis, TopAnat, applied to in situ hybridization data from thousands of genes from zebrafish and mouse. We showed that the presence of expression in the nervous system is a good predictor of a higher rate of retention of duplicate genes after whole-genome duplication. Further analyses suggest that purifying selection against the toxic effects of misfolded or misinteracting proteins, which is particularly strong in nonrenewing neural tissues, likely constrains the evolution of coding sequences of nervous system genes, leading indirectly to the preservation of duplicate genes after whole-genome duplication. Whole-genome duplications thus greatly contributed to the expansion of the toolkit of genes available for the evolution of profound novelties of the nervous system at the base of the vertebrate radiation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Signatures of selection in tilapia revealed by whole genome resequencing
Hong Xia, Jun; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Yi Wan, Zi; Li, Jiale; Lin, Haoran; Hua Yue, Gen
2015-01-01
Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10–100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia. PMID:26373374
Clear: Composition of Likelihoods for Evolve and Resequence Experiments.
Iranmehr, Arya; Akbari, Ali; Schlötterer, Christian; Bafna, Vineet
2017-06-01
The advent of next generation sequencing technologies has made whole-genome and whole-population sampling possible, even for eukaryotes with large genomes. With this development, experimental evolution studies can be designed to observe molecular evolution "in action" via evolve-and-resequence (E&R) experiments. Among other applications, E&R studies can be used to locate the genes and variants responsible for genetic adaptation. Most existing literature on time-series data analysis often assumes large population size, accurate allele frequency estimates, or wide time spans. These assumptions do not hold in many E&R studies. In this article, we propose a method-composition of likelihoods for evolve-and-resequence experiments (Clear)-to identify signatures of selection in small population E&R experiments. Clear takes whole-genome sequences of pools of individuals as input, and properly addresses heterogeneous ascertainment bias resulting from uneven coverage. Clear also provides unbiased estimates of model parameters, including population size, selection strength, and dominance, while being computationally efficient. Extensive simulations show that Clear achieves higher power in detecting and localizing selection over a wide range of parameters, and is robust to variation of coverage. We applied the Clear statistic to multiple E&R experiments, including data from a study of adaptation of Drosophila melanogaster to alternating temperatures and a study of outcrossing yeast populations, and identified multiple regions under selection with genome-wide significance. Copyright © 2017 by the Genetics Society of America.
Mogollon, Catherin Marin; van Pul, Fiona J A; Imai, Takashi; Ramesar, Jai; Chevalley-Maurel, Séverine; de Roo, Guido M; Veld, Sabrina A J; Kroeze, Hans; Franke-Fayard, Blandine M D; Janse, Chris J; Khan, Shahid M
2016-01-01
The CRISPR/Cas9 system is a powerful genome editing technique employed in a wide variety of organisms including recently the human malaria parasite, P. falciparum. Here we report on further improvements to the CRISPR/Cas9 transfection constructs and selection protocol to more rapidly modify the P. falciparum genome and to introduce transgenes into the parasite genome without the inclusion of drug-selectable marker genes. This method was used to stably integrate the gene encoding GFP into the P. falciparum genome under the control of promoters of three different Plasmodium genes (calmodulin, gapdh and hsp70). These genes were selected as they are highly transcribed in blood stages. We show that the three reporter parasite lines generated in this study (GFP@cam, GFP@gapdh and GFP@hsp70) have in vitro blood stage growth kinetics and drug-sensitivity profiles comparable to the parental P. falciparum (NF54) wild-type line. Both asexual and sexual blood stages of the three reporter lines expressed GFP-fluorescence with GFP@hsp70 having the highest fluorescent intensity in schizont stages as shown by flow cytometry analysis of GFP-fluorescence intensity. The improved CRISPR/Cas9 constructs/protocol will aid in the rapid generation of transgenic and modified P. falciparum parasites, including those expressing different reporters proteins under different (stage specific) promoters.
Mogollon, Catherin Marin; van Pul, Fiona J. A.; Imai, Takashi; Ramesar, Jai; Chevalley-Maurel, Séverine; de Roo, Guido M.; Veld, Sabrina A. J.; Kroeze, Hans; Franke-Fayard, Blandine M. D.; Janse, Chris J.
2016-01-01
The CRISPR/Cas9 system is a powerful genome editing technique employed in a wide variety of organisms including recently the human malaria parasite, P. falciparum. Here we report on further improvements to the CRISPR/Cas9 transfection constructs and selection protocol to more rapidly modify the P. falciparum genome and to introduce transgenes into the parasite genome without the inclusion of drug-selectable marker genes. This method was used to stably integrate the gene encoding GFP into the P. falciparum genome under the control of promoters of three different Plasmodium genes (calmodulin, gapdh and hsp70). These genes were selected as they are highly transcribed in blood stages. We show that the three reporter parasite lines generated in this study (GFP@cam, GFP@gapdh and GFP@hsp70) have in vitro blood stage growth kinetics and drug-sensitivity profiles comparable to the parental P. falciparum (NF54) wild-type line. Both asexual and sexual blood stages of the three reporter lines expressed GFP-fluorescence with GFP@hsp70 having the highest fluorescent intensity in schizont stages as shown by flow cytometry analysis of GFP-fluorescence intensity. The improved CRISPR/Cas9 constructs/protocol will aid in the rapid generation of transgenic and modified P. falciparum parasites, including those expressing different reporters proteins under different (stage specific) promoters. PMID:27997583
Qanbari, Saber; Strom, Tim M.; Haberer, Georg; Weigend, Steffen; Gheyas, Almas A.; Turner, Frances; Burt, David W.; Preisinger, Rudolf; Gianola, Daniel; Simianer, Henner
2012-01-01
In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes. PMID:23209582
Bourret, Vincent; Dionne, Mélanie; Bernatchez, Louis
2014-09-01
Wild populations of Atlantic salmon have declined worldwide. While the causes for this decline may be complex and numerous, increased mortality at sea is predicted to be one of the major contributing factors. Examining the potential changes occurring in the genome-wide composition of populations during this migration has the potential to tease apart some of the factors influencing marine mortality. Here, we genotyped 5568 SNPs in Atlantic salmon populations representing two distinct regional genetic groups and across two cohorts to test for differential allelic and genotypic frequencies between juveniles (smolts) migrating to sea and adults (grilses) returning to freshwater after 1 year at sea. Given the complexity of the traits potentially associated with sea mortality, we contrasted the outcomes of a single-locus F(ST) based genome scan method with a new multilocus framework to test for genetically based differential mortality at sea. While numerous outliers were identified by the single-locus analysis, no evidence for parallel, temporally repeated selection was found. In contrast, the multilocus approach detected repeated patterns of selection for a multilocus group of 34 covarying SNPs in one of the two populations. No significant pattern of selective mortality was detected in the other population, suggesting different causes of mortality among populations. These results first support the hypothesis that selection mainly causes small changes in allele frequencies among many covarying loci rather than a small number of changes in loci with large effects. They also point out that moving away from the a strict 'selective sweep paradigm' towards a multilocus genetics framework may be a more useful approach for studying the genomic signatures of natural selection on complex traits in wild populations. © 2014 John Wiley & Sons Ltd.
Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick
2014-02-01
Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.
Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y. F.; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie
2014-01-01
Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here. PMID:24284329
Metabolomic prediction of yield in hybrid rice.
Xu, Shizhong; Xu, Yang; Gong, Liang; Zhang, Qifa
2016-10-01
Rice (Oryza sativa) provides a staple food source for more than 50% of the world's population. An increase in yield can significantly contribute to global food security. Hybrid breeding can potentially help to meet this goal because hybrid rice often shows a considerable increase in yield when compared with pure-bred cultivars. We recently developed a marker-guided prediction method for hybrid yield and showed a substantial increase in yield through genomic hybrid breeding. We now have transcriptomic and metabolomic data as potential resources for prediction. Using six prediction methods, including least absolute shrinkage and selection operator (LASSO), best linear unbiased prediction (BLUP), stochastic search variable selection, partial least squares, and support vector machines using the radial basis function and polynomial kernel function, we found that the predictability of hybrid yield can be further increased using these omic data. LASSO and BLUP are the most efficient methods for yield prediction. For high heritability traits, genomic data remain the most efficient predictors. When metabolomic data are used, the predictability of hybrid yield is almost doubled compared with genomic prediction. Of the 21 945 potential hybrids derived from 210 recombinant inbred lines, selection of the top 10 hybrids predicted from metabolites would lead to a ~30% increase in yield. We hypothesize that each metabolite represents a biologically built-in genetic network for yield; thus, using metabolites for prediction is equivalent to using information integrated from these hidden genetic networks for yield prediction. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Campos, G S; Reimann, F A; Cardoso, L L; Ferreira, C E R; Junqueira, V S; Schmidt, P I; Braccini Neto, J; Yokoo, M J I; Sollero, B P; Boligon, A A; Cardoso, F F
2018-05-07
The objective of the present study was to evaluate the accuracy and bias of direct and blended genomic predictions using different methods and cross-validation techniques for growth traits (weight and weight gains) and visual scores (conformation, precocity, muscling and size) obtained at weaning and at yearling in Hereford and Braford breeds. Phenotypic data contained 126,290 animals belonging to the Delta G Connection genetic improvement program, and a set of 3,545 animals genotyped with the 50K chip and 131 sires with the 777K. After quality control, 41,045 markers remained for all animals. An animal model was used to estimate (co)variances components and to predict breeding values, which were later used to calculate the deregressed estimated breeding values (DEBV). Animals with genotype and phenotype for the traits studied were divided into four or five groups by random and k-means clustering cross-validation strategies. The values of accuracy of the direct genomic values (DGV) were moderate to high magnitude for at weaning and at yearling traits, ranging from 0.19 to 0.45 for the k-means and 0.23 to 0.78 for random clustering among all traits. The greatest gain in relation to the pedigree BLUP (PBLUP) was 9.5% with the BayesB method with both the k-means and the random clustering. Blended genomic value accuracies ranged from 0.19 to 0.56 for k-means and from 0.21 to 0.82 for random clustering. The analyzes using the historical pedigree and phenotypes contributed additional information to calculate the GEBV and in general, the largest gains were for the single-step (ssGBLUP) method in bivariate analyses with a mean increase of 43.00% among all traits measured at weaning and of 46.27% for those evaluated at yearling. The accuracy values for the marker effects estimation methods were lower for k-means clustering, indicating that the training set relationship to the selection candidates is a major factor affecting accuracy of genomic predictions. The gains in accuracy obtained with genomic blending methods, mainly ssGBLUP in bivariate analyses, indicate that genomic predictions should be used as a tool to improve genetic gains in relation to the traditional PBLUP selection.
Effect of sample stratification on dairy GWAS results
2012-01-01
Background Artificial insemination and genetic selection are major factors contributing to population stratification in dairy cattle. In this study, we analyzed the effect of sample stratification and the effect of stratification correction on results of a dairy genome-wide association study (GWAS). Three methods for stratification correction were used: the efficient mixed-model association expedited (EMMAX) method accounting for correlation among all individuals, a generalized least squares (GLS) method based on half-sib intraclass correlation, and a principal component analysis (PCA) approach. Results Historical pedigree data revealed that the 1,654 contemporary cows in the GWAS were all related when traced through approximately 10–15 generations of ancestors. Genome and phenotype stratifications had a striking overlap with the half-sib structure. A large elite half-sib family of cows contributed to the detection of favorable alleles that had low frequencies in the general population and high frequencies in the elite cows and contributed to the detection of X chromosome effects. All three methods for stratification correction reduced the number of significant effects. EMMAX method had the most severe reduction in the number of significant effects, and the PCA method using 20 principal components and GLS had similar significance levels. Removal of the elite cows from the analysis without using stratification correction removed many effects that were also removed by the three methods for stratification correction, indicating that stratification correction could have removed some true effects due to the elite cows. SNP effects with good consensus between different methods and effect size distributions from USDA’s Holstein genomic evaluation included the DGAT1-NIBP region of BTA14 for production traits, a SNP 45kb upstream from PIGY on BTA6 and two SNPs in NIBP on BTA14 for protein percentage. However, most of these consensus effects had similar frequencies in the elite and average cows. Conclusions Genetic selection and extensive use of artificial insemination contributed to overlapped genome, pedigree and phenotype stratifications. The presence of an elite cluster of cows was related to the detection of rare favorable alleles that had high frequencies in the elite cluster and low frequencies in the remaining cows. Methods for stratification correction could have removed some true effects associated with genetic selection. PMID:23039970
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.
Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca
2015-01-01
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Šmarda, Petr; Bureš, Petr; Horová, Lucie
2007-01-01
Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968
Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.
Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel
2018-06-01
The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
Genomic selection in a commercial winter wheat population.
He, Sang; Schulthess, Albert Wilhelm; Mirdita, Vilson; Zhao, Yusheng; Korzun, Viktor; Bothe, Reiner; Ebmeyer, Erhard; Reif, Jochen C; Jiang, Yong
2016-03-01
Genomic selection models can be trained using historical data and filtering genotypes based on phenotyping intensity and reliability criterion are able to increase the prediction ability. We implemented genomic selection based on a large commercial population incorporating 2325 European winter wheat lines. Our objectives were (1) to study whether modeling epistasis besides additive genetic effects results in enhancement on prediction ability of genomic selection, (2) to assess prediction ability when training population comprised historical or less-intensively phenotyped lines, and (3) to explore the prediction ability in subpopulations selected based on the reliability criterion. We found a 5 % increase in prediction ability when shifting from additive to additive plus epistatic effects models. In addition, only a marginal loss from 0.65 to 0.50 in accuracy was observed using the data collected from 1 year to predict genotypes of the following year, revealing that stable genomic selection models can be accurately calibrated to predict subsequent breeding stages. Moreover, prediction ability was maximized when the genotypes evaluated in a single location were excluded from the training set but subsequently decreased again when the phenotyping intensity was increased above two locations, suggesting that the update of the training population should be performed considering all the selected genotypes but excluding those evaluated in a single location. The genomic prediction ability was substantially higher in subpopulations selected based on the reliability criterion, indicating that phenotypic selection for highly reliable individuals could be directly replaced by applying genomic selection to them. We empirically conclude that there is a high potential to assist commercial wheat breeding programs employing genomic selection approaches.
The efficiency of genome-wide selection for genetic improvement of net merit.
Togashi, K; Lin, C Y; Yamazaki, T
2011-10-01
Four methods of selection for net merit comprising 2 correlated traits were compared in this study: 1) EBV-only index (I₁), which consists of the EBV of both traits (i.e., traditional 2-trait BLUP selection); 2) GEBV-only index (I₂), which comprises the genomic EBV (GEBV) of both traits; 3) GEBV-assisted index (I₃), which combines both the EBV and the GEBV of both traits; and 4) GBV-assisted index (I₄), which combines both the EBV and the true genomic breeding value (GBV) of both traits. Comparisons of these indices were based on 3 evaluation criteria [selection accuracy, genetic response (ΔH), and relative efficiency] under 64 scenarios that arise from combining 2 levels of genetic correlation (r(G)), 2 ratios of genetic variances between traits, 2 ratios of the genomic variance to total genetic variances for trait 1, 4 accuracies of EBV, and 2 proportions of r(G) explained by the GBV. Both selection accuracy and genetic responses of the indices I₁, I₃, and I₄ increased as the accuracy of EBV increased, but the efficiency of the indices I₃ and I₄ relative to I₁ decreased as the accuracy of EBV increased. The relative efficiency of both I₃ and I₄ was generally greater when the accuracy of EBV was 0.6 than when it was 0.9, suggesting that the genomic markers are most useful to assist selection when the accuracy of EBV is low. The GBV-assisted index I₄ was superior to the GEBV-assisted I₃ in all 64 cases examined, indicating the importance of improving the accuracy of prediction of genomic breeding values. Other parameters being identical, increasing the genetic variance of a high heritability trait would increase the genetic response of the genomic indices (I₂, I₃, and I₄). The genetic responses to I₂, I₃, and I(4) was greater when the genetic correlation between traits was positive (r(G) = 0.5) than when it was negative (r(G) = -0.5). The results of this study indicate that the effectiveness of the GEBV-assisted index I₃ is affected by heritability of and genetic correlation between traits, the ratio of genetic variances between traits, the genomic-genetic variance ratio of each index trait, the proportion of genetic correlation accounted for by the genomic markers, and the accuracy of predictions of both EBV and GBV. However, most of these affecting factors are genetic characteristics of a population that is beyond the control of the breeders. The key factor subject to manipulation is to maximize both the proportion of the genetic variance explained by GEBV and the accuracy of both GEBV and EBV. The developed procedures provide means to investigate the efficiency of various genomic indices for any given combination of the genetic factors studied.
A statistical approach to selecting and confirming validation targets in -omics experiments
2012-01-01
Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Genome-wide detection and characterization of positive selection in human populations.
Sabeti, Pardis C; Varilly, Patrick; Fry, Ben; Lohmueller, Jason; Hostetter, Elizabeth; Cotsapas, Chris; Xie, Xiaohui; Byrne, Elizabeth H; McCarroll, Steven A; Gaudet, Rachelle; Schaffner, Stephen F; Lander, Eric S; Frazer, Kelly A; Ballinger, Dennis G; Cox, David R; Hinds, David A; Stuve, Laura L; Gibbs, Richard A; Belmont, John W; Boudreau, Andrew; Hardenbol, Paul; Leal, Suzanne M; Pasternak, Shiran; Wheeler, David A; Willis, Thomas D; Yu, Fuli; Yang, Huanming; Zeng, Changqing; Gao, Yang; Hu, Haoran; Hu, Weitao; Li, Chaohua; Lin, Wei; Liu, Siqi; Pan, Hao; Tang, Xiaoli; Wang, Jian; Wang, Wei; Yu, Jun; Zhang, Bo; Zhang, Qingrun; Zhao, Hongbin; Zhao, Hui; Zhou, Jun; Gabriel, Stacey B; Barry, Rachel; Blumenstiel, Brendan; Camargo, Amy; Defelice, Matthew; Faggart, Maura; Goyette, Mary; Gupta, Supriya; Moore, Jamie; Nguyen, Huy; Onofrio, Robert C; Parkin, Melissa; Roy, Jessica; Stahl, Erich; Winchester, Ellen; Ziaugra, Liuda; Altshuler, David; Shen, Yan; Yao, Zhijian; Huang, Wei; Chu, Xun; He, Yungang; Jin, Li; Liu, Yangfan; Shen, Yayun; Sun, Weiwei; Wang, Haifeng; Wang, Yi; Wang, Ying; Xiong, Xiaoyan; Xu, Liang; Waye, Mary M Y; Tsui, Stephen K W; Xue, Hong; Wong, J Tze-Fei; Galver, Luana M; Fan, Jian-Bing; Gunderson, Kevin; Murray, Sarah S; Oliphant, Arnold R; Chee, Mark S; Montpetit, Alexandre; Chagnon, Fanny; Ferretti, Vincent; Leboeuf, Martin; Olivier, Jean-François; Phillips, Michael S; Roumy, Stéphanie; Sallée, Clémentine; Verner, Andrei; Hudson, Thomas J; Kwok, Pui-Yan; Cai, Dongmei; Koboldt, Daniel C; Miller, Raymond D; Pawlikowska, Ludmila; Taillon-Miller, Patricia; Xiao, Ming; Tsui, Lap-Chee; Mak, William; Song, You Qiang; Tam, Paul K H; Nakamura, Yusuke; Kawaguchi, Takahisa; Kitamoto, Takuya; Morizono, Takashi; Nagashima, Atsushi; Ohnishi, Yozo; Sekine, Akihiro; Tanaka, Toshihiro; Tsunoda, Tatsuhiko; Deloukas, Panos; Bird, Christine P; Delgado, Marcos; Dermitzakis, Emmanouil T; Gwilliam, Rhian; Hunt, Sarah; Morrison, Jonathan; Powell, Don; Stranger, Barbara E; Whittaker, Pamela; Bentley, David R; Daly, Mark J; de Bakker, Paul I W; Barrett, Jeff; Chretien, Yves R; Maller, Julian; McCarroll, Steve; Patterson, Nick; Pe'er, Itsik; Price, Alkes; Purcell, Shaun; Richter, Daniel J; Sabeti, Pardis; Saxena, Richa; Schaffner, Stephen F; Sham, Pak C; Varilly, Patrick; Altshuler, David; Stein, Lincoln D; Krishnan, Lalitha; Smith, Albert Vernon; Tello-Ruiz, Marcela K; Thorisson, Gudmundur A; Chakravarti, Aravinda; Chen, Peter E; Cutler, David J; Kashuk, Carl S; Lin, Shin; Abecasis, Gonçalo R; Guan, Weihua; Li, Yun; Munro, Heather M; Qin, Zhaohui Steve; Thomas, Daryl J; McVean, Gilean; Auton, Adam; Bottolo, Leonardo; Cardin, Niall; Eyheramendy, Susana; Freeman, Colin; Marchini, Jonathan; Myers, Simon; Spencer, Chris; Stephens, Matthew; Donnelly, Peter; Cardon, Lon R; Clarke, Geraldine; Evans, David M; Morris, Andrew P; Weir, Bruce S; Tsunoda, Tatsuhiko; Johnson, Todd A; Mullikin, James C; Sherry, Stephen T; Feolo, Michael; Skol, Andrew; Zhang, Houcan; Zeng, Changqing; Zhao, Hui; Matsuda, Ichiro; Fukushima, Yoshimitsu; Macer, Darryl R; Suda, Eiko; Rotimi, Charles N; Adebamowo, Clement A; Ajayi, Ike; Aniagwu, Toyin; Marshall, Patricia A; Nkwodimmah, Chibuzor; Royal, Charmaine D M; Leppert, Mark F; Dixon, Missy; Peiffer, Andy; Qiu, Renzong; Kent, Alastair; Kato, Kazuto; Niikawa, Norio; Adewole, Isaac F; Knoppers, Bartha M; Foster, Morris W; Clayton, Ellen Wright; Watkin, Jessica; Gibbs, Richard A; Belmont, John W; Muzny, Donna; Nazareth, Lynne; Sodergren, Erica; Weinstock, George M; Wheeler, David A; Yakub, Imtaz; Gabriel, Stacey B; Onofrio, Robert C; Richter, Daniel J; Ziaugra, Liuda; Birren, Bruce W; Daly, Mark J; Altshuler, David; Wilson, Richard K; Fulton, Lucinda L; Rogers, Jane; Burton, John; Carter, Nigel P; Clee, Christopher M; Griffiths, Mark; Jones, Matthew C; McLay, Kirsten; Plumb, Robert W; Ross, Mark T; Sims, Sarah K; Willey, David L; Chen, Zhu; Han, Hua; Kang, Le; Godbout, Martin; Wallenburg, John C; L'Archevêque, Paul; Bellemare, Guy; Saeki, Koji; Wang, Hongguang; An, Daochang; Fu, Hongbo; Li, Qing; Wang, Zhen; Wang, Renwu; Holden, Arthur L; Brooks, Lisa D; McEwen, Jean E; Guyer, Mark S; Wang, Vivian Ota; Peterson, Jane L; Shi, Michael; Spiegel, Jack; Sung, Lawrence M; Zacharia, Lynn F; Collins, Francis S; Kennedy, Karen; Jamieson, Ruth; Stewart, John
2007-10-18
With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.
Genomic signatures of positive selection in humans and the limits of outlier approaches.
Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M
2006-08-01
Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.
Megacycles of atmospheric carbon dioxide concentration correlate with fossil plant genome size.
Franks, Peter J; Freckleton, Rob P; Beaulieu, Jeremy M; Leitch, Ilia J; Beerling, David J
2012-02-19
Tectonic processes drive megacycles of atmospheric carbon dioxide (CO(2)) concentration, c(a), that force large fluctuations in global climate. With a period of several hundred million years, these megacycles have been linked to the evolution of vascular plants, but adaptation at the subcellular scale has been difficult to determine because fossils typically do not preserve this information. Here we show, after accounting for evolutionary relatedness using phylogenetic comparative methods, that plant nuclear genome size (measured as the haploid DNA amount) and the size of stomatal guard cells are correlated across a broad taxonomic range of extant species. This phylogenetic regression was used to estimate the mean genome size of fossil plants from the size of fossil stomata. For the last 400 Myr, spanning almost the full evolutionary history of vascular plants, we found a significant correlation between fossil plant genome size and c(a), modelled independently using geochemical data. The correlation is consistent with selection for stomatal size and genome size by c(a) as plants adapted towards optimal leaf gas exchange under a changing CO(2) regime. Our findings point to the possibility that major episodes of change in c(a) throughout Earth history might have selected for changes in genome size, influencing plant diversification.
Monfort, Matthias; Furlong, Eileen E M; Girardot, Charles
2017-07-15
Visualization of genomic data is fundamental for gaining insights into genome function. Yet, co-visualization of a large number of datasets remains a challenge in all popular genome browsers and the development of new visualization methods is needed to improve the usability and user experience of genome browsers. We present Dynamix, a JBrowse plugin that enables the parallel inspection of hundreds of genomic datasets. Dynamix takes advantage of a priori knowledge to automatically display data tracks with signal within a genomic region of interest. As the user navigates through the genome, Dynamix automatically updates data tracks and limits all manual operations otherwise needed to adjust the data visible on screen. Dynamix also introduces a new carousel view that optimizes screen utilization by enabling users to independently scroll through groups of tracks. Dynamix is hosted at http://furlonglab.embl.de/Dynamix . charles.girardot@embl.de. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Selections that isolate recombinant mitochondrial genomes in animals
Ma, Hansong; O'Farrell, Patrick H
2015-01-01
Homologous recombination is widespread and catalyzes evolution. Nonetheless, its existence in animal mitochondrial DNA is questioned. We designed selections for recombination between co-resident mitochondrial genomes in various heteroplasmic Drosophila lines. In four experimental settings, recombinant genomes became the sole or dominant genome in the progeny. Thus, selection uncovers occurrence of homologous recombination in Drosophila mtDNA and documents its functional benefit. Double-strand breaks enhanced recombination in the germline and revealed somatic recombination. When the recombination partner was a diverged Drosophila melanogaster genome or a genome from a different species such as Drosophila yakuba, sequencing revealed long continuous stretches of exchange. In addition, the distribution of sequence polymorphisms in recombinants allowed us to map a selected trait to a particular region in the Drosophila mitochondrial genome. Thus, recombination can be harnessed to dissect function and evolution of mitochondrial genome. DOI: http://dx.doi.org/10.7554/eLife.07247.001 PMID:26237110
Zotova, Anastasia; Lopatukhina, Elena; Filatov, Alexander; Khaitov, Musa; Mazurov, Dmitriy
2017-11-02
Programmable endonucleases introduce DNA breaks at specific sites, which are repaired by non-homologous end joining (NHEJ) or homology recombination (HDR). Genome editing in human lymphoid cells is challenging as these difficult-to-transfect cells may also inefficiently repair DNA by HDR. Here, we estimated efficiencies and dynamics of knockout (KO) and knockin (KI) generation in human T and B cell lines depending on repair template, target loci and types of genomic endonucleases. Using zinc finger nuclease (ZFN), we have engineered Jurkat and CEM cells with the 8.2 kb human immunodeficiency virus type 1 (HIV-1) ∆Env genome integrated at the adeno-associated virus integration site 1 (AAVS1) locus that stably produce virus particles and mediate infection upon transfection with helper vectors. Knockouts generated by ZFN or clustered regularly interspaced short palindromic repeats (CRISPR/Cas9) double nicking techniques were comparably efficient in lymphoid cells. However, unlike polyclonal sorted cells, gene-edited cells selected by cloning exerted tremendous deviations in functionality as estimated by replication of HIV-1 and human T cell leukemia virus type 1 (HTLV-1) in these cells. Notably, the recently reported high-fidelity eCas9 1.1 when combined to the nickase mutation displayed gene-dependent decrease in on-target activity. Thus, the balance between off-target effects and on-target efficiency of nucleases, as well as choice of the optimal method of edited cell selection should be taken into account for proper gene function validation in lymphoid cells.
Spiked GBS: A unified, open platform for single marker genotyping and whole-genome profiling
USDA-ARS?s Scientific Manuscript database
In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of t...
Effect of Artificial Selection on Runs of Homozygosity in U.S. Holstein Cattle
Kim, Eui-Soo; Cole, John B.; Huson, Heather; Wiggans, George R.; Van Tassell, Curtis P.; Crooker, Brian A.; Liu, George; Da, Yang; Sonstegard, Tad S.
2013-01-01
The intensive selection programs for milk made possible by mass artificial insemination increased the similarity among the genomes of North American (NA) Holsteins tremendously since the 1960s. This migration of elite alleles has caused certain regions of the genome to have runs of homozygosity (ROH) occasionally spanning millions of continuous base pairs at a specific locus. In this study, genome signatures of artificial selection in NA Holsteins born between 1953 and 2008 were identified by comparing changes in ROH between three distinct groups under different selective pressure for milk production. The ROH regions were also used to estimate the inbreeding coefficients. The comparisons of genomic autozygosity between groups selected or unselected since 1964 for milk production revealed significant differences with respect to overall ROH frequency and distribution. These results indicate selection has increased overall autozygosity across the genome, whereas the autozygosity in an unselected line has not changed significantly across most of the chromosomes. In addition, ROH distribution was more variable across the genomes of selected animals in comparison to a more even ROH distribution for unselected animals. Further analysis of genome-wide autozygosity changes and the association between traits and haplotypes identified more than 40 genomic regions under selection on several chromosomes (Chr) including Chr 2, 7, 16 and 20. Many of these selection signatures corresponded to quantitative trait loci for milk, fat, and protein yield previously found in contemporary Holsteins. PMID:24348915
Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays
Berg, Jeremy J.; Birchler, James A.; Grote, Mark N.; Lorant, Anne; Quezada, Juvenal
2018-01-01
While the vast majority of genome size variation in plants is due to differences in repetitive sequence, we know little about how selection acts on repeat content in natural populations. Here we investigate parallel changes in intraspecific genome size and repeat content of domesticated maize (Zea mays) landraces and their wild relative teosinte across altitudinal gradients in Mesoamerica and South America. We combine genotyping, low coverage whole-genome sequence data, and flow cytometry to test for evidence of selection on genome size and individual repeat abundance. We find that population structure alone cannot explain the observed variation, implying that clinal patterns of genome size are maintained by natural selection. Our modeling additionally provides evidence of selection on individual heterochromatic knob repeats, likely due to their large individual contribution to genome size. To better understand the phenotypes driving selection on genome size, we conducted a growth chamber experiment using a population of highland teosinte exhibiting extensive variation in genome size. We find weak support for a positive correlation between genome size and cell size, but stronger support for a negative correlation between genome size and the rate of cell production. Reanalyzing published data of cell counts in maize shoot apical meristems, we then identify a negative correlation between cell production rate and flowering time. Together, our data suggest a model in which variation in genome size is driven by natural selection on flowering time across altitudinal clines, connecting intraspecific variation in repetitive sequence to important differences in adaptive phenotypes. PMID:29746459
Ridge, Lasso and Bayesian additive-dominance genomic models.
Azevedo, Camila Ferreira; de Resende, Marcos Deon Vilela; E Silva, Fabyano Fonseca; Viana, José Marcelo Soriano; Valente, Magno Sávio Ferreira; Resende, Márcio Fernando Ribeiro; Muñoz, Patricio
2015-08-25
A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (-2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
SigHunt: horizontal gene transfer finder optimized for eukaryotic genomes.
Jaron, Kamil S; Moravec, Jiří C; Martínková, Natália
2014-04-15
Genomic islands (GIs) are DNA fragments incorporated into a genome through horizontal gene transfer (also called lateral gene transfer), often with functions novel for a given organism. While methods for their detection are well researched in prokaryotes, the complexity of eukaryotic genomes makes direct utilization of these methods unreliable, and so labour-intensive phylogenetic searches are used instead. We present a surrogate method that investigates nucleotide base composition of the DNA sequence in a eukaryotic genome and identifies putative GIs. We calculate a genomic signature as a vector of tetranucleotide (4-mer) frequencies using a sliding window approach. Extending the neighbourhood of the sliding window, we establish a local kernel density estimate of the 4-mer frequency. We score the number of 4-mer frequencies in the sliding window that deviate from the credibility interval of their local genomic density using a newly developed discrete interval accumulative score (DIAS). To further improve the effectiveness of DIAS, we select informative 4-mers in a range of organisms using the tetranucleotide quality score developed herein. We show that the SigHunt method is computationally efficient and able to detect GIs in eukaryotic genomes that represent non-ameliorated integration. Thus, it is suited to scanning for change in organisms with different DNA composition. Source code and scripts freely available for download at http://www.iba.muni.cz/index-en.php?pg=research-data-analysis-tools-sighunt are implemented in C and R and are platform-independent. 376090@mail.muni.cz or martinkova@ivb.cz. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Ren, Wen-Long; Wen, Yang-Jun; Dunwell, Jim M; Zhang, Yuan-Ming
2018-03-01
Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.
USDA-ARS?s Scientific Manuscript database
The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Identification of selection signatures in cattle breeds selected for dairy production.
Stella, Alessandra; Ajmone-Marsan, Paolo; Lazzari, Barbara; Boettcher, Paul
2010-08-01
The genomics revolution has spurred the undertaking of HapMap studies of numerous species, allowing for population genomics to increase the understanding of how selection has created genetic differences between subspecies populations. The objectives of this study were to (1) develop an approach to detect signatures of selection in subsets of phenotypically similar breeds of livestock by comparing single nucleotide polymorphism (SNP) diversity between the subset and a larger population, (2) verify this method in breeds selected for simply inherited traits, and (3) apply this method to the dairy breeds in the International Bovine HapMap (IBHM) study. The data consisted of genotypes for 32,689 SNPs of 497 animals from 19 breeds. For a given subset of breeds, the test statistic was the parametric composite log likelihood (CLL) of the differences in allelic frequencies between the subset and the IBHM for a sliding window of SNPs. The null distribution was obtained by calculating CLL for 50,000 random subsets (per chromosome) of individuals. The validity of this approach was confirmed by obtaining extremely large CLLs at the sites of causative variation for polled (BTA1) and black-coat-color (BTA18) phenotypes. Across the 30 bovine chromosomes, 699 putative selection signatures were detected. The largest CLL was on BTA6 and corresponded to KIT, which is responsible for the piebald phenotype present in four of the five dairy breeds. Potassium channel-related genes were at the site of the largest CLL on three chromosomes (BTA14, -16, and -25) whereas integrins (BTA18 and -19) and serine/arginine rich splicing factors (BTA20 and -23) each had the largest CLL on two chromosomes. On the basis of the results of this study, the application of population genomics to farm animals seems quite promising. Comparisons between breed groups have the potential to identify genomic regions influencing complex traits with no need for complex equipment and the collection of extensive phenotypic records and can contribute to the identification of candidate genes and to the understanding of the biological mechanisms controlling complex traits.
Genome-enabled prediction models for yield related traits in chickpea
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) unlike marker-assisted backcrossing (MABC) predicts breeding values of lines using genome-wide marker profiling and allows selection of lines prior to field-phenotyping, thereby shortening the breeding cycle. A collection of 320 elite breeding lines was selected and phenotyped...
Measuring genomic pre-selection in theory and in practice
USDA-ARS?s Scientific Manuscript database
Potential biases from genomic pre-selection were estimated from actual selection and mating patterns of US Holsteins. Traditional models using only phenotypes and pedigrees do not adjust for average genomic merit of an animal’s parents, progeny, mates, or contemporaries. Positive assortative mating ...
Berry, Nadine Kaye; Bain, Nicole L; Enjeti, Anoop K; Rowlings, Philip
2014-01-01
To evaluate the role of whole genome comparative genomic hybridisation microarray (array-CGH) in detecting genomic imbalances as compared to conventional karyotype (GTG-analysis) or myeloma specific fluorescence in situ hybridisation (FISH) panel in a diagnostic setting for plasma cell dyscrasia (PCD). A myeloma-specific interphase FISH (i-FISH) panel was carried out on CD138 PC-enriched bone marrow (BM) from 20 patients having BM biopsies for evaluation of PCD. Whole genome array-CGH was performed on reference (control) and neoplastic (test patient) genomic DNA extracted from CD138 PC-enriched BM and analysed. Comparison of techniques demonstrated a much higher detection rate of genomic imbalances using array-CGH. Genomic imbalances were detected in 1, 19 and 20 patients using GTG-analysis, i-FISH and array-CGH, respectively. Genomic rearrangements were detected in one patient using GTG-analysis and seven patients using i-FISH, while none were detected using array-CGH. I-FISH was the most sensitive method for detecting gene rearrangements and GTG-analysis was the least sensitive method overall. All copy number aberrations observed in GTG-analysis were detected using array-CGH and i-FISH. We show that array-CGH performed on CD138-enriched PCs significantly improves the detection of clinically relevant and possibly novel genomic abnormalities in PCD, and thus could be considered as a standard diagnostic technique in combination with IGH rearrangement i-FISH.
Understanding the Origin of Species with Genome-Scale Data: the Role of Gene Flow
Sousa, Vitor; Hey, Jody
2017-01-01
As it becomes easier to sequence multiple genomes from closely related species, evolutionary biologists working on speciation are struggling to get the most out of very large population-genomic data sets. Such data hold the potential to resolve evolutionary biology’s long-standing questions about the role of gene exchange in species formation. In principle the new population genomic data can be used to disentangle the conflicting roles of natural selection and gene flow during the divergence process. However there are great challenges in taking full advantage of such data, especially with regard to including recombination in genetic models of the divergence process. Current data, models, methods and the potential pitfalls in using them will be considered here. PMID:23657479
Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping
2013-12-27
With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
Johnston, Iain G; Williams, Ben P
2016-02-24
Since their endosymbiotic origin, mitochondria have lost most of their genes. Although many selective mechanisms underlying the evolution of mitochondrial genomes have been proposed, a data-driven exploration of these hypotheses is lacking, and a quantitatively supported consensus remains absent. We developed HyperTraPS, a methodology coupling stochastic modeling with Bayesian inference, to identify the ordering of evolutionary events and suggest their causes. Using 2015 complete mitochondrial genomes, we inferred evolutionary trajectories of mtDNA gene loss across the eukaryotic tree of life. We find that proteins comprising the structural cores of the electron transport chain are preferentially encoded within mitochondrial genomes across eukaryotes. A combination of high GC content and high protein hydrophobicity is required to explain patterns of mtDNA gene retention; a model that accounts for these selective pressures can also predict the success of artificial gene transfer experiments in vivo. This work provides a general method for data-driven inference of the ordering of evolutionary and progressive events, here identifying the distinct features shaping mitochondrial genomes of present-day species. Copyright © 2016 Elsevier Inc. All rights reserved.
Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro
2014-01-01
The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631
High-efficiency targeted editing of large viral genomes by RNA-guided nucleases.
Bi, Yanwei; Sun, Le; Gao, Dandan; Ding, Chen; Li, Zhihua; Li, Yadong; Cun, Wei; Li, Qihan
2014-05-01
A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas)) RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ) and homology-directed repair (HDR) pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses.
High-Efficiency Targeted Editing of Large Viral Genomes by RNA-Guided Nucleases
Gao, Dandan; Ding, Chen; Li, Zhihua; Li, Yadong; Cun, Wei; Li, Qihan
2014-01-01
A facile and efficient method for the precise editing of large viral genomes is required for the selection of attenuated vaccine strains and the construction of gene therapy vectors. The type II prokaryotic CRISPR-Cas (clustered regularly interspaced short palindromic repeats (CRISPR)-associated (Cas)) RNA-guided nuclease system can be introduced into host cells during viral replication. The CRISPR-Cas9 system robustly stimulates targeted double-stranded breaks in the genomes of DNA viruses, where the non-homologous end joining (NHEJ) and homology-directed repair (HDR) pathways can be exploited to introduce site-specific indels or insert heterologous genes with high frequency. Furthermore, CRISPR-Cas9 can specifically inhibit the replication of the original virus, thereby significantly increasing the abundance of the recombinant virus among progeny virus. As a result, purified recombinant virus can be obtained with only a single round of selection. In this study, we used recombinant adenovirus and type I herpes simplex virus as examples to demonstrate that the CRISPR-Cas9 system is a valuable tool for editing the genomes of large DNA viruses. PMID:24788700
Ultsch, Alfred; Kringel, Dario; Kalso, Eija; Mogil, Jeffrey S; Lötsch, Jörn
2016-12-01
The increasing availability of "big data" enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 535 genes identified empirically as relevant to pain with the knowledge about the functions of thousands of genes. Starting from an accepted description of chronic pain as displaying systemic features described by the terms "learning" and "neuronal plasticity," a functional genomics analysis proposed that among the functions of the 535 "pain genes," the biological processes "learning or memory" (P = 8.6 × 10) and "nervous system development" (P = 2.4 × 10) are statistically significantly overrepresented as compared with the annotations to these processes expected by chance. After establishing that the hypothesized biological processes were among important functional genomics features of pain, a subset of n = 34 pain genes were found to be annotated with both Gene Ontology terms. Published empirical evidence supporting their involvement in chronic pain was identified for almost all these genes, including 1 gene identified in March 2016 as being involved in pain. By contrast, such evidence was virtually absent in a randomly selected set of 34 other human genes. Hence, the present computational functional genomics-based method can be used for candidate gene selection, providing an alternative to established methods.
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Bayesian methods for estimating GEBVs of threshold traits
Wang, C-L; Ding, X-D; Wang, J-Y; Liu, J-F; Fu, W-X; Zhang, Z; Yin, Z-J; Zhang, Q
2013-01-01
Estimation of genomic breeding values is the key step in genomic selection (GS). Many methods have been proposed for continuous traits, but methods for threshold traits are still scarce. Here we introduced threshold model to the framework of GS, and specifically, we extended the three Bayesian methods BayesA, BayesB and BayesCπ on the basis of threshold model for estimating genomic breeding values of threshold traits, and the extended methods are correspondingly termed BayesTA, BayesTB and BayesTCπ. Computing procedures of the three BayesT methods using Markov Chain Monte Carlo algorithm were derived. A simulation study was performed to investigate the benefit of the presented methods in accuracy with the genomic estimated breeding values (GEBVs) for threshold traits. Factors affecting the performance of the three BayesT methods were addressed. As expected, the three BayesT methods generally performed better than the corresponding normal Bayesian methods, in particular when the number of phenotypic categories was small. In the standard scenario (number of categories=2, incidence=30%, number of quantitative trait loci=50, h2=0.3), the accuracies were improved by 30.4%, 2.4%, and 5.7% points, respectively. In most scenarios, BayesTB and BayesTCπ generated similar accuracies and both performed better than BayesTA. In conclusion, our work proved that threshold model fits well for predicting GEBVs of threshold traits, and BayesTCπ is supposed to be the method of choice for GS of threshold traits. PMID:23149458
Analysis of single nucleotide polymorphisms in case-control studies.
Li, Yonghong; Shiffman, Dov; Oberbauer, Rainer
2011-01-01
Single nucleotide polymorphisms (SNPs) are the most common type of genetic variants in the human genome. SNPs are known to modify susceptibility to complex diseases. We describe and discuss methods used to identify SNPs associated with disease in case-control studies. An outline on study population selection, sample collection and genotyping platforms is presented, complemented by SNP selection, data preprocessing and analysis.
Genomic Prediction of Testcross Performance in Canola (Brassica napus)
Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.
2016-01-01
Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924
Optimization of multi-environment trials for genomic selection based on crop models.
Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J
2017-08-01
We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Genomic selection for quantitative adult plant stem rust resistance in wheat
USDA-ARS?s Scientific Manuscript database
Quantitative adult plant resistance (APR) to stem rust (Puccinia graminis f. sp. tritici) is an important breeding target in wheat (Triticum aestivum L.) and a potential target for genomic selection (GS). To evaluate the relative importance of known APR loci in applying genomic selection, we charact...
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates for selection. Originally these models were developed without considering genotype ' environment interaction (GE). Several authors have proposed extensions of the cannonical GS model that accomm...
Rochus, Christina Marie; Tortereau, Flavie; Plisson-Petit, Florence; Restoux, Gwendal; Moreno-Romieux, Carole; Tosser-Klopp, Gwenola; Servin, Bertrand
2018-01-23
One of the approaches to detect genetics variants affecting fitness traits is to identify their surrounding genomic signatures of past selection. With established methods for detecting selection signatures and the current and future availability of large datasets, such studies should have the power to not only detect these signatures but also to infer their selective histories. Domesticated animals offer a powerful model for these approaches as they adapted rapidly to environmental and human-mediated constraints in a relatively short time. We investigated this question by studying a large dataset of 542 individuals from 27 domestic sheep populations raised in France, genotyped for more than 500,000 SNPs. Population structure analysis revealed that this set of populations harbour a large part of European sheep diversity in a small geographical area, offering a powerful model for the study of adaptation. Identification of extreme SNP and haplotype frequency differences between populations listed 126 genomic regions likely affected by selection. These signatures revealed selection at loci commonly identified as selection targets in many species ("selection hotspots") including ABCG2, LCORL/NCAPG, MSTN, and coat colour genes such as ASIP, MC1R, MITF, and TYRP1. For one of these regions (ABCG2, LCORL/NCAPG), we could propose a historical scenario leading to the introgression of an adaptive allele into a new genetic background. Among selection signatures, we found clear evidence for parallel selection events in different genetic backgrounds, most likely for different mutations. We confirmed this allelic heterogeneity in one case by resequencing the MC1R gene in three black-faced breeds. Our study illustrates how dense genetic data in multiple populations allows the deciphering of evolutionary history of populations and of their adaptive mutations.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations.
Kugelman, Jeffrey R; Wiley, Michael R; Nagle, Elyse R; Reyes, Daniel; Pfeffer, Brad P; Kuhn, Jens H; Sanchez-Lockhart, Mariano; Palacios, Gustavo F
2017-01-01
Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA) as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5) of all compared methods.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations
Kugelman, Jeffrey R.; Wiley, Michael R.; Nagle, Elyse R.; Reyes, Daniel; Pfeffer, Brad P.; Kuhn, Jens H.; Sanchez-Lockhart, Mariano; Palacios, Gustavo F.
2017-01-01
Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods. PMID:28182717
2011-01-01
Background 'Selection signatures' delimit regions of the genome that are, or have been, functionally important and have therefore been under either natural or artificial selection. In this study, two different and complementary methods--integrated Haplotype Homozygosity Score (|iHS|) and population differentiation index (FST)--were applied to identify traces of decades of intensive artificial selection for traits of economic importance in modern cattle. Results We scanned the genome of a diverse set of dairy and beef breeds from Germany, Canada and Australia genotyped with a 50 K SNP panel. Across breeds, a total of 109 extreme |iHS| values exceeded the empirical threshold level of 5% with 19, 27, 9, 10 and 17 outliers in Holstein, Brown Swiss, Australian Angus, Hereford and Simmental, respectively. Annotating the regions harboring clustered |iHS| signals revealed a panel of interesting candidate genes like SPATA17, MGAT1, PGRMC2 and ACTC1, COL23A1, MATN2, respectively, in the context of reproduction and muscle formation. In a further step, a new Bayesian FST-based approach was applied with a set of geographically separated populations including Holstein, Brown Swiss, Simmental, North American Angus and Piedmontese for detecting differentiated loci. In total, 127 regions exceeding the 2.5 per cent threshold of the empirical posterior distribution were identified as extremely differentiated. In a substantial number (56 out of 127 cases) the extreme FST values were found to be positioned in poor gene content regions which deviated significantly (p < 0.05) from the expectation assuming a random distribution. However, significant FST values were found in regions of some relevant genes such as SMCP and FGF1. Conclusions Overall, 236 regions putatively subject to recent positive selection in the cattle genome were detected. Both |iHS| and FST suggested selection in the vicinity of the Sialic acid binding Ig-like lectin 5 gene on BTA18. This region was recently reported to be a major QTL with strong effects on productive life and fertility traits in Holstein cattle. We conclude that high-resolution genome scans of selection signatures can be used to identify genomic regions contributing to within- and inter-breed phenotypic variation. PMID:21679429
Repeated divergent selection on pigmentation genes in a rapid finch radiation
Campagna, Leonardo; Repenning, Márcio; Silveira, Luís Fábio; Fontana, Carla Suertegaray; Tubaro, Pablo L.; Lovette, Irby J.
2017-01-01
Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits. PMID:28560331
Doerr, Daniel; Chauve, Cedric
2017-01-01
Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402
Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs.
Singh, Priyanka; Engel, Jasper; Jansen, Jeroen; de Haan, Jorn; Buydens, Lutgarde Maria Celina
2016-05-04
Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS. Eight genomic distance measures were used to quantify relationships between the tomato accessions from the SNPs. Subsequently, each of these distance measures was used to predict the BW using the DPLS prediction model. The DPLS model was found to be robust to the choice of distance measures; similar prediction performances were obtained for each distance measure. DPLS greatly outperformed the single-SNP regression approach, showing that BW is a comprehensive trait dependent on several loci. Next, the performance of the DPLS model was compared to that of GBLUP. Although GBLUP and DPLS are conceptually very different, the prediction quality (PQ) measured by DPLS models were similar to the prediction statistics obtained from GBLUP. A considerable advantage of DPLS is that the genotype-phenotype relationship can easily be visualized in a 2-D scatter plot. This so-called score-plot provides breeders an insight to select candidates for their future breeding program. DPLS is a highly appropriate method for GP. The model prediction performance was similar to the GBLUP and far better than the single-SNP approach. The proposed method can be used in combination with a wide range of genomic dissimilarity measures and genotype representations such as allele-count, haplotypes or allele-intensity values. Additionally, the data can be insightfully visualized by the DPLS model, allowing for selection of desirable candidates from the breeding experiments. In this study, we have assessed the DPLS performance on a single trait.
Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng
2018-01-01
Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes. PMID:29670636
Jiang, Peng; Shi, Feng-Xue; Li, Ming-Rui; Liu, Bao; Wen, Jun; Xiao, Hong-Xing; Li, Lin-Feng
2018-01-01
Panax L. (the ginseng genus) is a shade-demanding group within the family Araliaceae and all of its species are of crucial significance in traditional Chinese medicine. Phylogenetic and biogeographic analyses demonstrated that two rounds of whole genome duplications accompanying with geographic and ecological isolations promoted the diversification of Panax species. However, contributions of the cytoplasmic genomes to the adaptive evolution of Panax species remained largely uninvestigated. In this study, we sequenced the chloroplast and mitochondrial genomes of 11 accessions belonging to seven Panax species. Our results show that heterogeneity in nucleotide substitution rate is abundant in both of the two cytoplasmic genomes, with the mitochondrial genome possessing more variants at the total level but the chloroplast showing higher sequence polymorphisms at the genic regions. Genome-wide scanning of positive selection identified five and 12 genes from the chloroplast and mitochondrial genomes, respectively. Functional analyses further revealed that these selected genes play important roles in plant development, cellular metabolism and adaptation. We therefore conclude that positive selection might be one of the potential evolutionary forces that shaped nucleotide variation pattern of these Panax species. In particular, the mitochondrial genes evolved under stronger selective pressure compared to the chloroplast genes.
Genomic signatures of selection at linked sites: unifying the disparity among species
Cutter, Asher D.; Payseur, Bret A.
2014-01-01
Population genetics theory supplies powerful predictions about how natural selection interacts with genetic linkage to sculpt the genomic landscape of nucleotide polymorphism. Both the spread of beneficial mutations and removal of deleterious mutations act to depress polymorphism levels, especially in low-recombination regions. However, empiricists have documented extreme disparities among species. Here we characterize the dominant features that could drive variation in linked selection among species, including roles for selective sweeps being ‘hard’ or ‘soft’, and concealing by demography and genomic confounds. We advocate targeted studies of close relatives to unify our understanding of how selection and linkage interact to shape genome evolution. PMID:23478346
High-throughput analysis of T-DNA location and structure using sequence capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High-throughput analysis of T-DNA location and structure using sequence capture
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; ...
2015-10-07
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
Zhao, Zhongming; Liu, Zhandong; Chen, Ken; Guo, Yan; Allen, Genevera I; Zhang, Jiajie; Jim Zheng, W; Ruan, Jianhua
2017-10-03
In this editorial, we first summarize the 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) that was held on December 8-10, 2016 in Houston, Texas, USA, and then briefly introduce the ten research articles included in this supplement issue. ICIBM 2016 included four workshops or tutorials, four keynote lectures, four conference invited talks, eight concurrent scientific sessions and a poster session for 53 accepted abstracts, covering current topics in bioinformatics, systems biology, intelligent computing, and biomedical informatics. Through our call for papers, a total of 77 original manuscripts were submitted to ICIBM 2016. After peer review, 11 articles were selected in this special issue, covering topics such as single cell RNA-seq analysis method, genome sequence and variation analysis, bioinformatics method for vaccine development, and cancer genomics.
Mattagajasingh, Ilwola; Mukherjee, Arup Kumar; Das, Premananda
2006-01-01
Thirty-one species of Mammillaria were selected to study the molecular phylogeny using random amplified polymorphic DNA (RAPD) markers. High amount of mucilage (gelling polysaccharides) present in Mammillaria was a major obstacle in isolating good quality genomic DNA. The CTAB (cetyl trimethyl ammonium bromide) method was modified to obtain good quality genomic DNA. Twenty-two random decamer primers resulted in 621 bands, all of which were polymorphic. The similarity matrix value varied from 0.109 to 0.622 indicating wide variability among the studied species. The dendrogram obtained from the unweighted pair group method using arithmetic averages (UPGMA) analysis revealed that some of the species did not follow the conventional classification. The present work shows the usefulness of RAPD markers for genetic characterization to establish phylogenetic relations among Mammillaria species.
TEAM Webinar Series | EGRP/DCCPS/NCI/NIH
View archived webinars from the Transforming Epidemiology through Advanced Methods (TEAM) Webinar Series, hosted by NCI's Epidemiology and Genomics Research Program. Topics include participant engagement, data coordination, mHealth tools, sample selection, and instruments for diet & physical activity assessment.
The development of genomics applied to dairy breeding
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) has profoundly changed dairy cattle breeding in the last decade and can be defined as the use of genomic breeding values (GEBV) in selection programs. The GEBV is the sum of the effects of dense DNA markers across the whole genome, capturing all the quantitative trait loci (QT...
Evolutionary Quantitative Genomics of Populus trichocarpa
McKown, Athena D.; La Mantia, Jonathan; Guy, Robert D.; Ingvarsson, Pär K.; Hamelin, Richard; Mansfield, Shawn D.; Ehlting, Jürgen; Douglas, Carl J.; El-Kassaby, Yousry A.
2015-01-01
Forest trees generally show high levels of local adaptation and efforts focusing on understanding adaptation to climate will be crucial for species survival and management. Here, we address fundamental questions regarding the molecular basis of adaptation in undomesticated forest tree populations to past climatic environments by employing an integrative quantitative genetics and landscape genomics approach. Using this comprehensive approach, we studied the molecular basis of climate adaptation in 433 Populus trichocarpa (black cottonwood) genotypes originating across western North America. Variation in 74 field-assessed traits (growth, ecophysiology, phenology, leaf stomata, wood, and disease resistance) was investigated for signatures of selection (comparing Q ST -F ST) using clustering of individuals by climate of origin (temperature and precipitation). 29,354 SNPs were investigated employing three different outlier detection methods and marker-inferred relatedness was estimated to obtain the narrow-sense estimate of population differentiation in wild populations. In addition, we compared our results with previously assessed selection of candidate SNPs using the 25 topographical units (drainages) across the P. trichocarpa sampling range as population groupings. Narrow-sense Q ST for 53% of distinct field traits was significantly divergent from expectations of neutrality (indicating adaptive trait variation); 2,855 SNPs showed signals of diversifying selection and of these, 118 SNPs (within 81 genes) were associated with adaptive traits (based on significant Q ST). Many SNPs were putatively pleiotropic for functionally uncorrelated adaptive traits, such as autumn phenology, height, and disease resistance. Evolutionary quantitative genomics in P. trichocarpa provides an enhanced understanding regarding the molecular basis of climate-driven selection in forest trees and we highlight that important loci underlying adaptive trait variation also show relationship to climate of origin. We consider our approach the most comprehensive, as it uncovers the molecular mechanisms of adaptation using multiple methods and tests. We also provide a detailed outline of the required analyses for studying adaptation to the environment in a population genomics context to better understand the species’ potential adaptive capacity to future climatic scenarios. PMID:26599762
Khlestkina, E K; Shumny, V K
2016-07-01
Integration of the methods of contemporary genetics and biotechnology into the breeding process is assessed, and the potential role and efficacy of genome editing as a novel approach is discussed. Use of molecular (DNA) markers for breeding was proposed more than 30 years ago. Nowadays, they are widely used as an accessory tool in order to select plants by mono- and olygogenic traits. Presently, the genomic approaches are actively introduced into the breeding processes owing to automatization of DNA polymorphism analyses and development of comparatively cheap methods of DNA sequencing. These approaches provide effective selection by complex quantitative traits, and are based on the full-genome genotyping of the breeding material. Moreover, biotechnological tools, such as doubled haploids production, which provides fast obtainment of homozygotes, are widely used in plant breeding. Use of genomic and biotechnological approaches makes the development of varieties less time consuming. It also decreases the cultivated areas and financial expenditures required for accomplishment of the breeding process. However, the capacities of modern breeding are not limited to only these advantages. Experiments carried out on plants about 10 years ago provided the first data on genome editing. In the last two years, we have observed a sharp increase in the number of publications that report about successful experiments aimed at plant genome editing owing to the use of the relatively simple and convenient CRISPR/Cas9 system. The goal of some of these experiments was to modify agriculturally valuable genes of cultivated plants, such as potato, cabbage, tomato, maize, rice, wheat, barley, soybean and sorghum. These studies show that it is possible to obtain nontransgenic plants carrying stably inherited, specifically determined mutations using the CRISPR/Cas9 system. This possibility offers the challenge to obtain varieties with predetermined mono- and olygogenic traits.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato.
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs.
Prospects and Potential Uses of Genomic Prediction of Key Performance Traits in Tetraploid Potato
Stich, Benjamin; Van Inghelandt, Delphine
2018-01-01
Genomic prediction is a routine tool in breeding programs of most major animal and plant species. However, its usefulness for potato breeding has not yet been evaluated in detail. The objectives of this study were to (i) examine the prospects of genomic prediction of key performance traits in a diversity panel of tetraploid potato modeling additive, dominance, and epistatic effects, (ii) investigate the effects of size and make up of training set, number of test environments and molecular markers on prediction accuracy, and (iii) assess the effect of including markers from candidate genes on the prediction accuracy. With genomic best linear unbiased prediction (GBLUP), BayesA, BayesCπ, and Bayesian LASSO, four different prediction methods were used for genomic prediction of relative area under disease progress curve after a Phytophthora infestans infection, plant maturity, maturity corrected resistance, tuber starch content, tuber starch yield (TSY), and tuber yield (TY) of 184 tetraploid potato clones or subsets thereof genotyped with the SolCAP 8.3k SNP array. The cross-validated prediction accuracies with GBLUP and the three Bayesian approaches for the six evaluated traits ranged from about 0.5 to about 0.8. For traits with a high expected genetic complexity, such as TSY and TY, we observed an 8% higher prediction accuracy using a model with additive and dominance effects compared with a model with additive effects only. Our results suggest that for oligogenic traits in general and when diagnostic markers are available in particular, the use of Bayesian methods for genomic prediction is highly recommended and that the diagnostic markers should be modeled as fixed effects. The evaluation of the relative performance of genomic prediction vs. phenotypic selection indicated that the former is superior, assuming cycle lengths and selection intensities that are possible to realize in commercial potato breeding programs. PMID:29563919
Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus
2014-12-01
Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.
Identifying Bacterial Immune Evasion Proteins Using Phage Display.
Fevre, Cindy; Scheepmaker, Lisette; Haas, Pieter-Jan
2017-01-01
Methods aimed at identification of immune evasion proteins are mainly rely on in silico prediction of sequence, structural homology to known evasion proteins or use a proteomics driven approach. Although proven successful these methods are limited by a low efficiency and or lack of functional identification. Here we describe a high-throughput genomic strategy to functionally identify bacterial immune evasion proteins using phage display technology. Genomic bacterial DNA is randomly fragmented and ligated into a phage display vector that is used to create a phage display library expressing bacterial secreted and membrane bound proteins. This library is used to select displayed bacterial secretome proteins that interact with host immune components.
Population Genomics of Fungal and Oomycete Pathogens.
Grünwald, Niklaus J; McDonald, Bruce A; Milgroom, Michael G
2016-08-04
We are entering a new era in plant pathology in which whole-genome sequences of many individuals of a pathogen species are becoming readily available. Population genomics aims to discover genetic mechanisms underlying phenotypes associated with adaptive traits such as pathogenicity, virulence, fungicide resistance, and host specialization, as genome sequences or large numbers of single nucleotide polymorphisms become readily available from multiple individuals of the same species. This emerging field encompasses detailed genetic analyses of natural populations, comparative genomic analyses of closely related species, identification of genes under selection, and linkage analyses involving association studies in natural populations or segregating populations resulting from crosses. The era of pathogen population genomics will provide new opportunities and challenges, requiring new computational and analytical tools. This review focuses on conceptual and methodological issues as well as the approaches to answering questions in population genomics. The major steps start with defining relevant biological and evolutionary questions, followed by sampling, genotyping, and phenotyping, and ending in analytical methods and interpretations. We provide examples of recent applications of population genomics to fungal and oomycete plant pathogens.
Veerkamp, Roel F; Bouwman, Aniek C; Schrooten, Chris; Calus, Mario P L
2016-12-01
Whole-genome sequence data is expected to capture genetic variation more completely than common genotyping panels. Our objective was to compare the proportion of variance explained and the accuracy of genomic prediction by using imputed sequence data or preselected SNPs from a genome-wide association study (GWAS) with imputed whole-genome sequence data. Phenotypes were available for 5503 Holstein-Friesian bulls. Genotypes were imputed up to whole-genome sequence (13,789,029 segregating DNA variants) by using run 4 of the 1000 bull genomes project. The program GCTA was used to perform GWAS for protein yield (PY), somatic cell score (SCS) and interval from first to last insemination (IFL). From the GWAS, subsets of variants were selected and genomic relationship matrices (GRM) were used to estimate the variance explained in 2087 validation animals and to evaluate the genomic prediction ability. Finally, two GRM were fitted together in several models to evaluate the effect of selected variants that were in competition with all the other variants. The GRM based on full sequence data explained only marginally more genetic variation than that based on common SNP panels: for PY, SCS and IFL, genomic heritability improved from 0.81 to 0.83, 0.83 to 0.87 and 0.69 to 0.72, respectively. Sequence data also helped to identify more variants linked to quantitative trait loci and resulted in clearer GWAS peaks across the genome. The proportion of total variance explained by the selected variants combined in a GRM was considerably smaller than that explained by all variants (less than 0.31 for all traits). When selected variants were used, accuracy of genomic predictions decreased and bias increased. Although 35 to 42 variants were detected that together explained 13 to 19% of the total variance (18 to 23% of the genetic variance) when fitted alone, there was no advantage in using dense sequence information for genomic prediction in the Holstein data used in our study. Detection and selection of variants within a single breed are difficult due to long-range linkage disequilibrium. Stringent selection of variants resulted in more biased genomic predictions, although this might be due to the training population being the same dataset from which the selected variants were identified.
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Signatures of natural selection and ecological differentiation in microbial genomes.
Shapiro, B Jesse
2014-01-01
We live in a microbial world. Most of the genetic and metabolic diversity that exists on earth - and has existed for billions of years - is microbial. Making sense of this vast diversity is a daunting task, but one that can be approached systematically by analyzing microbial genome sequences. This chapter explores how the evolutionary forces of recombination and selection act to shape microbial genome sequences, leaving signatures that can be detected using comparative genomics and population-genetic tests for selection. I describe the major classes of tests, paying special attention to their relative strengths and weaknesses when applied to microbes. Specifically, I apply a suite of tests for selection to a set of closely-related bacterial genomes with different microhabitat preferences within the marine water column, shedding light on the genomic mechanisms of ecological differentiation in the wild. I will focus on the joint problem of simultaneously inferring the boundaries between microbial populations, and the selective forces operating within and between populations.
Nishida, I; Sugiura, M; Enju, A; Nakamura, M
2000-12-01
A new isogene for acyl-(acyl-carrier-protein):glycerol-3-phosphate acyltransferase (GPAT; EC 2.3.1.15) in squash has been cloned and the gene product was identified as oleate-selective GPAT. Using PCR primers that could hybridise with exons for a previously cloned squash GPAT, we obtained two PCR products of different size: one coded for a previously cloned squash GPAT corresponding to non-selective isoforms AT2 and AT3, and the other for a new isozyme, probably the oleate-selective isoform AT1. Full-length amino acid sequences of respective isozymes were deduced from the nucleotide sequences of genomic genes and cDNAs, which were cloned by a series of PCR-based methods. Thus, we designated the new gene CmATS1;1 and the other one CmATS1;2. Genome blot analysis revealed that the squash genome contained the two isogenes at non-allelic loci. AT1-active fractions were partially purified, and three polypeptide bands were identified as being AT1 polypeptides, which exhibited relative molecular masses of 39.5-40.5 kDa, pI values of 6.75-7.15, and oleate selectivity over palmitate. Partial amino-terminal sequences obtained from two of these bands verified that the new isogene codes for AT1 polypeptides.
Optimal marker-assisted selection to increase the effective size of small populations.
Wang, J
2001-02-01
An approach to the optimal utilization of marker and pedigree information in minimizing the rates of inbreeding and genetic drift at the average locus of the genome (not just the marked loci) in a small diploid population is proposed, and its efficiency is investigated by stochastic simulations. The approach is based on estimating the expected pedigree of each chromosome by using marker and individual pedigree information and minimizing the average coancestry of selected chromosomes by quadratic integer programming. It is shown that the approach is much more effective and much less computer demanding in implementation than previous ones. For pigs with 10 offspring per mother genotyped for two markers (each with four alleles at equal initial frequency) per chromosome of 100 cM, the approach can increase the average effective size for the whole genome by approximately 40 and 55% if mating ratios (the number of females mated with a male) are 3 and 12, respectively, compared with the corresponding values obtained by optimizing between-family selection using pedigree information only. The efficiency of the marker-assisted selection method increases with increasing amount of marker information (number of markers per chromosome, heterozygosity per marker) and family size, but decreases with increasing genome size. For less prolific species, the approach is still effective if the mating ratio is large so that a high marker-assisted selection pressure on the rarer sex can be maintained.
Da, Yang
2015-12-18
The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.
Abe, Kiyomi; Oshima, Masao; Akasaka, Maiko; Konagaya, Ken-Ichi; Nanasato, Yoshihiko; Okuzaki, Ayako; Taniguchi, Yojiro; Tanaka, Junichi; Tabei, Yutaka
2018-03-01
Genomic selection is attracting attention in the field of crop breeding. To apply genomic selection effectively for autogamous (self-pollinating) crops, an efficient outcross system is desired. Since dominant male sterility is a powerful tool for easy and successive outcross of autogamous crops, we developed transgenic dominant male sterile rice ( Oryza sativa L.) using the barnase gene that is expressed by the tapetum-specific promoter BoA9 . Barnase -induced male sterile rice No. 10 (BMS10) was selected for its stable male sterility and normal growth characteristics. The BMS10 flowering habits, including heading date, flowering date, and daily flowering time of BMS10 tended to be delayed compared to wild type. When BMS10 and wild type were placed side-by-side and crossed under an open-pollinating condition, the seed-setting rate was <1.5%. When the clipping method was used to avoid the influence of late flowering habits, the seed-setting rate of BMS10 increased to a maximum of 86.4%. Although flowering synchronicity should be improved to increase the seed-setting rate, our results showed that this system can produce stable transgenic male sterility with normal female fertility in rice. The transgenic male sterile rice would promote a genomic selection-based breeding system in rice.
Watanabe, Takahito; Noji, Sumihare; Mito, Taro
2016-01-01
Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms.
Application of single-step genomic evaluation for crossbred performance in pig.
Xiang, T; Nielsen, B; Su, G; Legarra, A; Christensen, O F
2016-03-01
Crossbreding is predominant and intensively used in commercial meat production systems, especially in poultry and swine. Genomic evaluation has been successfully applied for breeding within purebreds but also offers opportunities of selecting purebreds for crossbred performance by combining information from purebreds with information from crossbreds. However, it generally requires that all relevant animals are genotyped, which is costly and presently does not seem to be feasible in practice. Recently, a novel single-step BLUP method for genomic evaluation of both purebred and crossbred performance has been developed that can incorporate marker genotypes into a traditional animal model. This new method has not been validated in real data sets. In this study, we applied this single-step method to analyze data for the maternal trait of total number of piglets born in Danish Landrace, Yorkshire, and two-way crossbred pigs in different scenarios. The genetic correlation between purebred and crossbred performances was investigated first, and then the impact of (crossbred) genomic information on prediction reliability for crossbred performance was explored. The results confirm the existence of a moderate genetic correlation, and it was seen that the standard errors on the estimates were reduced when including genomic information. Models with marker information, especially crossbred genomic information, improved model-based reliabilities for crossbred performance of purebred boars and also improved the predictive ability for crossbred animals and, to some extent, reduced the bias of prediction. We conclude that the new single-step BLUP method is a good tool in the genetic evaluation for crossbred performance in purebred animals.
Fuentes-Pardo, Angela P; Ruzzante, Daniel E
2017-10-01
Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.
2013-01-01
Background Artificial selection played an important role in the origin of modern Glycine max cultivars from the wild soybean Glycine soja. To elucidate the consequences of artificial selection accompanying the domestication and modern improvement of soybean, 25 new and 30 published whole-genome re-sequencing accessions, which represent wild, domesticated landrace, and Chinese elite soybean populations were analyzed. Results A total of 5,102,244 single nucleotide polymorphisms (SNPs) and 707,969 insertion/deletions were identified. Among the SNPs detected, 25.5% were not described previously. We found that artificial selection during domestication led to more pronounced reduction in the genetic diversity of soybean than the switch from landraces to elite cultivars. Only a small proportion (2.99%) of the whole genomic regions appear to be affected by artificial selection for preferred agricultural traits. The selection regions were not distributed randomly or uniformly throughout the genome. Instead, clusters of selection hotspots in certain genomic regions were observed. Moreover, a set of candidate genes (4.38% of the total annotated genes) significantly affected by selection underlying soybean domestication and genetic improvement were identified. Conclusions Given the uniqueness of the soybean germplasm sequenced, this study drew a clear picture of human-mediated evolution of the soybean genomes. The genomic resources and information provided by this study would also facilitate the discovery of genes/loci underlying agronomically important traits. PMID:23984715
Future animal improvement programs applied to global populations
USDA-ARS?s Scientific Manuscript database
Breeding programs evolved gradually from within-herd phenotypic selection to local and regional cooperatives to national evaluations and now international evaluations. In the future, breeders may adapt reproductive, computational, and genomic methods to global populations as easily as with national ...
Ecological genomics of natural plant populations: the Israeli perspective.
Nevo, Eviatar
2009-01-01
The genomic era revolutionized evolutionary population biology. The ecological genomics of the wild progenitors of wheat and barley reviewed here was central in the research program of the Institute of Evolution, University of Haifa, since 1975 ( http://evolution.haifa.ac.il ). We explored the following questions: (1) How much of the genomic and phenomic diversity of wild progenitors of cultivars (wild emmer wheat, Triticum dicoccoides, the progenitor of most wheat, plus wild relatives of the Aegilops species; wild barley, Hordeum spontaneum, the progenitor of cultivated barley; wild oat, Avena sterilis, the progenitor of cultivated oats; and wild lettuce species, Lactuca, the progenitor and relatives of cultivated lettuce) are adaptive and processed by natural selection at both coding and noncoding genomic regions? (2) What is the origin and evolution of genomic adaptation and speciation processes and their regulation by mutation, recombination, and transposons under spatiotemporal variables and stressful macrogeographic and microgeographic environments? (3) How much genetic resources are harbored in the wild progenitors for crop improvement? We advanced ecological genetics into ecological genomics and analyzed (regionally across Israel and the entire Near East Fertile Crescent and locally at microsites, focusing on the "Evolution Canyon" model) hundreds of populations and thousands of genotypes for protein (allozyme) and deoxyribonucleic acid (DNA) (coding and noncoding) diversity, partly combined with phenotypic diversity. The environmental stresses analyzed included abiotic (climatic and microclimatic, edaphic) and biotic (pathogens, demographic) stresses. Recently, we introduced genetic maps, cloning, and transformation of candidate genes. Our results indicate abundant genotypic and phenotypic diversity in natural plant populations. The organization and evolution of molecular and organismal diversity in plant populations, at all genomic regions and geographical scales, are nonrandom and are positively correlated with, and partly predictable by, abiotic and biotic environmental heterogeneity and stress. Biodiversity evolution, even in small isolated populations, is primarily driven by natural selection including diversifying, balancing, cyclical, and purifying selection regimes interacting with, but, ultimately, overriding the effects of mutation, migration, and stochasticity. The progenitors of cultivated plants harbor rich genetic resources and are the best hope for crop improvement by both classical and modern biotechnological methods. Future studies should focus on the interplay between structural and functional genome organization focusing on gene regulation.
Evolution and the complexity of bacteriophages.
Serwer, Philip
2007-03-13
The genomes of both long-genome (> 200 Kb) bacteriophages and long-genome eukaryotic viruses have cellular gene homologs whose selective advantage is not explained. These homologs add genomic and possibly biochemical complexity. Understanding their significance requires a definition of complexity that is more biochemically oriented than past empirically based definitions. Initially, I propose two biochemistry-oriented definitions of complexity: either decreased randomness or increased encoded information that does not serve immediate needs. Then, I make the assumption that these two definitions are equivalent. This assumption and recent data lead to the following four-part hypothesis that explains the presence of cellular gene homologs in long bacteriophage genomes and also provides a pathway for complexity increases in prokaryotic cells: (1) Prokaryotes underwent evolutionary increases in biochemical complexity after the eukaryote/prokaryote splits. (2) Some of the complexity increases occurred via multi-step, weak selection that was both protected from strong selection and accelerated by embedding evolving cellular genes in the genomes of bacteriophages and, presumably, also archaeal viruses (first tier selection). (3) The mechanisms for retaining cellular genes in viral genomes evolved under additional, longer-term selection that was stronger (second tier selection). (4) The second tier selection was based on increased access by prokaryotic cells to improved biochemical systems. This access was achieved when DNA transfer moved to prokaryotic cells both the more evolved genes and their more competitive and complex biochemical systems. I propose testing this hypothesis by controlled evolution in microbial communities to (1) determine the effects of deleting individual cellular gene homologs on the growth and evolution of long genome bacteriophages and hosts, (2) find the environmental conditions that select for the presence of cellular gene homologs, (3) determine which, if any, bacteriophage genes were selected for maintaining the homologs and (4) determine the dynamics of homolog evolution. This hypothesis is an explanation of evolutionary leaps in general. If accurate, it will assist both understanding and influencing the evolution of microbes and their communities. Analysis of evolutionary complexity increase for at least prokaryotes should include analysis of genomes of long-genome bacteriophages.
Genomics Assisted Ancestry Deconvolution in Grape
Sawler, Jason; Reisch, Bruce; Aradhya, Mallikarjuna K.; Prins, Bernard; Zhong, Gan-Yuan; Schwaninger, Heidi; Simon, Charles; Buckler, Edward; Myles, Sean
2013-01-01
The genus Vitis (the grapevine) is a group of highly diverse, diploid woody perennial vines consisting of approximately 60 species from across the northern hemisphere. It is the world’s most valuable horticultural crop with ~8 million hectares planted, most of which is processed into wine. To gain insights into the use of wild Vitis species during the past century of interspecific grape breeding and to provide a foundation for marker-assisted breeding programmes, we present a principal components analysis (PCA) based ancestry estimation method to calculate admixture proportions of hybrid grapes in the United States Department of Agriculture grape germplasm collection using genome-wide polymorphism data. We find that grape breeders have backcrossed to both the domesticated V. vinifera and wild Vitis species and that reasonably accurate genome-wide ancestry estimation can be performed on interspecific Vitis hybrids using a panel of fewer than 50 ancestry informative markers (AIMs). We compare measures of ancestry informativeness used in selecting SNP panels for two-way admixture estimation, and verify the accuracy of our method on simulated populations of admixed offspring. Our method of ancestry deconvolution provides a first step towards selection at the seed or seedling stage for desirable admixture profiles, which will facilitate marker-assisted breeding that aims to introgress traits from wild Vitis species while retaining the desirable characteristics of elite V. vinifera cultivars. PMID:24244717
Genomic Signature of Kin Selection in an Ant with Obligately Sterile Workers
Warner, Michael R.; Mikheyev, Alexander S.
2017-01-01
Abstract Kin selection is thought to drive the evolution of cooperation and conflict, but the specific genes and genome-wide patterns shaped by kin selection are unknown. We identified thousands of genes associated with the sterile ant worker caste, the archetype of an altruistic phenotype shaped by kin selection, and then used population and comparative genomic approaches to study patterns of molecular evolution at these genes. Consistent with population genetic theoretical predictions, worker-upregulated genes experienced reduced selection compared with genes upregulated in reproductive castes. Worker-upregulated genes included more taxonomically restricted genes, indicating that the worker caste has recruited more novel genes, yet these genes also experienced reduced selection. Our study identifies a putative genomic signature of kin selection and helps to integrate emerging sociogenomic data with longstanding social evolution theory. PMID:28419349
Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression.
Kim, Soyeon; Baladandayuthapani, Veerabhadran; Lee, J Jack
2017-06-01
In personalized medicine, biomarkers are used to select therapies with the highest likelihood of success based on an individual patient's biomarker/genomic profile. Two goals are to choose important biomarkers that accurately predict treatment outcomes and to cull unimportant biomarkers to reduce the cost of biological and clinical verifications. These goals are challenging due to the high dimensionality of genomic data. Variable selection methods based on penalized regression (e.g., the lasso and elastic net) have yielded promising results. However, selecting the right amount of penalization is critical to simultaneously achieving these two goals. Standard approaches based on cross-validation (CV) typically provide high prediction accuracy with high true positive rates but at the cost of too many false positives. Alternatively, stability selection (SS) controls the number of false positives, but at the cost of yielding too few true positives. To circumvent these issues, we propose prediction-oriented marker selection (PROMISE), which combines SS with CV to conflate the advantages of both methods. Our application of PROMISE with the lasso and elastic net in data analysis shows that, compared to CV, PROMISE produces sparse solutions, few false positives, and small type I + type II error, and maintains good prediction accuracy, with a marginal decrease in the true positive rates. Compared to SS, PROMISE offers better prediction accuracy and true positive rates. In summary, PROMISE can be applied in many fields to select regularization parameters when the goals are to minimize false positives and maximize prediction accuracy.
A decade of pig genome sequencing: a window on pig domestication and evolution.
Groenen, Martien A M
2016-03-29
Insight into how genomes change and adapt due to selection addresses key questions in evolutionary biology and in domestication of animals and plants by humans. In that regard, the pig and its close relatives found in Africa and Eurasia represent an excellent group of species that enables studies of the effect of both natural and human-mediated selection on the genome. The recent completion of the draft genome sequence of a domestic pig and the development of next-generation sequencing technology during the past decade have created unprecedented possibilities to address these questions in great detail. In this paper, I review recent whole-genome sequencing studies in the pig and closely-related species that provide insight into the demography, admixture and selection of these species and, in particular, how domestication and subsequent selection of Sus scrofa have shaped the genomes of these animals.
Optofluidic Cell Selection from Complex Microbial Communities for Single-Genome Analysis
Landry, Zachary C.; Giovanonni, Stephen J.; Quake, Stephen R.; Blainey, Paul C.
2013-01-01
Genetic analysis of single cells is emerging as a powerful approach for studies of heterogeneous cell populations. Indeed, the notion of homogeneous cell populations is receding as approaches to resolve genetic and phenotypic variation between single cells are applied throughout the life sciences. A key step in single-cell genomic analysis today is the physical isolation of individual cells from heterogeneous populations, particularly microbial populations, which often exhibit high diversity. Here, we detail the construction and use of instrumentation for optical trapping inside microfluidic devices to select individual cells for analysis by methods including nucleic acid sequencing. This approach has unique advantages for analyses of rare community members, cells with irregular morphologies, small quantity samples, and studies that employ advanced optical microscopy. PMID:24060116
Fischer, Martin C; Foll, Matthieu; Heckel, Gerald; Excoffier, Laurent
2014-01-01
Genetic adaptation to different environmental conditions is expected to lead to large differences between populations at selected loci, thus providing a signature of positive selection. Whereas balancing selection can maintain polymorphisms over long evolutionary periods and even geographic scale, thus leads to low levels of divergence between populations at selected loci. However, little is known about the relative importance of these two selective forces in shaping genomic diversity, partly due to difficulties in recognizing balancing selection in species showing low levels of differentiation. Here we address this problem by studying genomic diversity in the European common vole (Microtus arvalis) presenting high levels of differentiation between populations (average F ST = 0.31). We studied 3,839 Amplified Fragment Length Polymorphism (AFLP) markers genotyped in 444 individuals from 21 populations distributed across the European continent and hence over different environmental conditions. Our statistical approach to detect markers under selection is based on a Bayesian method specifically developed for AFLP markers, which treats AFLPs as a nearly codominant marker system, and therefore has increased power to detect selection. The high number of screened populations allowed us to detect the signature of balancing selection across a large geographic area. We detected 33 markers potentially under balancing selection, hence strong evidence of stabilizing selection in 21 populations across Europe. However, our analyses identified four-times more markers (138) being under positive selection, and geographical patterns suggest that some of these markers are probably associated with alpine regions, which seem to have environmental conditions that favour adaptation. We conclude that despite favourable conditions in this study for the detection of balancing selection, this evolutionary force seems to play a relatively minor role in shaping the genomic diversity of the common vole, which is more influenced by positive selection and neutral processes like drift and demographic history.
An Efficient, Rapid, and Recyclable System for CRISPR-Mediated Genome Editing in Candida albicans.
Nguyen, Namkha; Quail, Morgan M F; Hernday, Aaron D
2017-01-01
Candida albicans is the most common fungal pathogen of humans. Historically, molecular genetic analysis of this important pathogen has been hampered by the lack of stable plasmids or meiotic cell division, limited selectable markers, and inefficient methods for generating gene knockouts. The recent development of clustered regularly interspaced short palindromic repeat(s) (CRISPR)-based tools for use with C. albicans has opened the door to more efficient genome editing; however, previously reported systems have specific limitations. We report the development of an optimized CRISPR-based genome editing system for use with C. albicans . Our system is highly efficient, does not require molecular cloning, does not leave permanent markers in the genome, and supports rapid, precise genome editing in C. albicans . We also demonstrate the utility of our system for generating two independent homozygous gene knockouts in a single transformation and present a method for generating homozygous wild-type gene addbacks at the native locus. Furthermore, each step of our protocol is compatible with high-throughput strain engineering approaches, thus opening the door to the generation of a complete C. albicans gene knockout library. IMPORTANCE Candida albicans is the major fungal pathogen of humans and is the subject of intense biomedical and discovery research. Until recently, the pace of research in this field has been hampered by the lack of efficient methods for genome editing. We report the development of a highly efficient and flexible genome editing system for use with C. albicans . This system improves upon previously published C. albicans CRISPR systems and enables rapid, precise genome editing without the use of permanent markers. This new tool kit promises to expedite the pace of research on this important fungal pathogen.
Investigative pathology: leading the post-genomic revolution.
Berman, David M; Bosenberg, Marcus W; Orwant, Robin L; Thurberg, Beth L; Draetta, Gulio F; Fletcher, Christopher D M; Loda, Massimo
2012-01-01
The completion of the Human Genome Project and the development of genome-based technologies over the past decade have set the stage for a new era of personalized medicine. By all rights, molecularly trained investigative pathologists should be leading this revolution. Singularly well suited for this work, molecular pathologists have the rare ability to wed genomic tools with unique diagnostic skills and tissue-based pathology techniques for integrated diagnosis of human disease. However, the number of pathologists with expertise in genome-based research has remained relatively low due to outdated training methods and a reluctance among some traditional pathologists to embrace new technologies. Moreover, because budding pathologists may not appreciate the vast selection of jobs available to them, they often end up choosing jobs that focus almost entirely on routine diagnosis rather than new frontiers in molecular pathology. This review calls for changes aimed at rectifying these troubling trends to ensure that pathology continues to guide patient care in a post-genomic era.
Paix, Alexandre; Wang, Yuemeng; Smith, Harold E.; Lee, Chih-Yung S.; Calidas, Deepika; Lu, Tu; Smith, Jarrett; Schmidt, Helen; Krause, Michael W.; Seydoux, Geraldine
2014-01-01
Homology-directed repair (HDR) of double-strand DNA breaks is a promising method for genome editing, but is thought to be less efficient than error-prone nonhomologous end joining in most cell types. We have investigated HDR of double-strand breaks induced by CRISPR-associated protein 9 (Cas9) in Caenorhabditis elegans. We find that HDR is very robust in the C. elegans germline. Linear repair templates with short (∼30–60 bases) homology arms support the integration of base and gene-sized edits with high efficiency, bypassing the need for selection. Based on these findings, we developed a systematic method to mutate, tag, or delete any gene in the C. elegans genome without the use of co-integrated markers or long homology arms. We generated 23 unique edits at 11 genes, including premature stops, whole-gene deletions, and protein fusions to antigenic peptides and GFP. Whole-genome sequencing of five edited strains revealed the presence of passenger variants, but no mutations at predicted off-target sites. The method is scalable for multi-gene editing projects and could be applied to other animals with an accessible germline. PMID:25249454
Bringing the fathead minnow into the genomic era | Science ...
The fathead minnow is a well-established ecotoxicological model organism that has been widely used for regulatory ecotoxicity testing and research for over a half century. While a large amount of molecular information has been gathered on the fathead minnow over the years, the lack of genomic sequence data has limited the utility of the fathead minnow for certain applications. To address this limitation, high-throughput Illumina sequencing technology was employed to sequence the fathead minnow genome. Approximately 100X coverage was achieved by sequencing several libraries of paired-end reads with differing genome insert sizes. Two draft genome assemblies were generated using the SOAPdenovo and String Graph Assembler (SGA) methods, respectively. When these were compared, the SOAPdenovo assembly had a higher scaffold N50 value of 60.4 kbp versus 15.4 kbp, and it also performed better in a Core Eukaryotic Genes Mapping Analysis (CEGMA), mapping 91% versus 67% of genes. As such, this assembly was selected for further development and annotation. The foundation for genome annotation was generated using AUGUSTUS, an ab initio method for gene prediction. A total of 43,345 potential coding sequences were predicted on the genome assembly. These predicted sequences were translated to peptides and queried in a BLAST search against all vertebrates, with 28,290 of these sequences corresponding to zebrafish peptides and 5,242 producing no significant alignments. Additional ty
McKinney, Garrett J; Larson, Wesley A; Seeb, Lisa W; Seeb, James E
2017-05-01
In their recently corrected manuscript, "Breaking RAD: An evaluation of the utility of restriction site associated DNA sequencing for genome scans of adaptation", Lowry et al. argue that genome scans using RADseq will miss many loci under selection due to a combination of sparse marker density and low levels of linkage disequilibrium in most species. We agree that marker density and levels of LD are important considerations when designing a RADseq study; however, we dispute that RAD-based genome scans are as prone to failure as Lowry et al. suggest. Their arguments ignore the flexible nature of RADseq; the availability of different restriction enzymes and capacity for combining restriction enzymes ensures that a well-designed study should be able to generate enough markers for efficient genome coverage. We further believe that simplifying assumptions about linkage disequilibrium in their simulations are invalid in many species. Finally, it is important to note that the alternative methods proposed by Lowry et al. have limitations equal to or greater than RADseq. The wealth of studies with positive impactful findings that have used RAD genome scans instead supports the argument that properly conducted RAD genome scans are an effective method for gaining insight into ecology and evolution, particularly for non-model organisms and those with large or complex genomes. © 2016 John Wiley & Sons Ltd.
Xu, Shuhua
2015-01-01
Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution. PMID:26053627
Richardson, Kris; Schnitzler, Gavin R; Lai, Chao-Qiang; Ordovas, Jose M
2015-12-01
Cardiovascular disease and type 2 diabetes mellitus represent overlapping diseases where a large portion of the variation attributable to genetics remains unexplained. An important player in their pathogenesis is peroxisome proliferator-activated receptor γ (PPARγ) that is involved in lipid and glucose metabolism and maintenance of metabolic homeostasis. We used a functional genomics methodology to interrogate human chromatin immunoprecipitation-sequencing, genome-wide association studies, and expression quantitative trait locus data to inform selection of candidate functional single nucleotide polymorphisms (SNPs) falling in PPARγ motifs. We derived 27 328 chromatin immunoprecipitation-sequencing peaks for PPARγ in human adipocytes through meta-analysis of 3 data sets. The PPARγ consensus motif showed greatest enrichment and mapped to 8637 peaks. We identified 146 SNPs in these motifs. This number was significantly less than would be expected by chance, and Inference of Natural Selection from Interspersed Genomically coHerent elemenTs analysis indicated that these motifs are under weak negative selection. A screen of these SNPs against genome-wide association studies for cardiometabolic traits revealed significant enrichment with 16 SNPs. A screen against the MuTHER expression quantitative trait locus data revealed 8 of these were significantly associated with altered gene expression in human adipose, more than would be expected by chance. Several SNPs fall close, or are linked by expression quantitative trait locus to lipid-metabolism loci including CYP26A1. We demonstrated the use of functional genomics to identify SNPs of potential function. Specifically, that SNPs within PPARγ motifs that bind PPARγ in adipocytes are significantly associated with cardiometabolic disease and with the regulation of transcription in adipose. This method may be used to uncover functional SNPs that do not reach significance thresholds in the agnostic approach of genome-wide association studies. © 2015 American Heart Association, Inc.
Structural Genomics: Correlation Blocks, Population Structure, and Genome Architecture
Hu, Xin-Sheng; Yeh, Francis C.; Wang, Zhiquan
2011-01-01
An integration of the pattern of genome-wide inter-site associations with evolutionary forces is important for gaining insights into the genomic evolution in natural or artificial populations. Here, we assess the inter-site correlation blocks and their distributions along chromosomes. A correlation block is broadly termed as the DNA segment within which strong correlations exist between genetic diversities at any two sites. We bring together the population genetic structure and the genomic diversity structure that have been independently built on different scales and synthesize the existing theories and methods for characterizing genomic structure at the population level. We discuss how population structure could shape correlation blocks and their patterns within and between populations. Effects of evolutionary forces (selection, migration, genetic drift, and mutation) on the pattern of genome-wide correlation blocks are discussed. In eukaryote organisms, we briefly discuss the associations between the pattern of correlation blocks and genome assembly features in eukaryote organisms, including the impacts of multigene family, the perturbation of transposable elements, and the repetitive nongenic sequences and GC-rich isochores. Our reviews suggest that the observable pattern of correlation blocks can refine our understanding of the ecological and evolutionary processes underlying the genomic evolution at the population level. PMID:21886455
Genomics-assisted breeding in fruit trees.
Iwata, Hiroyoshi; Minamikawa, Mai F; Kajiya-Kanegae, Hiromi; Ishimori, Motoyuki; Hayashi, Takeshi
2016-01-01
Recent advancements in genomic analysis technologies have opened up new avenues to promote the efficiency of plant breeding. Novel genomics-based approaches for plant breeding and genetics research, such as genome-wide association studies (GWAS) and genomic selection (GS), are useful, especially in fruit tree breeding. The breeding of fruit trees is hindered by their long generation time, large plant size, long juvenile phase, and the necessity to wait for the physiological maturity of the plant to assess the marketable product (fruit). In this article, we describe the potential of genomics-assisted breeding, which uses these novel genomics-based approaches, to break through these barriers in conventional fruit tree breeding. We first introduce the molecular marker systems and whole-genome sequence data that are available for fruit tree breeding. Next we introduce the statistical methods for biparental linkage and quantitative trait locus (QTL) mapping as well as GWAS and GS. We then review QTL mapping, GWAS, and GS studies conducted on fruit trees. We also review novel technologies for rapid generation advancement. Finally, we note the future prospects of genomics-assisted fruit tree breeding and problems that need to be overcome in the breeding.
Genomics-assisted breeding in fruit trees
Iwata, Hiroyoshi; Minamikawa, Mai F.; Kajiya-Kanegae, Hiromi; Ishimori, Motoyuki; Hayashi, Takeshi
2016-01-01
Recent advancements in genomic analysis technologies have opened up new avenues to promote the efficiency of plant breeding. Novel genomics-based approaches for plant breeding and genetics research, such as genome-wide association studies (GWAS) and genomic selection (GS), are useful, especially in fruit tree breeding. The breeding of fruit trees is hindered by their long generation time, large plant size, long juvenile phase, and the necessity to wait for the physiological maturity of the plant to assess the marketable product (fruit). In this article, we describe the potential of genomics-assisted breeding, which uses these novel genomics-based approaches, to break through these barriers in conventional fruit tree breeding. We first introduce the molecular marker systems and whole-genome sequence data that are available for fruit tree breeding. Next we introduce the statistical methods for biparental linkage and quantitative trait locus (QTL) mapping as well as GWAS and GS. We then review QTL mapping, GWAS, and GS studies conducted on fruit trees. We also review novel technologies for rapid generation advancement. Finally, we note the future prospects of genomics-assisted fruit tree breeding and problems that need to be overcome in the breeding. PMID:27069395
Evolutionary maintenance of filovirus-like genes in bat genomes
2011-01-01
Background Little is known of the biological significance and evolutionary maintenance of integrated non-retroviral RNA virus genes in eukaryotic host genomes. Here, we isolated novel filovirus-like genes from bat genomes and tested for evolutionary maintenance. We also estimated the age of filovirus VP35-like gene integrations and tested the phylogenetic hypotheses that there is a eutherian mammal clade and a marsupial/ebolavirus/Marburgvirus dichotomy for filoviruses. Results We detected homologous copies of VP35-like and NP-like gene integrations in both Old World and New World species of Myotis (bats). We also detected previously unknown VP35-like genes in rodents that are positionally homologous. Comprehensive phylogenetic estimates for filovirus NP-like and VP35-like loci support two main clades with a marsupial and a rodent grouping within the ebolavirus/Lloviu virus/Marburgvirus clade. The concordance of VP35-like, NP-like and mitochondrial gene trees with the expected species tree supports the notion that the copies we examined are orthologs that predate the global spread and radiation of the genus Myotis. Parametric simulations were consistent with selective maintenance for the open reading frame (ORF) of VP35-like genes in Myotis. The ORF of the filovirus-like VP35 gene has been maintained in bat genomes for an estimated 13. 4 MY. ORFs were disrupted for the NP-like genes in Myotis. Likelihood ratio tests revealed that a model that accommodates positive selection is a significantly better fit to the data than a model that does not allow for positive selection for VP35-like sequences. Moreover, site-by-site analysis of selection using two methods indicated at least 25 sites in the VP35-like alignment are under positive selection in Myotis. Conclusions Our results indicate that filovirus-like elements have significance beyond genomic imprints of prior infection. That is, there appears to be, or have been, functionally maintained copies of such genes in mammals. "Living fossils" of filoviruses appear to be selectively maintained in a diverse mammalian genus (Myotis). PMID:22093762
Adapting legume crops to climate change using genomic approaches.
Mousavi-Derazmahalleh, Mahsa; Bayer, Philipp E; Hane, James K; Valliyodan, Babu; Nguyen, Henry T; Nelson, Matthew N; Erskine, William; Varshney, Rajeev K; Papa, Roberto; Edwards, David
2018-03-30
Our agricultural system and hence food security is threatened by combination of events, such as increasing population, the impacts of climate change, and the need to a more sustainable development. Evolutionary adaptation may help some species to overcome environmental changes through new selection pressures driven by climate change. However, success of evolutionary adaptation is dependent on various factors, one of which is the extent of genetic variation available within species. Genomic approaches provide an exceptional opportunity to identify genetic variation that can be employed in crop improvement programs. In this review, we illustrate some of the routinely used genomics-based methods as well as recent breakthroughs, which facilitate assessment of genetic variation and discovery of adaptive genes in legumes. Although additional information is needed, the current utility of selection tools indicate a robust ability to utilize existing variation among legumes to address the challenges of climate uncertainty. © 2018 The Authors. Plant, Cell & Environment Published by John Wiley & Sons Ltd.
Generation of Leishmania Hybrids by Whole Genomic DNA Transformation
Coelho, Adriano C.; Leprohon, Philippe; Ouellette, Marc
2012-01-01
Genetic exchange is a powerful tool to study gene function in microorganisms. Here, we tested the feasibility of generating Leishmania hybrids by electroporating genomic DNA of donor cells into recipient Leishmania parasites. The donor DNA was marked with a drug resistance marker facilitating the selection of DNA transfer into the recipient cells. The transferred DNA was integrated exclusively at homologous locus and was as large as 45 kb. The independent generation of L. infantum hybrids with L. major sequences was possible for several chromosomal regions. Interfering with the mismatch repair machinery by inactivating the MSH2 gene enabled an increased efficiency of recombination between divergent sequences, hence favouring the selection of hybrids between species. Hybrids were shown to acquire the phenotype derived from the donor cells, as demonstrated for the transfer of drug resistance genes from L. major into L. infantum. The described method is a first step allowing the generation of in vitro hybrids for testing gene functions in a natural genomic context in the parasite Leishmania. PMID:23029579
USDA-ARS?s Scientific Manuscript database
A reassociation kinetics-based approach was used to reduce the complexity of genomic DNA from the Deutsch laboratory strain of the cattle tick, Rhipicephalus microplus, to facilitate genome sequencing. Selected genomic DNA (Cot value = 660) was sequenced using 454 GS FLX technology, resulting in 356...
Selective chemical labeling reveals the genome-wide distribution of 5-hydroxymethylcytosine.
Song, Chun-Xiao; Szulwach, Keith E; Fu, Ye; Dai, Qing; Yi, Chengqi; Li, Xuekun; Li, Yujing; Chen, Chih-Hsin; Zhang, Wen; Jian, Xing; Wang, Jing; Zhang, Li; Looney, Timothy J; Zhang, Baichen; Godley, Lucy A; Hicks, Leslie M; Lahn, Bruce T; Jin, Peng; He, Chuan
2011-01-01
In contrast to 5-methylcytosine (5-mC), which has been studied extensively, little is known about 5-hydroxymethylcytosine (5-hmC), a recently identified epigenetic modification present in substantial amounts in certain mammalian cell types. Here we present a method for determining the genome-wide distribution of 5-hmC. We use the T4 bacteriophage β-glucosyltransferase to transfer an engineered glucose moiety containing an azide group onto the hydroxyl group of 5-hmC. The azide group can be chemically modified with biotin for detection, affinity enrichment and sequencing of 5-hmC-containing DNA fragments in mammalian genomes. Using this method, we demonstrate that 5-hmC is present in human cell lines beyond those previously recognized. We also find a gene expression level-dependent enrichment of intragenic 5-hmC in mouse cerebellum and an age-dependent acquisition of this modification in specific gene bodies linked to neurodegenerative disorders.
A Primer on High-Throughput Computing for Genomic Selection
Wu, Xiao-Lin; Beissinger, Timothy M.; Bauck, Stewart; Woodward, Brent; Rosa, Guilherme J. M.; Weigel, Kent A.; Gatti, Natalia de Leon; Gianola, Daniel
2011-01-01
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. PMID:22303303
Terekhanova, Nadezhda V.; Logacheva, Maria D.; Penin, Aleksey A.; Neretina, Tatiana V.; Barmintseva, Anna E.; Bazykin, Georgii A.; Kondrashov, Alexey S.; Mugue, Nikolai S.
2014-01-01
Adaptation is driven by natural selection; however, many adaptations are caused by weak selection acting over large timescales, complicating its study. Therefore, it is rarely possible to study selection comprehensively in natural environments. The threespine stickleback (Gasterosteus aculeatus) is a well-studied model organism with a short generation time, small genome size, and many genetic and genomic tools available. Within this originally marine species, populations have recurrently adapted to freshwater all over its range. This evolution involved extensive parallelism: pre-existing alleles that adapt sticklebacks to freshwater habitats, but are also present at low frequencies in marine populations, have been recruited repeatedly. While a number of genomic regions responsible for this adaptation have been identified, the details of selection remain poorly understood. Using whole-genome resequencing, we compare pooled genomic samples from marine and freshwater populations of the White Sea basin, and identify 19 short genomic regions that are highly divergent between them, including three known inversions. 17 of these regions overlap protein-coding genes, including a number of genes with predicted functions that are relevant for adaptation to the freshwater environment. We then analyze four additional independently derived young freshwater populations of known ages, two natural and two artificially established, and use the observed shifts of allelic frequencies to estimate the strength of positive selection. Adaptation turns out to be quite rapid, indicating strong selection acting simultaneously at multiple regions of the genome, with selection coefficients of up to 0.27. High divergence between marine and freshwater genotypes, lack of reduction in polymorphism in regions responsible for adaptation, and high frequencies of freshwater alleles observed even in young freshwater populations are all consistent with rapid assembly of G. aculeatus freshwater genotypes from pre-existing genomic regions of adaptive variation, with strong selection that favors this assembly acting simultaneously at multiple loci. PMID:25299485
Terekhanova, Nadezhda V; Logacheva, Maria D; Penin, Aleksey A; Neretina, Tatiana V; Barmintseva, Anna E; Bazykin, Georgii A; Kondrashov, Alexey S; Mugue, Nikolai S
2014-10-01
Adaptation is driven by natural selection; however, many adaptations are caused by weak selection acting over large timescales, complicating its study. Therefore, it is rarely possible to study selection comprehensively in natural environments. The threespine stickleback (Gasterosteus aculeatus) is a well-studied model organism with a short generation time, small genome size, and many genetic and genomic tools available. Within this originally marine species, populations have recurrently adapted to freshwater all over its range. This evolution involved extensive parallelism: pre-existing alleles that adapt sticklebacks to freshwater habitats, but are also present at low frequencies in marine populations, have been recruited repeatedly. While a number of genomic regions responsible for this adaptation have been identified, the details of selection remain poorly understood. Using whole-genome resequencing, we compare pooled genomic samples from marine and freshwater populations of the White Sea basin, and identify 19 short genomic regions that are highly divergent between them, including three known inversions. 17 of these regions overlap protein-coding genes, including a number of genes with predicted functions that are relevant for adaptation to the freshwater environment. We then analyze four additional independently derived young freshwater populations of known ages, two natural and two artificially established, and use the observed shifts of allelic frequencies to estimate the strength of positive selection. Adaptation turns out to be quite rapid, indicating strong selection acting simultaneously at multiple regions of the genome, with selection coefficients of up to 0.27. High divergence between marine and freshwater genotypes, lack of reduction in polymorphism in regions responsible for adaptation, and high frequencies of freshwater alleles observed even in young freshwater populations are all consistent with rapid assembly of G. aculeatus freshwater genotypes from pre-existing genomic regions of adaptive variation, with strong selection that favors this assembly acting simultaneously at multiple loci.
Optimization of cDNA-AFLP experiments using genomic sequence data.
Kivioja, Teemu; Arvas, Mikko; Saloheimo, Markku; Penttilä, Merja; Ukkonen, Esko
2005-06-01
cDNA amplified fragment length polymorphism (cDNA-AFLP) is one of the few genome-wide level expression profiling methods capable of finding genes that have not yet been cloned or even predicted from sequence but have interesting expression patterns under the studied conditions. In cDNA-AFLP, a complex cDNA mixture is divided into small subsets using restriction enzymes and selective PCR. A large cDNA-AFLP experiment can require a substantial amount of resources, such as hundreds of PCR amplifications and gel electrophoresis runs, followed by manual cutting of a large number of bands from the gels. Our aim was to test whether this workload can be reduced by rational design of the experiment. We used the available genomic sequence information to optimize cDNA-AFLP experiments beforehand so that as many transcripts as possible could be profiled with a given amount of resources. Optimization of the selection of both restriction enzymes and selective primers for cDNA-AFLP experiments has not been performed previously. The in silico tests performed suggest that substantial amounts of resources can be saved by the optimization of cDNA-AFLP experiments.
Genomic Signatures Reveal New Evidences for Selection of Important Traits in Domestic Cattle
Xu, Lingyang; Bickhart, Derek M.; Cole, John B.; Schroeder, Steven G.; Song, Jiuzhou; Tassell, Curtis P. Van; Sonstegard, Tad S.; Liu, George E.
2015-01-01
We investigated diverse genomic selections using high-density single nucleotide polymorphism data of five distinct cattle breeds. Based on allele frequency differences, we detected hundreds of candidate regions under positive selection across Holstein, Angus, Charolais, Brahman, and N'Dama. In addition to well-known genes such as KIT, MC1R, ASIP, GHR, LCORL, NCAPG, WIF1, and ABCA12, we found evidence for a variety of novel and less-known genes under selection in cattle, such as LAP3, SAR1B, LRIG3, FGF5, and NUDCD3. Selective sweeps near LAP3 were then validated by next-generation sequencing. Genome-wide association analysis involving 26,362 Holsteins confirmed that LAP3 and SAR1B were related to milk production traits, suggesting that our candidate regions were likely functional. In addition, haplotype network analyses further revealed distinct selective pressures and evolution patterns across these five cattle breeds. Our results provided a glimpse into diverse genomic selection during cattle domestication, breed formation, and recent genetic improvement. These findings will facilitate genome-assisted breeding to improve animal production and health. PMID:25431480
Biotechnology of trees: Chestnut
C.D. Nelson; W.A. Powell; S.A. Merkle; J.E. Carlson; F.V. Hebard; N Islam-Faridi; M.E. Staton; L. Georgi
2014-01-01
Biotechnology has been practiced on chestnuts (Castanea spp.) for many decades, including vegetative propagation, controlled crossing followed by testing and selection, genetic and cytogenetic mapping, genetic modifi cation, and gene and genome sequencing. Vegetative propagation methods have ranged from grafting and rooting to somatic embryogenesis, often in...
Lin, Wei; Feng, Rui; Li, Hongzhe
2014-01-01
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online. PMID:26392642
Advances in genetic modification of pluripotent stem cells.
Fontes, Andrew; Lakshmipathy, Uma
2013-11-15
Genetically engineered stem cells aid in dissecting basic cell function and are valuable tools for drug discovery, in vivo cell tracking, and gene therapy. Gene transfer into pluripotent stem cells has been a challenge due to their intrinsic feature of growing in clusters and hence not amenable to common gene delivery methods. Several advances have been made in the rapid assembly of DNA elements, optimization of culture conditions, and DNA delivery methods. This has lead to the development of viral and non-viral methods for transient or stable modification of cells, albeit with varying efficiencies. Most methods require selection and clonal expansion that demand prolonged culture and are not suited for cells with limited proliferative potential. Choosing the right platform based on preferred length, strength, and context of transgene expression is a critical step. Random integration of the transgene into the genome can be complicated due to silencing or altered regulation of expression due to genomic effects. An alternative to this are site-specific methods that target transgenes followed by screening to identify the genomic loci that support long-term expression with stem cell proliferation and differentiation. A highly precise and accurate editing of the genome driven by homology can be achieved using traditional methods as well as the newer technologies such as zinc finger nuclease, TAL effector nucleases and CRISPR. In this review, we summarize the different genetic engineering methods that have been successfully used to create modified embryonic and induced pluripotent stem cells. © 2013. Published by Elsevier Inc. All rights reserved.
A RecET-assisted CRISPR-Cas9 genome editing in Corynebacterium glutamicum.
Wang, Bo; Hu, Qitiao; Zhang, Yu; Shi, Ruilin; Chai, Xin; Liu, Zhe; Shang, Xiuling; Zhang, Yun; Wen, Tingyi
2018-04-23
Extensive modification of genome is an efficient manner to regulate the metabolic network for producing target metabolites or non-native products using Corynebacterium glutamicum as a cell factory. Genome editing approaches by means of homologous recombination and counter-selection markers are laborious and time consuming due to multiple round manipulations and low editing efficiencies. The current two-plasmid-based CRISPR-Cas9 editing methods generate false positives due to the potential instability of Cas9 on the plasmid, and require a high transformation efficiency for co-occurrence of two plasmids transformation. Here, we developed a RecET-assisted CRISPR-Cas9 genome editing method using a chromosome-borne Cas9-RecET and a single plasmid harboring sgRNA and repair templates. The inducible expression of chromosomal RecET promoted the frequencies of homologous recombination, and increased the efficiency for gene deletion. Due to the high transformation efficiency of a single plasmid, this method enabled 10- and 20-kb region deletion, 2.5-, 5.7- and 7.5-kb expression cassette insertion and precise site-specific mutation, suggesting a versatility of this method. Deletion of argR and farR regulators as well as site-directed mutation of argB and pgi genes generated the mutant capable of accumulating L-arginine, indicating the stability of chromosome-borne Cas9 for iterative genome editing. Using this method, the model-predicted target genes were modified to redirect metabolic flux towards 1,2-propanediol biosynthetic pathway. The final engineered strain produced 6.75 ± 0.46 g/L of 1,2-propanediol that is the highest titer reported in C. glutamicum. Furthermore, this method is available for Corynebacterium pekinense 1.563, suggesting its universal applicability in other Corynebacterium species. The RecET-assisted CRISPR-Cas9 genome editing method will facilitate engineering of metabolic networks for the synthesis of interested bio-based products from renewable biomass using Corynebacterium species as cell factories.
Structural Genomics and Drug Discovery for Infectious Diseases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, W.F.
The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging,more » or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.« less
Neugebauer, Tomasz; Bordeleau, Eric; Burrus, Vincent; Brzezinski, Ryszard
2015-01-01
Data visualization methods are necessary during the exploration and analysis activities of an increasingly data-intensive scientific process. There are few existing visualization methods for raw nucleotide sequences of a whole genome or chromosome. Software for data visualization should allow the researchers to create accessible data visualization interfaces that can be exported and shared with others on the web. Herein, novel software developed for generating DNA data visualization interfaces is described. The software converts DNA data sets into images that are further processed as multi-scale images to be accessed through a web-based interface that supports zooming, panning and sequence fragment selection. Nucleotide composition frequencies and GC skew of a selected sequence segment can be obtained through the interface. The software was used to generate DNA data visualization of human and bacterial chromosomes. Examples of visually detectable features such as short and long direct repeats, long terminal repeats, mobile genetic elements, heterochromatic segments in microbial and human chromosomes, are presented. The software and its source code are available for download and further development. The visualization interfaces generated with the software allow for the immediate identification and observation of several types of sequence patterns in genomes of various sizes and origins. The visualization interfaces generated with the software are readily accessible through a web browser. This software is a useful research and teaching tool for genetics and structural genomics.
A segmentation/clustering model for the analysis of array CGH data.
Picard, F; Robin, S; Lebarbier, E; Daudin, J-J
2007-09-01
Microarray-CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming-expectation maximization (DP-EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.
Amyotte, Beatrice; Bowen, Amy J.; Banks, Travis; Rajcan, Istvan; Somers, Daryl J.
2017-01-01
Breeding apples is a long-term endeavour and it is imperative that new cultivars are selected to have outstanding consumer appeal. This study has taken the approach of merging sensory science with genome wide association analyses in order to map the human perception of apple flavour and texture onto the apple genome. The goal was to identify genomic associations that could be used in breeding apples for improved fruit quality. A collection of 85 apple cultivars was examined over two years through descriptive sensory evaluation by a trained sensory panel. The trained sensory panel scored randomized sliced samples of each apple cultivar for seventeen taste, flavour and texture attributes using controlled sensory evaluation practices. In addition, the apple collection was subjected to genotyping by sequencing for marker discovery. A genome wide association analysis suggested significant genomic associations for several sensory traits including juiciness, crispness, mealiness and fresh green apple flavour. The findings include previously unreported genomic regions that could be used in apple breeding and suggest that similar sensory association mapping methods could be applied in other plants. PMID:28231290
Amyotte, Beatrice; Bowen, Amy J; Banks, Travis; Rajcan, Istvan; Somers, Daryl J
2017-01-01
Breeding apples is a long-term endeavour and it is imperative that new cultivars are selected to have outstanding consumer appeal. This study has taken the approach of merging sensory science with genome wide association analyses in order to map the human perception of apple flavour and texture onto the apple genome. The goal was to identify genomic associations that could be used in breeding apples for improved fruit quality. A collection of 85 apple cultivars was examined over two years through descriptive sensory evaluation by a trained sensory panel. The trained sensory panel scored randomized sliced samples of each apple cultivar for seventeen taste, flavour and texture attributes using controlled sensory evaluation practices. In addition, the apple collection was subjected to genotyping by sequencing for marker discovery. A genome wide association analysis suggested significant genomic associations for several sensory traits including juiciness, crispness, mealiness and fresh green apple flavour. The findings include previously unreported genomic regions that could be used in apple breeding and suggest that similar sensory association mapping methods could be applied in other plants.
MIPS: analysis and annotation of genome information in 2007
Mewes, H. W.; Dietmann, S.; Frishman, D.; Gregory, R.; Mannhaupt, G.; Mayer, K. F. X.; Münsterkötter, M.; Ruepp, A.; Spannagl, M.; Stümpflen, V.; Rattei, T.
2008-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:18158298
MIPS: analysis and annotation of genome information in 2007.
Mewes, H W; Dietmann, S; Frishman, D; Gregory, R; Mannhaupt, G; Mayer, K F X; Münsterkötter, M; Ruepp, A; Spannagl, M; Stümpflen, V; Rattei, T
2008-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) combines automatic processing of large amounts of sequences with manual annotation of selected model genomes. Due to the massive growth of the available data, the depth of annotation varies widely between independent databases. Also, the criteria for the transfer of information from known to orthologous sequences are diverse. To cope with the task of global in-depth genome annotation has become unfeasible. Therefore, our efforts are dedicated to three levels of annotation: (i) the curation of selected genomes, in particular from fungal and plant taxa (e.g. CYGD, MNCDB, MatDB), (ii) the comprehensive, consistent, automatic annotation employing exhaustive methods for the computation of sequence similarities and sequence-related attributes as well as the classification of individual sequences (SIMAP, PEDANT and FunCat) and (iii) the compilation of manually curated databases for protein interactions based on scrutinized information from the literature to serve as an accepted set of reliable annotated interaction data (MPACT, MPPI, CORUM). All databases and tools described as well as the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Genome-wide scans for loci under selection in humans
2005-01-01
Natural selection, which can be defined as the differential contribution of genetic variants to future generations, is the driving force of Darwinian evolution. Identifying regions of the human genome that have been targets of natural selection is an important step in clarifying human evolutionary history and understanding how genetic variation results in phenotypic diversity, it may also facilitate the search for complex disease genes. Technological advances in high-throughput DNA sequencing and single nucleotide polymorphism genotyping have enabled several genome-wide scans of natural selection to be undertaken. Here, some of the observations that are beginning to emerge from these studies will be reviewed, including evidence for geographically restricted selective pressures (ie local adaptation) and a relationship between genes subject to natural selection and human disease. In addition, the paper will highlight several important problems that need to be addressed in future genome-wide studies of natural selection. PMID:16004726
Gagnaire, Pierre-Alexandre; Broquet, Thomas; Aurelle, Didier; Viard, Frédérique; Souissi, Ahmed; Bonhomme, François; Arnaud-Haond, Sophie; Bierne, Nicolas
2015-01-01
Estimating the rate of exchange of individuals among populations is a central concern to evolutionary ecology and its applications to conservation and management. For instance, the efficiency of protected areas in sustaining locally endangered populations and ecosystems depends on reserve network connectivity. The population genetics theory offers a powerful framework for estimating dispersal distances and migration rates from molecular data. In the marine realm, however, decades of molecular studies have met limited success in inferring genetic connectivity, due to the frequent lack of spatial genetic structure in species exhibiting high fecundity and dispersal capabilities. This is especially true within biogeographic regions bounded by well-known hotspots of genetic differentiation. Here, we provide an overview of the current methods for estimating genetic connectivity using molecular markers and propose several directions for improving existing approaches using large population genomic datasets. We highlight several issues that limit the effectiveness of methods based on neutral markers when there is virtually no genetic differentiation among samples. We then focus on alternative methods based on markers influenced by selection. Although some of these methodologies are still underexplored, our aim was to stimulate new research to test how broadly they are applicable to nonmodel marine species. We argue that the increased ability to apply the concepts of cline analyses will improve dispersal inferences across physical and ecological barriers that reduce connectivity locally. We finally present how neutral markers hitchhiking with selected loci can also provide information about connectivity patterns within apparently well-mixed biogeographic regions. We contend that one of the most promising applications of population genomics is the use of outlier loci to delineate relevant conservation units and related eco-geographic features across which connectivity can be measured. PMID:26366195
Use of direct and iterative solvers for estimation of SNP effects in genome-wide selection
2010-01-01
The aim of this study was to compare iterative and direct solvers for estimation of marker effects in genomic selection. One iterative and two direct methods were used: Gauss-Seidel with Residual Update, Cholesky Decomposition and Gentleman-Givens rotations. For resembling different scenarios with respect to number of markers and of genotyped animals, a simulated data set divided into 25 subsets was used. Number of markers ranged from 1,200 to 5,925 and number of animals ranged from 1,200 to 5,865. Methods were also applied to real data comprising 3081 individuals genotyped for 45181 SNPs. Results from simulated data showed that the iterative solver was substantially faster than direct methods for larger numbers of markers. Use of a direct solver may allow for computing (co)variances of SNP effects. When applied to real data, performance of the iterative method varied substantially, depending on the level of ill-conditioning of the coefficient matrix. From results with real data, Gentleman-Givens rotations would be the method of choice in this particular application as it provided an exact solution within a fairly reasonable time frame (less than two hours). It would indeed be the preferred method whenever computer resources allow its use. PMID:21637627
RUCS: rapid identification of PCR primers for unique core sequences.
Thomsen, Martin Christen Frølund; Hasman, Henrik; Westh, Henrik; Kaya, Hülya; Lund, Ole
2017-12-15
Designing PCR primers to target a specific selection of whole genome sequenced strains can be a long, arduous and sometimes impractical task. Such tasks would benefit greatly from an automated tool to both identify unique targets, and to validate the vast number of potential primer pairs for the targets in silico. Here we present RUCS, a program that will find PCR primer pairs and probes for the unique core sequences of a positive genome dataset complement to a negative genome dataset. The resulting primer pairs and probes are in addition to simple selection also validated through a complex in silico PCR simulation. We compared our method, which identifies the unique core sequences, against an existing tool called ssGeneFinder, and found that our method was 6.5-20 times more sensitive. We used RUCS to design primer pairs that would target a set of genomes known to contain the mcr-1 colistin resistance gene. Three of the predicted pairs were chosen for experimental validation using PCR and gel electrophoresis. All three pairs successfully produced an amplicon with the target length for the samples containing mcr-1 and no amplification products were produced for the negative samples. The novel methods presented in this manuscript can reduce the time needed to identify target sequences, and provide a quick virtual PCR validation to eliminate time wasted on ambiguously binding primers. Source code is freely available on https://bitbucket.org/genomicepidemiology/rucs. Web service is freely available on https://cge.cbs.dtu.dk/services/RUCS. mcft@cbs.dtu.dk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming
2012-07-01
We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo
2017-01-05
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C
2014-06-01
Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection.
Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C
2014-01-01
Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection. PMID:24518889
Xiao, Shijun; Wang, Panpan; Dong, Linsong; Zhang, Yaguang; Han, Zhaofang; Wang, Qiurong
2016-01-01
Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms. PMID:28028455
Genome-Wide Specific Selection in Three Domestic Sheep Breeds.
Wang, Huihua; Zhang, Li; Cao, Jiaxve; Wu, Mingming; Ma, Xiaomeng; Liu, Zhen; Liu, Ruizao; Zhao, Fuping; Wei, Caihong; Du, Lixin
2015-01-01
Commercial sheep raised for mutton grow faster than traditional Chinese sheep breeds. Here, we aimed to evaluate genetic selection among three different types of sheep breed: two well-known commercial mutton breeds and one indigenous Chinese breed. We first combined locus-specific branch lengths and di statistical methods to detect candidate regions targeted by selection in the three different populations. The results showed that the genetic distances reached at least medium divergence for each pairwise combination. We found these two methods were highly correlated, and identified many growth-related candidate genes undergoing artificial selection. For production traits, APOBR and FTO are associated with body mass index. For meat traits, ALDOA, STK32B and FAM190A are related to marbling. For reproduction traits, CCNB2 and SLC8A3 affect oocyte development. We also found two well-known genes, GHR (which affects meat production and quality) and EDAR (associated with hair thickness) were associated with German mutton merino sheep. Furthermore, four genes (POL, RPL7, MSL1 and SHISA9) were associated with pre-weaning gain in our previous genome-wide association study. Our results indicated that combine locus-specific branch lengths and di statistical approaches can reduce the searching ranges for specific selection. And we got many credible candidate genes which not only confirm the results of previous reports, but also provide a suite of novel candidate genes in defined breeds to guide hybridization breeding.
Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S
2011-01-01
Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.
Haasl, Ryan J.; Payseur, Bret A.
2016-01-01
Genomewide scans for natural selection (GWSS) have become increasingly common over the last 15 years due to increased availability of genome-scale genetic data. Here, we report a representative survey of GWSS from 1999 to present and find that (i) between 1999 and 2009, 35 of 49 (71%) GWSS focused on human, while from 2010 to present, only 38 of 83 (46%) of GWSS focused on human, indicating increased focus on nonmodel organisms; (ii) the large majority of GWSS incorporate interpopulation or interspecific comparisons using, for example FST, cross-population extended haplotype homozygosity or the ratio of nonsynonymous to synonymous substitutions; (iii) most GWSS focus on detection of directional selection rather than other modes such as balancing selection; and (iv) in human GWSS, there is a clear shift after 2004 from microsatellite markers to dense SNP data. A survey of GWSS meant to identify loci positively selected in response to severe hypoxic conditions support an approach to GWSS in which a list of a priori candidate genes based on potential selective pressures are used to filter the list of significant hits a posteriori. We also discuss four frequently ignored determinants of genomic heterogeneity that complicate GWSS: mutation, recombination, selection and the genetic architecture of adaptive traits. We recommend that GWSS methodology should better incorporate aspects of genomewide heterogeneity using empirical estimates of relevant parameters and/or realistic, whole-chromosome simulations to improve interpretation of GWSS results. Finally, we argue that knowledge of potential selective agents improves interpretation of GWSS results and that new methods focused on correlations between environmental variables and genetic variation can help automate this approach. PMID:26224644
USDA-ARS?s Scientific Manuscript database
Background: BAC-based physical maps provide for sequencing across an entire genome or selected sub-genome regions of biological interest. Using the minimum tiling path as a guide, it is possible to select specific BAC clones from prioritized genome sections such as a genetically defined QTL interv...
USDA-ARS?s Scientific Manuscript database
In this study, we aimed to (1) predict genomic estimated breeding value (GEBV) for bacterial cold water disease (BCWD) resistance by genotyping training (n=583) and validation samples (n=53) with two genotyping platforms (24K RAD-SNP and 49K SNP) and using different genomic selection (GS) models (Ba...
Stochastic model search with binary outcomes for genome-wide association studies.
Russu, Alberto; Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo
2012-06-01
The spread of case-control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model.
Theory of prokaryotic genome evolution.
Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V
2016-10-11
Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.
Watanabe, Takahito; Noji, Sumihare; Mito, Taro
2014-08-15
Hemimetabolous, or incompletely metamorphosing, insects are phylogenetically basal. These insects include many deleterious species. The cricket, Gryllus bimaculatus, is an emerging model for hemimetabolous insects, based on the success of RNA interference (RNAi)-based gene-functional analyses and transgenic technology. Taking advantage of genome-editing technologies in this species would greatly promote functional genomics studies. Genome editing using transcription activator-like effector nucleases (TALENs) has proven to be an effective method for site-specific genome manipulation in various species. TALENs are artificial nucleases that are capable of inducing DNA double-strand breaks into specified target sequences. Here, we describe a protocol for TALEN-based gene knockout in G. bimaculatus, including a mutant selection scheme via mutation detection assays, for generating homozygous knockout organisms. Copyright © 2014 Elsevier Inc. All rights reserved.
Selective significance of genome size in a plant community with heavy metal pollution.
Vidic, T; Greilhuber, J; Vilhar, B; Dermastia, M
2009-09-01
In eukaryotes, nuclear genome sizes vary by more than five orders of magnitude. This variation is not related to organismal complexity, and its origin and biological significance are still disputed. One of the open questions is whether genome size has an adaptive role. We tested the hypothesis that genome size has selective significance, using five grassland communities occurring on a gradient of metal pollution of the soil as a model. We detected a negative correlation between the concentration of contaminating metals in the soil and the number of vascular plant species. Analysis of genome sizes of 70 herbaceous dicot perennial species occurring on the investigated plots revealed a negative correlation between the concentration of contaminating metals in the soil and the proportion of species with large genomes in plant communities. Consistent with the hypothesis, these results show that species with large genomes are at selective disadvantage in extreme environmental conditions.
Suchan, Tomasz; Pitteloud, Camille; Gerasimova, Nadezhda S.; Kostikova, Anna; Schmid, Sarah; Arrigo, Nils; Pajkovic, Mila; Ronikier, Michał; Alvarez, Nadir
2016-01-01
In the recent years, many protocols aimed at reproducibly sequencing reduced-genome subsets in non-model organisms have been published. Among them, RAD-sequencing is one of the most widely used. It relies on digesting DNA with specific restriction enzymes and performing size selection on the resulting fragments. Despite its acknowledged utility, this method is of limited use with degraded DNA samples, such as those isolated from museum specimens, as these samples are less likely to harbor fragments long enough to comprise two restriction sites making possible ligation of the adapter sequences (in the case of double-digest RAD) or performing size selection of the resulting fragments (in the case of single-digest RAD). Here, we address these limitations by presenting a novel method called hybridization RAD (hyRAD). In this approach, biotinylated RAD fragments, covering a random fraction of the genome, are used as baits for capturing homologous fragments from genomic shotgun sequencing libraries. This simple and cost-effective approach allows sequencing of orthologous loci even from highly degraded DNA samples, opening new avenues of research in the field of museum genomics. Not relying on the restriction site presence, it improves among-sample loci coverage. In a trial study, hyRAD allowed us to obtain a large set of orthologous loci from fresh and museum samples from a non-model butterfly species, with a high proportion of single nucleotide polymorphisms present in all eight analyzed specimens, including 58-year-old museum samples. The utility of the method was further validated using 49 museum and fresh samples of a Palearctic grasshopper species for which the spatial genetic structure was previously assessed using mtDNA amplicons. The application of the method is eventually discussed in a wider context. As it does not rely on the restriction site presence, it is therefore not sensitive to among-sample loci polymorphisms in the restriction sites that usually causes loci dropout. This should enable the application of hyRAD to analyses at broader evolutionary scales. PMID:26999359
Isolation of single Chlamydia-infected cells using laser microdissection.
Podgorny, Oleg V; Polina, Nadezhda F; Babenko, Vladislav V; Karpova, Irina Y; Kostryukova, Elena S; Govorun, Vadim M; Lazarev, Vassili N
2015-02-01
Chlamydia are obligate intracellular parasites of humans and animals that cause a wide range of acute and chronic infections. To elucidate the genetic basis of chlamydial parasitism, several approaches for making genetic modifications to Chlamydia have recently been reported. However, the lack of the available methods for the fast and effective selection of genetically modified bacteria restricts the application of genetic tools. We suggest the use of laser microdissection to isolate of single live Chlamydia-infected cells for the re-cultivation and whole-genome sequencing of single inclusion-derived Chlamydia. To visualise individual infected cells, we made use of the vital labelling of inclusions with the fluorescent Golgi-specific dye BODIPY® FL C5-ceramide. We demonstrated that single Chlamydia-infected cells isolated by laser microdissection and placed onto a host cell monolayer resulted in new cycles of infection. We also demonstrated the successful use of whole-genome sequencing to study the genomic variability of Chlamydia derived from a single inclusion. Our work provides the first evidence of the successful use of laser microdissection for the isolation of single live Chlamydia-infected cells, thus demonstrating that this method can help overcome the barriers to the fast and effective selection of Chlamydia. Copyright © 2014 Elsevier B.V. All rights reserved.
Genomic prediction of reproduction traits for Merino sheep.
Bolormaa, S; Brown, D J; Swan, A A; van der Werf, J H J; Hayes, B J; Daetwyler, H D
2017-06-01
Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross-validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross-validations. The accuracies of GEBVs from random cross-validations (range 0.17-0.61) were higher than were those from sire family cross-validations (range 0.00-0.51). The GEBV accuracies of 0.41-0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy, it identified several candidate genes which are known to be associated with NLB and LSIZE. The approach provides a way to make use of all data available in genomic prediction for traits that have limited recording. © 2017 Stichting International Foundation for Animal Genetics.
Assessing Predictive Properties of Genome-Wide Selection in Soybeans
Xavier, Alencar; Muir, William M.; Rainey, Katy Martin
2016-01-01
Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786
The genome landscape of indigenous African cattle.
Kim, Jaemin; Hanotte, Olivier; Mwai, Okeyo Ally; Dessie, Tadelle; Bashir, Salim; Diallo, Boubacar; Agaba, Morris; Kim, Kwondo; Kwak, Woori; Sung, Samsun; Seo, Minseok; Jeong, Hyeonsoo; Kwon, Taehyung; Taye, Mengistie; Song, Ki-Duk; Lim, Dajeong; Cho, Seoae; Lee, Hyun-Jeong; Yoon, Duhak; Oh, Sung Jong; Kemp, Stephen; Lee, Hak-Kyo; Kim, Heebal
2017-02-20
The history of African indigenous cattle and their adaptation to environmental and human selection pressure is at the root of their remarkable diversity. Characterization of this diversity is an essential step towards understanding the genomic basis of productivity and adaptation to survival under African farming systems. We analyze patterns of African cattle genetic variation by sequencing 48 genomes from five indigenous populations and comparing them to the genomes of 53 commercial taurine breeds. We find the highest genetic diversity among African zebu and sanga cattle. Our search for genomic regions under selection reveals signatures of selection for environmental adaptive traits. In particular, we identify signatures of selection including genes and/or pathways controlling anemia and feeding behavior in the trypanotolerant N'Dama, coat color and horn development in Ankole, and heat tolerance and tick resistance across African cattle especially in zebu breeds. Our findings unravel at the genome-wide level, the unique adaptive diversity of African cattle while emphasizing the opportunities for sustainable improvement of livestock productivity on the continent.
Detecting signatures of positive selection associated with musical aptitude in the human genome
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
2016-01-01
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude. PMID:26879527
Detecting signatures of positive selection associated with musical aptitude in the human genome.
Liu, Xuanyao; Kanduri, Chakravarthi; Oikkonen, Jaana; Karma, Kai; Raijas, Pirre; Ukkola-Vuoti, Liisa; Teo, Yik-Ying; Järvelä, Irma
2016-02-16
Abilities related to musical aptitude appear to have a long history in human evolution. To elucidate the molecular and evolutionary background of musical aptitude, we compared genome-wide genotyping data (641 K SNPs) of 148 Finnish individuals characterized for musical aptitude. We assigned signatures of positive selection in a case-control setting using three selection methods: haploPS, XP-EHH and FST. Gene ontology classification revealed that the positive selection regions contained genes affecting inner-ear development. Additionally, literature survey has shown that several of the identified genes were known to be involved in auditory perception (e.g. GPR98, USH2A), cognition and memory (e.g. GRIN2B, IL1A, IL1B, RAPGEF5), reward mechanisms (RGS9), and song perception and production of songbirds (e.g. FOXP1, RGS9, GPR98, GRIN2B). Interestingly, genes related to inner-ear development and cognition were also detected in a previous genome-wide association study of musical aptitude. However, the candidate genes detected in this study were not reported earlier in studies of musical abilities. Identification of genes related to language development (FOXP1 and VLDLR) support the popular hypothesis that music and language share a common genetic and evolutionary background. The findings are consistent with the evolutionary conservation of genes related to auditory processes in other species and provide first empirical evidence for signatures of positive selection for abilities that contribute to musical aptitude.
High degree of genetic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus).
Defaveri, Jacquelin; Shikano, Takahito; Shimada, Yukinori; Merilä, Juha
2013-09-01
Populations of widespread marine organisms are typically characterized by a low degree of genetic differentiation in neutral genetic markers, but much less is known about differentiation in genes whose functional roles are associated with specific selection regimes. To uncover possible adaptive population divergence and heterogeneous genomic differentiation in marine three-spined sticklebacks (Gasterosteus aculeatus), we used a candidate gene-based genome-scan approach to analyse variability in 138 microsatellite loci located within/close to (<6 kb) functionally important genes in samples collected from ten geographic locations. The degree of genetic differentiation in markers classified as neutral or under balancing selection-as determined with several outlier detection methods-was low (F(ST) = 0.033 or 0.011, respectively), whereas average FST for directionally selected markers was significantly higher (F(ST) = 0.097). Clustering analyses provided support for genomic and geographic heterogeneity in selection: six genetic clusters were identified based on allele frequency differences in the directionally selected loci, whereas four were identified with the neutral loci. Allelic variation in several loci exhibited significant associations with environmental variables, supporting the conjecture that temperature and salinity, but not optic conditions, are important drivers of adaptive divergence among populations. In general, these results suggest that in spite of the high degree of physical connectivity and gene flow as inferred from neutral marker genes, marine stickleback populations are strongly genetically structured in loci associated with functionally relevant genes. © 2013 John Wiley & Sons Ltd.
ovoD Co-selection: A Method for Enriching CRISPR/Cas9-Edited Alleles in Drosophila.
Ewen-Campen, Ben; Perrimon, Norbert
2018-06-22
Screening for successful CRISPR/Cas9 editing events remains a time consuming technical bottleneck in the field of Drosophila genome editing. This step can be particularly laborious for events that do not cause a visible phenotype, or those which occur at relatively low frequency. A promising strategy to enrich for desired CRISPR events is to co-select for an independent CRISPR event that produces an easily detectable phenotype. Here, we describe a simple negative co-selection strategy involving CRISPR-editing of a dominant female sterile allele, ovo D1 In this system (" ovo D co-selection"), the only functional germ cells in injected females are those that have been edited at the ovo D1 locus, and thus all offspring of these flies have undergone editing of at least one locus. We demonstrate that ovo D co-selection can be used to enrich for knock-out mutagenesis via nonhomologous end-joining (NHEJ), and for knock-in alleles via homology-directed repair (HDR). Altogether, our results demonstrate that ovoD co-selection reduces the amount of screening necessary to isolate desired CRISPR events in Drosophila. Copyright © 2018, G3: Genes, Genomes, Genetics.
Konijnendijk, Nellie; Shikano, Takahito; Daneels, Dorien; Volckaert, Filip A M; Raeymaekers, Joost A M
2015-09-01
Local adaptation is often obvious when gene flow is impeded, such as observed at large spatial scales and across strong ecological contrasts. However, it becomes less certain at small scales such as between adjacent populations or across weak ecological contrasts, when gene flow is strong. While studies on genomic adaptation tend to focus on the former, less is known about the genomic targets of natural selection in the latter situation. In this study, we investigate genomic adaptation in populations of the three-spined stickleback Gasterosteus aculeatus L. across a small-scale ecological transition with salinities ranging from brackish to fresh. Adaptation to salinity has been repeatedly demonstrated in this species. A genome scan based on 87 microsatellite markers revealed only few signatures of selection, likely owing to the constraints that homogenizing gene flow puts on adaptive divergence. However, the detected loci appear repeatedly as targets of selection in similar studies of genomic adaptation in the three-spined stickleback. We conclude that the signature of genomic selection in the face of strong gene flow is weak, yet detectable. We argue that the range of studies of genomic divergence should be extended to include more systems characterized by limited geographical and ecological isolation, which is often a realistic setting in nature.
Bahbahani, Hussain; Clifford, Harry; Wragg, David; Mbole-Kariuki, Mary N; Van Tassell, Curtis; Sonstegard, Tad; Woolhouse, Mark; Hanotte, Olivier
2015-01-01
The small East African Shorthorn Zebu (EASZ) is the main indigenous cattle across East Africa. A recent genome wide SNP analysis revealed an ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signatures of positive selection in their genome, with the aim to provide qualitative insights about the corresponding selective pressures. Four hundred and twenty-five EASZ and four reference populations (Holstein-Friesian, Jersey, N’Dama and Nellore) were analysed using 46,171 SNPs covering all autosomes and the X chromosome. Following FST and two extended haplotype homozygosity-based (iHS and Rsb) analyses 24 candidate genome regions within 14 autosomes and the X chromosome were revealed, in which 18 and 4 were previously identified in tropical-adapted and commercial breeds, respectively. These regions overlap with 340 bovine QTL. They include 409 annotated genes, in which 37 were considered as candidates. These genes are involved in various biological pathways (e.g. immunity, reproduction, development and heat tolerance). Our results support that different selection pressures (e.g. environmental constraints, human selection, genome admixture constrains) have shaped the genome of EASZ. We argue that these candidate regions represent genome landmarks to be maintained in breeding programs aiming to improve sustainable livestock productivity in the tropics. PMID:26130263
Orozco-terWengel, Pablo; Kapun, Martin; Nolte, Viola; Kofler, Robert; Flatt, Thomas; Schlötterer, Christian
2012-10-01
The genomic basis of adaptation to novel environments is a fundamental problem in evolutionary biology that has gained additional importance in the light of the recent global change discussion. Here, we combined laboratory natural selection (experimental evolution) in Drosophila melanogaster with genome-wide next generation sequencing of DNA pools (Pool-Seq) to identify alleles that are favourable in a novel laboratory environment and traced their trajectories during the adaptive process. Already after 15 generations, we identified a pronounced genomic response to selection, with almost 5000 single nucleotide polymorphisms (SNP; genome-wide false discovery rates < 0.005%) deviating from neutral expectation. Importantly, the evolutionary trajectories of the selected alleles were heterogeneous, with the alleles falling into two distinct classes: (i) alleles that continuously rise in frequency; and (ii) alleles that at first increase rapidly but whose frequencies then reach a plateau. Our data thus suggest that the genomic response to selection can involve a large number of selected SNPs that show unexpectedly complex evolutionary trajectories, possibly due to nonadditive effects. © 2012 Blackwell Publishing Ltd.
Sequencing technologies - the next generation.
Metzker, Michael L
2010-01-01
Demand has never been greater for revolutionary technologies that deliver fast, inexpensive and accurate genome information. This challenge has catalysed the development of next-generation sequencing (NGS) technologies. The inexpensive production of large volumes of sequence data is the primary advantage over conventional methods. Here, I present a technical review of template preparation, sequencing and imaging, genome alignment and assembly approaches, and recent advances in current and near-term commercially available NGS instruments. I also outline the broad range of applications for NGS technologies, in addition to providing guidelines for platform selection to address biological questions of interest.
Schrider, Daniel R; Kern, Andrew D
2014-06-09
Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly, this signal has been exploited for over a decade through the use of genomic comparisons of distantly related species. While this is so, the functional complement of the genome changes extensively across time and between lineages; therefore, evidence of the current action of purifying selection in humans is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here, we assess the potential of this approach by examining an ultradeep sample of human mitochondrial genomes (n = 16,411). We show that the high density of polymorphism in this data set precisely delineates regions experiencing purifying selection. Furthermore, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mitochondrial DNA (ρ = 0.51; P < 2.2 × 10(-16)). These two measures track one another at a remarkably fine scale across many loci-a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal with surprising precision which regions in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete human genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Rübben, Albert; Nordhoff, Ole
2013-01-01
Summary Most clinically distinguishable malignant tumors are characterized by specific mutations, specific patterns of chromosomal rearrangements and a predominant mechanism of genetic instability but it remains unsolved whether modifications of cancer genomes can be explained solely by mutations and selection through the cancer microenvironment. It has been suggested that internal dynamics of genomic modifications as opposed to the external evolutionary forces have a significant and complex impact on Darwinian species evolution. A similar situation can be expected for somatic cancer evolution as molecular key mechanisms encountered in species evolution also constitute prevalent mutation mechanisms in human cancers. This assumption is developed into a systems approach of carcinogenesis which focuses on possible inner constraints of the genome architecture on lineage selection during somatic cancer evolution. The proposed systems approach can be considered an analogy to the concept of evolvability in species evolution. The principal hypothesis is that permissive or restrictive effects of the genome architecture on lineage selection during somatic cancer evolution exist and have a measurable impact. The systems approach postulates three classes of lineage selection effects of the genome architecture on somatic cancer evolution: i) effects mediated by changes of fitness of cells of cancer lineage, ii) effects mediated by changes of mutation probabilities and iii) effects mediated by changes of gene designation and physical and functional genome redundancy. Physical genome redundancy is the copy number of identical genetic sequences. Functional genome redundancy of a gene or a regulatory element is defined as the number of different genetic elements, regardless of copy number, coding for the same specific biological function within a cancer cell. Complex interactions of the genome architecture on lineage selection may be expected when modifications of the genome architecture have multiple and possibly opposed effects which manifest themselves at disparate times and progression stages. Dissection of putative mechanisms mediating constraints exerted by the genome architecture on somatic cancer evolution may provide an algorithm for understanding and predicting as well as modifying somatic cancer evolution in individual patients. PMID:23336076
Genome wide selection in Citrus breeding.
Gois, I B; Borém, A; Cristofani-Yaly, M; de Resende, M D V; Azevedo, C F; Bastianel, M; Novelli, V M; Machado, M A
2016-10-17
Genome wide selection (GWS) is essential for the genetic improvement of perennial species such as Citrus because of its ability to increase gain per unit time and to enable the efficient selection of characteristics with low heritability. This study assessed GWS efficiency in a population of Citrus and compared it with selection based on phenotypic data. A total of 180 individual trees from a cross between Pera sweet orange (Citrus sinensis Osbeck) and Murcott tangor (Citrus sinensis Osbeck x Citrus reticulata Blanco) were evaluated for 10 characteristics related to fruit quality. The hybrids were genotyped using 5287 DArT_seq TM (diversity arrays technology) molecular markers and their effects on phenotypes were predicted using the random regression - best linear unbiased predictor (rr-BLUP) method. The predictive ability, prediction bias, and accuracy of GWS were estimated to verify its effectiveness for phenotype prediction. The proportion of genetic variance explained by the markers was also computed. The heritability of the traits, as determined by markers, was 16-28%. The predictive ability of these markers ranged from 0.53 to 0.64, and the regression coefficients between predicted and observed phenotypes were close to unity. Over 35% of the genetic variance was accounted for by the markers. Accuracy estimates with GWS were lower than those obtained by phenotypic analysis; however, GWS was superior in terms of genetic gain per unit time. Thus, GWS may be useful for Citrus breeding as it can predict phenotypes early and accurately, and reduce the length of the selection cycle. This study demonstrates the feasibility of genomic selection in Citrus.
Accuracy of genomic selection in European maize elite breeding populations.
Zhao, Yusheng; Gowda, Manje; Liu, Wenxin; Würschum, Tobias; Maurer, Hans P; Longin, Friedrich H; Ranc, Nicolas; Reif, Jochen C
2012-03-01
Genomic selection is a promising breeding strategy for rapid improvement of complex traits. The objective of our study was to investigate the prediction accuracy of genomic breeding values through cross validation. The study was based on experimental data of six segregating populations from a half-diallel mating design with 788 testcross progenies from an elite maize breeding program. The plants were intensively phenotyped in multi-location field trials and fingerprinted with 960 SNP markers. We used random regression best linear unbiased prediction in combination with fivefold cross validation. The prediction accuracy across populations was higher for grain moisture (0.90) than for grain yield (0.58). The accuracy of genomic selection realized for grain yield corresponds to the precision of phenotyping at unreplicated field trials in 3-4 locations. As for maize up to three generations are feasible per year, selection gain per unit time is high and, consequently, genomic selection holds great promise for maize breeding programs.
Schlötterer, C; Kofler, R; Versace, E; Tobler, R; Franssen, S U
2015-05-01
Evolve and resequence (E&R) is a new approach to investigate the genomic responses to selection during experimental evolution. By using whole genome sequencing of pools of individuals (Pool-Seq), this method can identify selected variants in controlled and replicable experimental settings. Reviewing the current state of the field, we show that E&R can be powerful enough to identify causative genes and possibly even single-nucleotide polymorphisms. We also discuss how the experimental design and the complexity of the trait could result in a large number of false positive candidates. We suggest experimental and analytical strategies to maximize the power of E&R to uncover the genotype-phenotype link and serve as an important research tool for a broad range of evolutionary questions.
2013-01-01
Background Genomic selection is an appealing method to select purebreds for crossbred performance. In the case of crossbred records, single nucleotide polymorphism (SNP) effects can be estimated using an additive model or a breed-specific allele model. In most studies, additive gene action is assumed. However, dominance is the likely genetic basis of heterosis. Advantages of incorporating dominance in genomic selection were investigated in a two-way crossbreeding program for a trait with different magnitudes of dominance. Training was carried out only once in the simulation. Results When the dominance variance and heterosis were large and overdominance was present, a dominance model including both additive and dominance SNP effects gave substantially greater cumulative response to selection than the additive model. Extra response was the result of an increase in heterosis but at a cost of reduced purebred performance. When the dominance variance and heterosis were realistic but with overdominance, the advantage of the dominance model decreased but was still significant. When overdominance was absent, the dominance model was slightly favored over the additive model, but the difference in response between the models increased as the number of quantitative trait loci increased. This reveals the importance of exploiting dominance even in the absence of overdominance. When there was no dominance, response to selection for the dominance model was as high as for the additive model, indicating robustness of the dominance model. The breed-specific allele model was inferior to the dominance model in all cases and to the additive model except when the dominance variance and heterosis were large and with overdominance. However, the advantage of the dominance model over the breed-specific allele model may decrease as differences in linkage disequilibrium between the breeds increase. Retraining is expected to reduce the advantage of the dominance model over the alternatives, because in general, the advantage becomes important only after five or six generations post-training. Conclusion Under dominance and without retraining, genomic selection based on the dominance model is superior to the additive model and the breed-specific allele model to maximize crossbred performance through purebred selection. PMID:23621868
Hao, Chenyang; Wang, Yuquan; Chao, Shiaoman; Li, Tian; Liu, Hongxia; Wang, Lanfen; Zhang, Xueyong
2017-01-30
A Chinese wheat mini core collection was genotyped using the wheat 9 K iSelect SNP array. Total 2420 and 2396 polymorphic SNPs were detected on the A and the B genome chromosomes, which formed 878 haplotype blocks. There were more blocks in the B genome, but the average block size was significantly (P < 0.05) smaller than those in the A genome. Intense selection (domestication and breeding) had a stronger effect on the A than on the B genome chromosomes. Based on the genetic pedigrees, many blocks can be traced back to a well-known Strampelli cross, which was made one century ago. Furthermore, polyploidization of wheat (both tetraploidization and hexaploidization) induced revolutionary changes in both the A and the B genomes, with a greater increase of gene diversity compared to their diploid ancestors. Modern breeding has dramatically increased diversity in the gene coding regions, though obvious blocks were formed on most of the chromosomes in both tetraploid and hexaploid wheats. Tag-SNP markers identified in this study can be used for marker assisted selection using haplotype blocks as a wheat breeding strategy. This strategy can also be employed to facilitate genome selection in other self-pollinating crop species.
De Kort, H; Vandepitte, K; Mergeay, J; Mijnsbrugge, K V; Honnay, O
2015-01-01
The evaluation of the molecular signatures of selection in species lacking an available closely related reference genome remains challenging, yet it may provide valuable fundamental insights into the capacity of populations to respond to environmental cues. We screened 25 native populations of the tree species Frangula alnus subsp. alnus (Rhamnaceae), covering three different geographical scales, for 183 annotated single-nucleotide polymorphisms (SNPs). Standard population genomic outlier screens were combined with individual-based and multivariate landscape genomic approaches to examine the strength of selection relative to neutral processes in shaping genomic variation, and to identify the main environmental agents driving selection. Our results demonstrate a more distinct signature of selection with increasing geographical distance, as indicated by the proportion of SNPs (i) showing exceptional patterns of genetic diversity and differentiation (outliers) and (ii) associated with climate. Both temperature and precipitation have an important role as selective agents in shaping adaptive genomic differentiation in F. alnus subsp. alnus, although their relative importance differed among spatial scales. At the ‘intermediate' and ‘regional' scales, where limited genetic clustering and high population diversity were observed, some indications of natural selection may suggest a major role for gene flow in safeguarding adaptability. High genetic diversity at loci under selection in particular, indicated considerable adaptive potential, which may nevertheless be compromised by the combined effects of climate change and habitat fragmentation. PMID:25944466
Weigel, K A; VanRaden, P M; Norman, H D; Grosu, H
2017-12-01
In the early 1900s, breed society herdbooks had been established and milk-recording programs were in their infancy. Farmers wanted to improve the productivity of their cattle, but the foundations of population genetics, quantitative genetics, and animal breeding had not been laid. Early animal breeders struggled to identify genetically superior families using performance records that were influenced by local environmental conditions and herd-specific management practices. Daughter-dam comparisons were used for more than 30 yr and, although genetic progress was minimal, the attention given to performance recording, genetic theory, and statistical methods paid off in future years. Contemporary (herdmate) comparison methods allowed more accurate accounting for environmental factors and genetic progress began to accelerate when these methods were coupled with artificial insemination and progeny testing. Advances in computing facilitated the implementation of mixed linear models that used pedigree and performance data optimally and enabled accurate selection decisions. Sequencing of the bovine genome led to a revolution in dairy cattle breeding, and the pace of scientific discovery and genetic progress accelerated rapidly. Pedigree-based models have given way to whole-genome prediction, and Bayesian regression models and machine learning algorithms have joined mixed linear models in the toolbox of modern animal breeders. Future developments will likely include elucidation of the mechanisms of genetic inheritance and epigenetic modification in key biological pathways, and genomic data will be used with data from on-farm sensors to facilitate precision management on modern dairy farms. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Genomic selection across multiple breeding cycles in applied bread wheat breeding.
Michel, Sebastian; Ametz, Christian; Gungor, Huseyin; Epure, Doru; Grausgruber, Heinrich; Löschenberger, Franziska; Buerstmayr, Hermann
2016-06-01
We evaluated genomic selection across five breeding cycles of bread wheat breeding. Bias of within-cycle cross-validation and methods for improving the prediction accuracy were assessed. The prospect of genomic selection has been frequently shown by cross-validation studies using the same genetic material across multiple environments, but studies investigating genomic selection across multiple breeding cycles in applied bread wheat breeding are lacking. We estimated the prediction accuracy of grain yield, protein content and protein yield of 659 inbred lines across five independent breeding cycles and assessed the bias of within-cycle cross-validation. We investigated the influence of outliers on the prediction accuracy and predicted protein yield by its components traits. A high average heritability was estimated for protein content, followed by grain yield and protein yield. The bias of the prediction accuracy using populations from individual cycles using fivefold cross-validation was accordingly substantial for protein yield (17-712 %) and less pronounced for protein content (8-86 %). Cross-validation using the cycles as folds aimed to avoid this bias and reached a maximum prediction accuracy of [Formula: see text] = 0.51 for protein content, [Formula: see text] = 0.38 for grain yield and [Formula: see text] = 0.16 for protein yield. Dropping outlier cycles increased the prediction accuracy of grain yield to [Formula: see text] = 0.41 as estimated by cross-validation, while dropping outlier environments did not have a significant effect on the prediction accuracy. Independent validation suggests, on the other hand, that careful consideration is necessary before an outlier correction is undertaken, which removes lines from the training population. Predicting protein yield by multiplying genomic estimated breeding values of grain yield and protein content raised the prediction accuracy to [Formula: see text] = 0.19 for this derived trait.
Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E
2017-07-01
The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.
Sunflower Hybrid Breeding: From Markers to Genomic Selection
Dimitrijevic, Aleksandra; Horn, Renate
2018-01-01
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits. PMID:29387071
Badke, Yvonne M; Bates, Ronald O; Ernst, Catherine W; Fix, Justin; Steibel, Juan P
2014-04-16
Genomic selection has the potential to increase genetic progress. Genotype imputation of high-density single-nucleotide polymorphism (SNP) genotypes can improve the cost efficiency of genomic breeding value (GEBV) prediction for pig breeding. Consequently, the objectives of this work were to: (1) estimate accuracy of genomic evaluation and GEBV for three traits in a Yorkshire population and (2) quantify the loss of accuracy of genomic evaluation and GEBV when genotypes were imputed under two scenarios: a high-cost, high-accuracy scenario in which only selection candidates were imputed from a low-density platform and a low-cost, low-accuracy scenario in which all animals were imputed using a small reference panel of haplotypes. Phenotypes and genotypes obtained with the PorcineSNP60 BeadChip were available for 983 Yorkshire boars. Genotypes of selection candidates were masked and imputed using tagSNP in the GeneSeek Genomic Profiler (10K). Imputation was performed with BEAGLE using 128 or 1800 haplotypes as reference panels. GEBV were obtained through an animal-centric ridge regression model using de-regressed breeding values as response variables. Accuracy of genomic evaluation was estimated as the correlation between estimated breeding values and GEBV in a 10-fold cross validation design. Accuracy of genomic evaluation using observed genotypes was high for all traits (0.65-0.68). Using genotypes imputed from a large reference panel (accuracy: R(2) = 0.95) for genomic evaluation did not significantly decrease accuracy, whereas a scenario with genotypes imputed from a small reference panel (R(2) = 0.88) did show a significant decrease in accuracy. Genomic evaluation based on imputed genotypes in selection candidates can be implemented at a fraction of the cost of a genomic evaluation using observed genotypes and still yield virtually the same accuracy. On the other side, using a very small reference panel of haplotypes to impute training animals and candidates for selection results in lower accuracy of genomic evaluation.
Stochastic model search with binary outcomes for genome-wide association studies
Malovini, Alberto; Puca, Annibale A; Bellazzi, Riccardo
2012-01-01
Objective The spread of case–control genome-wide association studies (GWASs) has stimulated the development of new variable selection methods and predictive models. We introduce a novel Bayesian model search algorithm, Binary Outcome Stochastic Search (BOSS), which addresses the model selection problem when the number of predictors far exceeds the number of binary responses. Materials and methods Our method is based on a latent variable model that links the observed outcomes to the underlying genetic variables. A Markov Chain Monte Carlo approach is used for model search and to evaluate the posterior probability of each predictor. Results BOSS is compared with three established methods (stepwise regression, logistic lasso, and elastic net) in a simulated benchmark. Two real case studies are also investigated: a GWAS on the genetic bases of longevity, and the type 2 diabetes study from the Wellcome Trust Case Control Consortium. Simulations show that BOSS achieves higher precisions than the reference methods while preserving good recall rates. In both experimental studies, BOSS successfully detects genetic polymorphisms previously reported to be associated with the analyzed phenotypes. Discussion BOSS outperforms the other methods in terms of F-measure on simulated data. In the two real studies, BOSS successfully detects biologically relevant features, some of which are missed by univariate analysis and the three reference techniques. Conclusion The proposed algorithm is an advance in the methodology for model selection with a large number of features. Our simulated and experimental results showed that BOSS proves effective in detecting relevant markers while providing a parsimonious model. PMID:22534080
Reproductive technologies combine well with genomic selection in dairy breeding programs.
Thomasen, J R; Willam, A; Egger-Danner, C; Sørensen, A C
2016-02-01
The objective of the present study was to examine whether genomic selection of females interacts with the use of reproductive technologies (RT) to increase annual monetary genetic gain (AMGG). This was tested using a factorial design with 3 factors: genomic selection of females (0 or 2,000 genotyped heifers per year), RT (0 or 50 donors selected at 14 mo of age for producing 10 offspring), and 2 reliabilities of genomic prediction. In addition, different strategies for use of RT and how strategies interact with the reliability of genomic prediction were investigated using stochastic simulation by varying (1) number of donors (25, 50, 100, 200), (2) number of calves born per donor (10 or 20), (3) age of donor (2 or 14 mo), and (4) number of sires (25, 50, 100, 200). In total, 72 different breeding schemes were investigated. The profitability of the different breeding strategies was evaluated by deterministic simulation by varying the costs of a born calf with reproductive technologies at levels of €500, €1,000, and €1,500. The results confirm our hypothesis that combining genomic selection of females with use of RT increases AMGG more than in a reference scheme without genomic selection in females. When the reliability of genomic prediction is high, the effect on rate of inbreeding (ΔF) is small. The study also demonstrates favorable interaction effects between the components of the breeder's equation (selection intensity, selection accuracy, generation interval) for the bull dam donor path, leading to higher AMGG. Increasing the donor program and number of born calves to achieve higher AMGG is associated with the undesirable effect of increased ΔF. This can be alleviated, however, by increasing the numbers of sires without compromising AMGG remarkably. For the major part of the investigated donor schemes, the investment in RT is profitable in dairy cattle populations, even at high levels of costs for RT. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Genomic selection for slaughter age in pigs using the Cox frailty model.
Santos, V S; Martins Filho, S; Resende, M D V; Azevedo, C F; Lopes, P S; Guimarães, S E F; Glória, L S; Silva, F F
2015-10-19
The aim of this study was to compare genomic selection methodologies using a linear mixed model and the Cox survival model. We used data from an F2 population of pigs, in which the response variable was the time in days from birth to the culling of the animal and the covariates were 238 markers [237 single nucleotide polymorphism (SNP) plus the halothane gene]. The data were corrected for fixed effects, and the accuracy of the method was determined based on the correlation of the ranks of predicted genomic breeding values (GBVs) in both models with the corrected phenotypic values. The analysis was repeated with a subset of SNP markers with largest absolute effects. The results were in agreement with the GBV prediction and the estimation of marker effects for both models for uncensored data and for normality. However, when considering censored data, the Cox model with a normal random effect (S1) was more appropriate. Since there was no agreement between the linear mixed model and the imputed data (L2) for the prediction of genomic values and the estimation of marker effects, the model S1 was considered superior as it took into account the latent variable and the censored data. Marker selection increased correlations between the ranks of predicted GBVs by the linear and Cox frailty models and the corrected phenotypic values, and 120 markers were required to increase the predictive ability for the characteristic analyzed.
Optimization of Swine Breeding Programs Using Genomic Selection with ZPLAN+
Lopez, B. M.; Kang, H. S.; Kim, T. H.; Viterbo, V. S.; Kim, H. S.; Na, C. S.; Seo, K. S.
2016-01-01
The objective of this study was to evaluate the present conventional selection program of a swine nucleus farm and compare it with a new selection strategy employing genomic enhanced breeding value (GEBV) as the selection criteria. The ZPLAN+ software was employed to calculate and compare the genetic gain, total cost, return and profit of each selection strategy. The first strategy reflected the current conventional breeding program, which was a progeny test system (CS). The second strategy was a selection scheme based strictly on genomic information (GS1). The third scenario was the same as GS1, but the selection by GEBV was further supplemented by the performance test (GS2). The last scenario was a mixture of genomic information and progeny tests (GS3). The results showed that the accuracy of the selection index of young boars of GS1 was 26% higher than that of CS. On the other hand, both GS2 and GS3 gave 31% higher accuracy than CS for young boars. The annual monetary genetic gain of GS1, GS2 and GS3 was 10%, 12%, and 11% higher, respectively, than that of CS. As expected, the discounted costs of genomic selection strategies were higher than those of CS. The costs of GS1, GS2 and GS3 were 35%, 73%, and 89% higher than those of CS, respectively, assuming a genotyping cost of $120. As a result, the discounted profit per animal of GS1 and GS2 was 8% and 2% higher, respectively, than that of CS while GS3 was 6% lower. Comparison among genomic breeding scenarios revealed that GS1 was more profitable than GS2 and GS3. The genomic selection schemes, especially GS1 and GS2, were clearly superior to the conventional scheme in terms of monetary genetic gain and profit. PMID:26954222
2017-01-01
The consequences of selection at linked sites are multiple and widespread across the genomes of most species. Here, I first review the main concepts behind models of selection and linkage in recombining genomes, present the difficulty in parametrizing these models simply as a reduction in effective population size (Ne) and discuss the predicted impact of recombination rates on levels of diversity across genomes. Arguments are then put forward in favour of using a model of selection and linkage with neutral and deleterious mutations (i.e. the background selection model, BGS) as a sensible null hypothesis for investigating the presence of other forms of selection, such as balancing or positive. I also describe and compare two studies that have generated high-resolution landscapes of the predicted consequences of selection at linked sites in Drosophila melanogaster. Both studies show that BGS can explain a very large fraction of the observed variation in diversity across the whole genome, thus supporting its use as null model. Finally, I identify and discuss a number of caveats and challenges in studies of genetic hitchhiking that have been often overlooked, with several of them sharing a potential bias towards overestimating the evidence supporting recent selective sweeps to the detriment of a BGS explanation. One potential source of bias is the analysis of non-equilibrium populations: it is precisely because models of selection and linkage predict variation in Ne across chromosomes that demographic dynamics are not expected to be equivalent chromosome- or genome-wide. Other challenges include the use of incomplete genome annotations, the assumption of temporally stable recombination landscapes, the presence of genes under balancing selection and the consequences of ignoring non-crossover (gene conversion) recombination events. This article is part of the themed issue ‘Evolutionary causes and consequences of recombination rate variation in sexual organisms’. PMID:29109230
Lu, Qing; Niu, Xiaojun; Zhang, Mengchen; Wang, Caihong; Xu, Qun; Feng, Yue; Yang, Yaolong; Wang, Shan; Yuan, Xiaoping; Yu, Hanyong; Wang, Yiping; Chen, Xiaoping; Liang, Xuanqiang; Wei, Xinghua
2018-01-01
Seed dormancy is an important agronomic trait affecting grain yield and quality because of pre-harvest germination and is influenced by both environmental and genetic factors. However, our knowledge of the factors controlling seed dormancy remains limited. To better reveal the molecular mechanism underlying this trait, a genome-wide association study was conducted in an indica-only population consisting of 453 accessions genotyped using 5,291 SNPs. Nine known and new significant SNPs were identified on eight chromosomes. These lead SNPs explained 34.9% of the phenotypic variation, and four of them were designed as dCAPS markers in the hope of accelerating molecular breeding. Moreover, a total of 212 candidate genes was predicted and eight candidate genes showed plant tissue-specific expression in expression profile data from different public bioinformatics databases. In particular, LOC_Os03g10110, which had a maize homolog involved in embryo development, was identified as a candidate regulator for further biological function investigations. Additionally, a polymorphism information content ratio method was used to screen improvement footprints and 27 selective sweeps were identified, most of which harbored domestication-related genes. Further studies suggested that three significant SNPs were adjacent to the candidate selection signals, supporting the accuracy of our genome-wide association study (GWAS) results. These findings show that genome-wide screening for selective sweeps can be used to identify new improvement-related DNA regions, although the phenotypes are unknown. This study enhances our knowledge of the genetic variation in seed dormancy, and the new dormancy-associated SNPs will provide real benefits in molecular breeding. PMID:29354150
Genome-wide signatures of convergent evolution in echolocating mammals
Parker, Joe; Tsagkogeorga, Georgia; Cotton, James A.; Liu, Yuan; Provero, Paolo; Stupka, Elia; Rossiter, Stephen J.
2013-01-01
Evolution is typically thought to proceed through divergence of genes, proteins, and ultimately phenotypes1-3. However, similar traits might also evolve convergently in unrelated taxa due to similar selection pressures4,5. Adaptive phenotypic convergence is widespread in nature, and recent results from a handful of genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level6-9. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution9,10 although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show for the first time that convergence is not a rare process restricted to a handful of loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four new bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Surprisingly we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognised. PMID:24005325
Seamless Genome Editing in Rice via Gene Targeting and Precise Marker Elimination.
Nishizawa-Yokoi, Ayako; Saika, Hiroaki; Toki, Seiichi
2016-01-01
Positive-negative selection using hygromycin phosphotransferase (hpt) and diphtheria toxin A-fragment (DT-A) as positive and negative selection markers, respectively, allows enrichment of cells harboring target genes modified via gene targeting (GT). We have developed a successful GT system employing positive-negative selection and subsequent precise marker excision via the piggyBac transposon derived from the cabbage looper moth to introduce desired modifications into target genes in the rice genome. This approach could be applied to the precision genome editing of almost all endogenous genes throughout the genome, at least in rice.
Genomic selection in sugar beet breeding populations.
Würschum, Tobias; Reif, Jochen C; Kraft, Thomas; Janssen, Geert; Zhao, Yusheng
2013-09-18
Genomic selection exploits dense genome-wide marker data to predict breeding values. In this study we used a large sugar beet population of 924 lines representing different germplasm types present in breeding populations: unselected segregating families and diverse lines from more advanced stages of selection. All lines have been intensively phenotyped in multi-location field trials for six agronomically important traits and genotyped with 677 SNP markers. We used ridge regression best linear unbiased prediction in combination with fivefold cross-validation and obtained high prediction accuracies for all except one trait. In addition, we investigated whether a calibration developed based on a training population composed of diverse lines is suited to predict the phenotypic performance within families. Our results show that the prediction accuracy is lower than that obtained within the diverse set of lines, but comparable to that obtained by cross-validation within the respective families. The results presented in this study suggest that a training population derived from intensively phenotyped and genotyped diverse lines from a breeding program does hold potential to build up robust calibration models for genomic selection. Taken together, our results indicate that genomic selection is a valuable tool and can thus complement the genomics toolbox in sugar beet breeding.
Combining genomic selection and gene identification for crop improvement
USDA-ARS?s Scientific Manuscript database
The use of genetic information to predict the value of individuals in plant breeding populations began about 40 years ago. The original paradigm was to identify genomic regions with outsize influence on a trait of economic value, then to use markers in that genomic region to select individuals carry...
Using genomics to enhance selection of novel traits in North American dairy cattle
USDA-ARS?s Scientific Manuscript database
Genomics offers new opportunities for the effective selection of novel traits. For traits such as mastitis resistance, hoof health, or the prediction of milk composition from mid-infrared (MIR) data, for example, enough records are usually available to carry out genomic evaluations using sire genoty...
Perspectives for genomic selection applications and research in plants
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) has created a lot of excitement and expectations in the animal and plant breeding research communities. In this review, we briefly describe how genomic prediction can be integrated into breeding efforts and point out achievements and areas where more research is needed. GS pro...
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted ‘glmnet’). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition. PMID:27537694
Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data.
Chan, Ariel W; Hamblin, Martha T; Jannink, Jean-Luc
2016-01-01
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordable, datasets derived from HTS methods suffer from sequencing error, alignment errors, and missing data, all of which introduce noise and uncertainty to variant discovery and genotype calling. Under such circumstances, meaningful analysis of the data is difficult. Our primary interest lies in the issue of how one can accurately infer or impute missing genotypes in HTS-derived datasets. Many of the existing genotype imputation algorithms and software packages were primarily developed by and optimized for the human genetics community, a field where a complete and accurate reference genome has been constructed and SNP arrays have, in large part, been the common genotyping platform. We set out to answer two questions: 1) can we use existing imputation methods developed by the human genetics community to impute missing genotypes in datasets derived from non-human species and 2) are these methods, which were developed and optimized to impute ascertained variants, amenable for imputation of missing genotypes at HTS-derived variants? We selected Beagle v.4, a widely used algorithm within the human genetics community with reportedly high accuracy, to serve as our imputation contender. We performed a series of cross-validation experiments, using GBS data collected from the species Manihot esculenta by the Next Generation (NEXTGEN) Cassava Breeding Project. NEXTGEN currently imputes missing genotypes in their datasets using a LASSO-penalized, linear regression method (denoted 'glmnet'). We selected glmnet to serve as a benchmark imputation method for this reason. We obtained estimates of imputation accuracy by masking a subset of observed genotypes, imputing, and calculating the sample Pearson correlation between observed and imputed genotype dosages at the site and individual level; computation time served as a second metric for comparison. We then set out to examine factors affecting imputation accuracy, such as levels of missing data, read depth, minor allele frequency (MAF), and reference panel composition.
Willing, Eva-Maria; Bentzen, Paul; van Oosterhout, Cock; Hoffmann, Margarete; Cable, Joanne; Breden, Felix; Weigel, Detlef; Dreyer, Christine
2010-03-01
Adaptation of guppies (Poecilia reticulata) to contrasting upland and lowland habitats has been extensively studied with respect to behaviour, morphology and life history traits. Yet population history has not been studied at the whole-genome level. Although single nucleotide polymorphisms (SNPs) are the most abundant form of variation in many genomes and consequently very informative for a genome-wide picture of standing natural variation in populations, genome-wide SNP data are rarely available for wild vertebrates. Here we use genetically mapped SNP markers to comprehensively survey genetic variation within and among naturally occurring guppy populations from a wide geographic range in Trinidad and Venezuela. Results from three different clustering methods, Neighbor-net, principal component analysis (PCA) and Bayesian analysis show that the population substructure agrees with geographic separation and largely with previously hypothesized patterns of historical colonization. Within major drainages (Caroni, Oropouche and Northern), populations are genetically similar, but those in different geographic regions are highly divergent from one another, with some indications of ancient shared polymorphisms. Clear genomic signatures of a previous introduction experiment were seen, and we detected additional potential admixture events. Headwater populations were significantly less heterozygous than downstream populations. Pairwise F(ST) values revealed marked differences in allele frequencies among populations from different regions, and also among populations within the same region. F(ST) outlier methods indicated some regions of the genome as being under directional selection. Overall, this study demonstrates the power of a genome-wide SNP data set to inform for studies on natural variation, adaptation and evolution of wild populations.
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
Pajuelo, Mónica J.; Eguiluz, María; Dahlstrom, Eric; Requena, David; Guzmán, Frank; Ramirez, Manuel; Sheen, Patricia; Frace, Michael; Sammons, Scott; Cama, Vitaliano; Anzick, Sarah; Bruno, Dan; Mahanty, Siddhartha; Wilkins, Patricia; Nash, Theodore; Gonzalez, Armando; García, Héctor H.; Gilman, Robert H.; Porcella, Steve; Zimic, Mirko
2015-01-01
Background Infections with Taenia solium are the most common cause of adult acquired seizures worldwide, and are the leading cause of epilepsy in developing countries. A better understanding of the genetic diversity of T. solium will improve parasite diagnostics and transmission pathways in endemic areas thereby facilitating the design of future control measures and interventions. Microsatellite markers are useful genome features, which enable strain typing and identification in complex pathogen genomes. Here we describe microsatellite identification and characterization in T. solium, providing information that will assist in global efforts to control this important pathogen. Methods For genome sequencing, T. solium cysts and proglottids were collected from Huancayo and Puno in Peru, respectively. Using next generation sequencing (NGS) and de novo assembly, we assembled two draft genomes and one hybrid genome. Microsatellite sequences were identified and 36 of them were selected for further analysis. Twenty T. solium isolates were collected from Tumbes in the northern region, and twenty from Puno in the southern region of Peru. The size-polymorphism of the selected microsatellites was determined with multi-capillary electrophoresis. We analyzed the association between microsatellite polymorphism and the geographic origin of the samples. Results The predicted size of the hybrid (proglottid genome combined with cyst genome) T. solium genome was 111 MB with a GC content of 42.54%. A total of 7,979 contigs (>1,000 nt) were obtained. We identified 9,129 microsatellites in the Puno-proglottid genome and 9,936 in the Huancayo-cyst genome, with 5 or more repeats, ranging from mono- to hexa-nucleotide. Seven microsatellites were polymorphic and 29 were monomorphic within the analyzed isolates. T. solium tapeworms were classified into two genetic groups that correlated with the North/South geographic origin of the parasites. Conclusions/Significance The availability of draft genomes for T. solium represents a significant step towards the understanding the biology of the parasite. We report here a set of T. solium polymorphic microsatellite markers that appear promising for genetic epidemiology studies. PMID:26697878
Lomonaco, Sara; Furumoto, Emily J; Loquasto, Joseph R; Morra, Patrizia; Grassi, Ausilia; Roberts, Robert F
2015-02-01
Identification at the genus, species, and strain levels is desirable when a probiotic microorganism is added to foods. Strains of Bifidobacterium animalis ssp. lactis (BAL) are commonly used worldwide in dairy products supplemented with probiotic strains. However, strain discrimination is difficult because of the high degree of genome identity (99.975%) between different genomes of this subspecies. Typing of monomorphic species can be carried out efficiently by targeting informative single nucleotide polymorphisms (SNP). Findings from a previous study analyzing both reference and commercial strains of BAL identified SNP that could be used to discriminate common strains into 8 groups. This paper describes development of a minisequencing assay based on the primer extension reaction (PER) targeting multiple SNP that can allow strain differentiation of BAL. Based on previous data, 6 informative SNP were selected for further testing, and a multiplex preliminary PCR was optimized to amplify the DNA regions containing the selected SNP. Extension primers (EP) annealing immediately adjacent to the selected SNP were developed and tested in simplex and multiplex PER to evaluate their performance. Twenty-five strains belonging to 9 distinct genomic clusters of B. animalis ssp. lactis were selected and analyzed using the developed minisequencing assay, simultaneously targeting the 6 selected SNP. Fragment analysis was subsequently carried out in duplicate and demonstrated that the assay yielded 8 specific profiles separating the most commonly used commercial strains. This novel multiplex PER approach provides a simple, rapid, flexible SNP-based subtyping method for proper characterization and identification of commercial probiotic strains of BAL from fermented dairy products. To assess the usefulness of this method, DNA was extracted from yogurt manufactured with and without the addition of B. animalis ssp. lactis BB-12. Extracted DNA was then subjected to the minisequencing protocol, resulting in a SNP profile matching the profile for the strain BB-12. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Bohra, Abhishek; Singh, Narendra P
2015-08-01
Unprecedented developments in legume genomics over the last decade have resulted in the acquisition of a wide range of modern genomic resources to underpin genetic improvement of grain legumes. The genome enabled insights direct investigators in various ways that primarily include unearthing novel structural variations, retrieving the lost genetic diversity, introducing novel/exotic alleles from wider gene pools, finely resolving the complex quantitative traits and so forth. To this end, ready availability of cost-efficient and high-density genotyping assays allows genome wide prediction to be increasingly recognized as the key selection criterion in crop breeding. Further, the high-dimensional measurements of agronomically significant phenotypes obtained by using new-generation screening techniques will empower reference based resequencing as well as allele mining and trait mapping methods to comprehensively associate genome diversity with the phenome scale variation. Besides stimulating the forward genetic systems, accessibility to precisely delineated genomic segments reveals novel candidates for reverse genetic techniques like targeted genome editing. The shifting paradigm in plant genomics in turn necessitates optimization of crop breeding strategies to enable the most efficient integration of advanced omics knowledge and tools. We anticipate that the crop improvement schemes will be bolstered remarkably with rational deployment of these genome-guided approaches, ultimately resulting in expanded plant breeding capacities and improved crop performance.
Maintenance of genetic diversity through plant-herbivore interactions
Gloss, Andrew D.; Dittrich, Anna C. Nelson; Goldman-Huertas, Benjamin; Whiteman, Noah K.
2013-01-01
Identifying the factors governing the maintenance of genetic variation is a central challenge in evolutionary biology. New genomic data, methods and conceptual advances provide increasing evidence that balancing selection, mediated by antagonistic species interactions, maintains functionally-important genetic variation within species and natural populations. Because diverse interactions between plants and herbivorous insects dominate terrestrial communities, they provide excellent systems to address this hypothesis. Population genomic studies of Arabidopsis thaliana and its relatives suggest spatial variation in herbivory maintains adaptive genetic variation controlling defense phenotypes, both within and among populations. Conversely, inter-species variation in plant defenses promotes adaptive genetic variation in herbivores. Emerging genomic model herbivores of Arabidopsis could illuminate how genetic variation in herbivores and plants interact simultaneously. PMID:23834766
Zueva, Ksenia J.; Lumme, Jaakko; Veselov, Alexey E.; Kent, Matthew P.; Lien, Sigbjørn; Primmer, Craig R.
2014-01-01
Mechanisms of host-parasite co-adaptation have long been of interest in evolutionary biology; however, determining the genetic basis of parasite resistance has been challenging. Current advances in genome technologies provide new opportunities for obtaining a genome-scale view of the action of parasite-driven natural selection in wild populations and thus facilitate the search for specific genomic regions underlying inter-population differences in pathogen response. European populations of Atlantic salmon (Salmo salar L.) exhibit natural variance in susceptibility levels to the ectoparasite Gyrodactylus salaris Malmberg 1957, ranging from resistance to extreme susceptibility, and are therefore a good model for studying the evolution of virulence and resistance. However, distinguishing the molecular signatures of genetic drift and environment-associated selection in small populations such as land-locked Atlantic salmon populations presents a challenge, specifically in the search for pathogen-driven selection. We used a novel genome-scan analysis approach that enabled us to i) identify signals of selection in salmon populations affected by varying levels of genetic drift and ii) separate potentially selected loci into the categories of pathogen (G. salaris)-driven selection and selection acting upon other environmental characteristics. A total of 4631 single nucleotide polymorphisms (SNPs) were screened in Atlantic salmon from 12 different northern European populations. We identified three genomic regions potentially affected by parasite-driven selection, as well as three regions presumably affected by salinity-driven directional selection. Functional annotation of candidate SNPs is consistent with the role of the detected genomic regions in immune defence and, implicitly, in osmoregulation. These results provide new insights into the genetic basis of pathogen susceptibility in Atlantic salmon and will enable future searches for the specific genes involved. PMID:24670947
Zueva, Ksenia J; Lumme, Jaakko; Veselov, Alexey E; Kent, Matthew P; Lien, Sigbjørn; Primmer, Craig R
2014-01-01
Mechanisms of host-parasite co-adaptation have long been of interest in evolutionary biology; however, determining the genetic basis of parasite resistance has been challenging. Current advances in genome technologies provide new opportunities for obtaining a genome-scale view of the action of parasite-driven natural selection in wild populations and thus facilitate the search for specific genomic regions underlying inter-population differences in pathogen response. European populations of Atlantic salmon (Salmo salar L.) exhibit natural variance in susceptibility levels to the ectoparasite Gyrodactylus salaris Malmberg 1957, ranging from resistance to extreme susceptibility, and are therefore a good model for studying the evolution of virulence and resistance. However, distinguishing the molecular signatures of genetic drift and environment-associated selection in small populations such as land-locked Atlantic salmon populations presents a challenge, specifically in the search for pathogen-driven selection. We used a novel genome-scan analysis approach that enabled us to i) identify signals of selection in salmon populations affected by varying levels of genetic drift and ii) separate potentially selected loci into the categories of pathogen (G. salaris)-driven selection and selection acting upon other environmental characteristics. A total of 4631 single nucleotide polymorphisms (SNPs) were screened in Atlantic salmon from 12 different northern European populations. We identified three genomic regions potentially affected by parasite-driven selection, as well as three regions presumably affected by salinity-driven directional selection. Functional annotation of candidate SNPs is consistent with the role of the detected genomic regions in immune defence and, implicitly, in osmoregulation. These results provide new insights into the genetic basis of pathogen susceptibility in Atlantic salmon and will enable future searches for the specific genes involved.
Nasrullah, Izza; Butt, Azeem M; Tahir, Shifa; Idrees, Muhammad; Tong, Yigang
2015-08-26
The Marburg virus (MARV) has a negative-sense single-stranded RNA genome, belongs to the family Filoviridae, and is responsible for several outbreaks of highly fatal hemorrhagic fever. Codon usage patterns of viruses reflect a series of evolutionary changes that enable viruses to shape their survival rates and fitness toward the external environment and, most importantly, their hosts. To understand the evolution of MARV at the codon level, we report a comprehensive analysis of synonymous codon usage patterns in MARV genomes. Multiple codon analysis approaches and statistical methods were performed to determine overall codon usage patterns, biases in codon usage, and influence of various factors, including mutation pressure, natural selection, and its two hosts, Homo sapiens and Rousettus aegyptiacus. Nucleotide composition and relative synonymous codon usage (RSCU) analysis revealed that MARV shows mutation bias and prefers U- and A-ended codons to code amino acids. Effective number of codons analysis indicated that overall codon usage among MARV genomes is slightly biased. The Parity Rule 2 plot analysis showed that GC and AU nucleotides were not used proportionally which accounts for the presence of natural selection. Codon usage patterns of MARV were also found to be influenced by its hosts. This indicates that MARV have evolved codon usage patterns that are specific to both of its hosts. Moreover, selection pressure from R. aegyptiacus on the MARV RSCU patterns was found to be dominant compared with that from H. sapiens. Overall, mutation pressure was found to be the most important and dominant force that shapes codon usage patterns in MARV. To our knowledge, this is the first detailed codon usage analysis of MARV and extends our understanding of the mechanisms that contribute to codon usage and evolution of MARV.
SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling.
Boloc, Daniel; Rodríguez, Natalia; Gassó, Patricia; Abril, Josep F; Bernardo, Miquel; Lafuente, Amalia; Mas, Sergi
2017-09-14
The selection of a Single Nucleotide Polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centred screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies. SiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones, and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos). SiNoPsis is freely available at https://compgen.bio.ub.edu/SiNoPsis /. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Bolormaa, S; Pryce, J E; Kemper, K; Savin, K; Hayes, B J; Barendse, W; Zhang, Y; Reich, C M; Mason, B A; Bunch, R J; Harrison, B E; Reverter, A; Herd, R M; Tier, B; Graser, H-U; Goddard, M E
2013-07-01
The aim of this study was to assess the accuracy of genomic predictions for 19 traits including feed efficiency, growth, and carcass and meat quality traits in beef cattle. The 10,181 cattle in our study had real or imputed genotypes for 729,068 SNP although not all cattle were measured for all traits. Animals included Bos taurus, Brahman, composite, and crossbred animals. Genomic EBV (GEBV) were calculated using 2 methods of genomic prediction [BayesR and genomic BLUP (GBLUP)] either using a common training dataset for all breeds or using a training dataset comprising only animals of the same breed. Accuracies of GEBV were assessed using 5-fold cross-validation. The accuracy of genomic prediction varied by trait and by method. Traits with a large number of recorded and genotyped animals and with high heritability gave the greatest accuracy of GEBV. Using GBLUP, the average accuracy was 0.27 across traits and breeds, but the accuracies between breeds and between traits varied widely. When the training population was restricted to animals from the same breed as the validation population, GBLUP accuracies declined by an average of 0.04. The greatest decline in accuracy was found for the 4 composite breeds. The BayesR accuracies were greater by an average of 0.03 than GBLUP accuracies, particularly for traits with known genes of moderate to large effect mutations segregating. The accuracies of 0.43 to 0.48 for IGF-I traits were among the greatest in the study. Although accuracies are low compared with those observed in dairy cattle, genomic selection would still be beneficial for traits that are hard to improve by conventional selection, such as tenderness and residual feed intake. BayesR identified many of the same quantitative trait loci as a genomewide association study but appeared to map them more precisely. All traits appear to be highly polygenic with thousands of SNP independently associated with each trait.
Freytag, Saskia; Manitz, Juliane; Schlather, Martin; Kneib, Thomas; Amos, Christopher I.; Risch, Angela; Chang-Claude, Jenny; Heinrich, Joachim; Bickeböller, Heike
2014-01-01
Biological pathways provide rich information and biological context on the genetic causes of complex diseases. The logistic kernel machine test integrates prior knowledge on pathways in order to analyze data from genome-wide association studies (GWAS). Here, the kernel converts genomic information of two individuals to a quantitative value reflecting their genetic similarity. With the selection of the kernel one implicitly chooses a genetic effect model. Like many other pathway methods, none of the available kernels accounts for topological structure of the pathway or gene-gene interaction types. However, evidence indicates that connectivity and neighborhood of genes are crucial in the context of GWAS, because genes associated with a disease often interact. Thus, we propose a novel kernel that incorporates the topology of pathways and information on interactions. Using simulation studies, we demonstrate that the proposed method maintains the type I error correctly and can be more effective in the identification of pathways associated with a disease than non-network-based methods. We apply our approach to genome-wide association case control data on lung cancer and rheumatoid arthritis. We identify some promising new pathways associated with these diseases, which may improve our current understanding of the genetic mechanisms. PMID:24434848
Genome-wide gene order distances support clustering the gram-positive bacteria
House, Christopher H.; Pellegrini, Matteo; Fitz-Gibbon, Sorel T.
2015-01-01
Initially using 143 genomes, we developed a method for calculating the pair-wise distance between prokaryotic genomes using a Monte Carlo method to estimate the conservation of gene order. The method was based on repeatedly selecting five or six non-adjacent random orthologs from each of two genomes and determining if the chosen orthologs were in the same order. The raw distances were then corrected for gene order convergence using an adaptation of the Jukes-Cantor model, as well as using the common distance correction D′ = −ln(1-D). First, we compared the distances found via the order of six orthologs to distances found based on ortholog gene content and small subunit rRNA sequences. The Jukes-Cantor gene order distances are reasonably well correlated with the divergence of rRNA (R2 = 0.24), especially at rRNA Jukes-Cantor distances of less than 0.2 (R2 = 0.52). Gene content is only weakly correlated with rRNA divergence (R2 = 0.04) over all distances, however, it is especially strongly correlated at rRNA Jukes-Cantor distances of less than 0.1 (R2 = 0.67). This initial work suggests that gene order may be useful in conjunction with other methods to help understand the relatedness of genomes. Using the gene order distances in 143 genomes, the relations of prokaryotes were studied using neighbor joining and agreement subtrees. We then repeated our study of the relations of prokaryotes using gene order in 172 complete genomes better representing a wider-diversity of prokaryotes. Consistently, our trees show the Actinobacteria as a sister group to the bulk of the Firmicutes. In fact, the robustness of gene order support was found to be considerably greater for uniting these two phyla than for uniting any of the proteobacterial classes together. The results are supportive of the idea that Actinobacteria and Firmicutes are closely related, which in turn implies a single origin for the gram-positive cell. PMID:25653643
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ostrander, E.A.; Sprague, G.F. Jr.; Rine, J.
1993-04-01
A large block of simple sequence repeat (SSR) polymorphisms for the dog genome has been isolated and characterized. Screening of primary libraries by conventional hybridization methods as well as by screening of enriched marker-selected libraries led to the isolation of a large number of genomic clones that contained (CA)[sub n] repeats. The sequences of 101 clones showed that the size and complexity of (CA)[sub n] repeats in the dog genome were similar to those reported for these markers in the human genome. Detailed analysis of a representative subset of these markers revealed that most markers were moderately to highly polymorphic,more » with PIC values exceeding 0.70 for 33% of the markers tested. An association between higher PIC values and markers containing longer (CA)[sub n] repeats was observed in these studies, as previously noted for similar markers in the human genome. A list of primer sequences that tag each characterized marker is provided, and a comprehensive system of nomenclature for the dog genome is suggested. 28 refs., 4 figs., 2 tabs.« less
Wang, Jing; Street, Nathaniel R.; Scofield, Douglas G.; Ingvarsson, Pär K.
2016-01-01
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. PMID:26721855
Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K
2016-03-01
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
Zhu, Bo; Niu, Hong; Zhang, Wengang; Wang, Zezhao; Liang, Yonghu; Guan, Long; Guo, Peng; Chen, Yan; Zhang, Lupei; Guo, Yong; Ni, Heming; Gao, Xue; Gao, Huijiang; Xu, Lingyang; Li, Junya
2017-06-14
Fatty acid composition of muscle is an important trait contributing to meat quality. Recently, genome-wide association study (GWAS) has been extensively used to explore the molecular mechanism underlying important traits in cattle. In this study, we performed GWAS using high density SNP array to analyze the association between SNPs and fatty acids and evaluated the accuracy of genomic prediction for fatty acids in Chinese Simmental cattle. Using the BayesB method, we identified 35 and 7 regions in Chinese Simmental cattle that displayed significant associations with individual fatty acids and fatty acid groups, respectively. We further obtained several candidate genes which may be involved in fatty acid biosynthesis including elongation of very long chain fatty acids protein 5 (ELOVL5), fatty acid synthase (FASN), caspase 2 (CASP2) and thyroglobulin (TG). Specifically, we obtained strong evidence of association signals for one SNP located at 51.3 Mb for FASN using Genome-wide Rapid Association Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approaches. Also, region-based association test identified multiple SNPs within FASN and ELOVL5 for C14:0. In addition, our result revealed that the effectiveness of genomic prediction for fatty acid composition using BayesB was slightly superior over GBLUP in Chinese Simmental cattle. We identified several significantly associated regions and loci which can be considered as potential candidate markers for genomics-assisted breeding programs. Using multiple methods, our results revealed that FASN and ELOVL5 are associated with fatty acids with strong evidence. Our finding also suggested that it is feasible to perform genomic selection for fatty acids in Chinese Simmental cattle.
Analysis of Genome-Wide Association Studies with Multiple Outcomes Using Penalization
Liu, Jin; Huang, Jian; Ma, Shuangge
2012-01-01
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods. PMID:23272092
2014-01-01
Background Breeders in the allo-octoploid strawberry currently make little use of molecular marker tools. As a first step of a QTL discovery project on fruit quality traits and resistance to soil-borne pathogens such as Phytophthora cactorum and Verticillium we built a genome-wide SSR linkage map for the cross Holiday x Korona. We used the previously published MADCE method to obtain full haplotype information for both of the parental cultivars, facilitating in-depth studies on their genomic organisation. Results The linkage map incorporates 508 segregating loci and represents each of the 28 chromosome pairs of octoploid strawberry, spanning an estimated length of 2050 cM. The sub-genomes are denoted according to their sequence divergence from F. vesca as revealed by marker performance. The map revealed high overall synteny between the sub-genomes, but also revealed two large inversions on LG2C and LG2D, of which the latter was confirmed using a separate mapping population. We discovered interesting breeding features within the parental cultivars by in-depth analysis of our haplotype data. The linkage map-derived homozygosity level of Holiday was similar to the pedigree-derived inbreeding level (33% and 29%, respectively). For Korona we found that the observed homozygosity level was over three times higher than expected from the pedigree (13% versus 3.6%). This could indicate selection pressure on genes that have favourable effects in homozygous states. The level of kinship between Holiday and Korona derived from our linkage map was 2.5 times higher than the pedigree-derived value. This large difference could be evidence of selection pressure enacted by strawberry breeders towards specific haplotypes. Conclusion The obtained SSR linkage map provides a good base for QTL discovery. It also provides the first biologically relevant basis for the discernment and notation of sub-genomes. For the first time, we revealed genomic rearrangements that were verified in a separate mapping population. We believe that haplotype information will become increasingly important in identifying marker-trait relationships and regions that are under selection pressure within breeding material. Our attempt at providing a biological basis for the discernment of sub-genomes warrants follow-up studies to streamline the naming of the sub-genomes among different octoploid strawberry maps. PMID:24581289
van Dijk, Thijs; Pagliarani, Giulia; Pikunova, Anna; Noordijk, Yolanda; Yilmaz-Temel, Hulya; Meulenbroek, Bert; Visser, Richard G F; van de Weg, Eric
2014-03-01
Breeders in the allo-octoploid strawberry currently make little use of molecular marker tools. As a first step of a QTL discovery project on fruit quality traits and resistance to soil-borne pathogens such as Phytophthora cactorum and Verticillium we built a genome-wide SSR linkage map for the cross Holiday x Korona. We used the previously published MADCE method to obtain full haplotype information for both of the parental cultivars, facilitating in-depth studies on their genomic organisation. The linkage map incorporates 508 segregating loci and represents each of the 28 chromosome pairs of octoploid strawberry, spanning an estimated length of 2050 cM. The sub-genomes are denoted according to their sequence divergence from F. vesca as revealed by marker performance. The map revealed high overall synteny between the sub-genomes, but also revealed two large inversions on LG2C and LG2D, of which the latter was confirmed using a separate mapping population. We discovered interesting breeding features within the parental cultivars by in-depth analysis of our haplotype data. The linkage map-derived homozygosity level of Holiday was similar to the pedigree-derived inbreeding level (33% and 29%, respectively). For Korona we found that the observed homozygosity level was over three times higher than expected from the pedigree (13% versus 3.6%). This could indicate selection pressure on genes that have favourable effects in homozygous states. The level of kinship between Holiday and Korona derived from our linkage map was 2.5 times higher than the pedigree-derived value. This large difference could be evidence of selection pressure enacted by strawberry breeders towards specific haplotypes. The obtained SSR linkage map provides a good base for QTL discovery. It also provides the first biologically relevant basis for the discernment and notation of sub-genomes. For the first time, we revealed genomic rearrangements that were verified in a separate mapping population. We believe that haplotype information will become increasingly important in identifying marker-trait relationships and regions that are under selection pressure within breeding material. Our attempt at providing a biological basis for the discernment of sub-genomes warrants follow-up studies to streamline the naming of the sub-genomes among different octoploid strawberry maps.
Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P
2016-11-21
Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.
Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S
2015-06-01
This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Bottari, Benedetta; Felis, Giovanna E; Salvetti, Elisa; Castioni, Anna; Campedelli, Ilenia; Torriani, Sandra; Bernini, Valentina; Gatti, Monica
2017-07-01
Lactobacillus casei,Lactobacillus paracasei and Lactobacillusrhamnosus form a closely related taxonomic group (the L. casei group) within the facultatively heterofermentative lactobacilli. Strains of these species have been used for a long time as probiotics in a wide range of products, and they represent the dominant species of nonstarter lactic acid bacteria in ripened cheeses, where they contribute to flavour development. The close genetic relationship among those species, as well as the similarity of biochemical properties of the strains, hinders the development of an adequate selective method to identify these bacteria. Despite this being a hot topic, as demonstrated by the large amount of literature about it, the results of different proposed identification methods are often ambiguous and unsatisfactory. The aim of this study was to develop a more robust species-specific identification assay for differentiating the species of the L. casei group. A taxonomy-driven comparative genomic analysis was carried out to select the potential target genes whose similarity could better reflect genome-wide diversity. The gene mutL appeared to be the most promising one and, therefore, a novel species-specific multiplex PCR assay was developed to rapidly and effectively distinguish L. casei, L. paracasei and L. rhamnosus strains. The analysis of a collection of 76 wild dairy isolates, previously identified as members of the L. casei group combining the results of multiple approaches, revealed that the novel designed primers, especially in combination with already existing ones, were able to improve the discrimination power at the species level and reveal previously undiscovered intraspecific biodiversity.
Kügler, Jonas; Nieswandt, Simone; Gerlach, Gerald F; Meens, Jochen; Schirrmann, Thomas; Hust, Michael
2008-09-01
The identification of immunogenic polypeptides of pathogens is helpful for the development of diagnostic assays and therapeutic applications like vaccines. Routinely, these proteins are identified by two-dimensional polyacrylamide gel electrophoresis and Western blot using convalescent serum, followed by mass spectrometry. This technology, however, is limited, because low or differentially expressed proteins, e.g. dependent on pathogen-host interaction, cannot be identified. In this work, we developed and improved a M13 genomic phage display-based method for the selection of immunogenic polypeptides of Mycoplasma hyopneumoniae, a pathogen causing porcine enzootic pneumonia. The fragmented genome of M. hyopneumoniae was cloned into a phage display vector, and the genomic library was packaged using the helperphage Hyperphage to enrich open reading frames (ORFs). Afterwards, the phage display library was screened by panning using convalescent serum. The analysis of individual phage clones resulted in the identification of five genes encoding immunogenic proteins, only two of which had been previously identified and described as immunogenic. This M13 genomic phage display, directly combining ORF enrichment and the presentation of the corresponding polypeptide on the phage surface, complements proteome-based methods for the identification of immunogenic polypeptides and is particularly well suited for the use in mycoplasma species.
Exploiting Wild Relatives for Genomics-assisted Breeding of Perennial Crops
Migicovsky, Zoë; Myles, Sean
2017-01-01
Perennial crops are vital contributors to global food production and nutrition. However, the breeding of new perennial crops is an expensive and time-consuming process due to the large size and lengthy juvenile phase of many species. Genomics provides a valuable tool for improving the efficiency of breeding by allowing progeny possessing a trait of interest to be selected at the seed or seedling stage through marker-assisted selection (MAS). The benefits of MAS to a breeder are greatest when the targeted species takes a long time to reach maturity and is expensive to grow and maintain. Thus, MAS holds particular promise in perennials since they are often costly and time-consuming to grow to maturity and evaluate. Well-characterized germplasm that breeders can tap into for improving perennials is often limited in genetic diversity. Wild relatives are a largely untapped source of desirable traits including disease resistance, fruit quality, and rootstock characteristics. This review focuses on the use of genomics-assisted breeding in perennials, especially as it relates to the introgression of useful traits from wild relatives. The identification of genetic markers predictive of beneficial phenotypes derived from wild relatives is hampered by genomic tools designed for domesticated species that are often ill-suited for use in wild relatives. There is therefore an urgent need for better genomic resources from wild relatives. A further barrier to exploiting wild diversity through genomics is the phenotyping bottleneck: well-powered genetic mapping requires accurate and cost-effective characterization of large collections of diverse wild germplasm. While genomics will always be used in combination with traditional breeding methods, it is a powerful tool for accelerating the speed and reducing the costs of breeding while harvesting the potential of wild relatives for improving perennial crops. PMID:28421095
Exploiting Wild Relatives for Genomics-assisted Breeding of Perennial Crops.
Migicovsky, Zoë; Myles, Sean
2017-01-01
Perennial crops are vital contributors to global food production and nutrition. However, the breeding of new perennial crops is an expensive and time-consuming process due to the large size and lengthy juvenile phase of many species. Genomics provides a valuable tool for improving the efficiency of breeding by allowing progeny possessing a trait of interest to be selected at the seed or seedling stage through marker-assisted selection (MAS). The benefits of MAS to a breeder are greatest when the targeted species takes a long time to reach maturity and is expensive to grow and maintain. Thus, MAS holds particular promise in perennials since they are often costly and time-consuming to grow to maturity and evaluate. Well-characterized germplasm that breeders can tap into for improving perennials is often limited in genetic diversity. Wild relatives are a largely untapped source of desirable traits including disease resistance, fruit quality, and rootstock characteristics. This review focuses on the use of genomics-assisted breeding in perennials, especially as it relates to the introgression of useful traits from wild relatives. The identification of genetic markers predictive of beneficial phenotypes derived from wild relatives is hampered by genomic tools designed for domesticated species that are often ill-suited for use in wild relatives. There is therefore an urgent need for better genomic resources from wild relatives. A further barrier to exploiting wild diversity through genomics is the phenotyping bottleneck: well-powered genetic mapping requires accurate and cost-effective characterization of large collections of diverse wild germplasm. While genomics will always be used in combination with traditional breeding methods, it is a powerful tool for accelerating the speed and reducing the costs of breeding while harvesting the potential of wild relatives for improving perennial crops.
Epigenomics of Development in Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strauss, Steve; Freitag, Michael; Mockler, Todd
2013-01-10
We conducted research to determine the role of epigenetic modifications during tree development using poplar (Populus trichocarpa), a model woody feedstock species. Using methylated DNA immunoprecipitation (MeDIP) or chromatin immunoprecipitation (ChIP), followed by high-throughput sequencing, we are analyzed DNA and histone methylation patterns in the P. trichocarpa genome in relation to four biological processes: bud dormancy and release, mature organ maintenance, in vitro organogenesis, and methylation suppression. Our project is now completed. We have 1) produced 22 transgenic events for a gene involved in DNA methylation suppression and studied its phenotypic consequences; 2) completed sequencing of methylated DNA from elevenmore » target tissues in wildtype P. trichocarpa; 3) updated our customized poplar genome browser using the open-source software tools (2.13) and (V2.2) of the P. trichocarpa genome; 4) produced summary data for genome methylation in P. trichocarpa, including distribution of methylation across chromosomes and in and around genes; 5) employed bioinformatic and statistical methods to analyze differences in methylation patterns among tissue types; and 6) used bisulfite sequencing of selected target genes to confirm bioinformatics and sequencing results, and gain a higher-resolution view of methylation at selected genes 7) compared methylation patterns to expression using available microarray data. Our main findings of biological significance are the identification of extensive regions of the genome that display developmental variation in DNA methylation; highly distinctive gene-associated methylation profiles in reproductive tissues, particularly male catkins; a strong whole genome/all tissue inverse association of methylation at gene bodies and promoters with gene expression; a lack of evidence that tissue specificity of gene expression is associated with gene methylation; and evidence that genome methylation is a significant impediment to tissue dedifferentiation and redifferentiation in vitro.« less
Precision genome editing using CRISPR-Cas9 and linear repair templates in C. elegans.
Paix, Alexandre; Folkmann, Andrew; Seydoux, Geraldine
2017-05-15
The ability to introduce targeted edits in the genome of model organisms is revolutionizing the field of genetics. State-of-the-art methods for precision genome editing use RNA-guided endonucleases to create double-strand breaks (DSBs) and DNA templates containing the edits to repair the DSBs. Following this strategy, we have developed a protocol to create precise edits in the C. elegans genome. The protocol takes advantage of two innovations to improve editing efficiency: direct injection of CRISPR-Cas9 ribonucleoprotein complexes and use of linear DNAs with short homology arms as repair templates. The protocol requires no cloning or selection, and can be used to generate base and gene-size edits in just 4days. Point mutations, insertions, deletions and gene replacements can all be created using the same experimental pipeline. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Moon, Suyun; Lee, Hwa-Yong; Shim, Donghwan; Kim, Myungkil; Ka, Kang-Hyeon; Ryoo, Rhim; Ko, Han-Gyu; Koo, Chang-Duck; Chung, Jong-Wook; Ryu, Hojin
2017-06-01
Sixteen genomic DNA simple sequence repeat (SSR) markers of Lentinula edodes were developed from 205 SSR motifs present in 46.1-Mb long L. edodes genome sequences. The number of alleles ranged from 3-14 and the major allele frequency was distributed from 0.17-0.96. The values of observed and expected heterozygosity ranged from 0.00-0.76 and 0.07-0.90, respectively. The polymorphic information content value ranged from 0.07-0.89. A dendrogram, based on 16 SSR markers clustered by the paired hierarchical clustering' method, showed that 33 shiitake cultivars could be divided into three major groups and successfully identified. These SSR markers will contribute to the efficient breeding of this species by providing diversity in shiitake varieties. Furthermore, the genomic information covered by the markers can provide a valuable resource for genetic linkage map construction, molecular mapping, and marker-assisted selection in the shiitake mushroom.
Castrillo, Juan I; Lista, Simone; Hampel, Harald; Ritchie, Craig W
2018-01-01
Alzheimer's disease (AD) is a complex multifactorial disease, involving a combination of genomic, interactome, and environmental factors, with essential participation of (a) intrinsic genomic susceptibility and (b) a constant dynamic interplay between impaired pathways and central homeostatic networks of nerve cells. The proper investigation of the complexity of AD requires new holistic systems-level approaches, at both the experimental and computational level. Systems biology methods offer the potential to unveil new fundamental insights, basic mechanisms, and networks and their interplay. These may lead to the characterization of mechanism-based molecular signatures, and AD hallmarks at the earliest molecular and cellular levels (and beyond), for characterization of AD subtypes and stages, toward targeted interventions according to the evolving precision medicine paradigm. In this work, an update on advanced systems biology methods and strategies for holistic studies of multifactorial diseases-particularly AD-is presented. This includes next-generation genomics, neuroimaging and multi-omics methods, experimental and computational approaches, relevant disease models, and latest genome editing and single-cell technologies. Their progressive incorporation into basic research, cohort studies, and trials is beginning to provide novel insights into AD essential mechanisms, molecular signatures, and markers toward mechanism-based classification and staging, and tailored interventions. Selected methods which can be applied in cohort studies and trials, with the European Prevention of Alzheimer's Dementia (EPAD) project as a reference example, are presented and discussed.
Estimation of genomic breeding values for milk yield in UK dairy goats.
Mucha, S; Mrode, R; MacLaren-Lee, I; Coffey, M; Conington, J
2015-11-01
The objective of this study was to estimate genomic breeding values for milk yield in crossbred dairy goats. The research was based on data provided by 2 commercial goat farms in the UK comprising 590,409 milk yield records on 14,453 dairy goats kidding between 1987 and 2013. The population was created by crossing 3 breeds: Alpine, Saanen, and Toggenburg. In each generation the best performing animals were selected for breeding, and as a result, a synthetic breed was created. The pedigree file contained 30,139 individuals, of which 2,799 were founders. The data set contained test-day records of milk yield, lactation number, farm, age at kidding, and year and season of kidding. Data on milk composition was unavailable. In total 1,960 animals were genotyped with the Illumina 50K caprine chip. Two methods for estimation of genomic breeding value were compared-BLUP at the single nucleotide polymorphism level (BLUP-SNP) and single-step BLUP. The highest accuracy of 0.61 was obtained with single-step BLUP, and the lowest (0.36) with BLUP-SNP. Linkage disequilibrium (r(2), the squared correlation of the alleles at 2 loci) at 50 kb (distance between 2 SNP) was 0.18. This is the first attempt to implement genomic selection in UK dairy goats. Results indicate that the single-step method provides the highest accuracy for populations with a small number of genotyped individuals, where the number of genotyped males is low and females are predominant in the reference population. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Several PCR methods have recently been developed to identify fecal contamination in surface waters. In all cases, researchers have relied on one gene or one microorganism for selection of host specific markers. Here, we describe the application of a genome fragment enrichment met...
Several PCR methods have recently been developed to identify fecal contamination in surface waters. In all cases, researchers have relied on one gene or one microorganism for selection of host specific markers. Here, we describe the application of a genome fragment enrichment met...
Hoffmann, Thomas J; Zhan, Yiping; Kvale, Mark N; Hesselson, Stephanie E; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H; Smethurst, David; Somkin, Carol P; Van den Eeden, Stephen K; Walter, Larry; Webster, Teresa; Whitmer, Rachel A; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil
2011-12-01
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies. Copyright © 2011 Elsevier Inc. All rights reserved.
A novel sgRNA selection system for CRISPR-Cas9 in mammalian cells.
Zhang, Haiwei; Zhang, Xixi; Fan, Cunxian; Xie, Qun; Xu, Chengxian; Zhao, Qun; Liu, Yongbo; Wu, Xiaoxia; Zhang, Haibing
2016-03-18
CRISPR-Cas9 mediated genome editing system has been developed as a powerful tool for elucidating the function of genes through genetic engineering in multiple cells and organisms. This system takes advantage of a single guide RNA (sgRNA) to direct the Cas9 endonuclease to a specific DNA site to generate mutant alleles. Since the targeting efficiency of sgRNAs to distinct DNA loci can vary widely, there remains a need for a rapid, simple and efficient sgRNA selection method to overcome this limitation of the CRISPR-Cas9 system. Here we report a novel system to select sgRNA with high efficacy for DNA sequence modification by a luciferase assay. Using this sgRNAs selection system, we further demonstrated successful examples of one sgRNA for generating one gene knockout cell lines where the targeted genes are shown to be functionally defective. This system provides a potential application to optimize the sgRNAs in different species and to generate a powerful CRISPR-Cas9 genome-wide screening system with minimum amounts of sgRNAs. Copyright © 2016 Elsevier Inc. All rights reserved.
Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad
2016-01-01
ABSTRACT Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/. PMID:27071037
Genetic signatures of natural selection in a model invasive ascidian
NASA Astrophysics Data System (ADS)
Lin, Yaping; Chen, Yiyong; Yi, Changho; Fong, Jonathan J.; Kim, Won; Rius, Marc; Zhan, Aibin
2017-03-01
Invasive species represent promising models to study species’ responses to rapidly changing environments. Although local adaptation frequently occurs during contemporary range expansion, the associated genetic signatures at both population and genomic levels remain largely unknown. Here, we use genome-wide gene-associated microsatellites to investigate genetic signatures of natural selection in a model invasive ascidian, Ciona robusta. Population genetic analyses of 150 individuals sampled in Korea, New Zealand, South Africa and Spain showed significant genetic differentiation among populations. Based on outlier tests, we found high incidence of signatures of directional selection at 19 loci. Hitchhiking mapping analyses identified 12 directional selective sweep regions, and all selective sweep windows on chromosomes were narrow (~8.9 kb). Further analyses indentified 132 candidate genes under selection. When we compared our genetic data and six crucial environmental variables, 16 putatively selected loci showed significant correlation with these environmental variables. This suggests that the local environmental conditions have left significant signatures of selection at both population and genomic levels. Finally, we identified “plastic” genomic regions and genes that are promising regions to investigate evolutionary responses to rapid environmental change in C. robusta.
Frazier, Courtney L.; San Filippo, Joseph; Lambowitz, Alan M.; Mills, David A.
2003-01-01
Despite their commercial importance, there are relatively few facile methods for genomic manipulation of the lactic acid bacteria. Here, the lactococcal group II intron, Ll.ltrB, was targeted to insert efficiently into genes encoding malate decarboxylase (mleS) and tetracycline resistance (tetM) within the Lactococcus lactis genome. Integrants were readily identified and maintained in the absence of a selectable marker. Since splicing of the Ll.ltrB intron depends on the intron-encoded protein, targeted invasion with an intron lacking the intron open reading frame disrupted TetM and MleS function, and MleS activity could be partially restored by expressing the intron-encoded protein in trans. Restoration of splicing from intron variants lacking the intron-encoded protein illustrates how targeted group II introns could be used for conditional expression of any gene. Furthermore, the modified Ll.ltrB intron was used to separately deliver a phage resistance gene (abiD) and a tetracycline resistance marker (tetM) into mleS, without the need for selection to drive the integration or to maintain the integrant. Our findings demonstrate the utility of targeted group II introns as a potential food-grade mechanism for delivery of industrially important traits into the genomes of lactococci. PMID:12571038
Genomic Selection in Dairy Cattle: The USDA Experience.
Wiggans, George R; Cole, John B; Hubbard, Suzanne M; Sonstegard, Tad S
2017-02-08
Genomic selection has revolutionized dairy cattle breeding. Since 2000, assays have been developed to genotype large numbers of single-nucleotide polymorphisms (SNPs) at relatively low cost. The first commercial SNP genotyping chip was released with a set of 54,001 SNPs in December 2007. Over 15,000 genotypes were used to determine which SNPs should be used in genomic evaluation of US dairy cattle. Official USDA genomic evaluations were first released in January 2009 for Holsteins and Jerseys, in August 2009 for Brown Swiss, in April 2013 for Ayrshires, and in April 2016 for Guernseys. Producers have accepted genomic evaluations as accurate indications of a bull's eventual daughter-based evaluation. The integration of DNA marker technology and genomics into the traditional evaluation system has doubled the rate of genetic progress for traits of economic importance, decreased generation interval, increased selection accuracy, reduced previous costs of progeny testing, and allowed identification of recessive lethals.
USDA-ARS?s Scientific Manuscript database
Bovine Respiratory Disease Complex is a disease that is very costly to the dairy industry. Genomic selection may be an effective tool to improve host resistance to the pathogens that cause this disease. Use of genomic predicted transmitting abilities (GPTA) for selection has had a dramatic effect on...
Accuracy of genomic selection for BCWD resistance in rainbow trout
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant economic losses in salmonids. In this study, we aimed to (1) predict genomic breeding values (GEBV) by genotyping training (n=583) and validation samples (n=53) with a SNP50K chip; and (2) assess the accuracy of genomic selection (GS) for BCWD r...
Signatures of positive selection in East African Shorthorn Zebu: a genome-wide SNP analysis
USDA-ARS?s Scientific Manuscript database
The small East African Shorthorn Zebu is the main indigenous cattle across East Africa. A recent genome wide SNPs analysis has revealed their ancient stable African taurine x Asian zebu admixture. Here, we assess the presence of candidate signature of positive selection in their genome, with the aim...
Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction.
Liu, Cong; Wang, Xujun; Genchev, Georgi Z; Lu, Hui
2017-07-15
New developments in high-throughput genomic technologies have enabled the measurement of diverse types of omics biomarkers in a cost-efficient and clinically-feasible manner. Developing computational methods and tools for analysis and translation of such genomic data into clinically-relevant information is an ongoing and active area of investigation. For example, several studies have utilized an unsupervised learning framework to cluster patients by integrating omics data. Despite such recent advances, predicting cancer prognosis using integrated omics biomarkers remains a challenge. There is also a shortage of computational tools for predicting cancer prognosis by using supervised learning methods. The current standard approach is to fit a Cox regression model by concatenating the different types of omics data in a linear manner, while penalty could be added for feature selection. A more powerful approach, however, would be to incorporate data by considering relationships among omics datatypes. Here we developed two methods: a SKI-Cox method and a wLASSO-Cox method to incorporate the association among different types of omics data. Both methods fit the Cox proportional hazards model and predict a risk score based on mRNA expression profiles. SKI-Cox borrows the information generated by these additional types of omics data to guide variable selection, while wLASSO-Cox incorporates this information as a penalty factor during model fitting. We show that SKI-Cox and wLASSO-Cox models select more true variables than a LASSO-Cox model in simulation studies. We assess the performance of SKI-Cox and wLASSO-Cox using TCGA glioblastoma multiforme and lung adenocarcinoma data. In each case, mRNA expression, methylation, and copy number variation data are integrated to predict the overall survival time of cancer patients. Our methods achieve better performance in predicting patients' survival in glioblastoma and lung adenocarcinoma. Copyright © 2017. Published by Elsevier Inc.
WGE: a CRISPR database for genome engineering.
Hodgkins, Alex; Farne, Anna; Perera, Sajith; Grego, Tiago; Parry-Smith, David J; Skarnes, William C; Iyer, Vivek
2015-09-15
The rapid development of CRISPR-Cas9 mediated genome editing techniques has given rise to a number of online and stand-alone tools to find and score CRISPR sites for whole genomes. Here we describe the Wellcome Trust Sanger Institute Genome Editing database (WGE), which uses novel methods to compute, visualize and select optimal CRISPR sites in a genome browser environment. The WGE database currently stores single and paired CRISPR sites and pre-calculated off-target information for CRISPRs located in the mouse and human exomes. Scoring and display of off-target sites is simple, and intuitive, and filters can be applied to identify high-quality CRISPR sites rapidly. WGE also provides a tool for the design and display of gene targeting vectors in the same genome browser, along with gene models, protein translation and variation tracks. WGE is open, extensible and can be set up to compute and present CRISPR sites for any genome. The WGE database is freely available at www.sanger.ac.uk/htgt/wge : vvi@sanger.ac.uk or skarnes@sanger.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Pembleton, Luke W; Inch, Courtney; Baillie, Rebecca C; Drayton, Michelle C; Thakur, Preeti; Ogaji, Yvonne O; Spangenberg, German C; Forster, John W; Daetwyler, Hans D; Cogan, Noel O I
2018-06-02
Exploitation of data from a ryegrass breeding program has enabled rapid development and implementation of genomic selection for sward-based biomass yield with a twofold-to-threefold increase in genetic gain. Genomic selection, which uses genome-wide sequence polymorphism data and quantitative genetics techniques to predict plant performance, has large potential for the improvement in pasture plants. Major factors influencing the accuracy of genomic selection include the size of reference populations, trait heritability values and the genetic diversity of breeding populations. Global diversity of the important forage species perennial ryegrass is high and so would require a large reference population in order to achieve moderate accuracies of genomic selection. However, diversity of germplasm within a breeding program is likely to be lower. In addition, de novo construction and characterisation of reference populations are a logistically complex process. Consequently, historical phenotypic records for seasonal biomass yield and heading date over a 18-year period within a commercial perennial ryegrass breeding program have been accessed, and target populations have been characterised with a high-density transcriptome-based genotyping-by-sequencing assay. Ability to predict observed phenotypic performance in each successive year was assessed by using all synthetic populations from previous years as a reference population. Moderate and high accuracies were achieved for the two traits, respectively, consistent with broad-sense heritability values. The present study represents the first demonstration and validation of genomic selection for seasonal biomass yield within a diverse commercial breeding program across multiple years. These results, supported by previous simulation studies, demonstrate the ability to predict sward-based phenotypic performance early in the process of individual plant selection, so shortening the breeding cycle, increasing the rate of genetic gain and allowing rapid adoption in ryegrass improvement programs.
Gorelick, Root; Olson, Krystle
2013-07-01
There are two ways eukaryotes double number of chromosomes: (1) whole genome duplication (polyploidy), in which all nuclear DNA is replicated, and (2) karyotypic fission (pseudopolyploidy), in which all chromosomes are physically bifurcated. We contrast polyploidy with pseudopolyploidy, highlighting when it is crucial to look at genetic vs. genomic levels. We review history of pseudopolyploidy, including recent mechanisms by which chromosomal bifurcation may occur and outline methods for detecting such genomic changes. We then delve into the evolutionary implications, with particular focus on adaptive potential, of these two forms of doubling chromosome numbers. We address the common assertion that polyploidy induces adaptive radiations, which contains three fallacies. First, while polyploidy causes quantum speciation, evolutionary theory implies that these radiations should be non-adaptive. Polyploidy causes reproductive isolation, minute effective population sizes, and increased mutation rates, which all imply a diminished role for selection. Second, due to lack of karyotyping in recent decades and lack of distinction between genomic and genetic effects, it is usually impossible to detect pseudopolyploids. Third, pseudopolyploids lack minority cytotype exclusion because they readily backcross with their progenitors, which thereby means no reproductive isolation for newly formed pseudopolyploids. Pseudopolyploidy will thereby not result in radiations until pseudopolyploid descendants undergo subsequent chromosome rearrangements or grow new centromeres. Pseudopolyploids may have a modest selective advantage over their progenitors due to diminished linkage disequilibrium. Thus, pseudopolyploidy may induce adaptive non-radiations. We encourage a renaissance of karyotyping to distinguish between these two mechanisms and a renaissance in genomic perspectives in evolution. Copyright © 2013 Wiley Periodicals, Inc.
Shedding genomic light on Aristotle's lantern.
Sodergren, Erica; Shen, Yufeng; Song, Xingzhi; Zhang, Lan; Gibbs, Richard A; Weinstock, George M
2006-12-01
Sea urchins have proved fascinating to biologists since the time of Aristotle who compared the appearance of their bony mouth structure to a lantern in The History of Animals. Throughout modern times it has been a model system for research in developmental biology. Now, the genome of the sea urchin Strongylocentrotus purpuratus is the first echinoderm genome to be sequenced. A high quality draft sequence assembly was produced using the Atlas assembler to combine whole genome shotgun sequences with sequences from a collection of BACs selected to form a minimal tiling path along the genome. A formidable challenge was presented by the high degree of heterozygosity between the two haplotypes of the selected male representative of this marine organism. This was overcome by use of the BAC tiling path backbone, in which each BAC represents a single haplotype, as well as by improvements in the Atlas software. Another innovation introduced in this project was the sequencing of pools of tiling path BACs rather than individual BAC sequencing. The Clone-Array Pooled Shotgun Strategy greatly reduced the cost and time devoted to preparing shotgun libraries from BAC clones. The genome sequence was analyzed with several gene prediction methods to produce a comprehensive gene list that was then manually refined and annotated by a volunteer team of sea urchin experts. This latter annotation community edited over 9000 gene models and uncovered many unexpected aspects of the sea urchin genetic content impacting transcriptional regulation, immunology, sensory perception, and an organism's development. Analysis of the basic deuterostome genetic complement supports the sea urchin's role as a model system for deuterostome and, by extension, chordate development.
Li, Mingkun; Rothwell, Rebecca; Vermaat, Martijn; Wachsmuth, Manja; Schröder, Roland; Laros, Jeroen F.J.; van Oven, Mannis; de Bakker, Paul I.W.; Bovenberg, Jasper A.; van Duijn, Cornelia M.; van Ommen, Gert-Jan B.; Slagboom, P. Eline; Swertz, Morris A.; Wijmenga, Cisca; Kayser, Manfred; Boomsma, Dorret I.; Zöllner, Sebastian; de Knijff, Peter; Stoneking, Mark
2016-01-01
Although previous studies have documented a bottleneck in the transmission of mtDNA genomes from mothers to offspring, several aspects remain unclear, including the size and nature of the bottleneck. Here, we analyze the dynamics of mtDNA heteroplasmy transmission in the Genomes of the Netherlands (GoNL) data, which consists of complete mtDNA genome sequences from 228 trios, eight dizygotic (DZ) twin quartets, and 10 monozygotic (MZ) twin quartets. Using a minor allele frequency (MAF) threshold of 2%, we identified 189 heteroplasmies in the trio mothers, of which 59% were transmitted to offspring, and 159 heteroplasmies in the trio offspring, of which 70% were inherited from the mothers. MZ twin pairs exhibited greater similarity in MAF at heteroplasmic sites than DZ twin pairs, suggesting that the heteroplasmy MAF in the oocyte is the major determinant of the heteroplasmy MAF in the offspring. We used a likelihood method to estimate the effective number of mtDNA genomes transmitted to offspring under different bottleneck models; a variable bottleneck size model provided the best fit to the data, with an estimated mean of nine individual mtDNA genomes transmitted. We also found evidence for negative selection during transmission against novel heteroplasmies (in which the minor allele has never been observed in polymorphism data). These novel heteroplasmies are enhanced for tRNA and rRNA genes, and mutations associated with mtDNA diseases frequently occur in these genes. Our results thus suggest that the female germ line is able to recognize and select against deleterious heteroplasmies. PMID:26916109
Soybean (Glycine max) transformation using mature cotyledonary node explants.
Olhoft, Paula M; Donovan, Christopher M; Somers, David A
2006-01-01
Agrobacterium tumefaciens-mediated transformation of soybeans has been steadily improved since its development in 1988. Soybean transformation is now possible in a range of genotypes from different maturity groups using different explants as sources of regenerable cells, various selectable marker genes and selective agents, and different A. tumefaciens strains. The cotyledonary-node method has been extensively investigated and across a number of laboratories yields on average greater than 1% transformation efficiency (one Southern-positive, independent event per 100 cotyledonary-node explants). Continued improvements in the cotyledonary-node method concomitant with further increases in transformation efficiency will enhance broader adoption of this already productive transformation method for use in crop improvement and functional genomics research efforts.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.
van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem
2015-10-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.
Moreb, Eirik Adim; Hoover, Benjamin; Yaseen, Adam; Valyasevi, Nisakorn; Roecker, Zoe; Menacho-Melgar, Romel; Lynch, Michael D
2017-12-15
Phage-derived "recombineering" methods are utilized for bacterial genome editing. Recombineering results in a heterogeneous population of modified and unmodified chromosomes, and therefore selection methods, such as CRISPR-Cas9, are required to select for edited clones. Cells can evade CRISPR-Cas-induced cell death through recA-mediated induction of the SOS response. The SOS response increases RecA dependent repair as well as mutation rates through induction of the umuDC error prone polymerase. As a result, CRISPR-Cas selection is more efficient in recA mutants. We report an approach to inhibiting the SOS response and RecA activity through the expression of a mutant dominant negative form of RecA, which incorporates into wild type RecA filaments and inhibits activity. Using a plasmid-based system in which Cas9 and recA mutants are coexpressed, we can achieve increased efficiency and consistency of CRISPR-Cas9-mediated selection and recombineering in E. coli, while reducing the induction of the SOS response. To date, this approach has been shown to be independent of recA genotype and host strain lineage. Using this system, we demonstrate increased CRISPR-Cas selection efficacy with over 10 000 guides covering the E. coli chromosome. The use of dominant negative RecA or homologues may be of broad use in bacterial CRISPR-Cas-based genome editing where the SOS pathways are present.
Multivariate Methods for Meta-Analysis of Genetic Association Studies.
Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G
2018-01-01
Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
Guide RNA selection for CRISPR-Cas9 transfections in Plasmodium falciparum.
Ribeiro, Jose M; Garriga, Meera; Potchen, Nicole; Crater, Anna K; Gupta, Ankit; Ito, Daisuke; Desai, Sanjay A
2018-06-12
CRISPR-Cas9 mediated genome editing is addressing key limitations in the transfection of malaria parasites. While this method has already simplified the needed molecular cloning and reduced the time required to generate mutants in the human pathogen Plasmodium falciparum, optimal selection of required guide RNAs and guidelines for successful transfections have not been well characterized, leading workers to use time-consuming trial and error approaches. We used a genome-wide computational approach to create a comprehensive and publicly accessible database of possible guide RNA sequences in the P. falciparum genome. For each guide, we report on-target efficiency and specificity scores as well as information about the genomic site relevant to optimal design of CRISPR-Cas9 transfections to modify, disrupt, or conditionally knockdown any gene. As many antimalarial drug and vaccine targets are encoded by multigene families, we also developed a new paralog specificity score that should facilitate modification of either a single family member of interest or multiple paralogs that serve overlapping roles. Finally, we tabulated features of successful transfections in our laboratory, providing broadly useful guidelines for parasite transfections. Molecular studies aimed at understanding parasite biology or characterizing drug and vaccine targets in P. falciparum should be facilitated by this comprehensive database. Published by Elsevier Ltd.
Loss of Heterozygosity Drives Adaptation in Hybrid Yeast.
Smukowski Heil, Caiti S; DeSevo, Christopher G; Pai, Dave A; Tucker, Cheryl M; Hoang, Margaret L; Dunham, Maitreya J
2017-07-01
Hybridization is often considered maladaptive, but sometimes hybrids can invade new ecological niches and adapt to novel or stressful environments better than their parents. The genomic changes that occur following hybridization that facilitate genome resolution and/or adaptation are not well understood. Here, we examine hybrid genome evolution using experimental evolution of de novo interspecific hybrid yeast Saccharomyces cerevisiae × Saccharomyces uvarum and their parentals. We evolved these strains in nutrient-limited conditions for hundreds of generations and sequenced the resulting cultures identifying numerous point mutations, copy number changes, and loss of heterozygosity (LOH) events, including species-biased amplification of nutrient transporters. We focused on a particularly interesting example, in which we saw repeated LOH at the high-affinity phosphate transporter gene PHO84 in both intra- and interspecific hybrids. Using allele replacement methods, we tested the fitness of different alleles in hybrid and S. cerevisiae strain backgrounds and found that the LOH is indeed the result of selection on one allele over the other in both S. cerevisiae and the hybrids. This is an example where hybrid genome resolution is driven by positive selection on existing heterozygosity and demonstrates that even infrequent outcrossing may have lasting impacts on adaptation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Kjærner-Semb, Erik; Ayllon, Fernando; Furmanek, Tomasz; Wennevik, Vidar; Dahle, Geir; Niemelä, Eero; Ozerov, Mikhail; Vähä, Juha-Pekka; Glover, Kevin A; Rubin, Carl J; Wargelius, Anna; Edvardsen, Rolf B
2016-08-11
Populations of Atlantic salmon display highly significant genetic differences with unresolved molecular basis. These differences may result from separate postglacial colonization patterns, diversifying natural selection and adaptation, or a combination. Adaptation could be influenced or even facilitated by the recent whole genome duplication in the salmonid lineage which resulted in a partly tetraploid species with duplicated genes and regions. In order to elucidate the genes and genomic regions underlying the genetic differences, we conducted a genome wide association study using whole genome resequencing data from eight populations from Northern and Southern Norway. From a total of ~4.5 million sequencing-derived SNPs, more than 10 % showed significant differentiation between populations from these two regions and ten selective sweeps on chromosomes 5, 10, 11, 13-15, 21, 24 and 25 were identified. These comprised 59 genes, of which 15 had one or more differentiated missense mutation. Our analysis showed that most sweeps have paralogous regions in the partially tetraploid genome, each lacking the high number of significant SNPs found in the sweeps. The most significant sweep was found on Chr 25 and carried several missense mutations in the antiviral mx genes, suggesting that these populations have experienced differing viral pressures. Interestingly the second most significant sweep, found on Chr 5, contains two genes involved in the NF-KB pathway (nkap and nkrf), which is also a known pathogen target that controls a large number of processes in animals. Our results show that natural selection acting on immune related genes has contributed to genetic divergence between salmon populations in Norway. The differences between populations may have been facilitated by the plasticity of the salmon genome. The observed signatures of selection in duplicated genomic regions suggest that the recently duplicated genome has provided raw material for evolutionary adaptation.