Evaluation of an ensemble of genetic models for prediction of a quantitative trait.
Milton, Jacqueline N; Steinberg, Martin H; Sebastiani, Paola
2014-01-01
Many genetic markers have been shown to be associated with common quantitative traits in genome-wide association studies. Typically these associated genetic markers have small to modest effect sizes and individually they explain only a small amount of the variability of the phenotype. In order to build a genetic prediction model without fitting a multiple linear regression model with possibly hundreds of genetic markers as predictors, researchers often summarize the joint effect of risk alleles into a genetic score that is used as a covariate in the genetic prediction model. However, the prediction accuracy can be highly variable and selecting the optimal number of markers to be included in the genetic score is challenging. In this manuscript we present a strategy to build an ensemble of genetic prediction models from data and we show that the ensemble-based method makes the challenge of choosing the number of genetic markers more amenable. Using simulated data with varying heritability and number of genetic markers, we compare the predictive accuracy and inclusion of true positive and false positive markers of a single genetic prediction model and our proposed ensemble method. The results show that the ensemble of genetic models tends to include a larger number of genetic variants than a single genetic model and it is more likely to include all of the true genetic markers. This increased sensitivity is obtained at the price of a lower specificity that appears to minimally affect the predictive accuracy of the ensemble.
Morgante, Fabio; Huang, Wen; Maltecca, Christian; Mackay, Trudy F C
2018-06-01
Predicting complex phenotypes from genomic data is a fundamental aim of animal and plant breeding, where we wish to predict genetic merits of selection candidates; and of human genetics, where we wish to predict disease risk. While genomic prediction models work well with populations of related individuals and high linkage disequilibrium (LD) (e.g., livestock), comparable models perform poorly for populations of unrelated individuals and low LD (e.g., humans). We hypothesized that low prediction accuracies in the latter situation may occur when the genetics architecture of the trait departs from the infinitesimal and additive architecture assumed by most prediction models. We used simulated data for 10,000 lines based on sequence data from a population of unrelated, inbred Drosophila melanogaster lines to evaluate this hypothesis. We show that, even in very simplified scenarios meant as a stress test of the commonly used Genomic Best Linear Unbiased Predictor (G-BLUP) method, using all common variants yields low prediction accuracy regardless of the trait genetic architecture. However, prediction accuracy increases when predictions are informed by the genetic architecture inferred from mapping the top variants affecting main effects and interactions in the training data, provided there is sufficient power for mapping. When the true genetic architecture is largely or partially due to epistatic interactions, the additive model may not perform well, while models that account explicitly for interactions generally increase prediction accuracy. Our results indicate that accounting for genetic architecture can improve prediction accuracy for quantitative traits.
Schrodi, Steven J.; Mukherjee, Shubhabrata; Shan, Ying; Tromp, Gerard; Sninsky, John J.; Callear, Amy P.; Carter, Tonia C.; Ye, Zhan; Haines, Jonathan L.; Brilliant, Murray H.; Crane, Paul K.; Smelser, Diane T.; Elston, Robert C.; Weeks, Daniel E.
2014-01-01
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications. PMID:24917882
Su, Guosheng; Christensen, Ole F.; Ostersen, Tage; Henryon, Mark; Lund, Mogens S.
2012-01-01
Non-additive genetic variation is usually ignored when genome-wide markers are used to study the genetic architecture and genomic prediction of complex traits in human, wild life, model organisms or farm animals. However, non-additive genetic effects may have an important contribution to total genetic variation of complex traits. This study presented a genomic BLUP model including additive and non-additive genetic effects, in which additive and non-additive genetic relation matrices were constructed from information of genome-wide dense single nucleotide polymorphism (SNP) markers. In addition, this study for the first time proposed a method to construct dominance relationship matrix using SNP markers and demonstrated it in detail. The proposed model was implemented to investigate the amounts of additive genetic, dominance and epistatic variations, and assessed the accuracy and unbiasedness of genomic predictions for daily gain in pigs. In the analysis of daily gain, four linear models were used: 1) a simple additive genetic model (MA), 2) a model including both additive and additive by additive epistatic genetic effects (MAE), 3) a model including both additive and dominance genetic effects (MAD), and 4) a full model including all three genetic components (MAED). Estimates of narrow-sense heritability were 0.397, 0.373, 0.379 and 0.357 for models MA, MAE, MAD and MAED, respectively. Estimated dominance variance and additive by additive epistatic variance accounted for 5.6% and 9.5% of the total phenotypic variance, respectively. Based on model MAED, the estimate of broad-sense heritability was 0.506. Reliabilities of genomic predicted breeding values for the animals without performance records were 28.5%, 28.8%, 29.2% and 29.5% for models MA, MAE, MAD and MAED, respectively. In addition, models including non-additive genetic effects improved unbiasedness of genomic predictions. PMID:23028912
Eco-genetic modeling of contemporary life-history evolution.
Dunlop, Erin S; Heino, Mikko; Dieckmann, Ulf
2009-10-01
We present eco-genetic modeling as a flexible tool for exploring the course and rates of multi-trait life-history evolution in natural populations. We build on existing modeling approaches by combining features that facilitate studying the ecological and evolutionary dynamics of realistically structured populations. In particular, the joint consideration of age and size structure enables the analysis of phenotypically plastic populations with more than a single growth trajectory, and ecological feedback is readily included in the form of density dependence and frequency dependence. Stochasticity and life-history trade-offs can also be implemented. Critically, eco-genetic models permit the incorporation of salient genetic detail such as a population's genetic variances and covariances and the corresponding heritabilities, as well as the probabilistic inheritance and phenotypic expression of quantitative traits. These inclusions are crucial for predicting rates of evolutionary change on both contemporary and longer timescales. An eco-genetic model can be tightly coupled with empirical data and therefore may have considerable practical relevance, in terms of generating testable predictions and evaluating alternative management measures. To illustrate the utility of these models, we present as an example an eco-genetic model used to study harvest-induced evolution of multiple traits in Atlantic cod. The predictions of our model (most notably that harvesting induces a genetic reduction in age and size at maturation, an increase or decrease in growth capacity depending on the minimum-length limit, and an increase in reproductive investment) are corroborated by patterns observed in wild populations. The predicted genetic changes occur together with plastic changes that could phenotypically mask the former. Importantly, our analysis predicts that evolutionary changes show little signs of reversal following a harvest moratorium. This illustrates how predictions offered by eco-genetic models can enable and guide evolutionarily sustainable resource management.
Reuning, Gretchen A; Bauerle, William L; Mullen, Jack L; McKay, John K
2015-04-01
Transpiration is controlled by evaporative demand and stomatal conductance (gs ), and there can be substantial genetic variation in gs . A key parameter in empirical models of transpiration is minimum stomatal conductance (g0 ), a trait that can be measured and has a large effect on gs and transpiration. In Arabidopsis thaliana, g0 exhibits both environmental and genetic variation, and quantitative trait loci (QTL) have been mapped. We used this information to create a genetically parameterized empirical model to predict transpiration of genotypes. For the parental lines, this worked well. However, in a recombinant inbred population, the predictions proved less accurate. When based only upon their genotype at a single g0 QTL, genotypes were less distinct than our model predicted. Follow-up experiments indicated that both genotype by environment interaction and a polygenic inheritance complicate the application of genetic effects into physiological models. The use of ecophysiological or 'crop' models for predicting transpiration of novel genetic lines will benefit from incorporating further knowledge of the genetic control and degree of independence of core traits/parameters underlying gs variation. © 2014 John Wiley & Sons Ltd.
Iglesias, Adriana I; Mihaescu, Raluca; Ioannidis, John P A; Khoury, Muin J; Little, Julian; van Duijn, Cornelia M; Janssens, A Cecile J W
2014-05-01
Our main objective was to raise awareness of the areas that need improvements in the reporting of genetic risk prediction articles for future publications, based on the Genetic RIsk Prediction Studies (GRIPS) statement. We evaluated studies that developed or validated a prediction model based on multiple DNA variants, using empirical data, and were published in 2010. A data extraction form based on the 25 items of the GRIPS statement was created and piloted. Forty-two studies met our inclusion criteria. Overall, more than half of the evaluated items (34 of 62) were reported in at least 85% of included articles. Seventy-seven percentage of the articles were identified as genetic risk prediction studies through title assessment, but only 31% used the keywords recommended by GRIPS in the title or abstract. Seventy-four percentage mentioned which allele was the risk variant. Overall, only 10% of the articles reported all essential items needed to perform external validation of the risk model. Completeness of reporting in genetic risk prediction studies is adequate for general elements of study design but is suboptimal for several aspects that characterize genetic risk prediction studies such as description of the model construction. Improvements in the transparency of reporting of these aspects would facilitate the identification, replication, and application of genetic risk prediction models. Copyright © 2014 Elsevier Inc. All rights reserved.
Genetic Programming as Alternative for Predicting Development Effort of Individual Software Projects
Chavoya, Arturo; Lopez-Martin, Cuauhtemoc; Andalon-Garcia, Irma R.; Meda-Campaña, M. E.
2012-01-01
Statistical and genetic programming techniques have been used to predict the software development effort of large software projects. In this paper, a genetic programming model was used for predicting the effort required in individually developed projects. Accuracy obtained from a genetic programming model was compared against one generated from the application of a statistical regression model. A sample of 219 projects developed by 71 practitioners was used for generating the two models, whereas another sample of 130 projects developed by 38 practitioners was used for validating them. The models used two kinds of lines of code as well as programming language experience as independent variables. Accuracy results from the model obtained with genetic programming suggest that it could be used to predict the software development effort of individual projects when these projects have been developed in a disciplined manner within a development-controlled environment. PMID:23226305
Imaging genetics approach to predict progression of Parkinson's diseases.
Mansu Kim; Seong-Jin Son; Hyunjin Park
2017-07-01
Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.
Pandey, Daya Shankar; Pan, Indranil; Das, Saptarshi; Leahy, James J; Kwapinski, Witold
2015-03-01
A multi-gene genetic programming technique is proposed as a new method to predict syngas yield production and the lower heating value for municipal solid waste gasification in a fluidized bed gasifier. The study shows that the predicted outputs of the municipal solid waste gasification process are in good agreement with the experimental dataset and also generalise well to validation (untrained) data. Published experimental datasets are used for model training and validation purposes. The results show the effectiveness of the genetic programming technique for solving complex nonlinear regression problems. The multi-gene genetic programming are also compared with a single-gene genetic programming model to show the relative merits and demerits of the technique. This study demonstrates that the genetic programming based data-driven modelling strategy can be a good candidate for developing models for other types of fuels as well. Copyright © 2014 Elsevier Ltd. All rights reserved.
PredictABEL: an R package for the assessment of risk prediction models.
Kundu, Suman; Aulchenko, Yurii S; van Duijn, Cornelia M; Janssens, A Cecile J W
2011-04-01
The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility. We developed PredictABEL, a package in R that covers descriptive tables, measures and figures that are used in the analysis of risk prediction studies such as measures of model fit, predictive ability and clinical utility, and risk distributions, calibration plot and the receiver operating characteristic plot. Tables and figures are saved as separate files in a user-specified format, which include publication-quality EPS and TIFF formats. All figures are available in a ready-made layout, but they can be customized to the preferences of the user. The package has been developed for the analysis of genetic risk prediction studies, but can also be used for studies that only include non-genetic risk factors. PredictABEL is freely available at the websites of GenABEL ( http://www.genabel.org ) and CRAN ( http://cran.r-project.org/).
2011-01-01
Background Molecular marker information is a common source to draw inferences about the relationship between genetic and phenotypic variation. Genetic effects are often modelled as additively acting marker allele effects. The true mode of biological action can, of course, be different from this plain assumption. One possibility to better understand the genetic architecture of complex traits is to include intra-locus (dominance) and inter-locus (epistasis) interaction of alleles as well as the additive genetic effects when fitting a model to a trait. Several Bayesian MCMC approaches exist for the genome-wide estimation of genetic effects with high accuracy of genetic value prediction. Including pairwise interaction for thousands of loci would probably go beyond the scope of such a sampling algorithm because then millions of effects are to be estimated simultaneously leading to months of computation time. Alternative solving strategies are required when epistasis is studied. Methods We extended a fast Bayesian method (fBayesB), which was previously proposed for a purely additive model, to include non-additive effects. The fBayesB approach was used to estimate genetic effects on the basis of simulated datasets. Different scenarios were simulated to study the loss of accuracy of prediction, if epistatic effects were not simulated but modelled and vice versa. Results If 23 QTL were simulated to cause additive and dominance effects, both fBayesB and a conventional MCMC sampler BayesB yielded similar results in terms of accuracy of genetic value prediction and bias of variance component estimation based on a model including additive and dominance effects. Applying fBayesB to data with epistasis, accuracy could be improved by 5% when all pairwise interactions were modelled as well. The accuracy decreased more than 20% if genetic variation was spread over 230 QTL. In this scenario, accuracy based on modelling only additive and dominance effects was generally superior to that of the complex model including epistatic effects. Conclusions This simulation study showed that the fBayesB approach is convenient for genetic value prediction. Jointly estimating additive and non-additive effects (especially dominance) has reasonable impact on the accuracy of prediction and the proportion of genetic variation assigned to the additive genetic source. PMID:21867519
Genetic and phylogenetic consequences of island biogeography.
Johnson, K P; Adler, F R; Cherry, J L
2000-04-01
Island biogeography theory predicts that the number of species on an island should increase with island size and decrease with island distance to the mainland. These predictions are generally well supported in comparative and experimental studies. These ecological, equilibrium predictions arise as a result of colonization and extinction processes. Because colonization and extinction are also important processes in evolution, we develop methods to test evolutionary predictions of island biogeography. We derive a population genetic model of island biogeography that incorporates island colonization, migration of individuals from the mainland, and extinction of island populations. The model provides a means of estimating the rates of migration and extinction from population genetic data. This model predicts that within an island population the distribution of genetic divergences with respect to the mainland source population should be bimodal, with much of the divergence dating to the colonization event. Across islands, this model predicts that populations on large islands should be on average more genetically divergent from mainland source populations than those on small islands. Likewise, populations on distant islands should be more divergent than those on close islands. Published observations of a larger proportion of endemic species on large and distant islands support these predictions.
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
2015-01-01
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Genetic change and rates of cladogenesis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Avise, J.C.; Ayala, F.J.
1975-12-01
Models are introduced which predict ratios of mean levels of genetic divergence in species-rich versus species-poor phylads under two competing assumptions: (1) genetic differentiation is a function of time, unrelated to the number of cladogenetic events and (2) genetic differentiation is proportional to the number of speciation events in the group. The models are simple, general, and biologically real, but not precise. They lead to qualitatively distinct predictions about levels of genetic divergence depending upon the relationship between rates of speciation and amount of genetic change. When genetic distance between species is a function of time, mean genetic distances inmore » speciose and depauperate phylads of equal evolutionary age are very similar. On the contrary, when genetic distance is a function of the number of speciations in the history of a phylad, the ratio of mean genetic distances separating species in speciose versus depauperate phylads is greater than one, and increases rapidly as the frequency of speciations in one group relative to the other increases. The models may be tested with data from natural populations to assess (1) possible correlations between rates of anagenesis and cladogenesis and (2) the amount of genetic differentiation accompanying the speciation process. The data collected in electrophoretic surveys and other kinds of studies can be used to test the predictions of the models. For this purpose genetic distances need to be measured in speciose and depauperate phylads of equal evolutionary age. The limited information presently available agrees better with the model predicting that genetic change is primarily a function of time, and is not correlated with rates of speciation. Further testing of the models is, however, required before firm conclusions can be drawn. (auth)« less
Genetically informed ecological niche models improve climate change predictions.
Ikeda, Dana H; Max, Tamara L; Allan, Gerard J; Lau, Matthew K; Shuster, Stephen M; Whitham, Thomas G
2017-01-01
We examined the hypothesis that ecological niche models (ENMs) more accurately predict species distributions when they incorporate information on population genetic structure, and concomitantly, local adaptation. Local adaptation is common in species that span a range of environmental gradients (e.g., soils and climate). Moreover, common garden studies have demonstrated a covariance between neutral markers and functional traits associated with a species' ability to adapt to environmental change. We therefore predicted that genetically distinct populations would respond differently to climate change, resulting in predicted distributions with little overlap. To test whether genetic information improves our ability to predict a species' niche space, we created genetically informed ecological niche models (gENMs) using Populus fremontii (Salicaceae), a widespread tree species in which prior common garden experiments demonstrate strong evidence for local adaptation. Four major findings emerged: (i) gENMs predicted population occurrences with up to 12-fold greater accuracy than models without genetic information; (ii) tests of niche similarity revealed that three ecotypes, identified on the basis of neutral genetic markers and locally adapted populations, are associated with differences in climate; (iii) our forecasts indicate that ongoing climate change will likely shift these ecotypes further apart in geographic space, resulting in greater niche divergence; (iv) ecotypes that currently exhibit the largest geographic distribution and niche breadth appear to be buffered the most from climate change. As diverse agents of selection shape genetic variability and structure within species, we argue that gENMs will lead to more accurate predictions of species distributions under climate change. © 2016 John Wiley & Sons Ltd.
Gim, Jungsoo; Kim, Wonji; Kwak, Soo Heon; Choi, Hosik; Park, Changyi; Park, Kyong Soo; Kwon, Sunghoon; Park, Taesung; Won, Sungho
2017-11-01
Despite the many successes of genome-wide association studies (GWAS), the known susceptibility variants identified by GWAS have modest effect sizes, leading to notable skepticism about the effectiveness of building a risk prediction model from large-scale genetic data. However, in contrast to genetic variants, the family history of diseases has been largely accepted as an important risk factor in clinical diagnosis and risk prediction. Nevertheless, the complicated structures of the family history of diseases have limited their application in clinical practice. Here, we developed a new method that enables incorporation of the general family history of diseases with a liability threshold model, and propose a new analysis strategy for risk prediction with penalized regression analysis that incorporates both large numbers of genetic variants and clinical risk factors. Application of our model to type 2 diabetes in the Korean population (1846 cases and 1846 controls) demonstrated that single-nucleotide polymorphisms accounted for 32.5% of the variation explained by the predicted risk scores in the test data set, and incorporation of family history led to an additional 6.3% improvement in prediction. Our results illustrate that family medical history provides valuable information on the variation of complex diseases and improves prediction performance. Copyright © 2017 by the Genetics Society of America.
Burnside, Elizabeth S.; Liu, Jie; Wu, Yirong; Onitilo, Adedayo A.; McCarty, Catherine; Page, C. David; Peissig, Peggy; Trentham-Dietz, Amy; Kitchner, Terrie; Fan, Jun; Yuan, Ming
2015-01-01
Rationale and Objectives The discovery of germline genetic variants associated with breast cancer has engendered interest in risk stratification for improved, targeted detection and diagnosis. However, there has yet to be a comparison of the predictive ability of these genetic variants with mammography abnormality descriptors. Materials and Methods Our IRB-approved, HIPAA-compliant study utilized a personalized medicine registry in which participants consented to provide a DNA sample and participate in longitudinal follow-up. In our retrospective, age-matched, case-controlled study of 373 cases and 395 controls who underwent breast biopsy, we collected risk factors selected a priori based on the literature including: demographic variables based on the Gail model, common germline genetic variants, and diagnostic mammography findings according to BI-RADS. We developed predictive models using logistic regression to determine the predictive ability of: 1) demographic variables, 2) 10 selected genetic variants, or 3) mammography BI-RADS features. We evaluated each model in turn by calculating a risk score for each patient using 10-fold cross validation; used this risk estimate to construct ROC curves; and compared the AUC of each using the DeLong method. Results The performance of the regression model using demographic risk factors was not statistically different from the model using genetic variants (p=0.9). The model using mammography features (AUC = 0.689) was superior to both the demographic model (AUC = .598; p<0.001) and the genetic model (AUC = .601; p<0.001). Conclusion BI-RADS features exceeded the ability of demographic and 10 selected germline genetic variants to predict breast cancer in women recommended for biopsy. PMID:26514439
Prediction of Industrial Electric Energy Consumption in Anhui Province Based on GA-BP Neural Network
NASA Astrophysics Data System (ADS)
Zhang, Jiajing; Yin, Guodong; Ni, Youcong; Chen, Jinlan
2018-01-01
In order to improve the prediction accuracy of industrial electrical energy consumption, a prediction model of industrial electrical energy consumption was proposed based on genetic algorithm and neural network. The model use genetic algorithm to optimize the weights and thresholds of BP neural network, and the model is used to predict the energy consumption of industrial power in Anhui Province, to improve the prediction accuracy of industrial electric energy consumption in Anhui province. By comparing experiment of GA-BP prediction model and BP neural network model, the GA-BP model is more accurate with smaller number of neurons in the hidden layer.
Accuracies of univariate and multivariate genomic prediction models in African cassava.
Okeke, Uche Godfrey; Akdemir, Deniz; Rabbi, Ismail; Kulakow, Peter; Jannink, Jean-Luc
2017-12-04
Genomic selection (GS) promises to accelerate genetic gain in plant breeding programs especially for crop species such as cassava that have long breeding cycles. Practically, to implement GS in cassava breeding, it is necessary to evaluate different GS models and to develop suitable models for an optimized breeding pipeline. In this paper, we compared (1) prediction accuracies from a single-trait (uT) and a multi-trait (MT) mixed model for a single-environment genetic evaluation (Scenario 1), and (2) accuracies from a compound symmetric multi-environment model (uE) parameterized as a univariate multi-kernel model to a multivariate (ME) multi-environment mixed model that accounts for genotype-by-environment interaction for multi-environment genetic evaluation (Scenario 2). For these analyses, we used 16 years of public cassava breeding data for six target cassava traits and a fivefold cross-validation scheme with 10-repeat cycles to assess model prediction accuracies. In Scenario 1, the MT models had higher prediction accuracies than the uT models for all traits and locations analyzed, which amounted to on average a 40% improved prediction accuracy. For Scenario 2, we observed that the ME model had on average (across all locations and traits) a 12% improved prediction accuracy compared to the uE model. We recommend the use of multivariate mixed models (MT and ME) for cassava genetic evaluation. These models may be useful for other plant species.
USDA-ARS?s Scientific Manuscript database
Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates for selection. Originally these models were developed without considering genotype ' environment interaction (GE). Several authors have proposed extensions of the cannonical GS model that accomm...
NASA Astrophysics Data System (ADS)
Xie, Yan; Li, Mu; Zhou, Jin; Zheng, Chang-zheng
2009-07-01
Agricultural machinery total power is an important index to reflex and evaluate the level of agricultural mechanization. It is the power source of agricultural production, and is the main factors to enhance the comprehensive agricultural production capacity expand production scale and increase the income of the farmers. Its demand is affected by natural, economic, technological and social and other "grey" factors. Therefore, grey system theory can be used to analyze the development of agricultural machinery total power. A method based on genetic algorithm optimizing grey modeling process is introduced in this paper. This method makes full use of the advantages of the grey prediction model and characteristics of genetic algorithm to find global optimization. So the prediction model is more accurate. According to data from a province, the GM (1, 1) model for predicting agricultural machinery total power was given based on the grey system theories and genetic algorithm. The result indicates that the model can be used as agricultural machinery total power an effective tool for prediction.
Chang, Xuling; Salim, Agus; Dorajoo, Rajkumar; Han, Yi; Khor, Chiea-Chuen; van Dam, Rob M; Yuan, Jian-Min; Koh, Woon-Puay; Liu, Jianjun; Goh, Daniel Yt; Wang, Xu; Teo, Yik-Ying; Friedlander, Yechiel; Heng, Chew-Kiat
2017-01-01
Background Although numerous phenotype based equations for predicting risk of 'hard' coronary heart disease are available, data on the utility of genetic information for such risk prediction is lacking in Chinese populations. Design Case-control study nested within the Singapore Chinese Health Study. Methods A total of 1306 subjects comprising 836 men (267 incident cases and 569 controls) and 470 women (128 incident cases and 342 controls) were included. A Genetic Risk Score comprising 156 single nucleotide polymorphisms that have been robustly associated with coronary heart disease or its risk factors ( p < 5 × 10 -8 ) in at least two independent cohorts of genome-wide association studies was built. For each gender, three base models were used: recalibrated Adult Treatment Panel III (ATPIII) Model (M 1 ); ATP III model fitted using Singapore Chinese Health Study data (M 2 ) and M 3 : M 2 + C-reactive protein + creatinine. Results The Genetic Risk Score was significantly associated with incident 'hard' coronary heart disease ( p for men: 1.70 × 10 -10 -1.73 × 10 -9 ; p for women: 0.001). The inclusion of the Genetic Risk Score in the prediction models improved discrimination in both genders (c-statistics: 0.706-0.722 vs. 0.663-0.695 from base models for men; 0.788-0.790 vs. 0.765-0.773 for women). In addition, the inclusion of the Genetic Risk Score also improved risk classification with a net gain of cases being reclassified to higher risk categories (men: 12.4%-16.5%; women: 10.2% (M 3 )), while not significantly reducing the classification accuracy in controls. Conclusions The Genetic Risk Score is an independent predictor for incident 'hard' coronary heart disease in our ethnic Chinese population. Inclusion of genetic factors into coronary heart disease prediction models could significantly improve risk prediction performance.
NASA Astrophysics Data System (ADS)
Mundher Yaseen, Zaher; Abdulmohsin Afan, Haitham; Tran, Minh-Tung
2018-04-01
Scientifically evidenced that beam-column joints are a critical point in the reinforced concrete (RC) structure under the fluctuation loads effects. In this novel hybrid data-intelligence model developed to predict the joint shear behavior of exterior beam-column structure frame. The hybrid data-intelligence model is called genetic algorithm integrated with deep learning neural network model (GA-DLNN). The genetic algorithm is used as prior modelling phase for the input approximation whereas the DLNN predictive model is used for the prediction phase. To demonstrate this structural problem, experimental data is collected from the literature that defined the dimensional and specimens’ properties. The attained findings evidenced the efficitveness of the hybrid GA-DLNN in modelling beam-column joint shear problem. In addition, the accurate prediction achived with less input variables owing to the feasibility of the evolutionary phase.
Emura, Takeshi; Nakatochi, Masahiro; Matsui, Shigeyuki; Michimae, Hirofumi; Rondeau, Virginie
2017-01-01
Developing a personalized risk prediction model of death is fundamental for improving patient care and touches on the realm of personalized medicine. The increasing availability of genomic information and large-scale meta-analytic data sets for clinicians has motivated the extension of traditional survival prediction based on the Cox proportional hazards model. The aim of our paper is to develop a personalized risk prediction formula for death according to genetic factors and dynamic tumour progression status based on meta-analytic data. To this end, we extend the existing joint frailty-copula model to a model allowing for high-dimensional genetic factors. In addition, we propose a dynamic prediction formula to predict death given tumour progression events possibly occurring after treatment or surgery. For clinical use, we implement the computation software of the prediction formula in the joint.Cox R package. We also develop a tool to validate the performance of the prediction formula by assessing the prediction error. We illustrate the method with the meta-analysis of individual patient data on ovarian cancer patients.
Han, Lide; Yang, Jian; Zhu, Jun
2007-06-01
A genetic model was proposed for simultaneously analyzing genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) as well as their genotype by environment (GE) interaction for quantitative traits of diploid plants. In the model, the NCI effects were further partitioned into additive and dominance nuclear-cytoplasmic interaction components. Mixed linear model approaches were used for statistical analysis. On the basis of diallel cross designs, Monte Carlo simulations showed that the genetic model was robust for estimating variance components under several situations without specific effects. Random genetic effects were predicted by an adjusted unbiased prediction (AUP) method. Data on four quantitative traits (boll number, lint percentage, fiber length, and micronaire) in Upland cotton (Gossypium hirsutum L.) were analyzed as a worked example to show the effectiveness of the model.
Analysis of conditional genetic effects and variance components in developmental genetics.
Zhu, J
1995-12-01
A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.
Analysis of Conditional Genetic Effects and Variance Components in Developmental Genetics
Zhu, J.
1995-01-01
A genetic model with additive-dominance effects and genotype X environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t - 1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects. PMID:8601500
NASA Astrophysics Data System (ADS)
Guruprasad, R.; Behera, B. K.
2015-10-01
Quantitative prediction of fabric mechanical properties is an essential requirement for design engineering of textile and apparel products. In this work, the possibility of prediction of bending rigidity of cotton woven fabrics has been explored with the application of Artificial Neural Network (ANN) and two hybrid methodologies, namely Neuro-genetic modeling and Adaptive Neuro-Fuzzy Inference System (ANFIS) modeling. For this purpose, a set of cotton woven grey fabrics was desized, scoured and relaxed. The fabrics were then conditioned and tested for bending properties. With the database thus created, a neural network model was first developed using back propagation as the learning algorithm. The second model was developed by applying a hybrid learning strategy, in which genetic algorithm was first used as a learning algorithm to optimize the number of neurons and connection weights of the neural network. The Genetic algorithm optimized network structure was further allowed to learn using back propagation algorithm. In the third model, an ANFIS modeling approach was attempted to map the input-output data. The prediction performances of the models were compared and a sensitivity analysis was reported. The results show that the prediction by neuro-genetic and ANFIS models were better in comparison with that of back propagation neural network model.
Mixed model approaches for diallel analysis based on a bio-model.
Zhu, J; Weir, B S
1996-12-01
A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.
Hernando, Barbara; Ibañez, Maria Victoria; Deserio-Cuesta, Julio Alberto; Soria-Navarro, Raquel; Vilar-Sastre, Inca; Martinez-Cadenas, Conrado
2018-03-01
Prediction of human pigmentation traits, one of the most differentiable externally visible characteristics among individuals, from biological samples represents a useful tool in the field of forensic DNA phenotyping. In spite of freckling being a relatively common pigmentation characteristic in Europeans, little is known about the genetic basis of this largely genetically determined phenotype in southern European populations. In this work, we explored the predictive capacity of eight freckle and sunlight sensitivity-related genes in 458 individuals (266 non-freckled controls and 192 freckled cases) from Spain. Four loci were associated with freckling (MC1R, IRF4, ASIP and BNC2), and female sex was also found to be a predictive factor for having a freckling phenotype in our population. After identifying the most informative genetic variants responsible for human ephelides occurrence in our sample set, we developed a DNA-based freckle prediction model using a multivariate regression approach. Once developed, the capabilities of the prediction model were tested by a repeated 10-fold cross-validation approach. The proportion of correctly predicted individuals using the DNA-based freckle prediction model was 74.13%. The implementation of sex into the DNA-based freckle prediction model slightly improved the overall prediction accuracy by 2.19% (76.32%). Further evaluation of the newly-generated prediction model was performed by assessing the model's performance in a new cohort of 212 Spanish individuals, reaching a classification success rate of 74.61%. Validation of this prediction model may be carried out in larger populations, including samples from different European populations. Further research to validate and improve this newly-generated freckle prediction model will be needed before its forensic application. Together with DNA tests already validated for eye and hair colour prediction, this freckle prediction model may lead to a substantially more detailed physical description of unknown individuals from DNA found at the crime scene. Copyright © 2017 Elsevier B.V. All rights reserved.
Vallat, Laurent; Kemper, Corey A; Jung, Nicolas; Maumy-Bertrand, Myriam; Bertrand, Frédéric; Meyer, Nicolas; Pocheville, Arnaud; Fisher, John W; Gribben, John G; Bahram, Seiamak
2013-01-08
Cellular behavior is sustained by genetic programs that are progressively disrupted in pathological conditions--notably, cancer. High-throughput gene expression profiling has been used to infer statistical models describing these cellular programs, and development is now needed to guide orientated modulation of these systems. Here we develop a regression-based model to reverse-engineer a temporal genetic program, based on relevant patterns of gene expression after cell stimulation. This method integrates the temporal dimension of biological rewiring of genetic programs and enables the prediction of the effect of targeted gene disruption at the system level. We tested the performance accuracy of this model on synthetic data before reverse-engineering the response of primary cancer cells to a proliferative (protumorigenic) stimulation in a multistate leukemia biological model (i.e., chronic lymphocytic leukemia). To validate the ability of our method to predict the effects of gene modulation on the global program, we performed an intervention experiment on a targeted gene. Comparison of the predicted and observed gene expression changes demonstrates the possibility of predicting the effects of a perturbation in a gene regulatory network, a first step toward an orientated intervention in a cancer cell genetic program.
USDA-ARS?s Scientific Manuscript database
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait predicti...
Ensemble Learning of QTL Models Improves Prediction of Complex Traits
Bian, Yang; Holland, James B.
2015-01-01
Quantitative trait locus (QTL) models can provide useful insights into trait genetic architecture because of their straightforward interpretability but are less useful for genetic prediction because of the difficulty in including the effects of numerous small effect loci without overfitting. Tight linkage between markers introduces near collinearity among marker genotypes, complicating the detection of QTL and estimation of QTL effects in linkage mapping, and this problem is exacerbated by very high density linkage maps. Here we developed a thinning and aggregating (TAGGING) method as a new ensemble learning approach to QTL mapping. TAGGING reduces collinearity problems by thinning dense linkage maps, maintains aspects of marker selection that characterize standard QTL mapping, and by ensembling, incorporates information from many more markers-trait associations than traditional QTL mapping. The objective of TAGGING was to improve prediction power compared with QTL mapping while also providing more specific insights into genetic architecture than genome-wide prediction models. TAGGING was compared with standard QTL mapping using cross validation of empirical data from the maize (Zea mays L.) nested association mapping population. TAGGING-assisted QTL mapping substantially improved prediction ability for both biparental and multifamily populations by reducing both the variance and bias in prediction. Furthermore, an ensemble model combining predictions from TAGGING-assisted QTL and infinitesimal models improved prediction abilities over the component models, indicating some complementarity between model assumptions and suggesting that some trait genetic architectures involve a mixture of a few major QTL and polygenic effects. PMID:26276383
van Strien, Maarten J; Keller, Daniela; Holderegger, Rolf; Ghazoul, Jaboury; Kienast, Felix; Bolliger, Janine
2014-03-01
For conservation managers, it is important to know whether landscape changes lead to increasing or decreasing gene flow. Although the discipline of landscape genetics assesses the influence of landscape elements on gene flow, no studies have yet used landscape-genetic models to predict gene flow resulting from landscape change. A species that has already been severely affected by landscape change is the large marsh grasshopper (Stethophyma grossum), which inhabits moist areas in fragmented agricultural landscapes in Switzerland. From transects drawn between all population pairs within maximum dispersal distance (< 3 km), we calculated several measures of landscape composition as well as some measures of habitat configuration. Additionally, a complete sampling of all populations in our study area allowed incorporating measures of population topology. These measures together with the landscape metrics formed the predictor variables in linear models with gene flow as response variable (F(ST) and mean pairwise assignment probability). With a modified leave-one-out cross-validation approach, we selected the model with the highest predictive accuracy. With this model, we predicted gene flow under several landscape-change scenarios, which simulated construction, rezoning or restoration projects, and the establishment of a new population. For some landscape-change scenarios, significant increase or decrease in gene flow was predicted, while for others little change was forecast. Furthermore, we found that the measures of population topology strongly increase model fit in landscape genetic analysis. This study demonstrates the use of predictive landscape-genetic models in conservation and landscape planning.
Ensemble learning of QTL models improves prediction of complex traits
USDA-ARS?s Scientific Manuscript database
Quantitative trait locus (QTL) models can provide useful insights into trait genetic architecture because of their straightforward interpretability, but are less useful for genetic prediction due to difficulty in including the effects of numerous small effect loci without overfitting. Tight linkage ...
Genetic models of homosexuality: generating testable predictions
Gavrilets, Sergey; Rice, William R
2006-01-01
Homosexuality is a common occurrence in humans and other species, yet its genetic and evolutionary basis is poorly understood. Here, we formulate and study a series of simple mathematical models for the purpose of predicting empirical patterns that can be used to determine the form of selection that leads to polymorphism of genes influencing homosexuality. Specifically, we develop theory to make contrasting predictions about the genetic characteristics of genes influencing homosexuality including: (i) chromosomal location, (ii) dominance among segregating alleles and (iii) effect sizes that distinguish between the two major models for their polymorphism: the overdominance and sexual antagonism models. We conclude that the measurement of the genetic characteristics of quantitative trait loci (QTLs) found in genomic screens for genes influencing homosexuality can be highly informative in resolving the form of natural selection maintaining their polymorphism. PMID:17015344
Weighted Genetic Risk Scores and Prediction of Weight Gain in Solid Organ Transplant Populations
Saigi-Morgui, Núria; Quteineh, Lina; Bochud, Pierre-Yves; Crettol, Severine; Kutalik, Zoltán; Wojtowicz, Agnieszka; Bibert, Stéphanie; Beckmann, Sonja; Mueller, Nicolas J; Binet, Isabelle; van Delden, Christian; Steiger, Jürg; Mohacsi, Paul; Stirnimann, Guido; Soccal, Paola M.; Pascual, Manuel; Eap, Chin B
2016-01-01
Background Polygenic obesity in Solid Organ Transplant (SOT) populations is considered a risk factor for the development of metabolic abnormalities and graft survival. Few studies to date have studied the genetics of weight gain in SOT recipients. We aimed to determine whether weighted genetic risk scores (w-GRS) integrating genetic polymorphisms from GWAS studies (SNP group#1 and SNP group#2) and from Candidate Gene studies (SNP group#3) influence BMI in SOT populations and if they predict ≥10% weight gain (WG) one year after transplantation. To do so, two samples (nA = 995, nB = 156) were obtained from naturalistic studies and three w-GRS were constructed and tested for association with BMI over time. Prediction of 10% WG at one year after transplantation was assessed with models containing genetic and clinical factors. Results w-GRS were associated with BMI in sample A and B combined (BMI increased by 0.14 and 0.11 units per additional risk allele in SNP group#1 and #2, respectively, p-values<0.008). w-GRS of SNP group#3 showed an effect of 0.01 kg/m2 per additional risk allele when combining sample A and B (p-value 0.04). Models with genetic factors performed better than models without in predicting 10% WG at one year after transplantation. Conclusions This is the first study in SOT evaluating extensively the association of w-GRS with BMI and the influence of clinical and genetic factors on 10% of WG one year after transplantation, showing the importance of integrating genetic factors in the final model. Genetics of obesity among SOT recipients remains an important issue and can contribute to treatment personalization and prediction of WG after transplantation. PMID:27788139
Weighted Genetic Risk Scores and Prediction of Weight Gain in Solid Organ Transplant Populations.
Saigi-Morgui, Núria; Quteineh, Lina; Bochud, Pierre-Yves; Crettol, Severine; Kutalik, Zoltán; Wojtowicz, Agnieszka; Bibert, Stéphanie; Beckmann, Sonja; Mueller, Nicolas J; Binet, Isabelle; van Delden, Christian; Steiger, Jürg; Mohacsi, Paul; Stirnimann, Guido; Soccal, Paola M; Pascual, Manuel; Eap, Chin B
2016-01-01
Polygenic obesity in Solid Organ Transplant (SOT) populations is considered a risk factor for the development of metabolic abnormalities and graft survival. Few studies to date have studied the genetics of weight gain in SOT recipients. We aimed to determine whether weighted genetic risk scores (w-GRS) integrating genetic polymorphisms from GWAS studies (SNP group#1 and SNP group#2) and from Candidate Gene studies (SNP group#3) influence BMI in SOT populations and if they predict ≥10% weight gain (WG) one year after transplantation. To do so, two samples (nA = 995, nB = 156) were obtained from naturalistic studies and three w-GRS were constructed and tested for association with BMI over time. Prediction of 10% WG at one year after transplantation was assessed with models containing genetic and clinical factors. w-GRS were associated with BMI in sample A and B combined (BMI increased by 0.14 and 0.11 units per additional risk allele in SNP group#1 and #2, respectively, p-values<0.008). w-GRS of SNP group#3 showed an effect of 0.01 kg/m2 per additional risk allele when combining sample A and B (p-value 0.04). Models with genetic factors performed better than models without in predicting 10% WG at one year after transplantation. This is the first study in SOT evaluating extensively the association of w-GRS with BMI and the influence of clinical and genetic factors on 10% of WG one year after transplantation, showing the importance of integrating genetic factors in the final model. Genetics of obesity among SOT recipients remains an important issue and can contribute to treatment personalization and prediction of WG after transplantation.
Cross-validation analysis for genetic evaluation models for ranking in endurance horses.
García-Ballesteros, S; Varona, L; Valera, M; Gutiérrez, J P; Cervantes, I
2018-01-01
Ranking trait was used as a selection criterion for competition horses to estimate racing performance. In the literature the most common approaches to estimate breeding values are the linear or threshold statistical models. However, recent studies have shown that a Thurstonian approach was able to fix the race effect (competitive level of the horses that participate in the same race), thus suggesting a better prediction accuracy of breeding values for ranking trait. The aim of this study was to compare the predictability of linear, threshold and Thurstonian approaches for genetic evaluation of ranking in endurance horses. For this purpose, eight genetic models were used for each approach with different combinations of random effects: rider, rider-horse interaction and environmental permanent effect. All genetic models included gender, age and race as systematic effects. The database that was used contained 4065 ranking records from 966 horses and that for the pedigree contained 8733 animals (47% Arabian horses), with an estimated heritability around 0.10 for the ranking trait. The prediction ability of the models for racing performance was evaluated using a cross-validation approach. The average correlation between real and predicted performances across genetic models was around 0.25 for threshold, 0.58 for linear and 0.60 for Thurstonian approaches. Although no significant differences were found between models within approaches, the best genetic model included: the rider and rider-horse random effects for threshold, only rider and environmental permanent effects for linear approach and all random effects for Thurstonian approach. The absolute correlations of predicted breeding values among models were higher between threshold and Thurstonian: 0.90, 0.91 and 0.88 for all animals, top 20% and top 5% best animals. For rank correlations these figures were 0.85, 0.84 and 0.86. The lower values were those between linear and threshold approaches (0.65, 0.62 and 0.51). In conclusion, the Thurstonian approach is recommended for the routine genetic evaluations for ranking in endurance horses.
Visscher, H; Ross, C J D; Rassekh, S R; Sandor, G S S; Caron, H N; van Dalen, E C; Kremer, L C; van der Pal, H J; Rogers, P C; Rieder, M J; Carleton, B C; Hayden, M R
2013-08-01
The use of anthracyclines as effective antineoplastic drugs is limited by the occurrence of cardiotoxicity. Multiple genetic variants predictive of anthracycline-induced cardiotoxicity (ACT) in children were recently identified. The current study was aimed to assess replication of these findings in an independent cohort of children. . Twenty-three variants were tested for association with ACT in an independent cohort of 218 patients. Predictive models including genetic and clinical risk factors were constructed in the original cohort and assessed in the current replication cohort. . We confirmed the association of rs17863783 in UGT1A6 and ACT in the replication cohort (P = 0.0062, odds ratio (OR) 7.98). Additional evidence for association of rs7853758 (P = 0.058, OR 0.46) and rs885004 (P = 0.058, OR 0.42) in SLC28A3 was found (combined P = 1.6 × 10(-5) and P = 3.0 × 10(-5), respectively). A previously constructed prediction model did not significantly improve risk prediction in the replication cohort over clinical factors alone. However, an improved prediction model constructed using replicated genetic variants as well as clinical factors discriminated significantly better between cases and controls than clinical factors alone in both original (AUC 0.77 vs. 0.68, P = 0.0031) and replication cohort (AUC 0.77 vs. 0.69, P = 0.060). . We validated genetic variants in two genes predictive of ACT in an independent cohort. A prediction model combining replicated genetic variants as well as clinical risk factors might be able to identify high- and low-risk patients who could benefit from alternative treatment options. Copyright © 2013 Wiley Periodicals, Inc.
Xu, Z C; Zhu, J
2000-01-01
According to the double-cross mating design and using principles of Cockerham's general genetic model, a genetic model with additive, dominance and epistatic effects (ADAA model) was proposed for the analysis of agronomic traits. Components of genetic effects were derived for different generations. Monte Carlo simulation was conducted for analyzing the ADAA model and its reduced AD model by using different generations. It was indicated that genetic variance components could be estimated without bias by MINQUE(1) method and genetic effects could be predicted effectively by AUP method; at least three generations (including parent, F1 of single cross and F1 of double-cross) were necessary for analyzing the ADAA model and only two generations (including parent and F1 of double-cross) were enough for the reduced AD model. When epistatic effects were taken into account, a new approach for predicting the heterosis of agronomic traits of double-crosses was given on the basis of unbiased prediction of genotypic merits of parents and their crosses. In addition, genotype x environment interaction effects and interaction heterosis due to G x E interaction were discussed briefly.
Latent spatial models and sampling design for landscape genetics
Ephraim M. Hanks; Melvin B. Hooten; Steven T. Knick; Sara J. Oyler-McCance; Jennifer A. Fike; Todd B. Cross; Michael K. Schwartz
2016-01-01
We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial...
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A.; Burgueño, Juan; Pérez-Rodríguez, Paulino; de los Campos, Gustavo
2016-01-01
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects (u) that can be assessed by the Kronecker product of variance–covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model (u) plus an extra component, f, that captures random effects between environments that were not captured by the random effects u. We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with u and f over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect u. PMID:27793970
Bayesian Genomic Prediction with Genotype × Environment Interaction Kernel Models.
Cuevas, Jaime; Crossa, José; Montesinos-López, Osval A; Burgueño, Juan; Pérez-Rodríguez, Paulino; de Los Campos, Gustavo
2017-01-05
The phenomenon of genotype × environment (G × E) interaction in plant breeding decreases selection accuracy, thereby negatively affecting genetic gains. Several genomic prediction models incorporating G × E have been recently developed and used in genomic selection of plant breeding programs. Genomic prediction models for assessing multi-environment G × E interaction are extensions of a single-environment model, and have advantages and limitations. In this study, we propose two multi-environment Bayesian genomic models: the first model considers genetic effects [Formula: see text] that can be assessed by the Kronecker product of variance-covariance matrices of genetic correlations between environments and genomic kernels through markers under two linear kernel methods, linear (genomic best linear unbiased predictors, GBLUP) and Gaussian (Gaussian kernel, GK). The other model has the same genetic component as the first model [Formula: see text] plus an extra component, F: , that captures random effects between environments that were not captured by the random effects [Formula: see text] We used five CIMMYT data sets (one maize and four wheat) that were previously used in different studies. Results show that models with G × E always have superior prediction ability than single-environment models, and the higher prediction ability of multi-environment models with [Formula: see text] over the multi-environment model with only u occurred 85% of the time with GBLUP and 45% of the time with GK across the five data sets. The latter result indicated that including the random effect f is still beneficial for increasing prediction ability after adjusting by the random effect [Formula: see text]. Copyright © 2017 Cuevas et al.
Evolving hard problems: Generating human genetics datasets with a complex etiology.
Himmelstein, Daniel S; Greene, Casey S; Moore, Jason H
2011-07-07
A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.
Kumar, Satish; Molloy, Claire; Muñoz, Patricio; Daetwyler, Hans; Chagné, David; Volz, Richard
2015-01-01
The nonadditive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and nonadditive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (Malus × domestica Borkh.) phenotypes using relationship matrices constructed from genome-wide dense single nucleotide polymorphism (SNP) markers; and compare the accuracy of genomic predictions using genomic best linear unbiased prediction models with or without including nonadditive genetic effects. A set of 247 clonally replicated individuals was assessed for six fruit quality traits at two sites, and also genotyped using an Illumina 8K SNP array. Across several fruit quality traits, the additive, dominance, and epistatic effects contributed about 30%, 16%, and 19%, respectively, to the total phenotypic variance. Models ignoring nonadditive components yielded upwardly biased estimates of additive variance (heritability) for all traits in this study. The accuracy of genomic predicted genetic values (GEGV) varied from about 0.15 to 0.35 for various traits, and these were almost identical for models with or without including nonadditive effects. However, models including nonadditive genetic effects further reduced the bias of GEGV. Between-site genotypic correlations were high (>0.85) for all traits, and genotype-site interaction accounted for <10% of the phenotypic variability. The accuracy of prediction, when the validation set was present only at one site, was generally similar for both sites, and varied from about 0.50 to 0.85. The prediction accuracies were strongly influenced by trait heritability, and genetic relatedness between the training and validation families. PMID:26497141
Brown, Jason L; Weber, Jennifer J; Alvarado-Serrano, Diego F; Hickerson, Michael J; Franks, Steven J; Carnaval, Ana C
2016-01-01
Climate change is a widely accepted threat to biodiversity. Species distribution models (SDMs) are used to forecast whether and how species distributions may track these changes. Yet, SDMs generally fail to account for genetic and demographic processes, limiting population-level inferences. We still do not understand how predicted environmental shifts will impact the spatial distribution of genetic diversity within taxa. We propose a novel method that predicts spatially explicit genetic and demographic landscapes of populations under future climatic conditions. We use carefully parameterized SDMs as estimates of the spatial distribution of suitable habitats and landscape dispersal permeability under present-day, past, and future conditions. We use empirical genetic data and approximate Bayesian computation to estimate unknown demographic parameters. Finally, we employ these parameters to simulate realistic and complex models of responses to future environmental shifts. We contrast parameterized models under current and future landscapes to quantify the expected magnitude of change. We implement this framework on neutral genetic data available from Penstemon deustus. Our results predict that future climate change will result in geographically widespread declines in genetic diversity in this species. The extent of reduction will heavily depend on the continuity of population networks and deme sizes. To our knowledge, this is the first study to provide spatially explicit predictions of within-species genetic diversity using climatic, demographic, and genetic data. Our approach accounts for climatic, geographic, and biological complexity. This framework is promising for understanding evolutionary consequences of climate change, and guiding conservation planning. © 2016 Botanical Society of America.
A Population Genetics Model of Marker-Assisted Selection
Luo, Z. W.; Thompson, R.; Woolliams, J. A.
1997-01-01
A deterministic two-loci model was developed to predict genetic response to marker-assisted selection (MAS) in one generation and in multiple generations. Formulas were derived to relate linkage disequilibrium in a population to the proportion of additive genetic variance used by MAS, and in turn to an extra improvement in genetic response over phenotypic selection. Predictions of the response were compared to those predicted by using an infinite-loci model and the factors affecting efficiency of MAS were examined. Theoretical analyses of the present study revealed the nonlinearity between the selection intensity and genetic response in MAS. In addition to the heritability of the trait and the proportion of the marker-associated genetic variance, the frequencies of the selectively favorable alleles at the two loci, one marker and one quantitative trait locus, were found to play an important role in determining both the short- and long-term efficiencies of MAS. The evolution of linkage disequilibrium and thus the genetic response over several generations were predicted theoretically and examined by simulation. MAS dissipated the disequilibrium more quickly than drift alone. In some cases studied, the rate of dissipation was as large as that to be expected in the circumstance where the true recombination fraction was increased by three times and selection was absent. PMID:9215918
Genetic and linguistic coevolution in Northern Island Melanesia.
Hunley, Keith; Dunn, Michael; Lindström, Eva; Reesink, Ger; Terrill, Angela; Healy, Meghan E; Koki, George; Friedlaender, Françoise R; Friedlaender, Jonathan S
2008-10-01
Recent studies have detailed a remarkable degree of genetic and linguistic diversity in Northern Island Melanesia. Here we utilize that diversity to examine two models of genetic and linguistic coevolution. The first model predicts that genetic and linguistic correspondences formed following population splits and isolation at the time of early range expansions into the region. The second is analogous to the genetic model of isolation by distance, and it predicts that genetic and linguistic correspondences formed through continuing genetic and linguistic exchange between neighboring populations. We tested the predictions of the two models by comparing observed and simulated patterns of genetic variation, genetic and linguistic trees, and matrices of genetic, linguistic, and geographic distances. The data consist of 751 autosomal microsatellites and 108 structural linguistic features collected from 33 Northern Island Melanesian populations. The results of the tests indicate that linguistic and genetic exchange have erased any evidence of a splitting and isolation process that might have occurred early in the settlement history of the region. The correlation patterns are also inconsistent with the predictions of the isolation by distance coevolutionary process in the larger Northern Island Melanesian region, but there is strong evidence for the process in the rugged interior of the largest island in the region (New Britain). There we found some of the strongest recorded correlations between genetic, linguistic, and geographic distances. We also found that, throughout the region, linguistic features have generally been less likely to diffuse across population boundaries than genes. The results from our study, based on exceptionally fine-grained data, show that local genetic and linguistic exchange are likely to obscure evidence of the early history of a region, and that language barriers do not particularly hinder genetic exchange. In contrast, global patterns may emphasize more ancient demographic events, including population splits associated with the early colonization of major world regions.
Genetic and Linguistic Coevolution in Northern Island Melanesia
Hunley, Keith; Dunn, Michael; Lindström, Eva; Reesink, Ger; Terrill, Angela; Healy, Meghan E.; Koki, George; Friedlaender, Françoise R.; Friedlaender, Jonathan S.
2008-01-01
Recent studies have detailed a remarkable degree of genetic and linguistic diversity in Northern Island Melanesia. Here we utilize that diversity to examine two models of genetic and linguistic coevolution. The first model predicts that genetic and linguistic correspondences formed following population splits and isolation at the time of early range expansions into the region. The second is analogous to the genetic model of isolation by distance, and it predicts that genetic and linguistic correspondences formed through continuing genetic and linguistic exchange between neighboring populations. We tested the predictions of the two models by comparing observed and simulated patterns of genetic variation, genetic and linguistic trees, and matrices of genetic, linguistic, and geographic distances. The data consist of 751 autosomal microsatellites and 108 structural linguistic features collected from 33 Northern Island Melanesian populations. The results of the tests indicate that linguistic and genetic exchange have erased any evidence of a splitting and isolation process that might have occurred early in the settlement history of the region. The correlation patterns are also inconsistent with the predictions of the isolation by distance coevolutionary process in the larger Northern Island Melanesian region, but there is strong evidence for the process in the rugged interior of the largest island in the region (New Britain). There we found some of the strongest recorded correlations between genetic, linguistic, and geographic distances. We also found that, throughout the region, linguistic features have generally been less likely to diffuse across population boundaries than genes. The results from our study, based on exceptionally fine-grained data, show that local genetic and linguistic exchange are likely to obscure evidence of the early history of a region, and that language barriers do not particularly hinder genetic exchange. In contrast, global patterns may emphasize more ancient demographic events, including population splits associated with the early colonization of major world regions. PMID:18974871
Edwards, Stefan M.; Sørensen, Izel F.; Sarup, Pernille; Mackay, Trudy F. C.; Sørensen, Peter
2016-01-01
Predicting individual quantitative trait phenotypes from high-resolution genomic polymorphism data is important for personalized medicine in humans, plant and animal breeding, and adaptive evolution. However, this is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms and causal variants individually have small effects on the traits. We hypothesized that mapping molecular polymorphisms to genomic features such as genes and their gene ontology categories could increase the accuracy of genomic prediction models. We developed a genomic feature best linear unbiased prediction (GFBLUP) model that implements this strategy and applied it to three quantitative traits (startle response, starvation resistance, and chill coma recovery) in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. Our results indicate that subsetting markers based on genomic features increases the predictive ability relative to the standard genomic best linear unbiased prediction (GBLUP) model. Both models use all markers, but GFBLUP allows differential weighting of the individual genetic marker relationships, whereas GBLUP weighs the genetic marker relationships equally. Simulation studies show that it is possible to further increase the accuracy of genomic prediction for complex traits using this model, provided the genomic features are enriched for causal variants. Our GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits. PMID:27235308
Zhang, Zhe; Erbe, Malena; He, Jinlong; Ober, Ulrike; Gao, Ning; Zhang, Hao; Simianer, Henner; Li, Jiaqi
2015-02-09
Obtaining accurate predictions of unobserved genetic or phenotypic values for complex traits in animal, plant, and human populations is possible through whole-genome prediction (WGP), a combined analysis of genotypic and phenotypic data. Because the underlying genetic architecture of the trait of interest is an important factor affecting model selection, we propose a new strategy, termed BLUP|GA (BLUP-given genetic architecture), which can use genetic architecture information within the dataset at hand rather than from public sources. This is achieved by using a trait-specific covariance matrix ( T: ), which is a weighted sum of a genetic architecture part ( S: matrix) and the realized relationship matrix ( G: ). The algorithm of BLUP|GA (BLUP-given genetic architecture) is provided and illustrated with real and simulated datasets. Predictive ability of BLUP|GA was validated with three model traits in a dairy cattle dataset and 11 traits in three public datasets with a variety of genetic architectures and compared with GBLUP and other approaches. Results show that BLUP|GA outperformed GBLUP in 20 of 21 scenarios in the dairy cattle dataset and outperformed GBLUP, BayesA, and BayesB in 12 of 13 traits in the analyzed public datasets. Further analyses showed that the difference of accuracies for BLUP|GA and GBLUP significantly correlate with the distance between the T: and G: matrices. The new strategy applied in BLUP|GA is a favorable and flexible alternative to the standard GBLUP model, allowing to account for the genetic architecture of the quantitative trait under consideration when necessary. This feature is mainly due to the increased similarity between the trait-specific relationship matrix ( T: matrix) and the genetic relationship matrix at unobserved causal loci. Applying BLUP|GA in WGP would ease the burden of model selection. Copyright © 2015 Zhang et al.
Gu, Deqing; Jian, Xingxing; Zhang, Cheng; Hua, Qiang
2017-01-01
Genome-scale metabolic network models (GEMs) have played important roles in the design of genetically engineered strains and helped biologists to decipher metabolism. However, due to the complex gene-reaction relationships that exist in model systems, most algorithms have limited capabilities with respect to directly predicting accurate genetic design for metabolic engineering. In particular, methods that predict reaction knockout strategies leading to overproduction are often impractical in terms of gene manipulations. Recently, we proposed a method named logical transformation of model (LTM) to simplify the gene-reaction associations by introducing intermediate pseudo reactions, which makes it possible to generate genetic design. Here, we propose an alternative method to relieve researchers from deciphering complex gene-reactions by adding pseudo gene controlling reactions. In comparison to LTM, this new method introduces fewer pseudo reactions and generates a much smaller model system named as gModel. We showed that gModel allows two seldom reported applications: identification of minimal genomes and design of minimal cell factories within a modified OptKnock framework. In addition, gModel could be used to integrate expression data directly and improve the performance of the E-Fmin method for predicting fluxes. In conclusion, the model transformation procedure will facilitate genetic research based on GEMs, extending their applications.
The potential of large studies for building genetic risk prediction models
NCI scientists have developed a new paradigm to assess hereditary risk prediction in common diseases, such as prostate cancer. This genetic risk prediction concept is based on polygenic analysis—the study of a group of common DNA sequences, known as singl
Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria
Farasat, Iman; Kushwaha, Manish; Collens, Jason; Easterbrook, Michael; Guido, Matthew; Salis, Howard M
2014-01-01
Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs. PMID:24952589
[Analytic methods for seed models with genotype x environment interactions].
Zhu, J
1996-01-01
Genetic models with genotype effect (G) and genotype x environment interaction effect (GE) are proposed for analyzing generation means of seed quantitative traits in crops. The total genetic effect (G) is partitioned into seed direct genetic effect (G0), cytoplasm genetic of effect (C), and maternal plant genetic effect (Gm). Seed direct genetic effect (G0) can be further partitioned into direct additive (A) and direct dominance (D) genetic components. Maternal genetic effect (Gm) can also be partitioned into maternal additive (Am) and maternal dominance (Dm) genetic components. The total genotype x environment interaction effect (GE) can also be partitioned into direct genetic by environment interaction effect (G0E), cytoplasm genetic by environment interaction effect (CE), and maternal genetic by environment interaction effect (GmE). G0E can be partitioned into direct additive by environment interaction (AE) and direct dominance by environment interaction (DE) genetic components. GmE can also be partitioned into maternal additive by environment interaction (AmE) and maternal dominance by environment interaction (DmE) genetic components. Partitions of genetic components are listed for parent, F1, F2 and backcrosses. A set of parents, their reciprocal F1 and F2 seeds is applicable for efficient analysis of seed quantitative traits. MINQUE(0/1) method can be used for estimating variance and covariance components. Unbiased estimation for covariance components between two traits can also be obtained by the MINQUE(0/1) method. Random genetic effects in seed models are predictable by the Adjusted Unbiased Prediction (AUP) approach with MINQUE(0/1) method. The jackknife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects, which can be further used in a t-test for parameter. Unbiasedness and efficiency for estimating variance components and predicting genetic effects are tested by Monte Carlo simulations.
2013-01-01
Background Arguably, genotypes and phenotypes may be linked in functional forms that are not well addressed by the linear additive models that are standard in quantitative genetics. Therefore, developing statistical learning models for predicting phenotypic values from all available molecular information that are capable of capturing complex genetic network architectures is of great importance. Bayesian kernel ridge regression is a non-parametric prediction model proposed for this purpose. Its essence is to create a spatial distance-based relationship matrix called a kernel. Although the set of all single nucleotide polymorphism genotype configurations on which a model is built is finite, past research has mainly used a Gaussian kernel. Results We sought to investigate the performance of a diffusion kernel, which was specifically developed to model discrete marker inputs, using Holstein cattle and wheat data. This kernel can be viewed as a discretization of the Gaussian kernel. The predictive ability of the diffusion kernel was similar to that of non-spatial distance-based additive genomic relationship kernels in the Holstein data, but outperformed the latter in the wheat data. However, the difference in performance between the diffusion and Gaussian kernels was negligible. Conclusions It is concluded that the ability of a diffusion kernel to capture the total genetic variance is not better than that of a Gaussian kernel, at least for these data. Although the diffusion kernel as a choice of basis function may have potential for use in whole-genome prediction, our results imply that embedding genetic markers into a non-Euclidean metric space has very small impact on prediction. Our results suggest that use of the black box Gaussian kernel is justified, given its connection to the diffusion kernel and its similar predictive performance. PMID:23763755
Optimality models in the age of experimental evolution and genomics.
Bull, J J; Wang, I-N
2010-09-01
Optimality models have been used to predict evolution of many properties of organisms. They typically neglect genetic details, whether by necessity or design. This omission is a common source of criticism, and although this limitation of optimality is widely acknowledged, it has mostly been defended rather than evaluated for its impact. Experimental adaptation of model organisms provides a new arena for testing optimality models and for simultaneously integrating genetics. First, an experimental context with a well-researched organism allows dissection of the evolutionary process to identify causes of model failure--whether the model is wrong about genetics or selection. Second, optimality models provide a meaningful context for the process and mechanics of evolution, and thus may be used to elicit realistic genetic bases of adaptation--an especially useful augmentation to well-researched genetic systems. A few studies of microbes have begun to pioneer this new direction. Incompatibility between the assumed and actual genetics has been demonstrated to be the cause of model failure in some cases. More interestingly, evolution at the phenotypic level has sometimes matched prediction even though the adaptive mutations defy mechanisms established by decades of classic genetic studies. Integration of experimental evolutionary tests with genetics heralds a new wave for optimality models and their extensions that does not merely emphasize the forces driving evolution.
Azevedo Peixoto, Leonardo de; Laviola, Bruno Galvêas; Alves, Alexandre Alonso; Rosado, Tatiana Barbosa; Bhering, Leonardo Lopes
2017-01-01
Genomic wide selection is a promising approach for improving the selection accuracy in plant breeding, particularly in species with long life cycles, such as Jatropha. Therefore, the objectives of this study were to estimate the genetic parameters for grain yield (GY) and the weight of 100 seeds (W100S) using restricted maximum likelihood (REML); to compare the performance of GWS methods to predict GY and W100S; and to estimate how many markers are needed to train the GWS model to obtain the maximum accuracy. Eight GWS models were compared in terms of predictive ability. The impact that the marker density had on the predictive ability was investigated using a varying number of markers, from 2 to 1,248. Because the genetic variance between evaluated genotypes was significant, it was possible to obtain selection gain. All of the GWS methods tested in this study can be used to predict GY and W100S in Jatropha. A training model fitted using 1,000 and 800 markers is sufficient to capture the maximum genetic variance and, consequently, maximum prediction ability of GY and W100S, respectively. This study demonstrated the applicability of genome-wide prediction to identify useful genetic sources of GY and W100S for Jatropha breeding. Further research is needed to confirm the applicability of the proposed approach to other complex traits.
Bao, Wei; Hu, Frank B.; Rong, Shuang; Rong, Ying; Bowers, Katherine; Schisterman, Enrique F.; Liu, Liegang; Zhang, Cuilin
2013-01-01
This study aimed to evaluate the predictive performance of genetic risk models based on risk loci identified and/or confirmed in genome-wide association studies for type 2 diabetes mellitus. A systematic literature search was conducted in the PubMed/MEDLINE and EMBASE databases through April 13, 2012, and published data relevant to the prediction of type 2 diabetes based on genome-wide association marker–based risk models (GRMs) were included. Of the 1,234 potentially relevant articles, 21 articles representing 23 studies were eligible for inclusion. The median area under the receiver operating characteristic curve (AUC) among eligible studies was 0.60 (range, 0.55–0.68), which did not differ appreciably by study design, sample size, participants’ race/ethnicity, or the number of genetic markers included in the GRMs. In addition, the AUCs for type 2 diabetes did not improve appreciably with the addition of genetic markers into conventional risk factor–based models (median AUC, 0.79 (range, 0.63–0.91) vs. median AUC, 0.78 (range, 0.63–0.90), respectively). A limited number of included studies used reclassification measures and yielded inconsistent results. In conclusion, GRMs showed a low predictive performance for risk of type 2 diabetes, irrespective of study design, participants’ race/ethnicity, and the number of genetic markers included. Moreover, the addition of genome-wide association markers into conventional risk models produced little improvement in predictive performance. PMID:24008910
Barber, Grant E; Yajnik, Vijay; Khalili, Hamed; Giallourakis, Cosmas; Garber, John; Xavier, Ramnik; Ananthakrishnan, Ashwin N
2016-12-01
One-fifth of patients with Crohn's disease (CD) are primary non-responders to anti-tumor necrosis factor (anti-TNF) therapy, and an estimated 10-15% will fail therapy annually. Little is known about the genetics of response to anti-TNF therapy. The aim of our study was to identify genetic factors associated with primary non-response (PNR) and loss of response to anti-TNFs in CD. From a prospective registry, we characterized the response of 427 CD patients to their first anti-TNF therapy. Patients were designated as achieving primary response, durable response, and non-durable response based on clinical, endoscopic, and radiologic criteria. Genotyping was performed on the Illumina Immunochip. Separate genetic scores based on presence of predictive genetic alleles were calculated for PNR and durable response and performance of clinical and genetics models were compared. From 359 patients, 36 were adjudged to have PNR (10%), 200 had durable response, and 74 had non-durable response. PNRs had longer disease duration and were more likely to be smokers. Fifteen risk alleles were associated with PNR. Patients with PNR had a significantly higher genetic risk score (GRS) (P =8 × 10 -12 ). A combined clinical-genetic model more accurately predicted PNR when compared with a clinical only model (0.93 vs. 0.70, P <0.001). Sixteen distinct single nucleotide polymorphisms predicted durable response with a higher GRS (P =7 × 10 -13 ). The GRSs for PNR and durable response were not mutually correlated, suggesting distinct mechanisms. Genetic risk alleles can predict primary non-response and durable response to anti-TNF therapy in CD.
Recent development of risk-prediction models for incident hypertension: An updated systematic review
Xiao, Lei; Liu, Ya; Wang, Zuoguang; Li, Chuang; Jin, Yongxin; Zhao, Qiong
2017-01-01
Background Hypertension is a leading global health threat and a major cardiovascular disease. Since clinical interventions are effective in delaying the disease progression from prehypertension to hypertension, diagnostic prediction models to identify patient populations at high risk for hypertension are imperative. Methods Both PubMed and Embase databases were searched for eligible reports of either prediction models or risk scores of hypertension. The study data were collected, including risk factors, statistic methods, characteristics of study design and participants, performance measurement, etc. Results From the searched literature, 26 studies reporting 48 prediction models were selected. Among them, 20 reports studied the established models using traditional risk factors, such as body mass index (BMI), age, smoking, blood pressure (BP) level, parental history of hypertension, and biochemical factors, whereas 6 reports used genetic risk score (GRS) as the prediction factor. AUC ranged from 0.64 to 0.97, and C-statistic ranged from 60% to 90%. Conclusions The traditional models are still the predominant risk prediction models for hypertension, but recently, more models have begun to incorporate genetic factors as part of their model predictors. However, these genetic predictors need to be well selected. The current reported models have acceptable to good discrimination and calibration ability, but whether the models can be applied in clinical practice still needs more validation and adjustment. PMID:29084293
Prediction of road traffic death rate using neural networks optimised by genetic algorithm.
Jafari, Seyed Ali; Jahandideh, Sepideh; Jahandideh, Mina; Asadabadi, Ebrahim Barzegari
2015-01-01
Road traffic injuries (RTIs) are realised as a main cause of public health problems at global, regional and national levels. Therefore, prediction of road traffic death rate will be helpful in its management. Based on this fact, we used an artificial neural network model optimised through Genetic algorithm to predict mortality. In this study, a five-fold cross-validation procedure on a data set containing total of 178 countries was used to verify the performance of models. The best-fit model was selected according to the root mean square errors (RMSE). Genetic algorithm, as a powerful model which has not been introduced in prediction of mortality to this extent in previous studies, showed high performance. The lowest RMSE obtained was 0.0808. Such satisfactory results could be attributed to the use of Genetic algorithm as a powerful optimiser which selects the best input feature set to be fed into the neural networks. Seven factors have been known as the most effective factors on the road traffic mortality rate by high accuracy. The gained results displayed that our model is very promising and may play a useful role in developing a better method for assessing the influence of road traffic mortality risk factors.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Complex Adaptive System Models and the Genetic Analysis of Plasma HDL-Cholesterol Concentration
Rea, Thomas J.; Brown, Christine M.; Sing, Charles F.
2006-01-01
Despite remarkable advances in diagnosis and therapy, ischemic heart disease (IHD) remains a leading cause of morbidity and mortality in industrialized countries. Recent efforts to estimate the influence of genetic variation on IHD risk have focused on predicting individual plasma high-density lipoprotein cholesterol (HDL-C) concentration. Plasma HDL-C concentration (mg/dl), a quantitative risk factor for IHD, has a complex multifactorial etiology that involves the actions of many genes. Single gene variations may be necessary but are not individually sufficient to predict a statistically significant increase in risk of disease. The complexity of phenotype-genotype-environment relationships involved in determining plasma HDL-C concentration has challenged commonly held assumptions about genetic causation and has led to the question of which combination of variations, in which subset of genes, in which environmental strata of a particular population significantly improves our ability to predict high or low risk phenotypes. We document the limitations of inferences from genetic research based on commonly accepted biological models, consider how evidence for real-world dynamical interactions between HDL-C determinants challenges the simplifying assumptions implicit in traditional linear statistical genetic models, and conclude by considering research options for evaluating the utility of genetic information in predicting traits with complex etiologies. PMID:17146134
Effects of complex life cycles on genetic diversity: cyclical parthenogenesis.
Rouger, R; Reichel, K; Malrieu, F; Masson, J P; Stoeckel, S
2016-11-01
Neutral patterns of population genetic diversity in species with complex life cycles are difficult to anticipate. Cyclical parthenogenesis (CP), in which organisms undergo several rounds of clonal reproduction followed by a sexual event, is one such life cycle. Many species, including crop pests (aphids), human parasites (trematodes) or models used in evolutionary science (Daphnia), are cyclical parthenogens. It is therefore crucial to understand the impact of such a life cycle on neutral genetic diversity. In this paper, we describe distributions of genetic diversity under conditions of CP with various clonal phase lengths. Using a Markov chain model of CP for a single locus and individual-based simulations for two loci, our analysis first demonstrates that strong departures from full sexuality are observed after only a few generations of clonality. The convergence towards predictions made under conditions of full clonality during the clonal phase depends on the balance between mutations and genetic drift. Second, the sexual event of CP usually resets the genetic diversity at a single locus towards predictions made under full sexuality. However, this single recombination event is insufficient to reshuffle gametic phases towards full-sexuality predictions. Finally, for similar levels of clonality, CP and acyclic partial clonality (wherein a fixed proportion of individuals are clonally produced within each generation) differentially affect the distribution of genetic diversity. Overall, this work provides solid predictions of neutral genetic diversity that may serve as a null model in detecting the action of common evolutionary or demographic processes in cyclical parthenogens (for example, selection or bottlenecks).
Genetic basis of between-individual and within-individual variance of docility.
Martin, J G A; Pirotta, E; Petelle, M B; Blumstein, D T
2017-04-01
Between-individual variation in phenotypes within a population is the basis of evolution. However, evolutionary and behavioural ecologists have mainly focused on estimating between-individual variance in mean trait and neglected variation in within-individual variance, or predictability of a trait. In fact, an important assumption of mixed-effects models used to estimate between-individual variance in mean traits is that within-individual residual variance (predictability) is identical across individuals. Individual heterogeneity in the predictability of behaviours is a potentially important effect but rarely estimated and accounted for. We used 11 389 measures of docility behaviour from 1576 yellow-bellied marmots (Marmota flaviventris) to estimate between-individual variation in both mean docility and its predictability. We then implemented a double hierarchical animal model to decompose the variances of both mean trait and predictability into their environmental and genetic components. We found that individuals differed both in their docility and in their predictability of docility with a negative phenotypic covariance. We also found significant genetic variance for both mean docility and its predictability but no genetic covariance between the two. This analysis is one of the first to estimate the genetic basis of both mean trait and within-individual variance in a wild population. Our results indicate that equal within-individual variance should not be assumed. We demonstrate the evolutionary importance of the variation in the predictability of docility and illustrate potential bias in models ignoring variation in predictability. We conclude that the variability in the predictability of a trait should not be ignored, and present a coherent approach for its quantification. © 2017 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2017 European Society For Evolutionary Biology.
Increasing Prediction the Original Final Year Project of Student Using Genetic Algorithm
NASA Astrophysics Data System (ADS)
Saragih, Rijois Iboy Erwin; Turnip, Mardi; Sitanggang, Delima; Aritonang, Mendarissan; Harianja, Eva
2018-04-01
Final year project is very important forgraduation study of a student. Unfortunately, many students are not seriouslydidtheir final projects. Many of studentsask for someone to do it for them. In this paper, an application of genetic algorithms to predict the original final year project of a studentis proposed. In the simulation, the data of the final project for the last 5 years is collected. The genetic algorithm has several operators namely population, selection, crossover, and mutation. The result suggest that genetic algorithm can do better prediction than other comparable model. Experimental results of predicting showed that 70% was more accurate than the previous researched.
NASA Astrophysics Data System (ADS)
Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.
2016-02-01
Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.
Role of genetic variation in docetaxel-induced neutropenia and pharmacokinetics.
Nieuweboer, A J M; Smid, M; de Graan, A-J M; Elbouazzaoui, S; de Bruijn, P; Eskens, F A L M; Hamberg, P; Martens, J W M; Sparreboom, A; de Wit, R; van Schaik, R H N; Mathijssen, R H J
2016-11-01
Docetaxel is used for treatment of several solid malignancies. In this study, we aimed for predicting docetaxel clearance and docetaxel-induced neutropenia by developing several genetic models. Therefore, pharmacokinetic data and absolute neutrophil counts (ANCs) of 213 docetaxel-treated cancer patients were collected. Next, patients were genotyped for 1936 single nucleotide polymorphisms (SNPs) in 225 genes using the drug-metabolizing enzymes and transporters platform and thereafter split into two cohorts. The combination of SNPs that best predicted severe neutropenia or low clearance was selected in one cohort and validated in the other. Patients with severe neutropenia had lower docetaxel clearance than patients with ANCs in the normal range (P=0.01). Severe neutropenia was predicted with 70% sensitivity. True low clearance (1 s.d.
Dechow, C D; Rogers, G W
2018-05-01
Expectation of genetic merit in commercial dairy herds is routinely estimated using a 4-path genetic selection model that was derived for a closed population, but commercial herds using artificial insemination sires are not closed. The 4-path model also predicts a higher rate of genetic progress in elite herds that provide artificial insemination sires than in commercial herds that use such sires, which counters other theoretical assumptions and observations of realized genetic responses. The aim of this work is to clarify whether genetic merit in commercial herds is more accurately reflected under the assumptions of the 4-path genetic response formula or by a genetic lag formula. We demonstrate by tracing the transmission of genetic merit from parents to offspring that the rate of genetic progress in commercial dairy farms is expected to be the same as that in the genetic nucleus. The lag in genetic merit between the nucleus and commercial farms is a function of sire and dam generation interval, the rate of genetic progress in elite artificial insemination herds, and genetic merit of sires and dams. To predict how strategies such as the use of young versus daughter-proven sires, culling heifers following genomic testing, or selective use of sexed semen will alter genetic merit in commercial herds, genetic merit expectations for commercial herds should be modeled using genetic lag expectations. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ayres, D R; Pereira, R J; Boligon, A A; Silva, F F; Schenkel, F S; Roso, V M; Albuquerque, L G
2013-12-01
Cattle resistance to ticks is measured by the number of ticks infesting the animal. The model used for the genetic analysis of cattle resistance to ticks frequently requires logarithmic transformation of the observations. The objective of this study was to evaluate the predictive ability and goodness of fit of different models for the analysis of this trait in cross-bred Hereford x Nellore cattle. Three models were tested: a linear model using logarithmic transformation of the observations (MLOG); a linear model without transformation of the observations (MLIN); and a generalized linear Poisson model with residual term (MPOI). All models included the classificatory effects of contemporary group and genetic group and the covariates age of animal at the time of recording and individual heterozygosis, as well as additive genetic effects as random effects. Heritability estimates were 0.08 ± 0.02, 0.10 ± 0.02 and 0.14 ± 0.04 for MLIN, MLOG and MPOI models, respectively. The model fit quality, verified by deviance information criterion (DIC) and residual mean square, indicated fit superiority of MPOI model. The predictive ability of the models was compared by validation test in independent sample. The MPOI model was slightly superior in terms of goodness of fit and predictive ability, whereas the correlations between observed and predicted tick counts were practically the same for all models. A higher rank correlation between breeding values was observed between models MLOG and MPOI. Poisson model can be used for the selection of tick-resistant animals. © 2013 Blackwell Verlag GmbH.
Abraham, Gad; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael
2013-02-01
A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease. © 2012 WILEY PERIODICALS, INC.
Budde, Katharina B; Heuertz, Myriam; Hernández-Serrano, Ana; Pausas, Juli G; Vendramin, Giovanni G; Verdú, Miguel; González-Martínez, Santiago C
2014-01-01
Wildfire is a major ecological driver of plant evolution. Understanding the genetic basis of plant adaptation to wildfire is crucial, because impending climate change will involve fire regime changes worldwide. We studied the molecular genetic basis of serotiny, a fire-related trait, in Mediterranean maritime pine using association genetics. A single nucleotide polymorphism (SNP) set was used to identify genotype : phenotype associations in situ in an unstructured natural population of maritime pine (eastern Iberian Peninsula) under a mixed-effects model framework. RR-BLUP was used to build predictive models for serotiny in this region. Model prediction power outside the focal region was tested using independent range-wide serotiny data. Seventeen SNPs were potentially associated with serotiny, explaining approximately 29% of the trait phenotypic variation in the eastern Iberian Peninsula. Similar prediction power was found for nearby geographical regions from the same maternal lineage, but not for other genetic lineages. Association genetics for ecologically relevant traits evaluated in situ is an attractive approach for forest trees provided that traits are under strong genetic control and populations are unstructured, with large phenotypic variability. This will help to extend the research focus to ecological keystone non-model species in their natural environments, where polymorphisms acquired their adaptive value. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Genetic Model Fitting in IQ, Assortative Mating & Components of IQ Variance.
ERIC Educational Resources Information Center
Capron, Christiane; Vetta, Adrian R.; Vetta, Atam
1998-01-01
The biometrical school of scientists who fit models to IQ data traces their intellectual ancestry to R. Fisher (1918), but their genetic models have no predictive value. Fisher himself was critical of the concept of heritability, because assortative mating, such as for IQ, introduces complexities into the study of a genetic trait. (SLD)
Sanjak, Jaleal S.; Long, Anthony D.; Thornton, Kevin R.
2017-01-01
The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation. PMID:28103232
Effects of complex life cycles on genetic diversity: cyclical parthenogenesis
Rouger, R; Reichel, K; Malrieu, F; Masson, J P; Stoeckel, S
2016-01-01
Neutral patterns of population genetic diversity in species with complex life cycles are difficult to anticipate. Cyclical parthenogenesis (CP), in which organisms undergo several rounds of clonal reproduction followed by a sexual event, is one such life cycle. Many species, including crop pests (aphids), human parasites (trematodes) or models used in evolutionary science (Daphnia), are cyclical parthenogens. It is therefore crucial to understand the impact of such a life cycle on neutral genetic diversity. In this paper, we describe distributions of genetic diversity under conditions of CP with various clonal phase lengths. Using a Markov chain model of CP for a single locus and individual-based simulations for two loci, our analysis first demonstrates that strong departures from full sexuality are observed after only a few generations of clonality. The convergence towards predictions made under conditions of full clonality during the clonal phase depends on the balance between mutations and genetic drift. Second, the sexual event of CP usually resets the genetic diversity at a single locus towards predictions made under full sexuality. However, this single recombination event is insufficient to reshuffle gametic phases towards full-sexuality predictions. Finally, for similar levels of clonality, CP and acyclic partial clonality (wherein a fixed proportion of individuals are clonally produced within each generation) differentially affect the distribution of genetic diversity. Overall, this work provides solid predictions of neutral genetic diversity that may serve as a null model in detecting the action of common evolutionary or demographic processes in cyclical parthenogens (for example, selection or bottlenecks). PMID:27436524
An alternative covariance estimator to investigate genetic heterogeneity in populations.
Heslot, Nicolas; Jannink, Jean-Luc
2015-11-26
For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS.
A deep auto-encoder model for gene expression prediction.
Xie, Rui; Wen, Jia; Quitadamo, Andrew; Cheng, Jianlin; Shi, Xinghua
2017-11-17
Gene expression is a key intermediate level that genotypes lead to a particular trait. Gene expression is affected by various factors including genotypes of genetic variants. With an aim of delineating the genetic impact on gene expression, we build a deep auto-encoder model to assess how good genetic variants will contribute to gene expression changes. This new deep learning model is a regression-based predictive model based on the MultiLayer Perceptron and Stacked Denoising Auto-encoder (MLP-SAE). The model is trained using a stacked denoising auto-encoder for feature selection and a multilayer perceptron framework for backpropagation. We further improve the model by introducing dropout to prevent overfitting and improve performance. To demonstrate the usage of this model, we apply MLP-SAE to a real genomic datasets with genotypes and gene expression profiles measured in yeast. Our results show that the MLP-SAE model with dropout outperforms other models including Lasso, Random Forests and the MLP-SAE model without dropout. Using the MLP-SAE model with dropout, we show that gene expression quantifications predicted by the model solely based on genotypes, align well with true gene expression patterns. We provide a deep auto-encoder model for predicting gene expression from SNP genotypes. This study demonstrates that deep learning is appropriate for tackling another genomic problem, i.e., building predictive models to understand genotypes' contribution to gene expression. With the emerging availability of richer genomic data, we anticipate that deep learning models play a bigger role in modeling and interpreting genomics.
Baudracco, J; Lopez-Villalobos, N; Holmes, C W; Comeron, E A; Macdonald, K A; Barry, T N; Friggens, N C
2012-06-01
This animal simulation model, named e-Cow, represents a single dairy cow at grazing. The model integrates algorithms from three previously published models: a model that predicts herbage dry matter (DM) intake by grazing dairy cows, a mammary gland model that predicts potential milk yield and a body lipid model that predicts genetically driven live weight (LW) and body condition score (BCS). Both nutritional and genetic drives are accounted for in the prediction of energy intake and its partitioning. The main inputs are herbage allowance (HA; kg DM offered/cow per day), metabolisable energy and NDF concentrations in herbage and supplements, supplements offered (kg DM/cow per day), type of pasture (ryegrass or lucerne), days in milk, days pregnant, lactation number, BCS and LW at calving, breed or strain of cow and genetic merit, that is, potential yields of milk, fat and protein. Separate equations are used to predict herbage intake, depending on the cutting heights at which HA is expressed. The e-Cow model is written in Visual Basic programming language within Microsoft Excel®. The model predicts whole-lactation performance of dairy cows on a daily basis, and the main outputs are the daily and annual DM intake, milk yield and changes in BCS and LW. In the e-Cow model, neither herbage DM intake nor milk yield or LW change are needed as inputs; instead, they are predicted by the e-Cow model. The e-Cow model was validated against experimental data for Holstein-Friesian cows with both North American (NA) and New Zealand (NZ) genetics grazing ryegrass-based pastures, with or without supplementary feeding and for three complete lactations, divided into weekly periods. The model was able to predict animal performance with satisfactory accuracy, with concordance correlation coefficients of 0.81, 0.76 and 0.62 for herbage DM intake, milk yield and LW change, respectively. Simulations performed with the model showed that it is sensitive to genotype by feeding environment interactions. The e-Cow model tended to overestimate the milk yield of NA genotype cows at low milk yields, while it underestimated the milk yield of NZ genotype cows at high milk yields. The approach used to define the potential milk yield of the cow and equations used to predict herbage DM intake make the model applicable for predictions in countries with temperate pastures.
He, Dan; Kuhn, David; Parida, Laxmi
2016-06-15
Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.
2011-01-01
Background Genetic risk models could potentially be useful in identifying high-risk groups for the prevention of complex diseases. We investigated the performance of this risk stratification strategy by examining epidemiological parameters that impact the predictive ability of risk models. Methods We assessed sensitivity, specificity, and positive and negative predictive value for all possible risk thresholds that can define high-risk groups and investigated how these measures depend on the frequency of disease in the population, the frequency of the high-risk group, and the discriminative accuracy of the risk model, as assessed by the area under the receiver-operating characteristic curve (AUC). In a simulation study, we modeled genetic risk scores of 50 genes with equal odds ratios and genotype frequencies, and varied the odds ratios and the disease frequency across scenarios. We also performed a simulation of age-related macular degeneration risk prediction based on published odds ratios and frequencies for six genetic risk variants. Results We show that when the frequency of the high-risk group was lower than the disease frequency, positive predictive value increased with the AUC but sensitivity remained low. When the frequency of the high-risk group was higher than the disease frequency, sensitivity was high but positive predictive value remained low. When both frequencies were equal, both positive predictive value and sensitivity increased with increasing AUC, but higher AUC was needed to maximize both measures. Conclusions The performance of risk stratification is strongly determined by the frequency of the high-risk group relative to the frequency of disease in the population. The identification of high-risk groups with appreciable combinations of sensitivity and positive predictive value requires higher AUC. PMID:21797996
Giri, Veda N.; Egleston, Brian; Ruth, Karen; Uzzo, Robert G.; Chen, David Y.T.; Buyyounouski, Mark; Raysor, Susan; Hooker, Stanley; Torres, Jada Benn; Ramike, Teniel; Mastalski, Kathleen; Kim, Taylor Y.; Kittles, Rick
2008-01-01
Introduction “Race-specific” PSA needs evaluation in men at high-risk for prostate cancer (PCA) for optimizing early detection. Baseline PSA and longitudinal prediction for PCA was examined by self-reported race and genetic West African (WA) ancestry in the Prostate Cancer Risk Assessment Program, a prospective high-risk cohort. Materials and Methods Eligibility criteria are age 35–69 years, FH of PCA, African American (AA) race, or BRCA1/2 mutations. Biopsies have been performed at low PSA values (<4.0 ng/mL). WA ancestry was discerned by genotyping 100 ancestry informative markers. Cox proportional hazards models evaluated baseline PSA, self-reported race, and genetic WA ancestry. Cox models were used for 3-year predictions for PCA. Results 646 men (63% AA) were analyzed. Individual WA ancestry estimates varied widely among self-reported AA men. “Race-specific” differences in baseline PSA were not found by self-reported race or genetic WA ancestry. Among men with ≥ 1 follow-up visit (405 total, 54% AA), three-year prediction for PCA with a PSA of 1.5–4.0 ng/mL was higher in AA men with age in the model (p=0.025) compared to EA men. Hazard ratios of PSA for PCA were also higher by self-reported race (1.59 for AA vs. 1.32 for EA, p=0.04). There was a trend for increasing prediction for PCA with increasing genetic WA ancestry. Conclusions “Race-specific” PSA may need to be redefined as higher prediction for PCA at any given PSA in AA men. Large-scale studies are needed to confirm if genetic WA ancestry explains these findings to make progress in personalizing PCA early detection. PMID:19240249
Clinical-genetic model predicts incident impulse control disorders in Parkinson's disease.
Kraemmer, Julia; Smith, Kara; Weintraub, Daniel; Guillemot, Vincent; Nalls, Mike A; Cormier-Dequaire, Florence; Moszer, Ivan; Brice, Alexis; Singleton, Andrew B; Corvol, Jean-Christophe
2016-10-01
Impulse control disorders (ICD) are commonly associated with dopamine replacement therapy (DRT) in patients with Parkinson's disease (PD). Our aims were to estimate ICD heritability and to predict ICD by a candidate genetic multivariable panel in patients with PD. Data from de novo patients with PD, drug-naïve and free of ICD behaviour at baseline, were obtained from the Parkinson's Progression Markers Initiative cohort. Incident ICD behaviour was defined as positive score on the Questionnaire for Impulsive-Compulsive Disorders in PD. ICD heritability was estimated by restricted maximum likelihood analysis on whole exome sequencing data. 13 candidate variants were selected from the DRD2, DRD3, DAT1, COMT, DDC, GRIN2B, ADRA2C, SERT, TPH2, HTR2A, OPRK1 and OPRM1 genes. ICD prediction was evaluated by the area under the curve (AUC) of receiver operating characteristic (ROC) curves. Among 276 patients with PD included in the analysis, 86% started DRT, 40% were on dopamine agonists (DA), 19% reported incident ICD behaviour during follow-up. We found heritability of this symptom to be 57%. Adding genotypes from the 13 candidate variants significantly increased ICD predictability (AUC=76%, 95% CI (70% to 83%)) compared to prediction based on clinical variables only (AUC=65%, 95% CI (58% to 73%), p=0.002). The clinical-genetic prediction model reached highest accuracy in patients initiating DA therapy (AUC=87%, 95% CI (80% to 93%)). OPRK1, HTR2A and DDC genotypes were the strongest genetic predictive factors. Our results show that adding a candidate genetic panel increases ICD predictability, suggesting potential for developing clinical-genetic models to identify patients with PD at increased risk of ICD development and guide DRT management. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Porth, Ilga; Chen, Charles; El-Kassaby, Yousry A.
2016-01-01
The open-pollinated (OP) family testing combines the simplest known progeny evaluation and quantitative genetics analyses as candidates’ offspring are assumed to represent independent half-sib families. The accuracy of genetic parameter estimates is often questioned as the assumption of “half-sibling” in OP families may often be violated. We compared the pedigree- vs. marker-based genetic models by analysing 22-yr height and 30-yr wood density for 214 white spruce [Picea glauca (Moench) Voss] OP families represented by 1694 individuals growing on one site in Quebec, Canada. Assuming half-sibling, the pedigree-based model was limited to estimating the additive genetic variances which, in turn, were grossly overestimated as they were confounded by very minor dominance and major additive-by-additive epistatic genetic variances. In contrast, the implemented genomic pairwise realized relationship models allowed the disentanglement of additive from all nonadditive factors through genetic variance decomposition. The marker-based models produced more realistic narrow-sense heritability estimates and, for the first time, allowed estimating the dominance and epistatic genetic variances from OP testing. In addition, the genomic models showed better prediction accuracies compared to pedigree models and were able to predict individual breeding values for new individuals from untested families, which was not possible using the pedigree-based model. Clearly, the use of marker-based relationship approach is effective in estimating the quantitative genetic parameters of complex traits even under simple and shallow pedigree structure. PMID:26801647
Automated design of genetic toggle switches with predetermined bistability.
Chen, Shuobing; Zhang, Haoqian; Shi, Handuo; Ji, Weiyue; Feng, Jingchen; Gong, Yan; Yang, Zhenglin; Ouyang, Qi
2012-07-20
Synthetic biology aims to rationally construct biological devices with required functionalities. Methods that automate the design of genetic devices without post-hoc adjustment are therefore highly desired. Here we provide a method to predictably design genetic toggle switches with predetermined bistability. To accomplish this task, a biophysical model that links ribosome binding site (RBS) DNA sequence to toggle switch bistability was first developed by integrating a stochastic model with RBS design method. Then, to parametrize the model, a library of genetic toggle switch mutants was experimentally built, followed by establishing the equivalence between RBS DNA sequences and switch bistability. To test this equivalence, RBS nucleotide sequences for different specified bistabilities were in silico designed and experimentally verified. Results show that the deciphered equivalence is highly predictive for the toggle switch design with predetermined bistability. This method can be generalized to quantitative design of other probabilistic genetic devices in synthetic biology.
Ovenden, Ben; Milgate, Andrew; Wade, Len J; Rebetzke, Greg J; Holland, James B
2018-05-31
Abiotic stress tolerance traits are often complex and recalcitrant targets for conventional breeding improvement in many crop species. This study evaluated the potential of genomic selection to predict water-soluble carbohydrate concentration (WSCC), an important drought tolerance trait, in wheat under field conditions. A panel of 358 varieties and breeding lines constrained for maturity was evaluated under rainfed and irrigated treatments across two locations and two years. Whole-genome marker profiles and factor analytic mixed models were used to generate genomic estimated breeding values (GEBVs) for specific environments and environment groups. Additive genetic variance was smaller than residual genetic variance for WSCC, such that genotypic values were dominated by residual genetic effects rather than additive breeding values. As a result, GEBVs were not accurate predictors of genotypic values of the extant lines, but GEBVs should be reliable selection criteria to choose parents for intermating to produce new populations. The accuracy of GEBVs for untested lines was sufficient to increase predicted genetic gain from genomic selection per unit time compared to phenotypic selection if the breeding cycle is reduced by half by the use of GEBVs in off-season generations. Further, genomic prediction accuracy depended on having phenotypic data from environments with strong correlations with target production environments to build prediction models. By combining high-density marker genotypes, stress-managed field evaluations, and mixed models that model simultaneously covariances among genotypes and covariances of complex trait performance between pairs of environments, we were able to train models with good accuracy to facilitate genetic gain from genomic selection. Copyright © 2018 Ovenden et al.
Latent spatial models and sampling design for landscape genetics
Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.
2016-01-01
We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.
Wavelet-linear genetic programming: A new approach for modeling monthly streamflow
NASA Astrophysics Data System (ADS)
Ravansalar, Masoud; Rajaee, Taher; Kisi, Ozgur
2017-06-01
The streamflows are important and effective factors in stream ecosystems and its accurate prediction is an essential and important issue in water resources and environmental engineering systems. A hybrid wavelet-linear genetic programming (WLGP) model, which includes a discrete wavelet transform (DWT) and a linear genetic programming (LGP) to predict the monthly streamflow (Q) in two gauging stations, Pataveh and Shahmokhtar, on the Beshar River at the Yasuj, Iran were used in this study. In the proposed WLGP model, the wavelet analysis was linked to the LGP model where the original time series of streamflow were decomposed into the sub-time series comprising wavelet coefficients. The results were compared with the single LGP, artificial neural network (ANN), a hybrid wavelet-ANN (WANN) and Multi Linear Regression (MLR) models. The comparisons were done by some of the commonly utilized relevant physical statistics. The Nash coefficients (E) were found as 0.877 and 0.817 for the WLGP model, for the Pataveh and Shahmokhtar stations, respectively. The comparison of the results showed that the WLGP model could significantly increase the streamflow prediction accuracy in both stations. Since, the results demonstrate a closer approximation of the peak streamflow values by the WLGP model, this model could be utilized for the simulation of cumulative streamflow data prediction in one month ahead.
Practical implications for genetic modeling in the genomics era
USDA-ARS?s Scientific Manuscript database
Genetic models convert data into estimated breeding values and other information useful to breeders. The goal is to provide accurate and timely predictions of the future performance for each animal (or embryo). Modeling involves defining traits, editing raw data, removing environmental effects, incl...
3D Protein structure prediction with genetic tabu search algorithm
2010-01-01
Background Protein structure prediction (PSP) has important applications in different fields, such as drug design, disease prediction, and so on. In protein structure prediction, there are two important issues. The first one is the design of the structure model and the second one is the design of the optimization technology. Because of the complexity of the realistic protein structure, the structure model adopted in this paper is a simplified model, which is called off-lattice AB model. After the structure model is assumed, optimization technology is needed for searching the best conformation of a protein sequence based on the assumed structure model. However, PSP is an NP-hard problem even if the simplest model is assumed. Thus, many algorithms have been developed to solve the global optimization problem. In this paper, a hybrid algorithm, which combines genetic algorithm (GA) and tabu search (TS) algorithm, is developed to complete this task. Results In order to develop an efficient optimization algorithm, several improved strategies are developed for the proposed genetic tabu search algorithm. The combined use of these strategies can improve the efficiency of the algorithm. In these strategies, tabu search introduced into the crossover and mutation operators can improve the local search capability, the adoption of variable population size strategy can maintain the diversity of the population, and the ranking selection strategy can improve the possibility of an individual with low energy value entering into next generation. Experiments are performed with Fibonacci sequences and real protein sequences. Experimental results show that the lowest energy obtained by the proposed GATS algorithm is lower than that obtained by previous methods. Conclusions The hybrid algorithm has the advantages from both genetic algorithm and tabu search algorithm. It makes use of the advantage of multiple search points in genetic algorithm, and can overcome poor hill-climbing capability in the conventional genetic algorithm by using the flexible memory functions of TS. Compared with some previous algorithms, GATS algorithm has better performance in global optimization and can predict 3D protein structure more effectively. PMID:20522256
Holmes, John B; Dodds, Ken G; Lee, Michael A
2017-03-02
An important issue in genetic evaluation is the comparability of random effects (breeding values), particularly between pairs of animals in different contemporary groups. This is usually referred to as genetic connectedness. While various measures of connectedness have been proposed in the literature, there is general agreement that the most appropriate measure is some function of the prediction error variance-covariance matrix. However, obtaining the prediction error variance-covariance matrix is computationally demanding for large-scale genetic evaluations. Many alternative statistics have been proposed that avoid the computational cost of obtaining the prediction error variance-covariance matrix, such as counts of genetic links between contemporary groups, gene flow matrices, and functions of the variance-covariance matrix of estimated contemporary group fixed effects. In this paper, we show that a correction to the variance-covariance matrix of estimated contemporary group fixed effects will produce the exact prediction error variance-covariance matrix averaged by contemporary group for univariate models in the presence of single or multiple fixed effects and one random effect. We demonstrate the correction for a series of models and show that approximations to the prediction error matrix based solely on the variance-covariance matrix of estimated contemporary group fixed effects are inappropriate in certain circumstances. Our method allows for the calculation of a connectedness measure based on the prediction error variance-covariance matrix by calculating only the variance-covariance matrix of estimated fixed effects. Since the number of fixed effects in genetic evaluation is usually orders of magnitudes smaller than the number of random effect levels, the computational requirements for our method should be reduced.
Ruiz-López, María José; Monello, Ryan J.; Gompper, Matthew E.; Eggert, Lori S.
2012-01-01
Understanding factors that determine heterogeneity in levels of parasitism across individuals is a major challenge in disease ecology. It is known that genetic makeup plays an important role in infection likelihood, but the mechanism remains unclear as does its relative importance when compared to other factors. We analyzed relationships between genetic diversity and macroparasites in outbred, free-ranging populations of raccoons (Procyon lotor). We measured heterozygosity at 14 microsatellite loci and modeled the effects of both multi-locus and single-locus heterozygosity on parasitism using an information theoretic approach and including non-genetic factors that are known to influence the likelihood of parasitism. The association of genetic diversity and parasitism, as well as the relative importance of genetic diversity, differed by parasitic group. Endoparasite species richness was better predicted by a model that included genetic diversity, with the more heterozygous hosts harboring fewer endoparasite species. Genetic diversity was also important in predicting abundance of replete ticks (Dermacentor variabilis). This association fit a curvilinear trend, with hosts that had either high or low levels of heterozygosity harboring fewer parasites than those with intermediate levels. In contrast, genetic diversity was not important in predicting abundance of non-replete ticks and lice (Trichodectes octomaculatus). No strong single-locus effects were observed for either endoparasites or replete ticks. Our results suggest that in outbred populations multi-locus diversity might be important for coping with parasitism. The differences in the relationships between heterozygosity and parasitism for the different parasites suggest that the role of genetic diversity varies with parasite-mediated selective pressures. PMID:23049796
Peñagaricano, F; Urioste, J I; Naya, H; de los Campos, G; Gianola, D
2011-04-01
Black skin spots are associated with pigmented fibres in wool, an important quality fault. Our objective was to assess alternative models for genetic analysis of presence (BINBS) and number (NUMBS) of black spots in Corriedale sheep. During 2002-08, 5624 records from 2839 animals in two flocks, aged 1 through 6 years, were taken at shearing. Four models were considered: linear and probit for BINBS and linear and Poisson for NUMBS. All models included flock-year and age as fixed effects and animal and permanent environmental as random effects. Models were fitted to the whole data set and were also compared based on their predictive ability in cross-validation. Estimates of heritability ranged from 0.154 to 0.230 for BINBS and 0.269 to 0.474 for NUMBS. For BINBS, the probit model fitted slightly better to the data than the linear model. Predictions of random effects from these models were highly correlated, and both models exhibited similar predictive ability. For NUMBS, the Poisson model, with a residual term to account for overdispersion, performed better than the linear model in goodness of fit and predictive ability. Predictions of random effects from the Poisson model were more strongly correlated with those from BINBS models than those from the linear model. Overall, the use of probit or linear models for BINBS and of a Poisson model with a residual for NUMBS seems a reasonable choice for genetic selection purposes in Corriedale sheep. © 2010 Blackwell Verlag GmbH.
Streamflow prediction using multi-site rainfall obtained from hydroclimatic teleconnection
NASA Astrophysics Data System (ADS)
Kashid, S. S.; Ghosh, Subimal; Maity, Rajib
2010-12-01
SummarySimultaneous variations in weather and climate over widely separated regions are commonly known as "hydroclimatic teleconnections". Rainfall and runoff patterns, over continents, are found to be significantly teleconnected, with large-scale circulation patterns, through such hydroclimatic teleconnections. Though such teleconnections exist in nature, it is very difficult to model them, due to their inherent complexity. Statistical techniques and Artificial Intelligence (AI) tools gain popularity in modeling hydroclimatic teleconnection, based on their ability, in capturing the complicated relationship between the predictors (e.g. sea surface temperatures) and predictand (e.g., rainfall). Genetic Programming is such an AI tool, which is capable of capturing nonlinear relationship, between predictor and predictand, due to its flexible functional structure. In the present study, gridded multi-site weekly rainfall is predicted from El Niño Southern Oscillation (ENSO) indices, Equatorial Indian Ocean Oscillation (EQUINOO) indices, Outgoing Longwave Radiation (OLR) and lag rainfall at grid points, over the catchment, using Genetic Programming. The predicted rainfall is further used in a Genetic Programming model to predict streamflows. The model is applied for weekly forecasting of streamflow in Mahanadi River, India, and satisfactory performance is observed.
Poveda, Alaitz; Koivula, Robert W; Ahmad, Shafqat; Barroso, Inês; Hallmans, Göran; Johansson, Ingegerd; Renström, Frida; Franks, Paul W
2016-03-01
We compared the ability of genetic (established type 2 diabetes, fasting glucose, 2 h glucose and obesity variants) and modifiable lifestyle (diet, physical activity, smoking, alcohol and education) risk factors to predict incident type 2 diabetes and obesity in a population-based prospective cohort of 3,444 Swedish adults studied sequentially at baseline and 10 years later. Multivariable logistic regression analyses were used to assess the predictive ability of genetic and lifestyle risk factors on incident obesity and type 2 diabetes by calculating the AUC. The predictive accuracy of lifestyle risk factors was similar to that yielded by genetic information for incident type 2 diabetes (AUC 75% and 74%, respectively) and obesity (AUC 68% and 73%, respectively) in models adjusted for age, age(2) and sex. The addition of genetic information to the lifestyle model significantly improved the prediction of type 2 diabetes (AUC 80%; p = 0.0003) and obesity (AUC 79%; p < 0.0001) and resulted in a net reclassification improvement of 58% for type 2 diabetes and 64% for obesity. These findings illustrate that lifestyle and genetic information separately provide a similarly high degree of long-range predictive accuracy for obesity and type 2 diabetes.
Burghardt, Liana T; Metcalf, C Jessica E; Wilczek, Amity M; Schmitt, Johanna; Donohue, Kathleen
2015-02-01
Organisms develop through multiple life stages that differ in environmental tolerances. The seasonal timing, or phenology, of life-stage transitions determines the environmental conditions to which each life stage is exposed and the length of time required to complete a generation. Both environmental and genetic factors contribute to phenological variation, yet predicting their combined effect on life cycles across a geographic range remains a challenge. We linked submodels of the plasticity of individual life stages to create an integrated model that predicts life-cycle phenology in complex environments. We parameterized the model for Arabidopsis thaliana and simulated life cycles in four locations. We compared multiple "genotypes" by varying two parameters associated with natural genetic variation in phenology: seed dormancy and floral repression. The model predicted variation in life cycles across locations that qualitatively matches observed natural phenology. Seed dormancy had larger effects on life-cycle length than floral repression, and results suggest that a genetic cline in dormancy maintains a life-cycle length of 1 year across the geographic range of this species. By integrating across life stages, this approach demonstrates how genetic variation in one transition can influence subsequent transitions and the geographic distribution of life cycles more generally.
A test of genetic models for the evolutionary maintenance of same-sex sexual behaviour.
Hoskins, Jessica L; Ritchie, Michael G; Bailey, Nathan W
2015-06-22
The evolutionary maintenance of same-sex sexual behaviour (SSB) has received increasing attention because it is perceived to be an evolutionary paradox. The genetic basis of SSB is almost wholly unknown in non-human animals, though this is key to understanding its persistence. Recent theoretical work has yielded broadly applicable predictions centred on two genetic models for SSB: overdominance and sexual antagonism. Using Drosophila melanogaster, we assayed natural genetic variation for male SSB and empirically tested predictions about the mode of inheritance and fitness consequences of alleles influencing its expression. We screened 50 inbred lines derived from a wild population for male-male courtship and copulation behaviour, and examined crosses between the lines for evidence of overdominance and antagonistic fecundity selection. Consistent variation among lines revealed heritable genetic variation for SSB, but the nature of the genetic variation was complex. Phenotypic and fitness variation was consistent with expectations under overdominance, although predictions of the sexual antagonism model were also supported. We found an unexpected and strong paternal effect on the expression of SSB, suggesting possible Y-linkage of the trait. Our results inform evolutionary genetic mechanisms that might maintain low but persistently observed levels of male SSB in D. melanogaster, but highlight a need for broader taxonomic representation in studies of its evolutionary causes. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Disease Modeling via Large-Scale Network Analysis
2015-05-20
SECURITY CLASSIFICATION OF: A central goal of genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit...guarantees for the methods. In the past, we have developed predictive methods general enough to apply to potentially any genetic trait, varying from... genetics is to learn how the genotype of an organism determines its phenotype. We address the implicit problem of predicting the association of genes with
Genetic interactions for heat stress and production level: predicting foreign from domestic data
USDA-ARS?s Scientific Manuscript database
Genetic by environmental interactions were estimated from U.S. national data by separately adding random regressions for heat stress (HS) and herd production level (HL) to the all-breed animal model to improve predictions of future records and rankings in other climate and production situations. Yie...
Prediction of gene expression with cis-SNPs using mixed models and regularization methods.
Zeng, Ping; Zhou, Xiang; Huang, Shuiping
2017-05-11
It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2 ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2 ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.
Scribner, Kim T.; Lowe, Winsor H.; Landguth, Erin L.; Luikart, Gordon; Infante, Dana M.; Whelan, Gary; Muhlfeld, Clint C.
2015-01-01
Environmental variation and landscape features affect ecological processes in fluvial systems; however, assessing effects at management-relevant temporal and spatial scales is challenging. Genetic data can be used with landscape models and traditional ecological assessment data to identify biodiversity hotspots, predict ecosystem responses to anthropogenic effects, and detect impairments to underlying processes. We show that by combining taxonomic, demographic, and genetic data of species in complex riverscapes, managers can better understand the spatial and temporal scales over which environmental processes and disturbance influence biodiversity. We describe how population genetic models using empirical or simulated genetic data quantify effects of environmental processes affecting species diversity and distribution. Our summary shows that aquatic assessment initiatives that use standardized data sets to direct management actions can benefit from integration of genetic data to improve the predictability of disturbance–response relationships of river fishes and their habitats over a broad range of spatial and temporal scales.
A predictive relationship between population and genetic sex ratios in clonal species
NASA Astrophysics Data System (ADS)
McLetchie, D. Nicholas; García-Ramos, Gisela
2017-04-01
Sexual reproduction depends on mate availability that is reflected by local sex ratios. In species where both sexes can clonally expand, the population sex ratio describes the proportion of males, including clonally derived individuals (ramets) in addition to sexually produced individuals (genets). In contrast to population sex ratio that accounts for the overall abundance of the sexes, the genetic sex ratio reflects the relative abundance of genetically unique mates, which is critical in predicting effective population size but is difficult to estimate in the field. While an intuitive positive relationship between population (ramet) sex ratio and genetic (genet) sex ratio is expected, an explicit relationship is unknown. In this study, we determined a mathematical expression in the form of a hyperbola that encompasses a linear to a nonlinear positive relationship between ramet and genet sex ratios. As expected when both sexes clonally have equal number of ramets per genet both sex ratios are identical, and thus ramet sex ratio becomes a linear function of genet sex ratio. Conversely, if sex differences in ramet number occur, this mathematical relationship becomes nonlinear and a discrepancy between the sex ratios amplifies from extreme sex ratios values towards intermediate values. We evaluated our predictions with empirical data that simultaneously quantified ramet and genet sex ratios in populations of several species. We found that the data support the predicted positive nonlinear relationship, indicating sex differences in ramet number across populations. However, some data may also fit the null model, which suggests that sex differences in ramet number were not extensive, or the number of populations was too small to capture the curvature of the nonlinear relationship. Data with lack of fit suggest the presence of factors capable of weakening the positive relationship between the sex ratios. Advantages of this model include predicting genet sex ratio using population sex ratios given known sex differences in ramet number, and detecting sex differences in ramet number among populations.
Wang, Junbai; Wu, Qianqian; Hu, Xiaohua Tony; Tian, Tianhai
2016-11-01
Investigating the dynamics of genetic regulatory networks through high throughput experimental data, such as microarray gene expression profiles, is a very important but challenging task. One of the major hindrances in building detailed mathematical models for genetic regulation is the large number of unknown model parameters. To tackle this challenge, a new integrated method is proposed by combining a top-down approach and a bottom-up approach. First, the top-down approach uses probabilistic graphical models to predict the network structure of DNA repair pathway that is regulated by the p53 protein. Two networks are predicted, namely a network of eight genes with eight inferred interactions and an extended network of 21 genes with 17 interactions. Then, the bottom-up approach using differential equation models is developed to study the detailed genetic regulations based on either a fully connected regulatory network or a gene network obtained by the top-down approach. Model simulation error, parameter identifiability and robustness property are used as criteria to select the optimal network. Simulation results together with permutation tests of input gene network structures indicate that the prediction accuracy and robustness property of the two predicted networks using the top-down approach are better than those of the corresponding fully connected networks. In particular, the proposed approach reduces computational cost significantly for inferring model parameters. Overall, the new integrated method is a promising approach for investigating the dynamics of genetic regulation. Copyright © 2016 Elsevier Inc. All rights reserved.
Optimization of multi-environment trials for genomic selection based on crop models.
Rincent, R; Kuhn, E; Monod, H; Oury, F-X; Rousset, M; Allard, V; Le Gouis, J
2017-08-01
We propose a statistical criterion to optimize multi-environment trials to predict genotype × environment interactions more efficiently, by combining crop growth models and genomic selection models. Genotype × environment interactions (GEI) are common in plant multi-environment trials (METs). In this context, models developed for genomic selection (GS) that refers to the use of genome-wide information for predicting breeding values of selection candidates need to be adapted. One promising way to increase prediction accuracy in various environments is to combine ecophysiological and genetic modelling thanks to crop growth models (CGM) incorporating genetic parameters. The efficiency of this approach relies on the quality of the parameter estimates, which depends on the environments composing this MET used for calibration. The objective of this study was to determine a method to optimize the set of environments composing the MET for estimating genetic parameters in this context. A criterion called OptiMET was defined to this aim, and was evaluated on simulated and real data, with the example of wheat phenology. The MET defined with OptiMET allowed estimating the genetic parameters with lower error, leading to higher QTL detection power and higher prediction accuracies. MET defined with OptiMET was on average more efficient than random MET composed of twice as many environments, in terms of quality of the parameter estimates. OptiMET is thus a valuable tool to determine optimal experimental conditions to best exploit MET and the phenotyping tools that are currently developed.
Endelman, Jeffrey B; Carley, Cari A Schmitz; Bethke, Paul C; Coombs, Joseph J; Clough, Mark E; da Silva, Washington L; De Jong, Walter S; Douches, David S; Frederick, Curtis M; Haynes, Kathleen G; Holm, David G; Miller, J Creighton; Muñoz, Patricio R; Navarro, Felix M; Novy, Richard G; Palta, Jiwan P; Porter, Gregory A; Rak, Kyle T; Sathuvalli, Vidyasagar R; Thompson, Asunta L; Yencho, G Craig
2018-05-01
As one of the world's most important food crops, the potato ( Solanum tuberosum L.) has spurred innovation in autotetraploid genetics, including in the use of SNP arrays to determine allele dosage at thousands of markers. By combining genotype and pedigree information with phenotype data for economically important traits, the objectives of this study were to (1) partition the genetic variance into additive vs. nonadditive components, and (2) determine the accuracy of genome-wide prediction. Between 2012 and 2017, a training population of 571 clones was evaluated for total yield, specific gravity, and chip fry color. Genomic covariance matrices for additive ( G ), digenic dominant ( D ), and additive × additive epistatic ( G # G ) effects were calculated using 3895 markers, and the numerator relationship matrix ( A ) was calculated from a 13-generation pedigree. Based on model fit and prediction accuracy, mixed model analysis with G was superior to A for yield and fry color but not specific gravity. The amount of additive genetic variance captured by markers was 20% of the total genetic variance for specific gravity, compared to 45% for yield and fry color. Within the training population, including nonadditive effects improved accuracy and/or bias for all three traits when predicting total genotypic value. When six F 1 populations were used for validation, prediction accuracy ranged from 0.06 to 0.63 and was consistently lower (0.13 on average) without allele dosage information. We conclude that genome-wide prediction is feasible in potato and that it will improve selection for breeding value given the substantial amount of nonadditive genetic variance in elite germplasm. Copyright © 2018 by the Genetics Society of America.
Vandenberghe, Frederik; Saigí-Morgui, Núria; Delacrétaz, Aurélie; Quteineh, Lina; Crettol, Séverine; Ansermot, Nicolas; Gholam-Rezaee, Mehdi; von Gunten, Armin; Conus, Philippe; Eap, Chin B
2016-12-01
Psychotropic drugs can induce significant (>5%) weight gain (WG) already after 1 month of treatment, which is a good predictor for major WG at 3 and 12 months. The large interindividual variability of drug-induced WG can be explained in part by genetic and clinical factors. The aim of this study was to determine whether extensive analysis of genes, in addition to clinical factors, can improve prediction of patients at risk for more than 5% WG at 1 month of treatment. Data were obtained from a 1-year naturalistic longitudinal study, with weight monitoring during weight-inducing psychotropic treatment. A total of 248 Caucasian psychiatric patients, with at least baseline and 1-month weight measures, and with compliance ascertained were included. Results were tested for replication in a second cohort including 32 patients. Age and baseline BMI were associated significantly with strong WG. The area under the curve (AUC) of the final model including genetic (18 genes) and clinical variables was significantly greater than that of the model including clinical variables only (AUCfinal: 0.92, AUCclinical: 0.75, P<0.0001). Predicted accuracy increased by 17% with genetic markers (Accuracyfinal: 87%), indicating that six patients must be genotyped to avoid one misclassified patient. The validity of the final model was confirmed in a replication cohort. Patients predicted before treatment as having more than 5% WG after 1 month of treatment had 4.4% more WG over 1 year than patients predicted to have up to 5% WG (P≤0.0001). These results may help to implement genetic testing before starting psychotropic drug treatment to identify patients at risk of important WG.
Practical implications for genetic modeling in the genomics era for the dairy industry
USDA-ARS?s Scientific Manuscript database
Genetic models convert data into estimated breeding values and other information useful to breeders. The goal is to provide accurate and timely predictions of the future performance for each animal (or embryo). Modeling involves defining traits, editing raw data, removing environmental effects, incl...
Vanderick, S; Troch, T; Gillon, A; Glorieux, G; Gengler, N
2014-12-01
Calving ease scores from Holstein dairy cattle in the Walloon Region of Belgium were analysed using univariate linear and threshold animal models. Variance components and derived genetic parameters were estimated from a data set including 33,155 calving records. Included in the models were season, herd and sex of calf × age of dam classes × group of calvings interaction as fixed effects, herd × year of calving, maternal permanent environment and animal direct and maternal additive genetic as random effects. Models were fitted with the genetic correlation between direct and maternal additive genetic effects either estimated or constrained to zero. Direct heritability for calving ease was approximately 8% with linear models and approximately 12% with threshold models. Maternal heritabilities were approximately 2 and 4%, respectively. Genetic correlation between direct and maternal additive effects was found to be not significantly different from zero. Models were compared in terms of goodness of fit and predictive ability. Criteria of comparison such as mean squared error, correlation between observed and predicted calving ease scores as well as between estimated breeding values were estimated from 85,118 calving records. The results provided few differences between linear and threshold models even though correlations between estimated breeding values from subsets of data for sires with progeny from linear model were 17 and 23% greater for direct and maternal genetic effects, respectively, than from threshold model. For the purpose of genetic evaluation for calving ease in Walloon Holstein dairy cattle, the linear animal model without covariance between direct and maternal additive effects was found to be the best choice. © 2014 Blackwell Verlag GmbH.
Rohde, Palle Duun; Gaertner, Bryn; Ward, Kirsty; Sørensen, Peter; Mackay, Trudy F C
2017-08-01
Human psychiatric disorders such as schizophrenia, bipolar disorder, and attention-deficit/hyperactivity disorder often include adverse behaviors including increased aggressiveness. Individuals with psychiatric disorders often exhibit social withdrawal, which can further increase the probability of conducting a violent act. Here, we used the inbred, sequenced lines of the Drosophila Genetic Reference Panel (DGRP) to investigate the genetic basis of variation in male aggressive behavior for flies reared in a socialized and socially isolated environment. We identified genetic variation for aggressive behavior, as well as significant genotype-by-social environmental interaction (GSEI); i.e. , variation among DGRP genotypes in the degree to which social isolation affected aggression. We performed genome-wide association (GWA) analyses to identify genetic variants associated with aggression within each environment. We used genomic prediction to partition genetic variants into gene ontology (GO) terms and constituent genes, and identified GO terms and genes with high prediction accuracies in both social environments and for GSEI. The top predictive GO terms significantly increased the proportion of variance explained, compared to prediction models based on all segregating variants. We performed genomic prediction across environments, and identified genes in common between the social environments that turned out to be enriched for genome-wide associated variants. A large proportion of the associated genes have previously been associated with aggressive behavior in Drosophila and mice. Further, many of these genes have human orthologs that have been associated with neurological disorders, indicating partially shared genetic mechanisms underlying aggression in animal models and human psychiatric disorders. Copyright © 2017 by the Genetics Society of America.
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Genomic Model with Correlation Between Additive and Dominance Effects.
Xiang, Tao; Christensen, Ole Fredslund; Vitezica, Zulma Gladis; Legarra, Andres
2018-05-09
Dominance genetic effects are rarely included in pedigree-based genetic evaluation. With the availability of single nucleotide polymorphism markers and the development of genomic evaluation, estimates of dominance genetic effects have become feasible using genomic best linear unbiased prediction (GBLUP). Usually, studies involving additive and dominance genetic effects ignore possible relationships between them. It has been often suggested that the magnitude of functional additive and dominance effects at the quantitative trait loci are related, but there is no existing GBLUP-like approach accounting for such correlation. Wellmann and Bennewitz showed two ways of considering directional relationships between additive and dominance effects, which they estimated in a Bayesian framework. However, these relationships cannot be fitted at the level of individuals instead of loci in a mixed model and are not compatible with standard animal or plant breeding software. This comes from a fundamental ambiguity in assigning the reference allele at a given locus. We show that, if there has been selection, assigning the most frequent as the reference allele orients the correlation between functional additive and dominance effects. As a consequence, the most frequent reference allele is expected to have a positive value. We also demonstrate that selection creates negative covariance between genotypic additive and dominance genetic values. For parameter estimation, it is possible to use a combined additive and dominance relationship matrix computed from marker genotypes, and to use standard restricted maximum likelihood (REML) algorithms based on an equivalent model. Through a simulation study, we show that such correlations can easily be estimated by mixed model software and accuracy of prediction for genetic values is slightly improved if such correlations are used in GBLUP. However, a model assuming uncorrelated effects and fitting orthogonal breeding values and dominant deviations performed similarly for prediction. Copyright © 2018, Genetics.
Direct and indirect genetic and fine-scale location effects on breeding date in song sparrows.
Germain, Ryan R; Wolak, Matthew E; Arcese, Peter; Losdat, Sylvain; Reid, Jane M
2016-11-01
Quantifying direct and indirect genetic effects of interacting females and males on variation in jointly expressed life-history traits is central to predicting microevolutionary dynamics. However, accurately estimating sex-specific additive genetic variances in such traits remains difficult in wild populations, especially if related individuals inhabit similar fine-scale environments. Breeding date is a key life-history trait that responds to environmental phenology and mediates individual and population responses to environmental change. However, no studies have estimated female (direct) and male (indirect) additive genetic and inbreeding effects on breeding date, and estimated the cross-sex genetic correlation, while simultaneously accounting for fine-scale environmental effects of breeding locations, impeding prediction of microevolutionary dynamics. We fitted animal models to 38 years of song sparrow (Melospiza melodia) phenology and pedigree data to estimate sex-specific additive genetic variances in breeding date, and the cross-sex genetic correlation, thereby estimating the total additive genetic variance while simultaneously estimating sex-specific inbreeding depression. We further fitted three forms of spatial animal model to explicitly estimate variance in breeding date attributable to breeding location, overlap among breeding locations and spatial autocorrelation. We thereby quantified fine-scale location variances in breeding date and quantified the degree to which estimating such variances affected the estimated additive genetic variances. The non-spatial animal model estimated nonzero female and male additive genetic variances in breeding date (sex-specific heritabilities: 0·07 and 0·02, respectively) and a strong, positive cross-sex genetic correlation (0·99), creating substantial total additive genetic variance (0·18). Breeding date varied with female, but not male inbreeding coefficient, revealing direct, but not indirect, inbreeding depression. All three spatial animal models estimated small location variance in breeding date, but because relatedness and breeding location were virtually uncorrelated, modelling location variance did not alter the estimated additive genetic variances. Our results show that sex-specific additive genetic effects on breeding date can be strongly positively correlated, which would affect any predicted rates of microevolutionary change in response to sexually antagonistic or congruent selection. Further, we show that inbreeding effects on breeding date can also be sex specific and that genetic effects can exceed phenotypic variation stemming from fine-scale location-based variation within a wild population. © 2016 The Authors. Journal of Animal Ecology © 2016 British Ecological Society.
Prediction of body lipid change in pregnancy and lactation.
Friggens, N C; Ingvartsen, K L; Emmans, G C
2004-04-01
A simple method to predict the genetically driven pattern of body lipid change through pregnancy and lactation in dairy cattle is proposed. The rationale and evidence for genetically driven body lipid change have their basis in evolutionary considerations and in the homeorhetic changes in lipid metabolism through the reproductive cycle. The inputs required to predict body lipid change are body lipid mass at calving (kg) and the date of conception (days in milk). Body lipid mass can be derived from body condition score and live weight. A key assumption is that there is a linear rate of change of the rate of body lipid change (dL/dt) between calving and a genetically determined time in lactation (T') at which a particular level of body lipid (L') is sought. A second assumption is that there is a linear rate of change of the rate of body lipid change (dL/dt) between T' and the next calving. The resulting model was evaluated using 2 sets of data. The first was from Holstein cows with 3 different levels of body fatness at calving. The second was from Jersey cows in first, second, and third parity. The model was found to reproduce the observed patterns of change in body lipid reserves through lactation in both data sets. The average error of prediction was low, less than the variation normally associated with the recording of condition score, and was similar for the 2 data sets. When the model was applied using the initially suggested parameter values derived from the literature the average error of prediction was 0.185 units of condition score (+/- 0.086 SD). After minor adjustments to the parameter values, the average error of prediction was 0.118 units of condition score (+/- 0.070 SD). The assumptions on which the model is based were sufficient to predict the changes in body lipid of both Holstein and Jersey cows under different nutritional conditions and parities. Thus, the model presented here shows that it is possible to predict genetically driven curves of body lipid change through lactation in a simple way that requires few parameters and inputs that can be derived in practice. It is expected that prediction of the cow's energy requirements can be substantially improved, particularly in early lactation, by incorporating a genetically driven body energy mobilization.
An alternative covariance estimator to investigate genetic heterogeneity in populations
USDA-ARS?s Scientific Manuscript database
Genomic predictions and GWAS have used mixed models for identification of associations and trait predictions. In both cases, the covariance between individuals for performance is estimated using molecular markers. Mixed model properties indicate that the use of the data for prediction is optimal if ...
Da, Yang
2015-12-18
The amount of functional genomic information has been growing rapidly but remains largely unused in genomic selection. Genomic prediction and estimation using haplotypes in genome regions with functional elements such as all genes of the genome can be an approach to integrate functional and structural genomic information for genomic selection. Towards this goal, this article develops a new haplotype approach for genomic prediction and estimation. A multi-allelic haplotype model treating each haplotype as an 'allele' was developed for genomic prediction and estimation based on the partition of a multi-allelic genotypic value into additive and dominance values. Each additive value is expressed as a function of h - 1 additive effects, where h = number of alleles or haplotypes, and each dominance value is expressed as a function of h(h - 1)/2 dominance effects. For a sample of q individuals, the limit number of effects is 2q - 1 for additive effects and is the number of heterozygous genotypes for dominance effects. Additive values are factorized as a product between the additive model matrix and the h - 1 additive effects, and dominance values are factorized as a product between the dominance model matrix and the h(h - 1)/2 dominance effects. Genomic additive relationship matrix is defined as a function of the haplotype model matrix for additive effects, and genomic dominance relationship matrix is defined as a function of the haplotype model matrix for dominance effects. Based on these results, a mixed model implementation for genomic prediction and variance component estimation that jointly use haplotypes and single markers is established, including two computing strategies for genomic prediction and variance component estimation with identical results. The multi-allelic genetic partition fills a theoretical gap in genetic partition by providing general formulations for partitioning multi-allelic genotypic values and provides a haplotype method based on the quantitative genetics model towards the utilization of functional and structural genomic information for genomic prediction and estimation.
Abe, Makiko; Ito, Hidemi; Oze, Isao; Nomura, Masatoshi; Ogawa, Yoshihiro; Matsuo, Keitaro
2017-12-01
Little is known about the difference of genetic predisposition for CRC between ethnicities; however, many genetic traits common to colorectal cancer have been identified. This study investigated whether more SNPs identified in GWAS in East Asian population could improve the risk prediction of Japanese and explored possible application of genetic risk groups as an instrument of the risk communication. 558 Patients histologically verified colorectal cancer and 1116 first-visit outpatients were included for derivation study, and 547 cases and 547 controls were for replication study. Among each population, we evaluated prediction models for the risk of CRC that combined the genetic risk group based on SNPs from GWASs in European-population and a similarly developed model adding SNPs from GWASs in East Asian-population. We examined whether adding East Asian-specific SNPs would improve the discrimination. Six SNPs (rs6983267, rs4779584, rs4444235, rs9929218, rs10936599, rs16969681) from 23 SNPs by European-based GWAS and five SNPs (rs704017, rs11196172, rs10774214, rs647161, rs2423279) among ten SNPs by Asian-based GWAS were selected in CRC risk prediction model. Compared with a 6-SNP-based model, an 11-SNP model including Asian GWAS-SNPs showed improved discrimination capacity in Receiver operator characteristic analysis. A model with 11 SNPs resulted in statistically significant improvement in both derivation (P = 0.0039) and replication studies (P = 0.0018) compared with six SNP model. We estimated cumulative risk of CRC by using genetic risk group based on 11 SNPs and found that the cumulative risk at age 80 is approximately 13% in the high-risk group while 6% in the low-risk group. We constructed a more efficient CRC risk prediction model with 11 SNPs including newly identified East Asian-based GWAS SNPs (rs704017, rs11196172, rs10774214, rs647161, rs2423279). Risk grouping based on 11 SNPs depicted lifetime difference of CRC risk. This might be useful for effective individualized prevention for East Asian.
Inferring genetic interactions via a nonlinear model and an optimization algorithm.
Chen, Chung-Ming; Lee, Chih; Chuang, Cheng-Long; Wang, Chia-Chang; Shieh, Grace S
2010-02-26
Biochemical pathways are gradually becoming recognized as central to complex human diseases and recently genetic/transcriptional interactions have been shown to be able to predict partial pathways. With the abundant information made available by microarray gene expression data (MGED), nonlinear modeling of these interactions is now feasible. Two of the latest advances in nonlinear modeling used sigmoid models to depict transcriptional interaction of a transcription factor (TF) for a target gene, but do not model cooperative or competitive interactions of several TFs for a target. An S-shape model and an optimization algorithm (GASA) were developed to infer genetic interactions/transcriptional regulation of several genes simultaneously using MGED. GASA consists of a genetic algorithm (GA) and a simulated annealing (SA) algorithm, which is enhanced by a steepest gradient descent algorithm to avoid being trapped in local minimum. Using simulated data with various degrees of noise, we studied how GASA with two model selection criteria and two search spaces performed. Furthermore, GASA was shown to outperform network component analysis, the time series network inference algorithm (TSNI), GA with regular GA (GAGA) and GA with regular SA. Two applications are demonstrated. First, GASA is applied to infer a subnetwork of human T-cell apoptosis. Several of the predicted interactions are supported by the literature. Second, GASA was applied to infer the transcriptional factors of 34 cell cycle regulated targets in S. cerevisiae, and GASA performed better than one of the latest advances in nonlinear modeling, GAGA and TSNI. Moreover, GASA is able to predict multiple transcription factors for certain targets, and these results coincide with experiments confirmed data in YEASTRACT. GASA is shown to infer both genetic interactions and transcriptional regulatory interactions well. In particular, GASA seems able to characterize the nonlinear mechanism of transcriptional regulatory interactions (TIs) in yeast, and may be applied to infer TIs in other organisms. The predicted genetic interactions of a subnetwork of human T-cell apoptosis coincide with existing partial pathways, suggesting the potential of GASA on inferring biochemical pathways.
Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia
2017-10-01
The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Thorwarth, Patrick; Yousef, Eltohamy A A; Schmid, Karl J
2018-02-02
Genetic resources are an important source of genetic variation for plant breeding. Genome-wide association studies (GWAS) and genomic prediction greatly facilitate the analysis and utilization of useful genetic diversity for improving complex phenotypic traits in crop plants. We explored the potential of GWAS and genomic prediction for improving curd-related traits in cauliflower ( Brassica oleracea var. botrytis ) by combining 174 randomly selected cauliflower gene bank accessions from two different gene banks. The collection was genotyped with genotyping-by-sequencing (GBS) and phenotyped for six curd-related traits at two locations and three growing seasons. A GWAS analysis based on 120,693 single-nucleotide polymorphisms identified a total of 24 significant associations for curd-related traits. The potential for genomic prediction was assessed with a genomic best linear unbiased prediction model and BayesB. Prediction abilities ranged from 0.10 to 0.66 for different traits and did not differ between prediction methods. Imputation of missing genotypes only slightly improved prediction ability. Our results demonstrate that GWAS and genomic prediction in combination with GBS and phenotyping of highly heritable traits can be used to identify useful quantitative trait loci and genotypes among genetically diverse gene bank material for subsequent utilization as genetic resources in cauliflower breeding. Copyright © 2018 Thorwarth et al.
Improved prediction of biochemical recurrence after radical prostatectomy by genetic polymorphisms.
Morote, Juan; Del Amo, Jokin; Borque, Angel; Ars, Elisabet; Hernández, Carlos; Herranz, Felipe; Arruza, Antonio; Llarena, Roberto; Planas, Jacques; Viso, María J; Palou, Joan; Raventós, Carles X; Tejedor, Diego; Artieda, Marta; Simón, Laureano; Martínez, Antonio; Rioja, Luis A
2010-08-01
Single nucleotide polymorphisms are inherited genetic variations that can predispose or protect individuals against clinical events. We hypothesized that single nucleotide polymorphism profiling may improve the prediction of biochemical recurrence after radical prostatectomy. We performed a retrospective, multi-institutional study of 703 patients treated with radical prostatectomy for clinically localized prostate cancer who had at least 5 years of followup after surgery. All patients were genotyped for 83 prostate cancer related single nucleotide polymorphisms using a low density oligonucleotide microarray. Baseline clinicopathological variables and single nucleotide polymorphisms were analyzed to predict biochemical recurrence within 5 years using stepwise logistic regression. Discrimination was measured by ROC curve AUC, specificity, sensitivity, predictive values, net reclassification improvement and integrated discrimination index. The overall biochemical recurrence rate was 35%. The model with the best fit combined 8 covariates, including the 5 clinicopathological variables prostate specific antigen, Gleason score, pathological stage, lymph node involvement and margin status, and 3 single nucleotide polymorphisms at the KLK2, SULT1A1 and TLR4 genes. Model predictive power was defined by 80% positive predictive value, 74% negative predictive value and an AUC of 0.78. The model based on clinicopathological variables plus single nucleotide polymorphisms showed significant improvement over the model without single nucleotide polymorphisms, as indicated by 23.3% net reclassification improvement (p = 0.003), integrated discrimination index (p <0.001) and likelihood ratio test (p <0.001). Internal validation proved model robustness (bootstrap corrected AUC 0.78, range 0.74 to 0.82). The calibration plot showed close agreement between biochemical recurrence observed and predicted probabilities. Predicting biochemical recurrence after radical prostatectomy based on clinicopathological data can be significantly improved by including patient genetic information. Copyright (c) 2010 American Urological Association Education and Research, Inc. Published by Elsevier Inc. All rights reserved.
Khozani, Zohreh Sheikh; Bonakdari, Hossein; Zaji, Amir Hossein
2016-01-01
Two new soft computing models, namely genetic programming (GP) and genetic artificial algorithm (GAA) neural network (a combination of modified genetic algorithm and artificial neural network methods) were developed in order to predict the percentage of shear force in a rectangular channel with non-homogeneous roughness. The ability of these methods to estimate the percentage of shear force was investigated. Moreover, the independent parameters' effectiveness in predicting the percentage of shear force was determined using sensitivity analysis. According to the results, the GP model demonstrated superior performance to the GAA model. A comparison was also made between the GP program determined as the best model and five equations obtained in prior research. The GP model with the lowest error values (root mean square error ((RMSE) of 0.0515) had the best function compared with the other equations presented for rough and smooth channels as well as smooth ducts. The equation proposed for rectangular channels with rough boundaries (RMSE of 0.0642) outperformed the prior equations for smooth boundaries.
The genetic landscape of a physical interaction
Diss, Guillaume
2018-01-01
A key question in human genetics and evolutionary biology is how mutations in different genes combine to alter phenotypes. Efforts to systematically map genetic interactions have mostly made use of gene deletions. However, most genetic variation consists of point mutations of diverse and difficult to predict effects. Here, by developing a new sequencing-based protein interaction assay – deepPCA – we quantified the effects of >120,000 pairs of point mutations on the formation of the AP-1 transcription factor complex between the products of the FOS and JUN proto-oncogenes. Genetic interactions are abundant both in cis (within one protein) and trans (between the two molecules) and consist of two classes – interactions driven by thermodynamics that can be predicted using a three-parameter global model, and structural interactions between proximally located residues. These results reveal how physical interactions generate quantitatively predictable genetic interactions. PMID:29638215
Validity of Models for Predicting BRCA1 and BRCA2 Mutations
Parmigiani, Giovanni; Chen, Sining; Iversen, Edwin S.; Friebel, Tara M.; Finkelstein, Dianne M.; Anton-Culver, Hoda; Ziogas, Argyrios; Weber, Barbara L.; Eisen, Andrea; Malone, Kathleen E.; Daling, Janet R.; Hsu, Li; Ostrander, Elaine A.; Peterson, Leif E.; Schildkraut, Joellen M.; Isaacs, Claudine; Corio, Camille; Leondaridis, Leoni; Tomlinson, Gail; Amos, Christopher I.; Strong, Louise C.; Berry, Donald A.; Weitzel, Jeffrey N.; Sand, Sharon; Dutson, Debra; Kerber, Rich; Peshkin, Beth N.; Euhus, David M.
2008-01-01
Background Deleterious mutations of the BRCA1 and BRCA2 genes confer susceptibility to breast and ovarian cancer. At least 7 models for estimating the probabilities of having a mutation are used widely in clinical and scientific activities; however, the merits and limitations of these models are not fully understood. Objective To systematically quantify the accuracy of the following publicly available models to predict mutation carrier status: BRCAPRO, family history assessment tool, Finnish, Myriad, National Cancer Institute, University of Pennsylvania, and Yale University. Design Cross-sectional validation study, using model predictions and BRCA1 or BRCA2 mutation status of patients different from those used to develop the models. Setting Multicenter study across Cancer Genetics Network participating centers. Patients 3 population-based samples of participants in research studies and 8 samples from genetic counseling clinics. Measurements Discrimination between individuals testing positive for a mutation in BRCA1 or BRCA2 from those testing negative, as measured by the c-statistic, and sensitivity and specificity of model predictions. Results The 7 models differ in their predictions. The better-performing models have a c-statistic around 80%. BRCAPRO has the largest c-statistic overall and in all but 2 patient subgroups, although the margin over other models is narrow in many strata. Outside of high-risk populations, all models have high false-negative and false-positive rates across a range of probability thresholds used to refer for mutation testing. Limitation Three recently published models were not included. Conclusions All models identify women who probably carry a deleterious mutation of BRCA1 or BRCA2 with adequate discrimination to support individualized genetic counseling, although discrimination varies across models and populations. PMID:17909205
Ridge, Lasso and Bayesian additive-dominance genomic models.
Azevedo, Camila Ferreira; de Resende, Marcos Deon Vilela; E Silva, Fabyano Fonseca; Viana, José Marcelo Soriano; Valente, Magno Sávio Ferreira; Resende, Márcio Fernando Ribeiro; Muñoz, Patricio
2015-08-25
A complete approach for genome-wide selection (GWS) involves reliable statistical genetics models and methods. Reports on this topic are common for additive genetic models but not for additive-dominance models. The objective of this paper was (i) to compare the performance of 10 additive-dominance predictive models (including current models and proposed modifications), fitted using Bayesian, Lasso and Ridge regression approaches; and (ii) to decompose genomic heritability and accuracy in terms of three quantitative genetic information sources, namely, linkage disequilibrium (LD), co-segregation (CS) and pedigree relationships or family structure (PR). The simulation study considered two broad sense heritability levels (0.30 and 0.50, associated with narrow sense heritabilities of 0.20 and 0.35, respectively) and two genetic architectures for traits (the first consisting of small gene effects and the second consisting of a mixed inheritance model with five major genes). G-REML/G-BLUP and a modified Bayesian/Lasso (called BayesA*B* or t-BLASSO) method performed best in the prediction of genomic breeding as well as the total genotypic values of individuals in all four scenarios (two heritabilities x two genetic architectures). The BayesA*B*-type method showed a better ability to recover the dominance variance/additive variance ratio. Decomposition of genomic heritability and accuracy revealed the following descending importance order of information: LD, CS and PR not captured by markers, the last two being very close. Amongst the 10 models/methods evaluated, the G-BLUP, BAYESA*B* (-2,8) and BAYESA*B* (4,6) methods presented the best results and were found to be adequate for accurately predicting genomic breeding and total genotypic values as well as for estimating additive and dominance in additive-dominance genomic models.
A model of litter size distribution in cattle.
Bennett, G L; Echternkamp, S E; Gregory, K E
1998-07-01
Genetic increases in twinning of cattle could result in increased frequency of triplet or higher-order births. There are no estimates of the incidence of triplets in populations with genetic levels of twinning over 40% because these populations either have not existed or have not been documented. A model of the distribution of litter size in cattle is proposed. Empirical estimates of ovulation rate distribution in sheep were combined with biological hypotheses about the fate of embryos in cattle. Two phases of embryo loss were hypothesized. The first phase is considered to be preimplantation. Losses in this phase occur independently (i.e., the loss of one embryo does not affect the loss of the remaining embryos). The second phase occurs after implantation. The loss of one embryo in this stage results in the loss of all embryos. Fewer than 5% triplet births are predicted when 50% of births are twins and triplets. Above 60% multiple births, increased triplets accounted for most of the increase in litter size. Predictions were compared with data from 5,142 calvings by 14 groups of heifers and cows with average litter sizes ranging from 1.14 to 1.36 calves. The predicted number of triplets was not significantly different (chi2 = 16.85, df = 14) from the observed number. The model also predicted differences in conception rates. A cow ovulating two ova was predicted to have the highest conception rate in a single breeding cycle. As mean ovulation rate increased, predicted conception to one breeding cycle increased. Conception to two or three breeding cycles decreased as mean ovulation increased because late-pregnancy failures increased. An alternative model of the fate of ova in cattle based on embryo and uterine competency predicts very similar proportions of singles, twins, and triplets but different conception rates. The proposed model of litter size distribution in cattle accurately predicts the proportion of triplets found in cattle with genetically high twinning rates. This model can be used in projecting efficiency changes resulting from genetically increasing the twinning rate in cattle.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits.
van Heerwaarden, Joost; van Zanten, Martijn; Kruijer, Willem
2015-10-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation.
Plasticity of genetic interactions in metabolic networks of yeast.
Harrison, Richard; Papp, Balázs; Pál, Csaba; Oliver, Stephen G; Delneri, Daniela
2007-02-13
Why are most genes dispensable? The impact of gene deletions may depend on the environment (plasticity), the presence of compensatory mechanisms (mutational robustness), or both. Here, we analyze the interaction between these two forces by exploring the condition-dependence of synthetic genetic interactions that define redundant functions and alternative pathways. We performed systems-level flux balance analysis of the yeast (Saccharomyces cerevisiae) metabolic network to identify genetic interactions and then tested the model's predictions with in vivo gene-deletion studies. We found that the majority of synthetic genetic interactions are restricted to certain environmental conditions, partly because of the lack of compensation under some (but not all) nutrient conditions. Moreover, the phylogenetic cooccurrence of synthetically interacting pairs is not significantly different from random expectation. These findings suggest that these gene pairs have at least partially independent functions, and, hence, compensation is only a byproduct of their evolutionary history. Experimental analyses that used multiple gene deletion strains not only confirmed predictions of the model but also showed that investigation of false predictions may both improve functional annotation within the model and also lead to the discovery of higher-order genetic interactions. Our work supports the view that functional redundancy may be more apparent than real, and it offers a unified framework for the evolution of environmental adaptation and mutational robustness.
Zhu, Fan; Panwar, Bharat; Dodge, Hiroko H; Li, Hongdong; Hampstead, Benjamin M; Albin, Roger L; Paulson, Henry L; Guan, Yuanfang
2016-10-05
We present COMPASS, a COmputational Model to Predict the development of Alzheimer's diSease Spectrum, to model Alzheimer's disease (AD) progression. This was the best-performing method in recent crowdsourcing benchmark study, DREAM Alzheimer's Disease Big Data challenge to predict changes in Mini-Mental State Examination (MMSE) scores over 24-months using standardized data. In the present study, we conducted three additional analyses beyond the DREAM challenge question to improve the clinical contribution of our approach, including: (1) adding pre-validated baseline cognitive composite scores of ADNI-MEM and ADNI-EF, (2) identifying subjects with significant declines in MMSE scores, and (3) incorporating SNPs of top 10 genes connected to APOE identified from functional-relationship network. For (1) above, we significantly improved predictive accuracy, especially for the Mild Cognitive Impairment (MCI) group. For (2), we achieved an area under ROC of 0.814 in predicting significant MMSE decline: our model has 100% precision at 5% recall, and 91% accuracy at 10% recall. For (3), "genetic only" model has Pearson's correlation of 0.15 to predict progression in the MCI group. Even though addition of this limited genetic model to COMPASS did not improve prediction of progression of MCI group, the predictive ability of SNP information extended beyond well-known APOE allele.
The functional consequences of non-genetic diversity in cellular navigation
NASA Astrophysics Data System (ADS)
Emonet, Thierry; Waite, Adam J.; Frankel, Nicholas W.; Dufour, Yann; Johnston, Jessica F.
Substantial non-genetic diversity in complex behaviors, such as chemotaxis in E. coli, has been observed for decades, but the relevance of this diversity for the population is not well understood. Here, we use microfluidics to show that non-genetic diversity leads to significant structuring of the population in space and time, which confirms predictions made by our detailed mathematical model of chemotaxis. We then use genetic tools to show that altering the expression level of a single chemotaxis protein is sufficient to alter the distribution of swimming behaviors, which directly determines the performance of a population in a gradient of attractant, a result also predicted by our model. Supported by NIH 1R01GM106189, the James S McDonnell Foundation, and the Paul Allen foundation.
A genomic perspective on the generation and maintenance of genetic diversity in herbivorous insects
Gloss, Andrew D.; Groen, Simon C.; Whiteman, Noah K.
2017-01-01
Understanding the processes that generate and maintain genetic variation within populations is a central goal in evolutionary biology. Theory predicts that some of this variation is maintained as a consequence of adapting to variable habitats. Studies in herbivorous insects have played a key role in confirming this prediction. Here, we highlight theoretical and conceptual models for the maintenance of genetic diversity in herbivorous insects, empirical genomic studies testing these models, and pressing questions within the realm of evolutionary and functional genomic studies. To address key gaps, we propose an integrative approach combining population genomic scans for adaptation, genome-wide characterization of targets of selection through experimental manipulations, mapping the genetic architecture of traits influencing fitness, and functional studies. We also stress the importance of studying the maintenance of genetic variation across biological scales—from variation within populations to divergence among populations—to form a comprehensive view of adaptation in herbivorous insects. PMID:28736510
Nazarian, Alireza; Gezan, Salvador A
2016-03-01
The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Wan, Jizhong; Wang, Chunjing; Yu, Jinghua; Nie, Siming; Han, Shijie; Zu, Yuangang; Chen, Changmei; Yuan, Shusheng; Wang, Qinggui
2014-01-01
Climate change affects both habitat suitability and the genetic diversity of wild plants. Therefore, predicting and establishing the most effective and coherent conservation areas is essential for the conservation of genetic diversity in response to climate change. This is because genetic variance is a product not only of habitat suitability in conservation areas but also of efficient protection and management. Phellodendron amurense Rupr. is a tree species (family Rutaceae) that is endangered due to excessive and illegal harvesting for use in Chinese medicine. Here, we test a general computational method for the prediction of priority conservation areas (PCAs) by measuring the genetic diversity of P. amurense across the entirety of northeast China using a single strand repeat analysis of twenty microsatellite markers. Using computational modeling, we evaluated the geographical distribution of the species, both now and in different future climate change scenarios. Different populations were analyzed according to genetic diversity, and PCAs were identified using a spatial conservation prioritization framework. These conservation areas were optimized to account for the geographical distribution of P. amurense both now and in the future, to effectively promote gene flow, and to have a long period of validity. In situ and ex situ conservation, strategies for vulnerable populations were proposed. Three populations with low genetic diversity are predicted to be negatively affected by climate change, making conservation of genetic diversity challenging due to decreasing habitat suitability. Habitat suitability was important for the assessment of genetic variability in existing nature reserves, which were found to be much smaller than the proposed PCAs. Finally, a simple set of conservation measures was established through modeling. This combined molecular and computational ecology approach provides a framework for planning the protection of species endangered by climate change. PMID:25165526
NASA Technical Reports Server (NTRS)
Rajkumar, T.; Aragon, Cecilia; Bardina, Jorge; Britten, Roy
2002-01-01
A fast, reliable way of predicting aerodynamic coefficients is produced using a neural network optimized by a genetic algorithm. Basic aerodynamic coefficients (e.g. lift, drag, pitching moment) are modelled as functions of angle of attack and Mach number. The neural network is first trained on a relatively rich set of data from wind tunnel tests of numerical simulations to learn an overall model. Most of the aerodynamic parameters can be well-fitted using polynomial functions. A new set of data, which can be relatively sparse, is then supplied to the network to produce a new model consistent with the previous model and the new data. Because the new model interpolates realistically between the sparse test data points, it is suitable for use in piloted simulations. The genetic algorithm is used to choose a neural network architecture to give best results, avoiding over-and under-fitting of the test data.
Predicting mining activity with parallel genetic algorithms
Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,
2005-01-01
We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.
Anand, Vibha; Rosenman, Marc B; Downs, Stephen M
2013-09-01
To develop a map of disease associations exclusively using two publicly available genetic sources: the catalog of single nucleotide polymorphisms (SNPs) from the HapMap, and the catalog of Genome Wide Association Studies (GWAS) from the NHGRI, and to evaluate it with a large, long-standing electronic medical record (EMR). A computational model, In Silico Bayesian Integration of GWAS (IsBIG), was developed to learn associations among diseases using a Bayesian network (BN) framework, using only genetic data. The IsBIG model (I-Model) was re-trained using data from our EMR (M-Model). Separately, another clinical model (C-Model) was learned from this training dataset. The I-Model was compared with both the M-Model and the C-Model for power to discriminate a disease given other diseases using a test dataset from our EMR. Area under receiver operator characteristics curve was used as a performance measure. Direct associations between diseases in the I-Model were also searched in the PubMed database and in classes of the Human Disease Network (HDN). On the basis of genetic information alone, the I-Model linked a third of diseases from our EMR. When compared to the M-Model, the I-Model predicted diseases given other diseases with 94% specificity, 33% sensitivity, and 80% positive predictive value. The I-Model contained 117 direct associations between diseases. Of those associations, 20 (17%) were absent from the searches of the PubMed database; one of these was present in the C-Model. Of the direct associations in the I-Model, 7 (35%) were absent from disease classes of HDN. Using only publicly available genetic sources we have mapped associations in GWAS to a human disease map using an in silico approach. Furthermore, we have validated this disease map using phenotypic data from our EMR. Models predicting disease associations on the basis of known genetic associations alone are specific but not sensitive. Genetic data, as it currently exists, can only explain a fraction of the risk of a disease. Our approach makes a quantitative statement about disease variation that can be explained in an EMR on the basis of genetic associations described in the GWAS. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo
2014-01-01
We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005–0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level. PMID:24498162
Da, Yang; Wang, Chunkao; Wang, Shengwen; Hu, Guo
2014-01-01
We established a genomic model of quantitative trait with genomic additive and dominance relationships that parallels the traditional quantitative genetics model, which partitions a genotypic value as breeding value plus dominance deviation and calculates additive and dominance relationships using pedigree information. Based on this genomic model, two sets of computationally complementary but mathematically identical mixed model methods were developed for genomic best linear unbiased prediction (GBLUP) and genomic restricted maximum likelihood estimation (GREML) of additive and dominance effects using SNP markers. These two sets are referred to as the CE and QM sets, where the CE set was designed for large numbers of markers and the QM set was designed for large numbers of individuals. GBLUP and associated accuracy formulations for individuals in training and validation data sets were derived for breeding values, dominance deviations and genotypic values. Simulation study showed that GREML and GBLUP generally were able to capture small additive and dominance effects that each accounted for 0.00005-0.0003 of the phenotypic variance and GREML was able to differentiate true additive and dominance heritability levels. GBLUP of the total genetic value as the summation of additive and dominance effects had higher prediction accuracy than either additive or dominance GBLUP, causal variants had the highest accuracy of GREML and GBLUP, and predicted accuracies were in agreement with observed accuracies. Genomic additive and dominance relationship matrices using SNP markers were consistent with theoretical expectations. The GREML and GBLUP methods can be an effective tool for assessing the type and magnitude of genetic effects affecting a phenotype and for predicting the total genetic value at the whole genome level.
Genetic Predisposition to Ischemic Stroke
Kamatani, Yoichiro; Takahashi, Atsushi; Hata, Jun; Furukawa, Ryohei; Shiwa, Yuh; Yamaji, Taiki; Hara, Megumi; Tanno, Kozo; Ohmomo, Hideki; Ono, Kanako; Takashima, Naoyuki; Matsuda, Koichi; Wakai, Kenji; Sawada, Norie; Iwasaki, Motoki; Yamagishi, Kazumasa; Ago, Tetsuro; Ninomiya, Toshiharu; Fukushima, Akimune; Hozawa, Atsushi; Minegishi, Naoko; Satoh, Mamoru; Endo, Ryujin; Sasaki, Makoto; Sakata, Kiyomi; Kobayashi, Seiichiro; Ogasawara, Kuniaki; Nakamura, Motoyuki; Hitomi, Jiro; Kita, Yoshikuni; Tanaka, Keitaro; Iso, Hiroyasu; Kitazono, Takanari; Kubo, Michiaki; Tanaka, Hideo; Tsugane, Shoichiro; Kiyohara, Yutaka; Yamamoto, Masayuki; Sobue, Kenji; Shimizu, Atsushi
2017-01-01
Background and Purpose— The prediction of genetic predispositions to ischemic stroke (IS) may allow the identification of individuals at elevated risk and thereby prevent IS in clinical practice. Previously developed weighted multilocus genetic risk scores showed limited predictive ability for IS. Here, we investigated the predictive ability of a newer method, polygenic risk score (polyGRS), based on the idea that a few strong signals, as well as several weaker signals, can be collectively informative to determine IS risk. Methods— We genotyped 13 214 Japanese individuals with IS and 26 470 controls (derivation samples) and generated both multilocus genetic risk scores and polyGRS, using the same derivation data set. The predictive abilities of each scoring system were then assessed using 2 independent sets of Japanese samples (KyushuU and JPJM data sets). Results— In both validation data sets, polyGRS was shown to be significantly associated with IS, but weighted multilocus genetic risk scores was not. Comparing the highest with the lowest polyGRS quintile, the odds ratios for IS were 1.75 (95% confidence interval, 1.33–2.31) and 1.99 (95% confidence interval, 1.19–3.33) in the KyushuU and JPJM samples, respectively. Using the KyushuU samples, the addition of polyGRS to a nongenetic risk model resulted in a significant improvement of the predictive ability (net reclassification improvement=0.151; P<0.001). Conclusions— The polyGRS was shown to be superior to weighted multilocus genetic risk scores as an IS prediction model. Thus, together with the nongenetic risk factors, polyGRS will provide valuable information for individual risk assessment and management of modifiable risk factors. PMID:28034966
Aliloo, Hassan; Pryce, Jennie E; González-Recio, Oscar; Cocks, Benjamin G; Hayes, Ben J
2016-02-01
Dominance effects may contribute to genetic variation of complex traits in dairy cattle, especially for traits closely related to fitness such as fertility. However, traditional genetic evaluations generally ignore dominance effects and consider additive genetic effects only. Availability of dense single nucleotide polymorphisms (SNPs) panels provides the opportunity to investigate the role of dominance in quantitative variation of complex traits at both the SNP and animal levels. Including dominance effects in the genomic evaluation of animals could also help to increase the accuracy of prediction of future phenotypes. In this study, we estimated additive and dominance variance components for fertility and milk production traits of genotyped Holstein and Jersey cows in Australia. The predictive abilities of a model that accounts for additive effects only (additive), and a model that accounts for both additive and dominance effects (additive + dominance) were compared in a fivefold cross-validation. Estimates of the proportion of dominance variation relative to phenotypic variation that is captured by SNPs, for production traits, were up to 3.8 and 7.1 % in Holstein and Jersey cows, respectively, whereas, for fertility, they were equal to 1.2 % in Holstein and very close to zero in Jersey cows. We found that including dominance in the model was not consistently advantageous. Based on maximum likelihood ratio tests, the additive + dominance model fitted the data better than the additive model, for milk, fat and protein yields in both breeds. However, regarding the prediction of phenotypes assessed with fivefold cross-validation, including dominance effects in the model improved accuracy only for fat yield in Holstein cows. Regression coefficients of phenotypes on genetic values and mean squared errors of predictions showed that the predictive ability of the additive + dominance model was superior to that of the additive model for some of the traits. In both breeds, dominance effects were significant (P < 0.01) for all milk production traits but not for fertility. Accuracy of prediction of phenotypes was slightly increased by including dominance effects in the genomic evaluation model. Thus, it can help to better identify highly performing individuals and be useful for culling decisions.
Jarquin, Diego; Specht, James; Lorenz, Aaron
2016-08-09
The identification and mobilization of useful genetic variation from germplasm banks for use in breeding programs is critical for future genetic gain and protection against crop pests. Plummeting costs of next-generation sequencing and genotyping is revolutionizing the way in which researchers and breeders interface with plant germplasm collections. An example of this is the high density genotyping of the entire USDA Soybean Germplasm Collection. We assessed the usefulness of 50K single nucleotide polymorphism data collected on 18,480 domesticated soybean (Glycine max) accessions and vast historical phenotypic data for developing genomic prediction models for protein, oil, and yield. Resulting genomic prediction models explained an appreciable amount of the variation in accession performance in independent validation trials, with correlations between predicted and observed reaching up to 0.92 for oil and protein and 0.79 for yield. The optimization of training set design was explored using a series of cross-validation schemes. It was found that the target population and environment need to be well represented in the training set. Second, genomic prediction training sets appear to be robust to the presence of data from diverse geographical locations and genetic clusters. This finding, however, depends on the influence of shattering and lodging, and may be specific to soybean with its presence of maturity groups. The distribution of 7608 nonphenotyped accessions was examined through the application of genomic prediction models. The distribution of predictions of phenotyped accessions was representative of the distribution of predictions for nonphenotyped accessions, with no nonphenotyped accessions being predicted to fall far outside the range of predictions of phenotyped accessions. Copyright © 2016 Jarquin et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evans, J.S.; Moeller, D.W.; Cooper, D.W.
1985-07-01
Analysis of the radiological health effects of nuclear power plant accidents requires models for predicting early health effects, cancers and benign thyroid nodules, and genetic effects. Since the publication of the Reactor Safety Study, additional information on radiological health effects has become available. This report summarizes the efforts of a program designed to provide revised health effects models for nuclear power plant accident consequence modeling. The new models for early effects address four causes of mortality and nine categories of morbidity. The models for early effects are based upon two parameter Weibull functions. They permit evaluation of the influence ofmore » dose protraction and address the issue of variation in radiosensitivity among the population. The piecewise-linear dose-response models used in the Reactor Safety Study to predict cancers and thyroid nodules have been replaced by linear and linear-quadratic models. The new models reflect the most recently reported results of the follow-up of the survivors of the bombings of Hiroshima and Nagasaki and permit analysis of both morbidity and mortality. The new models for genetic effects allow prediction of genetic risks in each of the first five generations after an accident and include information on the relative severity of various classes of genetic effects. The uncertainty in modeloling radiological health risks is addressed by providing central, upper, and lower estimates of risks. An approach is outlined for summarizing the health consequences of nuclear power plant accidents. 298 refs., 9 figs., 49 tabs.« less
Genome-Wide Prediction of the Performance of Three-Way Hybrids in Barley.
Li, Zuo; Philipp, Norman; Spiller, Monika; Stiewe, Gunther; Reif, Jochen C; Zhao, Yusheng
2017-03-01
Predicting the grain yield performance of three-way hybrids is challenging. Three-way crosses are relevant for hybrid breeding in barley ( L.) and maize ( L.) adapted to East Africa. The main goal of our study was to implement and evaluate genome-wide prediction approaches of the performance of three-way hybrids using data of single-cross hybrids for a scenario in which parental lines of the three-way hybrids originate from three genetically distinct subpopulations. We extended the ridge regression best linear unbiased prediction (RRBLUP) and devised a genomic selection model allowing for subpopulation-specific marker effects (GSA-RRBLUP: general and subpopulation-specific additive RRBLUP). Using an empirical barley data set, we showed that applying GSA-RRBLUP tripled the prediction ability of three-way hybrids from 0.095 to 0.308 compared with RRBLUP, modeling one additive effect for all three subpopulations. The experimental findings were further substantiated with computer simulations. Our results emphasize the potential of GSA-RRBLUP to improve genome-wide hybrid prediction of three-way hybrids for scenarios of genetically diverse parental populations. Because of the advantages of the GSA-RRBLUP model in dealing with hybrids from different parental populations, it may also be a promising approach to boost the prediction ability for hybrid breeding programs based on genetically diverse heterotic groups. Copyright © 2017 Crop Science Society of America.
Mühlenbruch, Kristin; Jeppesen, Charlotte; Joost, Hans-Georg; Boeing, Heiner; Schulze, Matthias B
2013-01-01
Genome-wide association studies have identified numerous single nucleotide polymorphisms associated with type 2 diabetes through the past years. In previous studies, the usefulness of these genetic markers for prediction of diabetes was found to be limited. However, differences may exist between substrata of the population according to the presence of major diabetes risk factors. This study aimed to investigate the added predictive value of genetic information (42 single nucleotide polymorphisms) in subgroups of sex, age, family history of diabetes, and obesity. A case-cohort study (random subcohort N = 1,968; incident cases: N = 578) within the European Prospective Investigation into Cancer and Nutrition Potsdam study was used. Prediction models without and with genetic information were evaluated in terms of the area under the receiver operating characteristic curve and the integrated discrimination improvement. Stratified analyses included subgroups of sex, age (<50 or ≥50 years), family history (positive if either father or mother or a sibling has/had diabetes), and obesity (BMI< or ≥30 kg/m(2)). A genetic risk score did not improve prediction above classic and metabolic markers, but - compared to a non-invasive prediction model - genetic information slightly improved the area under the receiver operating characteristic curve (difference [95%-CI]: 0.007 [0.002-0.011]). Stratified analyses showed stronger improvement in the older age group (0.010 [0.002-0.018]), the group with a positive family history (0.012 [0.000-0.023]) and among obese participants (0.015 [-0.005-0.034]) compared to the younger participants (0.005 [-0.004-0.014]), participants with a negative family history (0.003 [-0.001-0.008]) and non-obese (0.007 [0.000-0.014]), respectively. No difference was found between men and women. There was no incremental value of genetic information compared to standard non-invasive and metabolic markers. Our study suggests that inclusion of genetic variants in diabetes risk prediction might be useful for subgroups with already manifest risk factors such as older age, a positive family history and obesity.
Genomic selection for crossbred performance accounting for breed-specific effects.
Lopes, Marcos S; Bovenhuis, Henk; Hidalgo, André M; van Arendonk, Johan A M; Knol, Egbert F; Bastiaansen, John W M
2017-06-26
Breed-specific effects are observed when the same allele of a given genetic marker has a different effect depending on its breed origin, which results in different allele substitution effects across breeds. In such a case, single-breed breeding values may not be the most accurate predictors of crossbred performance. Our aim was to estimate the contribution of alleles from each parental breed to the genetic variance of traits that are measured in crossbred offspring, and to compare the prediction accuracies of estimated direct genomic values (DGV) from a traditional genomic selection model (GS) that are trained on purebred or crossbred data, with accuracies of DGV from a model that accounts for breed-specific effects (BS), trained on purebred or crossbred data. The final dataset was composed of 924 Large White, 924 Landrace and 924 two-way cross (F1) genotyped and phenotyped animals. The traits evaluated were litter size (LS) and gestation length (GL) in pigs. The genetic correlation between purebred and crossbred performance was higher than 0.88 for both LS and GL. For both traits, the additive genetic variance was larger for alleles inherited from the Large White breed compared to alleles inherited from the Landrace breed (0.74 and 0.56 for LS, and 0.42 and 0.40 for GL, respectively). The highest prediction accuracies of crossbred performance were obtained when training was done on crossbred data. For LS, prediction accuracies were the same for GS and BS DGV (0.23), while for GL, prediction accuracy for BS DGV was similar to the accuracy of GS DGV (0.53 and 0.52, respectively). In this study, training on crossbred data resulted in higher prediction accuracy than training on purebred data and evidence of breed-specific effects for LS and GL was demonstrated. However, when training was done on crossbred data, both GS and BS models resulted in similar prediction accuracies. In future studies, traits with a lower genetic correlation between purebred and crossbred performance should be included to further assess the value of the BS model in genomic predictions.
Discriminatory power of common genetic variants in personalized breast cancer diagnosis
NASA Astrophysics Data System (ADS)
Wu, Yirong; Abbey, Craig K.; Liu, Jie; Ong, Irene; Peissig, Peggy; Onitilo, Adedayo A.; Fan, Jun; Yuan, Ming; Burnside, Elizabeth S.
2016-03-01
Technology advances in genome-wide association studies (GWAS) has engendered optimism that we have entered a new age of precision medicine, in which the risk of breast cancer can be predicted on the basis of a person's genetic variants. The goal of this study is to evaluate the discriminatory power of common genetic variants in breast cancer risk estimation. We conducted a retrospective case-control study drawing from an existing personalized medicine data repository. We collected variables that predict breast cancer risk: 153 high-frequency/low-penetrance genetic variants, reflecting the state-of-the-art GWAS on breast cancer, mammography descriptors and BI-RADS assessment categories in the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We trained and tested naïve Bayes models by using these predictive variables. We generated ROC curves and used the area under the ROC curve (AUC) to quantify predictive performance. We found that genetic variants achieved comparable predictive performance to BI-RADS assessment categories in terms of AUC (0.650 vs. 0.659, p-value = 0.742), but significantly lower predictive performance than the combination of BI-RADS assessment categories and mammography descriptors (0.650 vs. 0.751, p-value < 0.001). A better understanding of relative predictive capability of genetic variants and mammography data may benefit clinicians and patients to make appropriate decisions about breast cancer screening, prevention, and treatment in the era of precision medicine.
Genomic Prediction Accounting for Residual Heteroskedasticity
Ou, Zhining; Tempelman, Robert J.; Steibel, Juan P.; Ernst, Catherine W.; Bates, Ronald O.; Bello, Nora M.
2015-01-01
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. PMID:26564950
Global skin colour prediction from DNA.
Walsh, Susan; Chaitanya, Lakshmi; Breslin, Krystal; Muralidharan, Charanya; Bronikowska, Agnieszka; Pospiech, Ewelina; Koller, Julia; Kovatsi, Leda; Wollstein, Andreas; Branicki, Wojciech; Liu, Fan; Kayser, Manfred
2017-07-01
Human skin colour is highly heritable and externally visible with relevance in medical, forensic, and anthropological genetics. Although eye and hair colour can already be predicted with high accuracies from small sets of carefully selected DNA markers, knowledge about the genetic predictability of skin colour is limited. Here, we investigate the skin colour predictive value of 77 single-nucleotide polymorphisms (SNPs) from 37 genetic loci previously associated with human pigmentation using 2025 individuals from 31 global populations. We identified a minimal set of 36 highly informative skin colour predictive SNPs and developed a statistical prediction model capable of skin colour prediction on a global scale. Average cross-validated prediction accuracies expressed as area under the receiver-operating characteristic curve (AUC) ± standard deviation were 0.97 ± 0.02 for Light, 0.83 ± 0.11 for Dark, and 0.96 ± 0.03 for Dark-Black. When using a 5-category, this resulted in 0.74 ± 0.05 for Very Pale, 0.72 ± 0.03 for Pale, 0.73 ± 0.03 for Intermediate, 0.87±0.1 for Dark, and 0.97 ± 0.03 for Dark-Black. A comparative analysis in 194 independent samples from 17 populations demonstrated that our model outperformed a previously proposed 10-SNP-classifier approach with AUCs rising from 0.79 to 0.82 for White, comparable at the intermediate level of 0.63 and 0.62, respectively, and a large increase from 0.64 to 0.92 for Black. Overall, this study demonstrates that the chosen DNA markers and prediction model, particularly the 5-category level; allow skin colour predictions within and between continental regions for the first time, which will serve as a valuable resource for future applications in forensic and anthropologic genetics.
Mastrangelo, Giuseppe; Carta, Angela; Arici, Cecilia; Pavanello, Sofia; Porru, Stefano
2017-01-01
No etiological prediction model incorporating biomarkers is available to predict bladder cancer risk associated with occupational exposure to aromatic amines. Cases were 199 bladder cancer patients. Clinical, laboratory and genetic data were predictors in logistic regression models (full and short) in which the dependent variable was 1 for 15 patients with aromatic amines related bladder cancer and 0 otherwise. The receiver operating characteristics approach was adopted; the area under the curve was used to evaluate discriminatory ability of models. Area under the curve was 0.93 for the full model (including age, smoking and coffee habits, DNA adducts, 12 genotypes) and 0.86 for the short model (including smoking, DNA adducts, 3 genotypes). Using the "best cut-off" of predicted probability of a positive outcome, percentage of cases correctly classified was 92% (full model) against 75% (short model). Cancers classified as "positive outcome" are those to be referred for evaluation by an occupational physician for etiological diagnosis; these patients were 28 (full model) or 60 (short model). Using 3 genotypes instead of 12 can double the number of patients with suspect of aromatic amine related cancer, thus increasing costs of etiologic appraisal. Integrating clinical, laboratory and genetic factors, we developed the first etiologic prediction model for aromatic amine related bladder cancer. Discriminatory ability was excellent, particularly for the full model, allowing individualized predictions. Validation of our model in external populations is essential for practical use in the clinical setting.
Multivariate Cholesky models of human female fertility patterns in the NLSY.
Rodgers, Joseph Lee; Bard, David E; Miller, Warren B
2007-03-01
Substantial evidence now exists that variables measuring or correlated with human fertility outcomes have a heritable component. In this study, we define a series of age-sequenced fertility variables, and fit multivariate models to account for underlying shared genetic and environmental sources of variance. We make predictions based on a theory developed by Udry [(1996) Biosocial models of low-fertility societies. In: Casterline, JB, Lee RD, Foote KA (eds) Fertility in the United States: new patterns, new theories. The Population Council, New York] suggesting that biological/genetic motivations can be more easily realized and measured in settings in which fertility choices are available. Udry's theory, along with principles from molecular genetics and certain tenets of life history theory, allow us to make specific predictions about biometrical patterns across age. Consistent with predictions, our results suggest that there are different sources of genetic influence on fertility variance at early compared to later ages, but that there is only one source of shared environmental influence that occurs at early ages. These patterns are suggestive of the types of gene-gene and gene-environment interactions for which we must account to better understand individual differences in fertility outcomes.
Improving production efficiency through genetic selection
USDA-ARS?s Scientific Manuscript database
The goal of dairy cattle breeding is to increase productivity and efficiency by means of genetic selection. This is possible because related animals share some of their DNA in common, and we can use statistical models to predict the genetic merit animals based on the performance of their relatives. ...
Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration
Janssens, A Cecile JW; Ioannidis, John PA; Bedrosian, Sara; Boffetta, Paolo; Dolan, Siobhan M; Dowling, Nicole; Fortier, Isabel; Freedman, Andrew N; Grimshaw, Jeremy M; Gulcher, Jeffrey; Gwinn, Marta; Hlatky, Mark A; Janes, Holly; Kraft, Peter; Melillo, Stephanie; O'Donnell, Christopher J; Pencina, Michael J; Ransohoff, David; Schully, Sheri D; Seminara, Daniela; Winn, Deborah M; Wright, Caroline F; van Duijn, Cornelia M; Little, Julian; Khoury, Muin J
2011-01-01
The rapid and continuing progress in gene discovery for complex diseases is fueling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by previous reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis. PMID:21407270
Diallel analysis for sex-linked and maternal effects.
Zhu, J; Weir, B S
1996-01-01
Genetic models including sex-linked and maternal effects as well as autosomal gene effects are described. Monte Carlo simulations were conducted to compare efficiencies of estimation by minimum norm quadratic unbiased estimation (MINQUE) and restricted maximum likelihood (REML) methods. MINQUE(1), which has 1 for all prior values, has a similar efficiency to MINQUE(θ), which requires prior estimates of parameter values. MINQUE(1) has the advantage over REML of unbiased estimation and convenient computation. An adjusted unbiased prediction (AUP) method is developed for predicting random genetic effects. AUP is desirable for its easy computation and unbiasedness of both mean and variance of predictors. The jackknife procedure is appropriate for estimating the sampling variances of estimated variances (or covariances) and of predicted genetic effects. A t-test based on jackknife variances is applicable for detecting significance of variation. Worked examples from mice and silkworm data are given in order to demonstrate variance and covariance estimation and genetic effect prediction.
Assessing non-additive effects in GBLUP model.
Vieira, I C; Dos Santos, J P R; Pires, L P M; Lima, B M; Gonçalves, F M A; Balestre, M
2017-05-10
Understanding non-additive effects in the expression of quantitative traits is very important in genotype selection, especially in species where the commercial products are clones or hybrids. The use of molecular markers has allowed the study of non-additive genetic effects on a genomic level, in addition to a better understanding of its importance in quantitative traits. Thus, the purpose of this study was to evaluate the behavior of the GBLUP model in different genetic models and relationship matrices and their influence on the estimates of genetic parameters. We used real data of the circumference at breast height in Eucalyptus spp and simulated data from a population of F 2 . Three commonly reported kinship structures in the literature were adopted. The simulation results showed that the inclusion of epistatic kinship improved prediction estimates of genomic breeding values. However, the non-additive effects were not accurately recovered. The Fisher information matrix for real dataset showed high collinearity in estimates of additive, dominant, and epistatic variance, causing no gain in the prediction of the unobserved data and convergence problems. Estimates presented differences of genetic parameters and correlations considering the different kinship structures. Our results show that the inclusion of non-additive effects can improve the predictive ability or even the prediction of additive effects. However, the high distortions observed in the variance estimates when the Hardy-Weinberg equilibrium assumption is violated due to the presence of selection or inbreeding can converge at zero gains in models that consider epistasis in genomic kinship.
The mathematical limits of genetic prediction for complex chronic disease.
Keyes, Katherine M; Smith, George Davey; Koenen, Karestan C; Galea, Sandro
2015-06-01
Attempts at predicting individual risk of disease based on common germline genetic variation have largely been disappointing. The present paper formalises why genetic prediction at the individual level is and will continue to have limited utility given the aetiological architecture of most common complex diseases. Data were simulated on one million populations with 10 000 individuals in each populations with varying prevalences of a genetic risk factor, an interacting environmental factor and the background rate of disease. The determinant risk ratio and risk difference magnitude for the association between a gene variant and disease is a function of the prevalence of the interacting factors that activate the gene, and the background rate of disease. The risk ratio and total excess cases due to the genetic factor increase as the prevalence of interacting factors increase, and decrease as the background rate of disease increases. Germline genetic variations have high predictive capacity for individual disease only under conditions of high heritability of particular genetic sequences, plausible only under rare variant hypotheses. Under a model of common germline genetic variants that interact with other genes and/or environmental factors in order to cause disease, the predictive capacity of common genetic variants is determined by the prevalence of the factors that interact with the variant and the background rate. A focus on estimating genetic associations for the purpose of prediction without explicitly grounding such work in an understanding of modifiable (including environmentally influenced) factors will be limited in its ability to yield important insights about the risk of disease. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Genomic Selection in Multi-environment Crop Trials.
Oakey, Helena; Cullis, Brian; Thompson, Robin; Comadran, Jordi; Halpin, Claire; Waugh, Robbie
2016-05-03
Genomic selection in crop breeding introduces modeling challenges not found in animal studies. These include the need to accommodate replicate plants for each line, consider spatial variation in field trials, address line by environment interactions, and capture nonadditive effects. Here, we propose a flexible single-stage genomic selection approach that resolves these issues. Our linear mixed model incorporates spatial variation through environment-specific terms, and also randomization-based design terms. It considers marker, and marker by environment interactions using ridge regression best linear unbiased prediction to extend genomic selection to multiple environments. Since the approach uses the raw data from line replicates, the line genetic variation is partitioned into marker and nonmarker residual genetic variation (i.e., additive and nonadditive effects). This results in a more precise estimate of marker genetic effects. Using barley height data from trials, in 2 different years, of up to 477 cultivars, we demonstrate that our new genomic selection model improves predictions compared to current models. Analyzing single trials revealed improvements in predictive ability of up to 5.7%. For the multiple environment trial (MET) model, combining both year trials improved predictive ability up to 11.4% compared to a single environment analysis. Benefits were significant even when fewer markers were used. Compared to a single-year standard model run with 3490 markers, our partitioned MET model achieved the same predictive ability using between 500 and 1000 markers depending on the trial. Our approach can be used to increase accuracy and confidence in the selection of the best lines for breeding and/or, to reduce costs by using fewer markers. Copyright © 2016 Oakey et al.
Evans, Jonathan P; Simmons, Leigh W
2008-09-01
The good-sperm and sexy-sperm (GS-SS) hypotheses predict that female multiple mating (polyandry) can fuel sexual selection for heritable male traits that promote success in sperm competition. A major prediction generated by these models, therefore, is that polyandry will benefit females indirectly via their sons' enhanced fertilization success. Furthermore, like classic 'good genes' and 'sexy son' models for the evolution of female preferences, GS-SS processes predict a genetic correlation between genes for female mating frequency (analogous to the female preference) and those for traits influencing fertilization success (the sexually selected traits). We examine the premise for these predictions by exploring the genetic basis of traits thought to influence fertilization success and female mating frequency. We also highlight recent debates that stress the possible genetic constraints to evolution of traits influencing fertilization success via GS-SS processes, including sex-linked inheritance, nonadditive effects, interacting parental genotypes, and trade-offs between integrated ejaculate components. Despite these possible constraints, the available data suggest that male traits involved in sperm competition typically exhibit substantial additive genetic variance and rapid evolutionary responses to selection. Nevertheless, the limited data on the genetic variation in female mating frequency implicate strong genetic maternal effects, including X-linkage, which is inconsistent with GS-SS processes. Although the relative paucity of studies on the genetic basis of polyandry does not allow us to draw firm conclusions about the evolutionary origins of this trait, the emerging pattern of sex linkage in genes for polyandry is more consistent with an evolutionary history of antagonistic selection over mating frequency. We advocate further development of GS-SS theory to take account of the complex evolutionary dynamics imposed by sexual conflict over mating frequency.
Unraveling additive from nonadditive effects using genomic relationship matrices.
Muñoz, Patricio R; Resende, Marcio F R; Gezan, Salvador A; Resende, Marcos Deon Vilela; de Los Campos, Gustavo; Kirst, Matias; Huber, Dudley; Peter, Gary F
2014-12-01
The application of quantitative genetics in plant and animal breeding has largely focused on additive models, which may also capture dominance and epistatic effects. Partitioning genetic variance into its additive and nonadditive components using pedigree-based models (P-genomic best linear unbiased predictor) (P-BLUP) is difficult with most commonly available family structures. However, the availability of dense panels of molecular markers makes possible the use of additive- and dominance-realized genomic relationships for the estimation of variance components and the prediction of genetic values (G-BLUP). We evaluated height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions (additive by additive, dominance by dominance, and additive by dominance), using either pedigree- or marker-based information. We show that, compared with the pedigree, use of realized genomic relationships in marker-based models yields a substantially more precise separation of additive and nonadditive components of genetic variance. We conclude that the marker-based relationship matrices in a model including additive and nonadditive effects performed better, improving breeding value prediction. Moreover, our results suggest that, for tree height in this population, the additive and nonadditive components of genetic variance are similar in magnitude. This novel result improves our current understanding of the genetic control and architecture of a quantitative trait and should be considered when developing breeding strategies. Copyright © 2014 by the Genetics Society of America.
Combining neural networks and genetic algorithms for hydrological flow forecasting
NASA Astrophysics Data System (ADS)
Neruda, Roman; Srejber, Jan; Neruda, Martin; Pascenko, Petr
2010-05-01
We present a neural network approach to rainfall-runoff modeling for small size river basins based on several time series of hourly measured data. Different neural networks are considered for short time runoff predictions (from one to six hours lead time) based on runoff and rainfall data observed in previous time steps. Correlation analysis shows that runoff data, short time rainfall history, and aggregated API values are the most significant data for the prediction. Neural models of multilayer perceptron and radial basis function networks with different numbers of units are used and compared with more traditional linear time series predictors. Out of possible 48 hours of relevant history of all the input variables, the most important ones are selected by means of input filters created by a genetic algorithm. The genetic algorithm works with population of binary encoded vectors defining input selection patterns. Standard genetic operators of two-point crossover, random bit-flipping mutation, and tournament selection were used. The evaluation of objective function of each individual consists of several rounds of building and testing a particular neural network model. The whole procedure is rather computational exacting (taking hours to days on a desktop PC), thus a high-performance mainframe computer has been used for our experiments. Results based on two years worth data from the Ploucnice river in Northern Bohemia suggest that main problems connected with this approach to modeling are ovetraining that can lead to poor generalization, and relatively small number of extreme events which makes it difficult for a model to predict the amplitude of the event. Thus, experiments with both absolute and relative runoff predictions were carried out. In general it can be concluded that the neural models show about 5 per cent improvement in terms of efficiency coefficient over liner models. Multilayer perceptrons with one hidden layer trained by back propagation algorithm and predicting relative runoff show the best behavior so far. Utilizing the genetically evolved input filter improves the performance of yet another 5 per cent. In the future we would like to continue with experiments in on-line prediction using real-time data from Smeda River with 6 hours lead time forecast. Following the operational reality we will focus on classification of the runoffs into flood alert levels, and reformulation of the time series prediction task as a classification problem. The main goal of all this work is to improve flood warning system operated by the Czech Hydrometeorological Institute.
Vrshek-Schallhorn, Suzanne; Stroud, Catherine B.; Mineka, Susan; Zinbarg, Richard E.; Adam, Emma K.; Redei, Eva E.; Hammen, Constance; Craske, Michelle G.
2016-01-01
Behavioral genetic research supports polygenic models of depression in which many genetic variations each contribute a small amount of risk, and prevailing diathesis-stress models suggest gene-environment interactions (GxE). Multilocus profile scores of additive risk offer an approach that is consistent with polygenic models of depression risk. In a first demonstration of this approach in a GxE predicting depression, we created an additive multilocus profile score from five serotonin system polymorphisms (one each in the genes HTR1A, HTR2A, HTR2C, and two in TPH2). Analyses focused on two forms of interpersonal stress as environmental risk factors. Using five years of longitudinal diagnostic and life stress interviews from 387 emerging young adults in the Youth Emotion Project, survival analyses show that this multilocus profile score interacts with major interpersonal stressful life events to predict major depressive episode onsets (HR = 1.815, p = .007). Simultaneously, there was a significant protective effect of the profile score without a recent event (HR = 0.83, p = .030). The GxE effect with interpersonal chronic stress was not significant (HR = 1.15, p = .165). Finally, effect sizes for genetic factors examined ignoring stress suggested such an approach could lead to overlooking or misinterpreting genetic effects. Both the GxE effect and the protective simple main effect were replicated in a sample of early adolescent girls (N = 105). We discuss potential benefits of the multilocus genetic profile score approach and caveats for future research. PMID:26595467
Genetic prediction of type 2 diabetes using deep neural network.
Kim, J; Kim, J; Kwak, M J; Bajaj, M
2018-04-01
Type 2 diabetes (T2DM) has strong heritability but genetic models to explain heritability have been challenging. We tested deep neural network (DNN) to predict T2DM using the nested case-control study of Nurses' Health Study (3326 females, 45.6% T2DM) and Health Professionals Follow-up Study (2502 males, 46.5% T2DM). We selected 96, 214, 399, and 678 single-nucleotide polymorphism (SNPs) through Fisher's exact test and L1-penalized logistic regression. We split each dataset randomly in 4:1 to train prediction models and test their performance. DNN and logistic regressions showed better area under the curve (AUC) of ROC curves than the clinical model when 399 or more SNPs included. DNN was superior than logistic regressions in AUC with 399 or more SNPs in male and 678 SNPs in female. Addition of clinical factors consistently increased AUC of DNN but failed to improve logistic regressions with 214 or more SNPs. In conclusion, we show that DNN can be a versatile tool to predict T2DM incorporating large numbers of SNPs and clinical information. Limitations include a relatively small number of the subjects mostly of European ethnicity. Further studies are warranted to confirm and improve performance of genetic prediction models using DNN in different ethnic groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Genetic markers enhance coronary risk prediction in men: the MORGAM prospective cohorts.
Hughes, Maria F; Saarela, Olli; Stritzke, Jan; Kee, Frank; Silander, Kaisa; Klopp, Norman; Kontto, Jukka; Karvanen, Juha; Willenborg, Christina; Salomaa, Veikko; Virtamo, Jarmo; Amouyel, Phillippe; Arveiler, Dominique; Ferrières, Jean; Wiklund, Per-Gunner; Baumert, Jens; Thorand, Barbara; Diemert, Patrick; Trégouët, David-Alexandre; Hengstenberg, Christian; Peters, Annette; Evans, Alun; Koenig, Wolfgang; Erdmann, Jeanette; Samani, Nilesh J; Kuulasmaa, Kari; Schunkert, Heribert
2012-01-01
More accurate coronary heart disease (CHD) prediction, specifically in middle-aged men, is needed to reduce the burden of disease more effectively. We hypothesised that a multilocus genetic risk score could refine CHD prediction beyond classic risk scores and obtain more precise risk estimates using a prospective cohort design. Using data from nine prospective European cohorts, including 26,221 men, we selected in a case-cohort setting 4,818 healthy men at baseline, and used Cox proportional hazards models to examine associations between CHD and risk scores based on genetic variants representing 13 genomic regions. Over follow-up (range: 5-18 years), 1,736 incident CHD events occurred. Genetic risk scores were validated in men with at least 10 years of follow-up (632 cases, 1361 non-cases). Genetic risk score 1 (GRS1) combined 11 SNPs and two haplotypes, with effect estimates from previous genome-wide association studies. GRS2 combined 11 SNPs plus 4 SNPs from the haplotypes with coefficients estimated from these prospective cohorts using 10-fold cross-validation. Scores were added to a model adjusted for classic risk factors comprising the Framingham risk score and 10-year risks were derived. Both scores improved net reclassification (NRI) over the Framingham score (7.5%, p = 0.017 for GRS1, 6.5%, p = 0.044 for GRS2) but GRS2 also improved discrimination (c-index improvement 1.11%, p = 0.048). Subgroup analysis on men aged 50-59 (436 cases, 603 non-cases) improved net reclassification for GRS1 (13.8%) and GRS2 (12.5%). Net reclassification improvement remained significant for both scores when family history of CHD was added to the baseline model for this male subgroup improving prediction of early onset CHD events. Genetic risk scores add precision to risk estimates for CHD and improve prediction beyond classic risk factors, particularly for middle aged men.
Toffanin, V; Penasa, M; McParland, S; Berry, D P; Cassandro, M; De Marchi, M
2015-05-01
The aim of the present study was to estimate genetic parameters for calcium (Ca), phosphorus (P) and titratable acidity (TA) in bovine milk predicted by mid-IR spectroscopy (MIRS). Data consisted of 2458 Italian Holstein-Friesian cows sampled once in 220 farms. Information per sample on protein and fat percentage, pH and somatic cell count, as well as test-day milk yield, was also available. (Co)variance components were estimated using univariate and bivariate animal linear mixed models. Fixed effects considered in the analyses were herd of sampling, parity, lactation stage and a two-way interaction between parity and lactation stage; an additive genetic and residual term were included in the models as random effects. Estimates of heritability for Ca, P and TA were 0.10, 0.12 and 0.26, respectively. Positive moderate to strong phenotypic correlations (0.33 to 0.82) existed between Ca, P and TA, whereas phenotypic weak to moderate correlations (0.00 to 0.45) existed between these traits with both milk quality and yield. Moderate to strong genetic correlations (0.28 to 0.92) existed between Ca, P and TA, and between these predicted traits with both fat and protein percentage (0.35 to 0.91). The existence of heritable genetic variation for Ca, P and TA, coupled with the potential to predict these components for routine cow milk testing, imply that genetic gain in these traits is indeed possible.
Caraviello, D Z; Weigel, K A; Gianola, D
2004-05-01
Predicted transmitting abilities (PTA) of US Jersey sires for daughter longevity were calculated using a Weibull proportional hazards sire model and compared with predictions from a conventional linear animal model. Culling data from 268,008 Jersey cows with first calving from 1981 to 2000 were used. The proportional hazards model included time-dependent effects of herd-year-season contemporary group and parity by stage of lactation interaction, as well as time-independent effects of sire and age at first calving. Sire variances and parameters of the Weibull distribution were estimated, providing heritability estimates of 4.7% on the log scale and 18.0% on the original scale. The PTA of each sire was expressed as the expected risk of culling relative to daughters of an average sire. Risk ratios (RR) ranged from 0.7 to 1.3, indicating that the risk of culling for daughters of the best sires was 30% lower than for daughters of average sires and nearly 50% lower than than for daughters of the poorest sires. Sire PTA from the proportional hazards model were compared with PTA from a linear model similar to that used for routine national genetic evaluation of length of productive life (PL) using cross-validation in independent samples of herds. Models were compared using logistic regression of daughters' stayability to second, third, fourth, or fifth lactation on their sires' PTA values, with alternative approaches for weighting the contribution of each sire. Models were also compared using logistic regression of daughters' stayability to 36, 48, 60, 72, and 84 mo of life. The proportional hazards model generally yielded more accurate predictions according to these criteria, but differences in predictive ability between methods were smaller when using a Kullback-Leibler distance than with other approaches. Results of this study suggest that survival analysis methodology may provide more accurate predictions of genetic merit for longevity than conventional linear models.
A Pareto-optimal moving average multigene genetic programming model for daily streamflow prediction
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Kahya, Ercan
2017-06-01
Genetic programming (GP) is able to systematically explore alternative model structures of different accuracy and complexity from observed input and output data. The effectiveness of GP in hydrological system identification has been recognized in recent studies. However, selecting a parsimonious (accurate and simple) model from such alternatives still remains a question. This paper proposes a Pareto-optimal moving average multigene genetic programming (MA-MGGP) approach to develop a parsimonious model for single-station streamflow prediction. The three main components of the approach that take us from observed data to a validated model are: (1) data pre-processing, (2) system identification and (3) system simplification. The data pre-processing ingredient uses a simple moving average filter to diminish the lagged prediction effect of stand-alone data-driven models. The multigene ingredient of the model tends to identify the underlying nonlinear system with expressions simpler than classical monolithic GP and, eventually simplification component exploits Pareto front plot to select a parsimonious model through an interactive complexity-efficiency trade-off. The approach was tested using the daily streamflow records from a station on Senoz Stream, Turkey. Comparing to the efficiency results of stand-alone GP, MGGP, and conventional multi linear regression prediction models as benchmarks, the proposed Pareto-optimal MA-MGGP model put forward a parsimonious solution, which has a noteworthy importance of being applied in practice. In addition, the approach allows the user to enter human insight into the problem to examine evolved models and pick the best performing programs out for further analysis.
Phuong, H N; Martin, O; de Boer, I J M; Ingvartsen, K L; Schmidely, Ph; Friggens, N C
2015-01-01
This study explored the ability of an existing lifetime nutrient partitioning model for simulating individual variability in genetic potentials of dairy cows. Generally, the model assumes a universal trajectory of dynamic partitioning of priority between life functions and genetic scaling parameters are then incorporated to simulate individual difference in performance. Data of 102 cows including 180 lactations of 3 breeds: Danish Red, Danish Holstein, and Jersey, which were completely independent from those used previously for model development, were used. Individual cow performance records through sequential lactations were used to derive genetic scaling parameters for each animal by calibrating the model to achieve best fit, cow by cow. The model was able to fit individual curves of body weight, and milk fat, milk protein, and milk lactose concentrations with a high degree of accuracy. Daily milk yield and dry matter intake were satisfactorily predicted in early and mid lactation, but underpredictions were found in late lactation. Breeds and parities did not significantly affect the prediction accuracy. The means of genetic scaling parameters between Danish Red and Danish Holstein were similar but significantly different from those of Jersey. The extent of correlations between the genetic scaling parameters was consistent with that reported in the literature. In conclusion, this model is of value as a tool to derive estimates of genetic potentials of milk yield, milk composition, body reserve usage, and growth for different genotypes of cow. Moreover, it can be used to separate genetic variability in performance between individual cows from environmental noise. The model enables simulation of the effects of a genetic selection strategy on lifetime efficiency of individual cows, which has a main advantage of including the rearing costs, and thus, can be used to explore the impact of future selection on animal performance and efficiency. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Gamal El-Dien, Omnia; Ratcliffe, Blaise; Klápště, Jaroslav; Chen, Charles; Porth, Ilga; El-Kassaby, Yousry A
2015-05-09
Genomic selection (GS) in forestry can substantially reduce the length of breeding cycle and increase gain per unit time through early selection and greater selection intensity, particularly for traits of low heritability and late expression. Affordable next-generation sequencing technologies made it possible to genotype large numbers of trees at a reasonable cost. Genotyping-by-sequencing was used to genotype 1,126 Interior spruce trees representing 25 open-pollinated families planted over three sites in British Columbia, Canada. Four imputation algorithms were compared (mean value (MI), singular value decomposition (SVD), expectation maximization (EM), and a newly derived, family-based k-nearest neighbor (kNN-Fam)). Trees were phenotyped for several yield and wood attributes. Single- and multi-site GS prediction models were developed using the Ridge Regression Best Linear Unbiased Predictor (RR-BLUP) and the Generalized Ridge Regression (GRR) to test different assumption about trait architecture. Finally, using PCA, multi-trait GS prediction models were developed. The EM and kNN-Fam imputation methods were superior for 30 and 60% missing data, respectively. The RR-BLUP GS prediction model produced better accuracies than the GRR indicating that the genetic architecture for these traits is complex. GS prediction accuracies for multi-site were high and better than those of single-sites while multi-site predictability produced the lowest accuracies reflecting type-b genetic correlations and deemed unreliable. The incorporation of genomic information in quantitative genetics analyses produced more realistic heritability estimates as half-sib pedigree tended to inflate the additive genetic variance and subsequently both heritability and gain estimates. Principle component scores as representatives of multi-trait GS prediction models produced surprising results where negatively correlated traits could be concurrently selected for using PCA2 and PCA3. The application of GS to open-pollinated family testing, the simplest form of tree improvement evaluation methods, was proven to be effective. Prediction accuracies obtained for all traits greatly support the integration of GS in tree breeding. While the within-site GS prediction accuracies were high, the results clearly indicate that single-site GS models ability to predict other sites are unreliable supporting the utilization of multi-site approach. Principle component scores provided an opportunity for the concurrent selection of traits with different phenotypic optima.
Improving coeliac disease risk prediction by testing non-HLA variants additional to HLA variants.
Romanos, Jihane; Rosén, Anna; Kumar, Vinod; Trynka, Gosia; Franke, Lude; Szperl, Agata; Gutierrez-Achury, Javier; van Diemen, Cleo C; Kanninga, Roan; Jankipersadsing, Soesma A; Steck, Andrea; Eisenbarth, Georges; van Heel, David A; Cukrowska, Bozena; Bruno, Valentina; Mazzilli, Maria Cristina; Núñez, Concepcion; Bilbao, Jose Ramon; Mearin, M Luisa; Barisani, Donatella; Rewers, Marian; Norris, Jill M; Ivarsson, Anneli; Boezen, H Marieke; Liu, Edwin; Wijmenga, Cisca
2014-03-01
The majority of coeliac disease (CD) patients are not being properly diagnosed and therefore remain untreated, leading to a greater risk of developing CD-associated complications. The major genetic risk heterodimer, HLA-DQ2 and DQ8, is already used clinically to help exclude disease. However, approximately 40% of the population carry these alleles and the majority never develop CD. We explored whether CD risk prediction can be improved by adding non-HLA-susceptible variants to common HLA testing. We developed an average weighted genetic risk score with 10, 26 and 57 single nucleotide polymorphisms (SNP) in 2675 cases and 2815 controls and assessed the improvement in risk prediction provided by the non-HLA SNP. Moreover, we assessed the transferability of the genetic risk model with 26 non-HLA variants to a nested case-control population (n=1709) and a prospective cohort (n=1245) and then tested how well this model predicted CD outcome for 985 independent individuals. Adding 57 non-HLA variants to HLA testing showed a statistically significant improvement compared to scores from models based on HLA only, HLA plus 10 SNP and HLA plus 26 SNP. With 57 non-HLA variants, the area under the receiver operator characteristic curve reached 0.854 compared to 0.823 for HLA only, and 11.1% of individuals were reclassified to a more accurate risk group. We show that the risk model with HLA plus 26 SNP is useful in independent populations. Predicting risk with 57 additional non-HLA variants improved the identification of potential CD patients. This demonstrates a possible role for combined HLA and non-HLA genetic testing in diagnostic work for CD.
Ning, Kaida; Chen, Bo; Sun, Fengzhu; Hobel, Zachary; Zhao, Lu; Matloff, Will; Toga, Arthur W
2018-08-01
A long-standing question is how to best use brain morphometric and genetic data to distinguish Alzheimer's disease (AD) patients from cognitively normal (CN) subjects and to predict those who will progress from mild cognitive impairment (MCI) to AD. Here, we use a neural network (NN) framework on both magnetic resonance imaging-derived quantitative structural brain measures and genetic data to address this question. We tested the effectiveness of NN models in classifying and predicting AD. We further performed a novel analysis of the NN model to gain insight into the most predictive imaging and genetics features and to identify possible interactions between features that affect AD risk. Data were obtained from the AD Neuroimaging Initiative cohort and included baseline structural MRI data and single nucleotide polymorphism (SNP) data for 138 AD patients, 225 CN subjects, and 358 MCI patients. We found that NN models with both brain and SNP features as predictors perform significantly better than models with either alone in classifying AD and CN subjects, with an area under the receiver operating characteristic curve (AUC) of 0.992, and in predicting the progression from MCI to AD (AUC=0.835). The most important predictors in the NN model were the left middle temporal gyrus volume, the left hippocampus volume, the right entorhinal cortex volume, and the APOE (a gene that encodes apolipoprotein E) ɛ4 risk allele. Furthermore, we identified interactions between the right parahippocampal gyrus and the right lateral occipital gyrus, the right banks of the superior temporal sulcus and the left posterior cingulate, and SNP rs10838725 and the left lateral occipital gyrus. Our work shows the ability of NN models to not only classify and predict AD occurrence but also to identify important AD risk factors and interactions among them. Copyright © 2018 Elsevier Inc. All rights reserved.
Movement behavior explains genetic differentiation in American black bears
Samuel A Cushman; Jesse S. Lewis
2010-01-01
Individual-based landscape genetic analyses provide empirically based models of gene flow. It would be valuable to verify the predictions of these models using independent data of a different type. Analyses using different data sources that produce consistent results provide strong support for the generality of the findings. Mating and dispersal movements are the...
Ogungbenro, Kayode; Aarons, Leon
2015-01-01
Aims To extend the physiologically based pharmacokinetic (PBPK) model developed for 6-mercaptopurine to account for intracellular metabolism and to explore the role of genetic polymorphism in the TPMT enzyme on the pharmacokinetics of 6-mercaptopurine. Methods The developed PBPK model was extended for 6-mercaptopurine to account for intracellular metabolism and genetic polymorphism in TPMT activity. System and drug specific parameters were obtained from the literature or estimated using plasma or intracellular red blood cell concentrations of 6-mercaptopurine and its metabolites. Age-dependent changes in parameters were implemented for scaling, and variability was also introduced for simulation. The model was validated using published data. Results The model was extended successfully. Parameter estimation and model predictions were satisfactory. Prediction of intracellular red blood cell concentrations of 6-thioguanine nucleotide for different TPMT phenotypes (in a clinical study that compared conventional and individualized dosing) showed results that were consistent with observed values and reported incidence of haematopoietic toxicity. Following conventional dosing, the predicted mean concentrations for homozygous and heterozygous variants, respectively, were about 10 times and two times the levels for wild-type. However, following individualized dosing, the mean concentration was around the same level for the three phenotypes despite different doses. Conclusions The developed PBPK model has been extended for 6-mercaptopurine and can be used to predict plasma 6-mercaptopurine and tissue concentration of 6-mercaptopurine, 6-thioguanine nucleotide and 6-methylmercaptopurine ribonucleotide in adults and children. Predictions of reported data from clinical studies showed satisfactory results. The model may help to improve 6-mercaptopurine dosing, achieve better clinical outcome and reduce toxicity. PMID:25614061
Buzzetti, R; Prudente, S; Copetti, M; Dauriz, M; Zampetti, S; Garofolo, M; Penno, G; Trischitta, V
2017-02-01
We are currently facing several attempts aimed at marketing genetic data for predicting multifactorial diseases, among which diabetes mellitus is one of the more prevalent. The present document primarily aims at providing to practicing physicians a summary of available data regarding the role of genetic information in predicting diabetes and its chronic complications. Firstly, general information about characteristics and performance of risk prediction tools will be presented in order to help clinicians to get acquainted with basic methodological information related to the subject at issue. Then, as far as type 1 diabetes is concerned, available data indicate that genetic information and counseling may be useful only in families with many affected individuals. However, since no disease prevention is possible, the utility of predicting this form of diabetes is at question. In the case of type 2 diabetes, available data really question the utility of adding genetic information on top of well performing, easy available and inexpensive non-genetic markers. Finally, the possibility of using the few available genetic data on diabetic complications for improving our ability to predict them will also be presented and discussed. For cardiovascular complication, the addition of genetic information to models based on clinical features does not translate in a substantial improvement in risk discrimination. For all other diabetic complications genetic information are currently very poor and cannot, therefore, be used for improving risk stratification. In all, nowadays the use of genetic testing for predicting diabetes and its chronic complications is definitively of little value in clinical practice. Copyright © 2016 The Italian Society of Diabetology, the Italian Society for the Study of Atherosclerosis, the Italian Society of Human Nutrition, and the Department of Clinical Medicine and Surgery, Federico II University. Published by Elsevier B.V. All rights reserved.
Levine, Rebecca S; Peterson, A Townsend; Benedict, Mark Q
2004-02-01
The distribution of the Anopheles gambiae complex of malaria vectors in Africa is uncertain due to under-sampling of vast regions. We use ecologic niche modeling to predict the potential distribution of three members of the complex (A. gambiae, A. arabiensis, and A. quadriannulatus) and demonstrate the statistical significance of the models. Predictions correspond well to previous estimates, but provide detail regarding spatial discontinuities in the distribution of A. gambiae s.s. that are consistent with population genetic studies. Our predictions also identify large areas of Africa where the presence of A. arabiensis is predicted, but few specimens have been obtained, suggesting under-sampling of the species. Finally, we project models developed from African distribution data for the late 1900s into the past and to South America to determine retrospectively whether the deadly 1929 introduction of A. gambiae sensu lato into Brazil was more likely that of A. gambiae sensu stricto or A. arabiensis.
Genome-Wide Association Analysis of Adaptation Using Environmentally Predicted Traits
van Zanten, Martijn
2015-01-01
Current methods for studying the genetic basis of adaptation evaluate genetic associations with ecologically relevant traits or single environmental variables, under the implicit assumption that natural selection imposes correlations between phenotypes, environments and genotypes. In practice, observed trait and environmental data are manifestations of unknown selective forces and are only indirectly associated with adaptive genetic variation. In theory, improved estimation of these forces could enable more powerful detection of loci under selection. Here we present an approach in which we approximate adaptive variation by modeling phenotypes as a function of the environment and using the predicted trait in multivariate and univariate genome-wide association analysis (GWAS). Based on computer simulations and published flowering time data from the model plant Arabidopsis thaliana, we find that environmentally predicted traits lead to higher recovery of functional loci in multivariate GWAS and are more strongly correlated to allele frequencies at adaptive loci than individual environmental variables. Our results provide an example of the use of environmental data to obtain independent and meaningful information on adaptive genetic variation. PMID:26496492
Predictive accuracy of combined genetic and environmental risk scores.
Dudbridge, Frank; Pashayan, Nora; Yang, Jian
2018-02-01
The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. © 2017 WILEY PERIODICALS, INC.
Predictive accuracy of combined genetic and environmental risk scores
Pashayan, Nora; Yang, Jian
2017-01-01
ABSTRACT The substantial heritability of most complex diseases suggests that genetic data could provide useful risk prediction. To date the performance of genetic risk scores has fallen short of the potential implied by heritability, but this can be explained by insufficient sample sizes for estimating highly polygenic models. When risk predictors already exist based on environment or lifestyle, two key questions are to what extent can they be improved by adding genetic information, and what is the ultimate potential of combined genetic and environmental risk scores? Here, we extend previous work on the predictive accuracy of polygenic scores to allow for an environmental score that may be correlated with the polygenic score, for example when the environmental factors mediate the genetic risk. We derive common measures of predictive accuracy and improvement as functions of the training sample size, chip heritabilities of disease and environmental score, and genetic correlation between disease and environmental risk factors. We consider simple addition of the two scores and a weighted sum that accounts for their correlation. Using examples from studies of cardiovascular disease and breast cancer, we show that improvements in discrimination are generally small but reasonable degrees of reclassification could be obtained with current sample sizes. Correlation between genetic and environmental scores has only minor effects on numerical results in realistic scenarios. In the longer term, as the accuracy of polygenic scores improves they will come to dominate the predictive accuracy compared to environmental scores. PMID:29178508
Integrating environmental and genetic effects to predict responses of tree populations to climate.
Wang, Tongli; O'Neill, Gregory A; Aitken, Sally N
2010-01-01
Climate is a major environmental factor affecting the phenotype of trees and is also a critical agent of natural selection that has molded among-population genetic variation. Population response functions describe the environmental effect of planting site climates on the performance of a single population, whereas transfer functions describe among-population genetic variation molded by natural selection for climate. Although these approaches are widely used to predict the responses of trees to climate change, both have limitations. We present a novel approach that integrates both genetic and environmental effects into a single "universal response function" (URF) to better predict the influence of climate on phenotypes. Using a large lodgepole pine (Pinus contorta Dougl. ex Loud.) field transplant experiment composed of 140 populations planted on 62 sites to demonstrate the methodology, we show that the URF makes full use of data from provenance trials to: (1) improve predictions of climate change impacts on phenotypes; (2) reduce the size and cost of future provenance trials without compromising predictive power; (3) more fully exploit existing, less comprehensive provenance tests; (4) quantify and compare environmental and genetic effects of climate on population performance; and (5) predict the performance of any population growing in any climate. Finally, we discuss how the last attribute allows the URF to be used as a mechanistic model to predict population and species ranges for the future and to guide assisted migration of seed for reforestation, restoration, or afforestation and genetic conservation in a changing climate.
Modelling the effect of structural QSAR parameters on skin penetration using genetic programming
NASA Astrophysics Data System (ADS)
Chung, K. K.; Do, D. Q.
2010-09-01
In order to model relationships between chemical structures and biological effects in quantitative structure-activity relationship (QSAR) data, an alternative technique of artificial intelligence computing—genetic programming (GP)—was investigated and compared to the traditional method—statistical. GP, with the primary advantage of generating mathematical equations, was employed to model QSAR data and to define the most important molecular descriptions in QSAR data. The models predicted by GP agreed with the statistical results, and the most predictive models of GP were significantly improved when compared to the statistical models using ANOVA. Recently, artificial intelligence techniques have been applied widely to analyse QSAR data. With the capability of generating mathematical equations, GP can be considered as an effective and efficient method for modelling QSAR data.
Genomic Prediction Accounting for Residual Heteroskedasticity.
Ou, Zhining; Tempelman, Robert J; Steibel, Juan P; Ernst, Catherine W; Bates, Ronald O; Bello, Nora M
2015-11-12
Whole-genome prediction (WGP) models that use single-nucleotide polymorphism marker information to predict genetic merit of animals and plants typically assume homogeneous residual variance. However, variability is often heterogeneous across agricultural production systems and may subsequently bias WGP-based inferences. This study extends classical WGP models based on normality, heavy-tailed specifications and variable selection to explicitly account for environmentally-driven residual heteroskedasticity under a hierarchical Bayesian mixed-models framework. WGP models assuming homogeneous or heterogeneous residual variances were fitted to training data generated under simulation scenarios reflecting a gradient of increasing heteroskedasticity. Model fit was based on pseudo-Bayes factors and also on prediction accuracy of genomic breeding values computed on a validation data subset one generation removed from the simulated training dataset. Homogeneous vs. heterogeneous residual variance WGP models were also fitted to two quantitative traits, namely 45-min postmortem carcass temperature and loin muscle pH, recorded in a swine resource population dataset prescreened for high and mild residual heteroskedasticity, respectively. Fit of competing WGP models was compared using pseudo-Bayes factors. Predictive ability, defined as the correlation between predicted and observed phenotypes in validation sets of a five-fold cross-validation was also computed. Heteroskedastic error WGP models showed improved model fit and enhanced prediction accuracy compared to homoskedastic error WGP models although the magnitude of the improvement was small (less than two percentage points net gain in prediction accuracy). Nevertheless, accounting for residual heteroskedasticity did improve accuracy of selection, especially on individuals of extreme genetic merit. Copyright © 2016 Ou et al.
Hill, William G
2014-01-01
Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives' performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher's infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with "genomic selection" is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas.
The shaping of genetic variation in edge-of-range populations under past and future climate change
Razgour, Orly; Juste, Javier; Ibáñez, Carlos; Kiefer, Andreas; Rebelo, Hugo; Puechmaille, Sébastien J; Arlettaz, Raphael; Burke, Terry; Dawson, Deborah A; Beaumont, Mark; Jones, Gareth; Wiens, John
2013-01-01
With rates of climate change exceeding the rate at which many species are able to shift their range or adapt, it is important to understand how future changes are likely to affect biodiversity at all levels of organisation. Understanding past responses and extent of niche conservatism in climatic tolerance can help predict future consequences. We use an integrated approach to determine the genetic consequences of past and future climate changes on a bat species, Plecotus austriacus. Glacial refugia predicted by palaeo-modelling match those identified from analyses of extant genetic diversity and model-based inference of demographic history. Former refugial populations currently contain disproportionately high genetic diversity, but niche conservatism, shifts in suitable areas and barriers to migration mean that these hotspots of genetic diversity are under threat from future climate change. Evidence of population decline despite recent northward migration highlights the need to conserve leading-edge populations for spearheading future range shifts. PMID:23890483
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
2017-08-10
A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.
Improvement of Predictive Ability by Uniform Coverage of the Target Genetic Space
Bustos-Korts, Daniela; Malosetti, Marcos; Chapman, Scott; Biddulph, Ben; van Eeuwijk, Fred
2016-01-01
Genome-enabled prediction provides breeders with the means to increase the number of genotypes that can be evaluated for selection. One of the major challenges in genome-enabled prediction is how to construct a training set of genotypes from a calibration set that represents the target population of genotypes, where the calibration set is composed of a training and validation set. A random sampling protocol of genotypes from the calibration set will lead to low quality coverage of the total genetic space by the training set when the calibration set contains population structure. As a consequence, predictive ability will be affected negatively, because some parts of the genotypic diversity in the target population will be under-represented in the training set, whereas other parts will be over-represented. Therefore, we propose a training set construction method that uniformly samples the genetic space spanned by the target population of genotypes, thereby increasing predictive ability. To evaluate our method, we constructed training sets alongside with the identification of corresponding genomic prediction models for four genotype panels that differed in the amount of population structure they contained (maize Flint, maize Dent, wheat, and rice). Training sets were constructed using uniform sampling, stratified-uniform sampling, stratified sampling and random sampling. We compared these methods with a method that maximizes the generalized coefficient of determination (CD). Several training set sizes were considered. We investigated four genomic prediction models: multi-locus QTL models, GBLUP models, combinations of QTL and GBLUPs, and Reproducing Kernel Hilbert Space (RKHS) models. For the maize and wheat panels, construction of the training set under uniform sampling led to a larger predictive ability than under stratified and random sampling. The results of our methods were similar to those of the CD method. For the rice panel, all training set construction methods led to similar predictive ability, a reflection of the very strong population structure in this panel. PMID:27672112
Fleming, A; Schenkel, F S; Koeck, A; Malchiodi, F; Ali, R A; Corredig, M; Mallard, B; Sargolzaei, M; Miglior, F
2017-05-01
The objective of this study was to estimate the heritability of milk fat globule (MFG) size and mid-infrared (MIR) predicted MFG size in Holstein cattle. The genetic correlations between measured and predicted MFG size with milk fat and protein percentage were also investigated. Average MFG size was measured in 1,583 milk samples taken from 254 Holstein cows from 29 herds across Canada. Size was expressed as volume moment mean (D[4,3]) and surface moment mean (D[3,2]). Analyzed milk samples also had average MFG size predicted from their MIR spectral records. Fat and protein percentages were obtained for all test-day milk samples in the cow's lactation. Univariate and bivariate repeatability animal models were used to estimate heritability and genetic correlations. Moderate heritabilities of 0.364 and 0.466 were found for D[4,3] and D[3,2], respectively, and a strong genetic correlation was found between the 2 traits (0.98). The heritabilities for the MIR-predicted MFG size were lower than those estimated for the measured MFG size at 0.300 for predicted D[4,3] and 0.239 for predicted D[3,2]. The genetic correlation between measured and predicted D[4,3] was 0.685; the correlation was slightly higher between measured and predicted D[3,2] at 0.764, likely due to the better prediction accuracy of D[3,2]. Milk fat percentage had moderate genetic correlations with both D[4,3] and D[3,2] (0.538 and 0.681, respectively). The genetic correlation between predicted MFG size and fat percentage was much stronger (greater than 0.97 for both predicted D[4,3] and D[3,2]). The stronger correlation suggests a limitation for the use of the predicted values of MFG size as indicator traits for true average MFG size in milk in selection programs. Larger samples sizes are required to provide better evidence of the estimated genetic parameters. A genetic component appears to exist for the average MFG size in bovine milk, and the variation could be exploited in selection programs. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Oppositional Defiant Disorder dimensions: genetic influences and risk for later psychopathology
Mikolajewski, Amy J.; Taylor, Jeanette; Iacono, William G.
2016-01-01
Background This study was undertaken to determine how well two Oppositional Defiant Disorder (ODD) dimensions (irritable and headstrong/hurtful) assessed in childhood predict late adolescent psychopathology and the degree to which these outcomes can be attributed to genetic influences shared with ODD dimensions. Methods Psychopathology was assessed via diagnostic interviews of 1225 twin pairs at ages 11 and 17. Results Consistent with hypotheses, the irritable dimension uniquely predicted overall internalizing problems, whereas the headstrong/hurtful dimension uniquely predicted substance use disorder symptoms. Both dimensions were predictive of antisocial behavior, and overall externalizing problems. The expected relationships between the irritable dimension and specific internalizing disorders were not found. Twin modeling showed the irritable and headstrong/hurtful dimensions were related to late adolescent psychopathology symptoms through common genetic influences. Conclusions Symptoms of ODD in childhood pose a significant risk for various mental health outcomes in late adolescence. Further, common genetic influences underlie the covariance between irritable symptoms in childhood and overall internalizing problems in late adolescence, whereas headstrong/hurtful symptoms share genetic influences with substance use disorder symptoms. Antisocial behavior and overall externalizing share common genetic influences with both the irritable and headstrong/hurtful dimensions. PMID:28059443
Oppositional defiant disorder dimensions: genetic influences and risk for later psychopathology.
Mikolajewski, Amy J; Taylor, Jeanette; Iacono, William G
2017-06-01
This study was undertaken to determine how well two oppositional defiant disorder (ODD) dimensions (irritable and headstrong/hurtful) assessed in childhood predict late adolescent psychopathology and the degree to which these outcomes can be attributed to genetic influences shared with ODD dimensions. Psychopathology was assessed via diagnostic interviews of 1,225 twin pairs at ages 11 and 17. Consistent with hypotheses, the irritable dimension uniquely predicted overall internalizing problems, whereas the headstrong/hurtful dimension uniquely predicted substance use disorder symptoms. Both dimensions were predictive of antisocial behavior and overall externalizing problems. The expected relationships between the irritable dimension and specific internalizing disorders were not found. Twin modeling showed that the irritable and headstrong/hurtful dimensions were related to late adolescent psychopathology symptoms through common genetic influences. Symptoms of ODD in childhood pose a significant risk for various mental health outcomes in late adolescence. Further, common genetic influences underlie the covariance between irritable symptoms in childhood and overall internalizing problems in late adolescence, whereas headstrong/hurtful symptoms share genetic influences with substance use disorder symptoms. Antisocial behavior and overall externalizing share common genetic influences with both the irritable and headstrong/hurtful dimensions. © 2017 Association for Child and Adolescent Mental Health.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
Evaluating realized genetic gains from tree improvement.
J.B. St. Clair
1993-01-01
Tree improvement has become an essential part of the management of forest lands for wood production, and predicting yields and realized gains from forests planted with genetically-improved trees will become increasingly important. This paper discusses concepts of tree improvement and genetic gain important to growth and yield modeling, and reviews previous studies of...
Fernández-Cadenas, Israel; Mendióroz, Maite; Giralt, Dolors; Nafria, Cristina; Garcia, Elena; Carrera, Caty; Gallego-Fabrega, Cristina; Domingues-Montanari, Sophie; Delgado, Pilar; Ribó, Marc; Castellanos, Mar; Martínez, Sergi; Freijo, Marimar; Jiménez-Conde, Jordi; Rubiera, Marta; Alvarez-Sabín, José; Molina, Carlos A; Font, Maria Angels; Grau Olivares, Marta; Palomeras, Ernest; Perez de la Ossa, Natalia; Martinez-Zabaleta, Maite; Masjuan, Jaime; Moniche, Francisco; Canovas, David; Piñana, Carlos; Purroy, Francisco; Cocho, Dolores; Navas, Inma; Tejero, Carlos; Aymerich, Nuria; Cullell, Natalia; Muiño, Elena; Serena, Joaquín; Rubio, Francisco; Davalos, Antoni; Roquer, Jaume; Arenillas, Juan Francisco; Martí-Fábregas, Joan; Keene, Keith; Chen, Wei-Min; Worrall, Bradford; Sale, Michele; Arboix, Adrià; Krupinski, Jerzy; Montaner, Joan
2017-05-01
Vascular recurrence occurs in 11% of patients during the first year after ischemic stroke (IS) or transient ischemic attack. Clinical scores do not predict the whole vascular recurrence risk; therefore, we aimed to find genetic variants associated with recurrence that might improve the clinical predictive models in IS. We analyzed 256 polymorphisms from 115 candidate genes in 3 patient cohorts comprising 4482 IS or transient ischemic attack patients. The discovery cohort was prospectively recruited and included 1494 patients, 6.2% of them developed a new IS during the first year of follow-up. Replication analysis was performed in 2988 patients using SNPlex or HumanOmni1-Quad technology. We generated a predictive model using Cox regression (GRECOS score [Genotyping Reurrence Risk of Stroke]) and generated risk groups using a classification tree method. The analyses revealed that rs1800801 in the MGP gene (hazard ratio, 1.33; P =9×10 - 03 ), a gene related to artery calcification, was associated with new IS during the first year of follow-up. This polymorphism was replicated in a Spanish cohort (n=1.305); however, it was not significantly associated in a North American cohort (n=1.683). The GRECOS score predicted new IS ( P =3.2×10 - 09 ) and could classify patients, from low risk of stroke recurrence (1.9%) to high risk (12.6%). Moreover, the addition of genetic risk factors to the GRECOS score improves the prediction compared with previous Stroke Prognosis Instrument-II score ( P =0.03). The use of genetics could be useful to estimate vascular recurrence risk after IS. Genetic variability in the MGP gene was associated with vascular recurrence in the Spanish population. © 2017 American Heart Association, Inc.
GRECOS project. The use of genetics to predict the vascular recurrence after stroke
Fernández-Cadenas, Israel; Mendióroz, Maite; Giralt, Dolors; Nafria, Cristina; Garcia, Elena; Carrera, Caty; Gallego-Fabrega, Cristina; Domingues-Montanari, Sophie; Delgado, Pilar; Ribó, Marc; Castellanos, Mar; Martínez, Sergi; Freijo, Mari Mar; Jiménez-Conde, Jordi; Rubiera, Marta; Alvarez-Sabín, José; Molina, Carlos A.; Font, Maria Angels; Olivares, Marta Grau; Palomeras, Ernest; de la Ossa, Natalia Perez; Martinez-Zabaleta, Maite; Masjuan, Jaime; Moniche, Francisco; Canovas, David; Piñana, Carlos; Purroy, Francisco; Cocho, Dolores; Navas, Inma; Tejero, Carlos; Aymerich, Nuria; Cullell, Natalia; Muiño, Elena; Serena, Joaquín; Rubio, Francisco; Davalos, Antoni; Roquer, Jaume; Arenillas, Juan Francisco; Martí-Fábregas, Joan; Keene, Keith; Chen, Wei-Min; Worrall, Bradford; Sale, Michele; Arboix, Adrià; Krupinski, Jerzy; Montaner, Joan
2017-01-01
Background and Purpose Vascular recurrence occurs in 11% of patients during the first year after ischemic stroke (IS) or transient ischemic attack (TIA). Clinical scores do not predict the whole vascular recurrence risk, therefore we aimed to find genetic variants associated with recurrence that might improve the clinical predictive models in IS. Methods We analyzed 256 polymorphisms from 115 candidate genes in three patient cohorts comprising 4,482 IS or TIA patients. The discovery cohort was prospectively recruited and included 1,494 patients, 6.2% of them developed a new IS during the first year of follow-up. Replication analysis was performed in 2,988 patients using SNPlex or HumanOmni1-Quad technology. We generated a predictive model using Cox regression (GRECOS score), and generated risk groups using a classification tree method. Results The analyses revealed that rs1800801 in the MGP gene (HR: 1.33, p= 9×10−03), a gene related to artery calcification, was associated with new IS during the first year of follow-up. This polymorphism was replicated in a Spanish cohort (n=1.305), however it was not significantly associated in a North American cohort (n=1.683). The GRECOS score predicted new IS (p= 3.2×10−09) and could classify patients, from low risk of stroke recurrence (1.9%) to high risk (12.6%). Moreover, the addition of genetic risk factors to the GRECOS score improves the prediction compared to previous SPI-II score (p=0.03). Conclusions The use of genetics could be useful to estimate vascular recurrence risk after IS. Genetic variability in the MGP gene was associated with vascular recurrence in the Spanish population. PMID:28411264
Williams, C B; Bennett, G L; Jenkins, T G; Cundiff, L V; Ferrell, C L
2006-06-01
The objectives of this study were to evaluate the accuracy of the Decision Evaluator for the Cattle Industry (DECI) and the Cornell Value Discovery System (CVDS) in predicting individual DMI and to assess the feasibility of using predicted DMI data in genetic evaluations of cattle. Observed individual animal data on the average daily DMI (OFI), ADG, and carcass measurements were obtained from postweaning records of 504 steers from 52 sires (502 with complete data). The experimental data and daily temperature and wind speed data were used as inputs to predict average daily feed DMI (kg) required (feed required; FR) for maintenance, cold stress, and ADG; maintenance and cold stress; ADG; maintenance and ADG; and maintenance alone, with CVDS (CFRmcg, CFRmc, CFRg, CFRmg, and CFRm, respectively) and DECI (DFRmcg, DFRmc, DFRg, DFRmg, and DFRm, respectively). Genetic parameters were estimated by REML using an animal model with age on test as a covariate and with genotype, age of dam, and year as fixed effects. Regression equations for observed on predicted DMI were OFI = 1.27 (SE = 0.27) + 0.83 (SE = 0.04) x CFRmcg [R2 = 0.44, residual SD (s(y.x)) = 0.669 kg/d] and OFI = 1.32 (SE = 0.22) + 0.8 (SE = 0.03) x DFRmcg (R2 = 0.53, s(y.x) = 0.612 kg/d). Heritability of OFI was 0.27 +/- 0.12, and heritabilities ranged from 0.33 +/- 0.12 to 0.41 +/- 0.13 for predicted measures of DMI. Phenotypic and genetic correlations between OFI and CFRmcg, CFRmc, CFRg, CFRmg, CFRm, DFRmcg, DFRmc, DFRg, DFRmg, and DFRm were 0.67, 0.73, 0.41, 0.63, 0.78, 0.73, 0.82, 0.45, 0.77, and 0.86 (P < 0.001 for all phenotypic correlations); and 0.95 +/- 0.07, 0.82 +/- 0.13, 0.89 +/- 0.09, 0.95 +/- 0.07, 0.91 +/- 0.09, 0.96 +/- 0.07, 0.89 +/- 0.09, 0.88 +/- 0.09, 0.96 +/- 0.06, and 0.96 +/- 0.07, respectively. Phenotypic and genetic correlations between CFRmcg and DFRmcg, CFRmc and DFRmc, CFRg and DFRg, CFRmg and DFRmg, and CFRm and DFRm were 0.98, 0.94, 0.99, 0.98, and 0.95 (P < 0.001 for all phenotypic correlations), and 0.99 +/- 0.004, 0.98 +/- 0.017, 0.99 +/- 0.004, 0.99 +/- 0.005, and 0.97 +/- 0.021, respectively. The strong genetic relationships between OFI and CFRmcg, CFRmg, DFRmcg, and DFRmg indicate that these predicted measures of DMI may be used in genetic evaluations and that DM requirements for cold stress may not be needed, thus reducing model complexity. However, high genetic correlations for final weight with OFI, CFRmcg, and DFRmcg suggest that the technology needs to be further evaluated in populations with genetic variance in feed efficiency.
NASA Astrophysics Data System (ADS)
Aksoy, A.; Lee, J. H.; Kitanidis, P. K.
2016-12-01
Heterogeneity in hydraulic conductivity (K) impacts the transport and fate of contaminants in subsurface as well as design and operation of managed aquifer recharge (MAR) systems. Recently, improvements in computational resources and availability of big data through electrical resistivity tomography (ERT) and remote sensing have provided opportunities to better characterize the subsurface. Yet, there is need to improve prediction and evaluation methods in order to obtain information from field measurements for better field characterization. In this study, genetic algorithm optimization, which has been widely used in optimal aquifer remediation designs, was used to determine the spatial distribution of K. A hypothetical 2 km by 2 km aquifer was considered. A genetic algorithm library, PGAPack, was linked with a fast Fourier transform based random field generator as well as a groundwater flow and contaminant transport simulation model (BIO2D-KE). The objective of the optimization model was to minimize the total squared error between measured and predicted field values. It was assumed measured K values were available through ERT. Performance of genetic algorithm in predicting the distribution of K was tested for different cases. In the first one, it was assumed that observed K values were evaluated using the random field generator only as the forward model. In the second case, as well as K-values obtained through ERT, measured head values were incorporated into evaluation in which BIO2D-KE and random field generator were used as the forward models. Lastly, tracer concentrations were used as additional information in the optimization model. Initial results indicated enhanced performance when random field generator and BIO2D-KE are used in combination in predicting the spatial distribution in K.
Fenlon, Caroline; O'Grady, Luke; Butler, Stephen; Doherty, Michael L; Dunnion, John
2017-01-01
Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thorough evaluation. Artificial Insemination service records from two research herds and ten commercial herds were provided to build and evaluate the models. All were managed as spring-calving pasture-based systems. The factors studied were related to age, genetics, and time of service. The data were split into training and testing sets and bootstrapping was used to train the models. Logistic regression (with and without random effects) and generalised additive modelling were selected as the model-building techniques. Two types of evaluation were used to test the predictive ability of the models: discrimination and calibration. Discrimination, which includes sensitivity, specificity, accuracy and ROC analysis, measures a model's ability to distinguish between positive and negative outcomes. Calibration measures the accuracy of the predicted probabilities with the Hosmer-Lemeshow goodness-of-fit, calibration plot and calibration error. After data cleaning and the removal of services with missing values, 1396 services remained to train the models and 597 were left for testing. Age, breed, genetic predicted transmitting ability for calving interval, month and year were significant in the multivariate models. The regression models also included an interaction between age and month. Year within herd was a random effect in the mixed regression model. Overall prediction accuracy was between 77.1% and 78.9%. All three models had very high sensitivity, but low specificity. The two regression models were very well-calibrated. The mean absolute calibration errors were all below 4%. Because the models were not adept at identifying unsuccessful services, they are not suggested for use in predicting the outcome of individual heifer services. Instead, they are useful for the comparison of services with different covariate values or as sub-models in whole-farm simulations. The mixed regression model was identified as the best model for prediction, as the random effects can be ignored and the other variables can be easily obtained or simulated.
Mas, Sergi; Gassó, Patricia; Morer, Astrid; Calvo, Anna; Bargalló, Nuria; Lafuente, Amalia; Lázaro, Luisa
2016-01-01
We propose an integrative approach that combines structural magnetic resonance imaging data (MRI), diffusion tensor imaging data (DTI), neuropsychological data, and genetic data to predict early-onset obsessive compulsive disorder (OCD) severity. From a cohort of 87 patients, 56 with complete information were used in the present analysis. First, we performed a multivariate genetic association analysis of OCD severity with 266 genetic polymorphisms. This association analysis was used to select and prioritize the SNPs that would be included in the model. Second, we split the sample into a training set (N = 38) and a validation set (N = 18). Third, entropy-based measures of information gain were used for feature selection with the training subset. Fourth, the selected features were fed into two supervised methods of class prediction based on machine learning, using the leave-one-out procedure with the training set. Finally, the resulting model was validated with the validation set. Nine variables were used for the creation of the OCD severity predictor, including six genetic polymorphisms and three variables from the neuropsychological data. The developed model classified child and adolescent patients with OCD by disease severity with an accuracy of 0.90 in the testing set and 0.70 in the validation sample. Above its clinical applicability, the combination of particular neuropsychological, neuroimaging, and genetic characteristics could enhance our understanding of the neurobiological basis of the disorder. PMID:27093171
Integrating paleoecology and genetics of bird populations in two sky island archipelagos.
McCormack, John E; Bowen, Bonnie S; Smith, Thomas B
2008-06-27
Genetic tests of paleoecological hypotheses have been rare, partly because recent genetic divergence is difficult to detect and time. According to fossil plant data, continuous woodland in the southwestern USA and northern Mexico became fragmented during the last 10,000 years, as warming caused cool-adapted species to retreat to high elevations. Most genetic studies of resulting 'sky islands' have either failed to detect recent divergence or have found discordant evidence for ancient divergence. We test this paleoecological hypothesis for the region with intraspecific mitochondrial DNA and microsatellite data from sky-island populations of a sedentary bird, the Mexican jay (Aphelocoma ultramarina). We predicted that populations on different sky islands would share common, ancestral alleles that existed during the last glaciation, but that populations on each sky island, owing to their isolation, would contain unique variants of postglacial origin. We also predicted that divergence times estimated from corrected genetic distance and a coalescence model would post-date the last glacial maximum. Our results provide multiple independent lines of support for postglacial divergence, with the predicted pattern of shared and unique mitochondrial DNA haplotypes appearing in two independent sky-island archipelagos, and most estimates of divergence time based on corrected genetic distance post-dating the last glacial maximum. Likewise, an isolation model based on multilocus gene coalescence indicated postglacial divergence of five pairs of sky islands. In contrast to their similar recent histories, the two archipelagos had dissimilar historical patterns in that sky islands in Arizona showed evidence for older divergence, suggesting different responses to the last glaciation. This study is one of the first to provide explicit support from genetic data for a postglacial divergence scenario predicted by one of the best paleoecological records in the world. Our results demonstrate that sky islands act as generators of genetic diversity at both recent and historical timescales and underscore the importance of thorough sampling and the use of loci with fast mutation rates to studies that test hypotheses concerning recent genetic divergence.
Kernel-based whole-genome prediction of complex traits: a review.
Morota, Gota; Gianola, Daniel
2014-01-01
Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.
Amos, J. Nevil; Bennett, Andrew F.; Mac Nally, Ralph; Newell, Graeme; Pavlova, Alexandra; Radford, James Q.; Thomson, James R.; White, Matt; Sunnucks, Paul
2012-01-01
Inference concerning the impact of habitat fragmentation on dispersal and gene flow is a key theme in landscape genetics. Recently, the ability of established approaches to identify reliably the differential effects of landscape structure (e.g. land-cover composition, remnant vegetation configuration and extent) on the mobility of organisms has been questioned. More explicit methods of predicting and testing for such effects must move beyond post hoc explanations for single landscapes and species. Here, we document a process for making a priori predictions, using existing spatial and ecological data and expert opinion, of the effects of landscape structure on genetic structure of multiple species across replicated landscape blocks. We compare the results of two common methods for estimating the influence of landscape structure on effective distance: least-cost path analysis and isolation-by-resistance. We present a series of alternative models of genetic connectivity in the study area, represented by different landscape resistance surfaces for calculating effective distance, and identify appropriate null models. The process is applied to ten species of sympatric woodland-dependant birds. For each species, we rank a priori the expectation of fit of genetic response to the models according to the expected response of birds to loss of structural connectivity and landscape-scale tree-cover. These rankings (our hypotheses) are presented for testing with empirical genetic data in a subsequent contribution. We propose that this replicated landscape, multi-species approach offers a robust method for identifying the likely effects of landscape fragmentation on dispersal. PMID:22363508
Vrshek-Schallhorn, Suzanne; Stroud, Catherine B; Mineka, Susan; Zinbarg, Richard E; Adam, Emma K; Redei, Eva E; Hammen, Constance; Craske, Michelle G
2015-11-01
Behavioral genetic research supports polygenic models of depression in which many genetic variations each contribute a small amount of risk, and prevailing diathesis-stress models suggest gene-environment interactions (G×E). Multilocus profile scores of additive risk offer an approach that is consistent with polygenic models of depression risk. In a first demonstration of this approach in a G×E predicting depression, we created an additive multilocus profile score from 5 serotonin system polymorphisms (1 each in the genes HTR1A, HTR2A, HTR2C, and 2 in TPH2). Analyses focused on 2 forms of interpersonal stress as environmental risk factors. Using 5 years of longitudinal diagnostic and life stress interviews from 387 emerging young adults in the Youth Emotion Project, survival analyses show that this multilocus profile score interacts with major interpersonal stressful life events to predict major depressive episode onsets (hazard ratio [HR] = 1.815, p = .007). Simultaneously, there was a significant protective effect of the profile score without a recent event (HR = 0.83, p = .030). The G×E effect with interpersonal chronic stress was not significant (HR = 1.15, p = .165). Finally, effect sizes for genetic factors examined ignoring stress suggested such an approach could lead to overlooking or misinterpreting genetic effects. Both the G×E effect and the protective simple main effect were replicated in a sample of early adolescent girls (N = 105). We discuss potential benefits of the multilocus genetic profile score approach and caveats for future research. (c) 2015 APA, all rights reserved).
Austin, Caitlin M.; Stoy, William; Su, Peter; Harber, Marie C.; Bardill, J. Patrick; Hammer, Brian K.; Forest, Craig R.
2014-01-01
Biosensors exploiting communication within genetically engineered bacteria are becoming increasingly important for monitoring environmental changes. Currently, there are a variety of mathematical models for understanding and predicting how genetically engineered bacteria respond to molecular stimuli in these environments, but as sensors have miniaturized towards microfluidics and are subjected to complex time-varying inputs, the shortcomings of these models have become apparent. The effects of microfluidic environments such as low oxygen concentration, increased biofilm encapsulation, diffusion limited molecular distribution, and higher population densities strongly affect rate constants for gene expression not accounted for in previous models. We report a mathematical model that accurately predicts the biological response of the autoinducer N-acyl homoserine lactone-mediated green fluorescent protein expression in reporter bacteria in microfluidic environments by accommodating these rate constants. This generalized mass action model considers a chain of biomolecular events from input autoinducer chemical to fluorescent protein expression through a series of six chemical species. We have validated this model against experimental data from our own apparatus as well as prior published experimental results. Results indicate accurate prediction of dynamics (e.g., 14% peak time error from a pulse input) and with reduced mean-squared error with pulse or step inputs for a range of concentrations (10 μM–30 μM). This model can help advance the design of genetically engineered bacteria sensors and molecular communication devices. PMID:25379076
Buonomo, Roberto; Assis, Jorge; Fernandes, Francisco; Engelen, Aschwin H; Airoldi, Laura; Serrão, Ester A
2017-02-01
Effective predictive and management approaches for species occurring in a metapopulation structure require good understanding of interpopulation connectivity. In this study, we ask whether population genetic structure of marine species with fragmented distributions can be predicted by stepping-stone oceanographic transport and habitat continuity, using as model an ecosystem-structuring brown alga, Cystoseira amentacea var. stricta. To answer this question, we analysed the genetic structure and estimated the connectivity of populations along discontinuous rocky habitat patches in southern Italy, using microsatellite markers at multiple scales. In addition, we modelled the effect of rocky habitat continuity and ocean circulation on gene flow by simulating Lagrangian particle dispersal based on ocean surface currents allowing multigenerational stepping-stone dynamics. Populations were highly differentiated, at scales from few metres up to thousands of kilometres. The best possible model fit to explain the genetic results combined current direction, rocky habitat extension and distance along the coast among rocky sites. We conclude that a combination of variable suitable habitat and oceanographic transport is a useful predictor of genetic structure. This relationship provides insight into the mechanisms of dispersal and the role of life-history traits. Our results highlight the importance of spatially explicit modelling of stepping-stone dynamics and oceanographic directional transport coupled with habitat suitability, to better describe and predict marine population structure and differentiation. This study also suggests the appropriate spatial scales for the conservation, restoration and management of species that are increasingly affected by habitat modifications. © 2016 John Wiley & Sons Ltd.
The evolution of sexes: A specific test of the disruptive selection theory.
da Silva, Jack
2018-01-01
The disruptive selection theory of the evolution of anisogamy posits that the evolution of a larger body or greater organismal complexity selects for a larger zygote, which in turn selects for larger gametes. This may provide the opportunity for one mating type to produce more numerous, small gametes, forcing the other mating type to produce fewer, large gametes. Predictions common to this and related theories have been partially upheld. Here, a prediction specific to the disruptive selection theory is derived from a previously published game-theoretic model that represents the most complete description of the theory. The prediction, that the ratio of macrogamete to microgamete size should be above three for anisogamous species, is supported for the volvocine algae. A fully population genetic implementation of the model, involving mutation, genetic drift, and selection, is used to verify the game-theoretic approach and accurately simulates the evolution of gamete sizes in anisogamous species. This model was extended to include a locus for gamete motility and shows that oogamy should evolve whenever there is costly motility. The classic twofold cost of sex may be derived from the fitness functions of these models, showing that this cost is ultimately due to genetic conflict.
Crossa, José; Campos, Gustavo de Los; Pérez, Paulino; Gianola, Daniel; Burgueño, Juan; Araus, José Luis; Makumbi, Dan; Singh, Ravi P; Dreisigacker, Susanne; Yan, Jianbing; Arief, Vivi; Banziger, Marianne; Braun, Hans-Joachim
2010-10-01
The availability of dense molecular markers has made possible the use of genomic selection (GS) for plant breeding. However, the evaluation of models for GS in real plant populations is very limited. This article evaluates the performance of parametric and semiparametric models for GS using wheat (Triticum aestivum L.) and maize (Zea mays) data in which different traits were measured in several environmental conditions. The findings, based on extensive cross-validations, indicate that models including marker information had higher predictive ability than pedigree-based models. In the wheat data set, and relative to a pedigree model, gains in predictive ability due to inclusion of markers ranged from 7.7 to 35.7%. Correlation between observed and predictive values in the maize data set achieved values up to 0.79. Estimates of marker effects were different across environmental conditions, indicating that genotype × environment interaction is an important component of genetic variability. These results indicate that GS in plant breeding can be an effective strategy for selecting among lines whose phenotypes have yet to be observed.
Predicting performance for ecological restoration: A case study using Spartina altemiflora
Travis, S.E.; Grace, J.B.
2010-01-01
The success of population-based ecological restoration relies on the growth and reproductive performance of selected donor materials, whether consisting of whole plants or seed. Accurately predicting performance requires an understanding of a variety of underlying processes, particularly gene flow and selection, which can be measured, at least in part, using surrogates such as neutral marker genetic distances and simple latitudinal effects. Here we apply a structural equation modeling approach to understanding and predicting performance in a widespread salt marsh grass, Spartina alterniflora, commonly used for ecological restoration throughout its native range in North America. We collected source materials from throughout this range, consisting of eight clones each from 23 populations, for transplantation to a common garden site in coastal Louisiana and monitored their performance. We modeled performance as a latent process described by multiple indicator variables (e.g., clone diameter, stem number) and estimated direct and indirect influences of geographic and genetic distances on performance. Genetic distances were determined by comparison of neutral molecular markers with those from a local population at the common garden site. Geographic distance metrics included dispersal distance (the minimum distance over water between donor and experimental sites) and latitude. Model results indicate direct effects of genetic distance and latitude on performance variation among the donor sites. Standardized effect strengths indicate that performance was roughly twice as sensitive to variation in genetic distance as to latitudinal variation. Dispersal distance had an indirect influence on performance through effects on genetic distance, indicating a typical pattern of genetic isolation by distance. Latitude also had an indirect effect on genetic distance through its linear relationship with dispersal distance. Three performance indicators had significant loadings on performance alone (mean clone diameter, mean number of stems, mean number of inflorescences), while the performance indicators mean stem height and mean stem width were also influenced by latitude. We suggest that dispersal distance and latitude should provide an adequate means of predicting performance in future S. alterniflora restorations and propose a maximum sampling distance of 300 km (holding latitude constant) to avoid the sampling of inappropriate ecotypes. ?? 2010 by the Ecological Society of America.
Prediction of breast cancer risk by genetic risk factors, overall and by hormone receptor status.
Hüsing, Anika; Canzian, Federico; Beckmann, Lars; Garcia-Closas, Montserrat; Diver, W Ryan; Thun, Michael J; Berg, Christine D; Hoover, Robert N; Ziegler, Regina G; Figueroa, Jonine D; Isaacs, Claudine; Olsen, Anja; Viallon, Vivian; Boeing, Heiner; Masala, Giovanna; Trichopoulos, Dimitrios; Peeters, Petra H M; Lund, Eiliv; Ardanaz, Eva; Khaw, Kay-Tee; Lenner, Per; Kolonel, Laurence N; Stram, Daniel O; Le Marchand, Loïc; McCarty, Catherine A; Buring, Julie E; Lee, I-Min; Zhang, Shumin; Lindström, Sara; Hankinson, Susan E; Riboli, Elio; Hunter, David J; Henderson, Brian E; Chanock, Stephen J; Haiman, Christopher A; Kraft, Peter; Kaaks, Rudolf
2012-09-01
There is increasing interest in adding common genetic variants identified through genome wide association studies (GWAS) to breast cancer risk prediction models. First results from such models showed modest benefits in terms of risk discrimination. Heterogeneity of breast cancer as defined by hormone-receptor status has not been considered in this context. In this study we investigated the predictive capacity of 32 GWAS-detected common variants for breast cancer risk, alone and in combination with classical risk factors, and for tumours with different hormone receptor status. Within the Breast and Prostate Cancer Cohort Consortium, we analysed 6009 invasive breast cancer cases and 7827 matched controls of European ancestry, with data on classical breast cancer risk factors and 32 common gene variants identified through GWAS. Discriminatory ability with respect to breast cancer of specific hormone receptor-status was assessed with the age adjusted and cohort-adjusted concordance statistic (AUROC(a)). Absolute risk scores were calculated with external reference data. Integrated discrimination improvement was used to measure improvements in risk prediction. We found a small but steady increase in discriminatory ability with increasing numbers of genetic variants included in the model (difference in AUROC(a) going from 2.7% to 4%). Discriminatory ability for all models varied strongly by hormone receptor status. Adding information on common polymorphisms provides small but statistically significant improvements in the quality of breast cancer risk prediction models. We consistently observed better performance for receptor-positive cases, but the gain in discriminatory quality is not sufficient for clinical application.
Langenstein, Christoph; Schork, Diana; Badenhoop, Klaus; Herrmann, Eva
2016-12-01
Graves' disease (GD) is an important and prevalent thyroid autoimmune disorder. Standard therapy for GD consists of antithyroid drugs (ATD) with treatment periods of around 12 months but relapse is frequent. Since predictors for relapse are difficult to identify the individual decision making for optimal treatment is often arbitrary. After reviewing the literature on this topic we summarize important factors involved in GD and with respect to their potential for relapse prediction from markers before and after treatment. This information was used to design a mathematical model integrating thyroid hormone parameters, thyroid size, antibody titers and a complex algorithm encompassing genetic predisposition, environmental exposures and current immune activity in order to arrive at a prognostic index for relapse risk after treatment. In the search for a tool to analyze and predict relapse in GD mathematical modeling is a promising approach. In analogy to mathematical modeling approaches in other diseases such as viral infections, we developed a differential equation model on the basis of published clinical trials in patients with GD. Although our model needs further evaluation to be applicable in a clinical context, it provides a perspective for an important contribution to a final statistical prediction model.
Bernardo, R
1996-11-01
Best linear unbiased prediction (BLUP) has been found to be useful in maize (Zea mays L.) breeding. The advantage of including both testcross additive and dominance effects (Intralocus Model) in BLUP, rather than only testcross additive effects (Additive Model), has not been clearly demonstrated. The objective of this study was to compare the usefulness of Intralocus and Additive Models for BLUP of maize single-cross performance. Multilocation data from 1990 to 1995 were obtained from the hybrid testing program of Limagrain Genetics. Grain yield, moisture, stalk lodging, and root lodging of untested single crosses were predicted from (1) the performance of tested single crosses and (2) known genetic relationships among the parental inbreds. Correlations between predicted and observed performance were obtained with a delete-one cross-validation procedure. For the Intralocus Model, the correlations ranged from 0.50 to 0.66 for yield, 0.88 to 0.94 for moisture, 0.47 to 0.69 for stalk lodging, and 0.31 to 0.45 for root lodging. The BLUP procedure was consistently more effective with the Intralocus Model than with the Additive Model. When the Additive Model was used instead of the Intralocus Model, the reductions in the correlation were largest for root lodging (0.06-0.35), smallest for moisture (0.00-0.02), and intermediate for yield (0.02-0.06) and stalk lodging (0.02-0.08). The ratio of dominance variance (v D) to total genetic variance (v G) was highest for root lodging (0.47) and lowest for moisture (0.10). The Additive Model may be used if prior information indicates that VD for a given trait has little contribution to VG. Otherwise, the continued use of the Intralocus Model for BLUP of single-cross performance is recommended.
Evaluation of non-additive genetic variation in feed-related traits of broiler chickens.
Li, Y; Hawken, R; Sapp, R; George, A; Lehnert, S A; Henshall, J M; Reverter, A
2017-03-01
Genome-wide association mapping and genomic predictions of phenotype of individuals in livestock are predominately based on the detection and estimation of additive genetic effects. Non-additive genetic effects are largely ignored. Studies in animals, plants, and humans to assess the impact of non-additive genetic effects in genetic analyses have led to differing conclusions. In this paper, we examined the consequences of including non-additive genetic effects in genome-wide association mapping and genomic prediction of total genetic values in a commercial population of 5,658 broiler chickens genotyped for 45,176 single nucleotide polymorphism (SNP) markers. We employed mixed-model equations and restricted maximum likelihood to analyze 7 feed related traits (TRT1 - TRT7). Dominance variance accounted for a significant proportion of the total genetic variance in all 7 traits, ranging from 29.5% for TRT1 to 58.4% for TRT7. Using a 5-fold cross-validation schema, we found that in spite of the large dominance component, including the estimated dominance effects in the prediction of total genetic values did not improve the accuracy of the predictions for any of the phenotypes. We offer some possible explanations for this counter-intuitive result including the possible confounding of dominance deviations with common environmental effects such as hatch, different directional effects of SNP additive and dominance variations, and the gene-gene interactions' failure to contribute to the level of variance. © 2016 Poultry Science Association Inc.
Velo-Antón, G; Parra, J L; Parra-Olea, G; Zamudio, K R
2013-06-01
Tropical montane taxa are often locally adapted to very specific climatic conditions, contributing to their lower dispersal potential across complex landscapes. Climate and landscape features in montane regions affect population genetic structure in predictable ways, yet few empirical studies quantify the effects of both factors in shaping genetic structure of montane-adapted taxa. Here, we considered temporal and spatial variability in climate to explain contemporary genetic differentiation between populations of the montane salamander, Pseudoeurycea leprosa. Specifically, we used ecological niche modelling (ENM) and measured spatial connectivity and gene flow (using both mtDNA and microsatellite markers) across extant populations of P. leprosa in the Trans-Mexican Volcanic Belt (TVB). Our results indicate significant spatial and genetic isolation among populations, but we cannot distinguish between isolation by distance over time or current landscape barriers as mechanisms shaping population genetic divergences. Combining ecological niche modelling, spatial connectivity analyses, and historical and contemporary genetic signatures from different classes of genetic markers allows for inference of historical evolutionary processes and predictions of the impacts future climate change will have on the genetic diversity of montane taxa with low dispersal rates. Pseudoeurycea leprosa is one montane species among many endemic to this region and thus is a case study for the continued persistence of spatially and genetically isolated populations in the highly biodiverse TVB of central Mexico. © 2013 John Wiley & Sons Ltd.
How Complex, Probable, and Predictable is Genetically Driven Red Queen Chaos?
Duarte, Jorge; Rodrigues, Carla; Januário, Cristina; Martins, Nuno; Sardanyés, Josep
2015-12-01
Coevolution between two antagonistic species has been widely studied theoretically for both ecologically- and genetically-driven Red Queen dynamics. A typical outcome of these systems is an oscillatory behavior causing an endless series of one species adaptation and others counter-adaptation. More recently, a mathematical model combining a three-species food chain system with an adaptive dynamics approach revealed genetically driven chaotic Red Queen coevolution. In the present article, we analyze this mathematical model mainly focusing on the impact of species rates of evolution (mutation rates) in the dynamics. Firstly, we analytically proof the boundedness of the trajectories of the chaotic attractor. The complexity of the coupling between the dynamical variables is quantified using observability indices. By using symbolic dynamics theory, we quantify the complexity of genetically driven Red Queen chaos computing the topological entropy of existing one-dimensional iterated maps using Markov partitions. Co-dimensional two bifurcation diagrams are also built from the period ordering of the orbits of the maps. Then, we study the predictability of the Red Queen chaos, found in narrow regions of mutation rates. To extend the previous analyses, we also computed the likeliness of finding chaos in a given region of the parameter space varying other model parameters simultaneously. Such analyses allowed us to compute a mean predictability measure for the system in the explored region of the parameter space. We found that genetically driven Red Queen chaos, although being restricted to small regions of the analyzed parameter space, might be highly unpredictable.
NASA Astrophysics Data System (ADS)
Rohman, Muhamad Nur; Hidayat, Mas Irfan P.; Purniawan, Agung
2018-04-01
Neural networks (NN) have been widely used in application of fatigue life prediction. In the use of fatigue life prediction for polymeric-base composite, development of NN model is necessary with respect to the limited fatigue data and applicable to be used to predict the fatigue life under varying stress amplitudes in the different stress ratios. In the present paper, Multilayer-Perceptrons (MLP) model of neural network is developed, and Genetic Algorithm was employed to optimize the respective weights of NN for prediction of polymeric-base composite materials under variable amplitude loading. From the simulation result obtained with two different composite systems, named E-glass fabrics/epoxy (layups [(±45)/(0)2]S), and E-glass/polyester (layups [90/0/±45/0]S), NN model were trained with fatigue data from two different stress ratios, which represent limited fatigue data, can be used to predict another four and seven stress ratios respectively, with high accuracy of fatigue life prediction. The accuracy of NN prediction were quantified with the small value of mean square error (MSE). When using 33% from the total fatigue data for training, the NN model able to produce high accuracy for all stress ratios. When using less fatigue data during training (22% from the total fatigue data), the NN model still able to produce high coefficient of determination between the prediction result compared with obtained by experiment.
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Belay, T K; Dagnachew, B S; Boison, S A; Ådnøy, T
2018-03-28
Milk infrared spectra are routinely used for phenotyping traits of interest through links developed between the traits and spectra. Predicted individual traits are then used in genetic analyses for estimated breeding value (EBV) or for phenotypic predictions using a single-trait mixed model; this approach is referred to as indirect prediction (IP). An alternative approach [direct prediction (DP)] is a direct genetic analysis of (a reduced dimension of) the spectra using a multitrait model to predict multivariate EBV of the spectral components and, ultimately, also to predict the univariate EBV or phenotype for the traits of interest. We simulated 3 traits under different genetic (low: 0.10 to high: 0.90) and residual (zero to high: ±0.90) correlation scenarios between the 3 traits and assumed the first trait is a linear combination of the other 2 traits. The aim was to compare the IP and DP approaches for predictions of EBV and phenotypes under the different correlation scenarios. We also evaluated relationships between performances of the 2 approaches and the accuracy of calibration equations. Moreover, the effect of using different regression coefficients estimated from simulated phenotypes (β p ), true breeding values (β g ), and residuals (β r ) on performance of the 2 approaches were evaluated. The simulated data contained 2,100 parents (100 sires and 2,000 cows) and 8,000 offspring (4 offspring per cow). Of the 8,000 observations, 2,000 were randomly selected and used to develop links between the first and the other 2 traits using partial least square (PLS) regression analysis. The different PLS regression coefficients, such as β p , β g , and β r , were used in subsequent predictions following the IP and DP approaches. We used BLUP analyses for the remaining 6,000 observations using the true (co)variance components that had been used for the simulation. Accuracy of prediction (of EBV and phenotype) was calculated as a correlation between predicted and true values from the simulations. The results showed that accuracies of EBV prediction were higher in the DP than in the IP approach. The reverse was true for accuracy of phenotypic prediction when using β p but not when using β g and β r , where accuracy of phenotypic prediction in the DP was slightly higher than in the IP approach. Within the DP approach, accuracies of EBV when using β g were higher than when using β p only at the low genetic correlation scenario. However, we found no differences in EBV prediction accuracy between the β p and β g in the IP approach. Accuracy of the calibration models increased with an increase in genetic and residual correlations between the traits. Performance of both approaches increased with an increase in accuracy of the calibration models. In conclusion, the DP approach is a good strategy for EBV prediction but not for phenotypic prediction, where the classical PLS regression-based equations or the IP approach provided better results. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Ruyck, Kim, E-mail: kim.deruyck@UGent.be; Sabbe, Nick; Oberije, Cary
2011-10-01
Purpose: To construct a model for the prediction of acute esophagitis in lung cancer patients receiving chemoradiotherapy by combining clinical data, treatment parameters, and genotyping profile. Patients and Methods: Data were available for 273 lung cancer patients treated with curative chemoradiotherapy. Clinical data included gender, age, World Health Organization performance score, nicotine use, diabetes, chronic disease, tumor type, tumor stage, lymph node stage, tumor location, and medical center. Treatment parameters included chemotherapy, surgery, radiotherapy technique, tumor dose, mean fractionation size, mean and maximal esophageal dose, and overall treatment time. A total of 332 genetic polymorphisms were considered in 112 candidatemore » genes. The predicting model was achieved by lasso logistic regression for predictor selection, followed by classic logistic regression for unbiased estimation of the coefficients. Performance of the model was expressed as the area under the curve of the receiver operating characteristic and as the false-negative rate in the optimal point on the receiver operating characteristic curve. Results: A total of 110 patients (40%) developed acute esophagitis Grade {>=}2 (Common Terminology Criteria for Adverse Events v3.0). The final model contained chemotherapy treatment, lymph node stage, mean esophageal dose, gender, overall treatment time, radiotherapy technique, rs2302535 (EGFR), rs16930129 (ENG), rs1131877 (TRAF3), and rs2230528 (ITGB2). The area under the curve was 0.87, and the false-negative rate was 16%. Conclusion: Prediction of acute esophagitis can be improved by combining clinical, treatment, and genetic factors. A multicomponent prediction model for acute esophagitis with a sensitivity of 84% was constructed with two clinical parameters, four treatment parameters, and four genetic polymorphisms.« less
Data Based Prediction of Blood Glucose Concentrations Using Evolutionary Methods.
Hidalgo, J Ignacio; Colmenar, J Manuel; Kronberger, Gabriel; Winkler, Stephan M; Garnica, Oscar; Lanchares, Juan
2017-08-08
Predicting glucose values on the basis of insulin and food intakes is a difficult task that people with diabetes need to do daily. This is necessary as it is important to maintain glucose levels at appropriate values to avoid not only short-term, but also long-term complications of the illness. Artificial intelligence in general and machine learning techniques in particular have already lead to promising results in modeling and predicting glucose concentrations. In this work, several machine learning techniques are used for the modeling and prediction of glucose concentrations using as inputs the values measured by a continuous monitoring glucose system as well as also previous and estimated future carbohydrate intakes and insulin injections. In particular, we use the following four techniques: genetic programming, random forests, k-nearest neighbors, and grammatical evolution. We propose two new enhanced modeling algorithms for glucose prediction, namely (i) a variant of grammatical evolution which uses an optimized grammar, and (ii) a variant of tree-based genetic programming which uses a three-compartment model for carbohydrate and insulin dynamics. The predictors were trained and tested using data of ten patients from a public hospital in Spain. We analyze our experimental results using the Clarke error grid metric and see that 90% of the forecasts are correct (i.e., Clarke error categories A and B), but still even the best methods produce 5 to 10% of serious errors (category D) and approximately 0.5% of very serious errors (category E). We also propose an enhanced genetic programming algorithm that incorporates a three-compartment model into symbolic regression models to create smoothed time series of the original carbohydrate and insulin time series.
Kershenbaum, Arik; Blank, Lior; Sinai, Iftach; Merilä, Juha; Blaustein, Leon; Templeton, Alan R
2014-06-01
When populations reside within a heterogeneous landscape, isolation by distance may not be a good predictor of genetic divergence if dispersal behaviour and therefore gene flow depend on landscape features. Commonly used approaches linking landscape features to gene flow include the least cost path (LCP), random walk (RW), and isolation by resistance (IBR) models. However, none of these models is likely to be the most appropriate for all species and in all environments. We compared the performance of LCP, RW and IBR models of dispersal with the aid of simulations conducted on artificially generated landscapes. We also applied each model to empirical data on the landscape genetics of the endangered fire salamander, Salamandra infraimmaculata, in northern Israel, where conservation planning requires an understanding of the dispersal corridors. Our simulations demonstrate that wide dispersal corridors of the low-cost environment facilitate dispersal in the IBR model, but inhibit dispersal in the RW model. In our empirical study, IBR explained the genetic divergence better than the LCP and RW models (partial Mantel correlation 0.413 for IBR, compared to 0.212 for LCP, and 0.340 for RW). Overall dispersal cost in salamanders was also well predicted by landscape feature slope steepness (76%), and elevation (24%). We conclude that fire salamander dispersal is well characterised by IBR predictions. Together with our simulation findings, these results indicate that wide dispersal corridors facilitate, rather than hinder, salamander dispersal. Comparison of genetic data to dispersal model outputs can be a useful technique in inferring dispersal behaviour from population genetic data.
Are genetically robust regulatory networks dynamically different from random ones?
NASA Astrophysics Data System (ADS)
Sevim, Volkan; Rikvold, Per Arne
We study a genetic regulatory network model developed to demonstrate that genetic robustness can evolve through stabilizing selection for optimal phenotypes. We report preliminary results on whether such selection could result in a reorganization of the state space of the system. For the chosen parameters, the evolution moves the system slightly toward the more ordered part of the phase diagram. We also find that strong memory effects cause the Derrida annealed approximation to give erroneous predictions about the model's phase diagram.
Dunlop, Malcolm G.; Tenesa, Albert; Farrington, Susan M.; Ballereau, Stephane; Brewster, David H.; Pharoah, Paul DP.; Schafmayer, Clemens; Hampe, Jochen; Völzke, Henry; Chang-Claude, Jenny; Hoffmeister, Michael; Brenner, Hermann; von Holst, Susanna; Picelli, Simone; Lindblom, Annika; Jenkins, Mark A.; Hopper, John L.; Casey, Graham; Duggan, David; Newcomb, Polly; Abulí, Anna; Bessa, Xavier; Ruiz-Ponte, Clara; Castellví-Bel, Sergi; Niittymäki, Iina; Tuupanen, Sari; Karhu, Auli; Aaltonen, Lauri; Zanke, Brent W.; Hudson, Thomas J.; Gallinger, Steven; Barclay, Ella; Martin, Lynn; Gorman, Maggie; Carvajal-Carmona, Luis; Walther, Axel; Kerr, David; Lubbe, Steven; Broderick, Peter; Chandler, Ian; Pittman, Alan; Penegar, Steven; Campbell, Harry; Tomlinson, Ian; Houlston, Richard S.
2016-01-01
Objective Colorectal cancer (CRC) has a substantial heritable component. Common genetic variation has been shown to contribute to CRC risk. In a large, multi-population study, we set out to assess the feasibility of CRC risk prediction using common genetic variant data, combined with other risk factors. We built a risk prediction model and applied it to the Scottish population using available data. Design Nine populations of European descent were studied to develop and validate colorectal cancer risk prediction models. Binary logistic regression was used to assess the combined effect of age, gender, family history (FH) and genotypes at 10 susceptibility loci that individually only modestly influence colorectal cancer risk. Risk models were generated from case-control data incorporating genotypes alone (n=39,266), and in combination with gender, age and family history (n=11,324). Model discriminatory performance was assessed using 10-fold internal cross-validation and externally using 4,187 independent samples. 10-year absolute risk was estimated by modelling genotype and FH with age- and gender-specific population risks. Results Median number of risk alleles was greater in cases than controls (10 vs 9, p<2.2×10−16), confirmed in external validation sets (Sweden p=1.2×10−6, Finland p=2×10−5). Mean per-allele increase in risk was 9% (OR 1.09; 95% CI 1.05–1.13). Discriminative performance was poor across the risk spectrum (area under curve (AUC) for genotypes alone - 0.57; AUC for genotype/age/gender/FH - 0.59). However, modelling genotype data, FH, age and gender with Scottish population data shows the practicalities of identifying a subgroup with >5% predicted 10-year absolute risk. Conclusion We show that genotype data provides additional information that complements age, gender and FH as risk factors. However, individualized genetic risk prediction is not currently feasible. Nonetheless, the modelling exercise suggests public health potential, since it is possible to stratify the population into CRC risk categories, thereby informing targeted prevention and surveillance. PMID:22490517
Witkiewicz, Agnieszka K; Balaji, Uthra; Eslinger, Cody; McMillan, Elizabeth; Conway, William; Posner, Bruce; Mills, Gordon B; O'Reilly, Eileen M; Knudsen, Erik S
2016-08-16
Pancreatic ductal adenocarcinoma (PDAC) harbors the worst prognosis of any common solid tumor, and multiple failed clinical trials indicate therapeutic recalcitrance. Here, we use exome sequencing of patient tumors and find multiple conserved genetic alterations. However, the majority of tumors exhibit no clearly defined therapeutic target. High-throughput drug screens using patient-derived cell lines found rare examples of sensitivity to monotherapy, with most models requiring combination therapy. Using PDX models, we confirmed the effectiveness and selectivity of the identified treatment responses. Out of more than 500 single and combination drug regimens tested, no single treatment was effective for the majority of PDAC tumors, and each case had unique sensitivity profiles that could not be predicted using genetic analyses. These data indicate a shortcoming of reliance on genetic analysis to predict efficacy of currently available agents against PDAC and suggest that sensitivity profiling of patient-derived models could inform personalized therapy design for PDAC. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Eaglen, Sophie A E; Coffey, Mike P; Woolliams, John A; Wall, Eileen
2012-07-28
The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (-maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (-maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (-maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (-maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions.
2012-01-01
Background The focus in dairy cattle breeding is gradually shifting from production to functional traits and genetic parameters of calving traits are estimated more frequently. However, across countries, various statistical models are used to estimate these parameters. This study evaluates different models for calving ease and stillbirth in United Kingdom Holstein-Friesian cattle. Methods Data from first and later parity records were used. Genetic parameters for calving ease, stillbirth and gestation length were estimated using the restricted maximum likelihood method, considering different models i.e. sire (−maternal grandsire), animal, univariate and bivariate models. Gestation length was fitted as a correlated indicator trait and, for all three traits, genetic correlations between first and later parities were estimated. Potential bias in estimates was avoided by acknowledging a possible environmental direct-maternal covariance. The total heritable variance was estimated for each trait to discuss its theoretical importance and practical value. Prediction error variances and accuracies were calculated to compare the models. Results and discussion On average, direct and maternal heritabilities for calving traits were low, except for direct gestation length. Calving ease in first parity had a significant and negative direct-maternal genetic correlation. Gestation length was maternally correlated to stillbirth in first parity and directly correlated to calving ease in later parities. Multi-trait models had a slightly greater predictive ability than univariate models, especially for the lowly heritable traits. The computation time needed for sire (−maternal grandsire) models was much smaller than for animal models with only small differences in accuracy. The sire (−maternal grandsire) model was robust when additional genetic components were estimated, while the equivalent animal model had difficulties reaching convergence. Conclusions For the evaluation of calving traits, multi-trait models show a slight advantage over univariate models. Extended sire models (−maternal grandsire) are more practical and robust than animal models. Estimated genetic parameters for calving traits of UK Holstein cattle are consistent with literature. Calculating an aggregate estimated breeding value including direct and maternal values should encourage breeders to consider both direct and maternal effects in selection decisions. PMID:22839757
Tian, Tian; Salis, Howard M.
2015-01-01
Natural and engineered genetic systems require the coordinated expression of proteins. In bacteria, translational coupling provides a genetically encoded mechanism to control expression level ratios within multi-cistronic operons. We have developed a sequence-to-function biophysical model of translational coupling to predict expression level ratios in natural operons and to design synthetic operons with desired expression level ratios. To quantitatively measure ribosome re-initiation rates, we designed and characterized 22 bi-cistronic operon variants with systematically modified intergenic distances and upstream translation rates. We then derived a thermodynamic free energy model to calculate de novo initiation rates as a result of ribosome-assisted unfolding of intergenic RNA structures. The complete biophysical model has only five free parameters, but was able to accurately predict downstream translation rates for 120 synthetic bi-cistronic and tri-cistronic operons with rationally designed intergenic regions and systematically increased upstream translation rates. The biophysical model also accurately predicted the translation rates of the nine protein atp operon, compared to ribosome profiling measurements. Altogether, the biophysical model quantitatively predicts how translational coupling controls protein expression levels in synthetic and natural bacterial operons, providing a deeper understanding of an important post-transcriptional regulatory mechanism and offering the ability to rationally engineer operons with desired behaviors. PMID:26117546
Coates, James; Jeyaseelan, Asha K; Ybarra, Norma; David, Marc; Faria, Sergio; Souhami, Luis; Cury, Fabio; Duclos, Marie; El Naqa, Issam
2015-04-01
We explore analytical and data-driven approaches to investigate the integration of genetic variations (single nucleotide polymorphisms [SNPs] and copy number variations [CNVs]) with dosimetric and clinical variables in modeling radiation-induced rectal bleeding (RB) and erectile dysfunction (ED) in prostate cancer patients. Sixty-two patients who underwent curative hypofractionated radiotherapy (66 Gy in 22 fractions) between 2002 and 2010 were retrospectively genotyped for CNV and SNP rs5489 in the xrcc1 DNA repair gene. Fifty-four patients had full dosimetric profiles. Two parallel modeling approaches were compared to assess the risk of severe RB (Grade⩾3) and ED (Grade⩾1); Maximum likelihood estimated generalized Lyman-Kutcher-Burman (LKB) and logistic regression. Statistical resampling based on cross-validation was used to evaluate model predictive power and generalizability to unseen data. Integration of biological variables xrcc1 CNV and SNP improved the fit of the RB and ED analytical and data-driven models. Cross-validation of the generalized LKB models yielded increases in classification performance of 27.4% for RB and 14.6% for ED when xrcc1 CNV and SNP were included, respectively. Biological variables added to logistic regression modeling improved classification performance over standard dosimetric models by 33.5% for RB and 21.2% for ED models. As a proof-of-concept, we demonstrated that the combination of genetic and dosimetric variables can provide significant improvement in NTCP prediction using analytical and data-driven approaches. The improvement in prediction performance was more pronounced in the data driven approaches. Moreover, we have shown that CNVs, in addition to SNPs, may be useful structural genetic variants in predicting radiation toxicities. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Peter J. Gould; Constance A. Harrington; Bradley J. St Clair
2011-01-01
Models to predict budburst and other phenological events in plants are needed to forecast how climate change may impact ecosystems and for the development of mitigation strategies. Differences among genotypes are important to predicting phenological events in species that show strong clinal variation in adaptive traits. We present a model that incorporates the effects...
Uemoto, Yoshinobu; Sasaki, Shinji; Kojima, Takatoshi; Sugimoto, Yoshikazu; Watanabe, Toshio
2015-11-19
Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data. In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design. The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV. The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.
Race, Genetic Ancestry and Response to Antidepressant Treatment for Major Depression
Murphy, Eleanor; Hou, Liping; Maher, Brion S; Woldehawariat, Girma; Kassem, Layla; Akula, Nirmala; Laje, Gonzalo; McMahon, Francis J
2013-01-01
The Sequenced Treatment Alternatives to Relieve Depression (STAR*D) Study revealed poorer antidepressant treatment response among black compared with white participants. This racial disparity persisted even after socioeconomic and baseline clinical factors were taken into account. Some studies have suggested genetic contributions to this disparity, but none have attempted to disentangle race and genetic ancestry. Here we used genome-wide single-nucleotide polymorphism (SNP) data to examine independent contributions of race and genetic ancestry to citalopram response. Secondary data analyses included 1877 STAR*D participants who completed an average of 10 weeks of citalopram treatment and provided DNA samples. Participants reported their race as White (n=1464), black (n=299) or other/mixed (n=114). Genetic ancestry was estimated by multidimensional scaling (MDS) analyses of about 500 000 SNPs. Ancestry proportions were estimated by STRUCTURE. Structural equation modeling was used to examine the direct and indirect effects of observed and latent predictors of response, defined as change in the Quick Inventory of Depressive Symptomatology (QIDS) score from baseline to exit. Socioeconomic and baseline clinical factors, race, and anxiety significantly predicted response, as previously reported. However, direct effects of race disappeared in all models that included genetic ancestry. Genetic African ancestry predicted lower treatment response in all models. Although socioeconomic and baseline clinical factors drive racial differences in antidepressant response, genetic ancestry, rather than self-reported race, explains a significant fraction of the residual differences. Larger samples would be needed to identify the specific genetic mechanisms that may be involved, but these findings underscore the importance of including more African-American patients in drug trials. PMID:23827886
Roff, Derek A; Fairbairn, Daphne J
2007-01-01
Predicting evolutionary change is the central goal of evolutionary biology because it is the primary means by which we can test evolutionary hypotheses. In this article, we analyze the pattern of evolutionary change in a laboratory population of the wing-dimorphic sand cricket Gryllus firmus resulting from relaxation of selection favoring the migratory (long-winged) morph. Based on a well-characterized trade-off between fecundity and flight capability, we predict that evolution in the laboratory environment should result in a reduction in the proportion of long-winged morphs. We also predict increased fecundity and reduced functionality and weight of the major flight muscles in long-winged females but little change in short-winged (flightless) females. Based on quantitative genetic theory, we predict that the regression equation describing the trade-off between ovary weight and weight of the major flight muscles will show a change in its intercept but not in its slope. Comparisons across generations verify all of these predictions. Further, using values of genetic parameters estimated from previous studies, we show that a quantitative genetic simulation model can account for not only the qualitative changes but also the evolutionary trajectory. These results demonstrate the power of combining quantitative genetic and physiological approaches for understanding the evolution of complex traits.
A call for tiger management using "reserves" of genetic diversity.
Bay, Rachael A; Ramakrishnan, Uma; Hadly, Elizabeth A
2014-01-01
Tigers (Panthera tigris), like many large carnivores, are threatened by anthropogenic impacts, primarily habitat loss and poaching. Current conservation plans for tigers focus on population expansion, with the goal of doubling census size in the next 10 years. Previous studies have shown that because the demographic decline was recent, tiger populations still retain a large amount of genetic diversity. Although maintaining this diversity is extremely important to avoid deleterious effects of inbreeding, management plans have yet to consider predictive genetic models. We used coalescent simulations based on previously sequenced mitochondrial fragments (n = 125) from 5 of 6 extant subspecies to predict the population growth needed to maintain current genetic diversity over the next 150 years. We found that the level of gene flow between populations has a large effect on the local population growth necessary to maintain genetic diversity, without which tigers may face decreases in fitness. In the absence of gene flow, we demonstrate that maintaining genetic diversity is impossible based on known demographic parameters for the species. Thus, managing for the genetic diversity of the species should be prioritized over the riskier preservation of distinct subspecies. These predictive simulations provide unique management insights, hitherto not possible using existing analytical methods.
Lopes, Fernando B; da Silva, Marcelo C; Marques, Ednira G; McManus, Concepta M
2012-12-01
This study was undertaken to aim of estimating the genetic parameters and trends for asymptotic weight (A) and maturity rate (k) of Nellore cattle from northern Brazil. The data set was made available by the Brazilian Association of Zebu Breeders and collected between the years of 1997 and 2007. The Von Bertalanffy, Brody, Gompertz, and logistic nonlinear models were fitted by the Gauss-Newton method to weight-age data of 45,895 animals collected quarterly of the birth to 750 days old. The curve parameters were analyzed using the procedures GLM and CORR. The estimation of (co)variance components and genetic parameters was obtained using the MTDFREML software. The estimated heritability coefficients were 0.21 ± 0.013 and 0.25 ± 0.014 for asymptotic weight and maturity rate, respectively. This indicates that selection for any trait shall results in genetic progress in the herd. The genetic correlation between A and k was negative (-0.57 ± 0.03) and indicated that animals selected for high maturity rate shall result in low asymptotic weight. The Von Bertalanffy function is adequate to establish the mean growth patterns and to predict the adult weight of Nellore cattle. This model is more accurate in predicting the birth weight of these animals and has better overall fit. The prediction of adult weight using nonlinear functions can be accurate when growth curve parameters and their (co)variance components are estimated jointly. The model used in this study can be applied to the prediction of mature weight in herds where a portion of the animals are culled before they reach the adult age.
USDA-ARS?s Scientific Manuscript database
High-throughput phenotyping (HTP) platforms can be used to measure traits that are genetically correlated with wheat (Triticum aestivum L.) grain yield across time. Incorporating such secondary traits in the multivariate pedigree and genomic prediction models would be desirable to improve indirect s...
Hill, William G.
2014-01-01
Although animal breeding was practiced long before the science of genetics and the relevant disciplines of population and quantitative genetics were known, breeding programs have mainly relied on simply selecting and mating the best individuals on their own or relatives’ performance. This is based on sound quantitative genetic principles, developed and expounded by Lush, who attributed much of his understanding to Wright, and formalized in Fisher’s infinitesimal model. Analysis at the level of individual loci and gene frequency distributions has had relatively little impact. Now with access to genomic data, a revolution in which molecular information is being used to enhance response with “genomic selection” is occurring. The predictions of breeding value still utilize multiple loci throughout the genome and, indeed, are largely compatible with additive and specifically infinitesimal model assumptions. I discuss some of the history and genetic issues as applied to the science of livestock improvement, which has had and continues to have major spin-offs into ideas and applications in other areas. PMID:24395822
Chen, Yuhong; Zeng, Jiexi; Zhao, Chao; Wang, Kevin; Trood, Elizabeth; Buehler, Jeanette; Weed, Matthew; Kasuga, Daniel; Bernstein, Paul S.; Hughes, Guy; Fu, Victoria; Chin, Jessica; Lee, Clara; Crocker, Maureen; Bedell, Matthew; Salasar, Francesca; Yang, Zhenglin; Goldbaum, Michael; Ferreyra, Henry; Freeman, William R.; Kozak, Igor; Zhang, Kang
2014-01-01
Objectives To evaluate the independent and joint effects of genetic factors and environmental variables on advanced forms of age-related macular degeneration (AMD), including geographic atrophy and choroidal neovascularization, and to develop a predictive model with genetic and environmental factors included. Methods Demographic information, including age at onset, smoking status, and body mass index, was collected for 1844 participants. Genotypes were evaluated for 8 variants in 5 genes related to AMD. Unconditional logistic regression analyses were performed to generate a risk predictive model. Results All genetic variants showed a strong association with AMD. Multivariate odds ratios were 3.52 (95% confidence interval, 2.08-5.94) for complement factor H, CFH rs1061170 CC, 4.21 (2.30-7.70) for CFH rs2274700 CC, 0.46 (0.27-0.80) for C2 rs9332739 CC/CG, 0.44 (0.30-0.66) for CFB rs641153 TT/CT, 10.99 (6.04-19.97) for HTRA1/LOC387715 rs10490924 TT, and 2.66 (1.43-4.96) for C3 rs2230199 GG. Smoking was independently associated with advanced AMD after controlling for age, sex, body mass index, and all genetic variants. Conclusion CFH confers more risk to the bilaterality of geographic atrophy, whereas HTRA1/LOC387715 contributes more to the bilaterality of choroidal neovascularization. C3 confers more risk for geographic atrophy than choroidal neovascularization. Risk models with combined genetic and environmental factors have notable discrimination power. Clinical Relevance Early detection and risk prediction of AMD could help to improve the prognosis of AMD and to reduce the outcome of blindness. Targeting high-risk individuals for surveillance and clinical interventions may help reduce disease burden. PMID:21402993
Energy Consumption Forecasting Using Semantic-Based Genetic Programming with Local Search Optimizer.
Castelli, Mauro; Trujillo, Leonardo; Vanneschi, Leonardo
2015-01-01
Energy consumption forecasting (ECF) is an important policy issue in today's economies. An accurate ECF has great benefits for electric utilities and both negative and positive errors lead to increased operating costs. The paper proposes a semantic based genetic programming framework to address the ECF problem. In particular, we propose a system that finds (quasi-)perfect solutions with high probability and that generates models able to produce near optimal predictions also on unseen data. The framework blends a recently developed version of genetic programming that integrates semantic genetic operators with a local search method. The main idea in combining semantic genetic programming and a local searcher is to couple the exploration ability of the former with the exploitation ability of the latter. Experimental results confirm the suitability of the proposed method in predicting the energy consumption. In particular, the system produces a lower error with respect to the existing state-of-the art techniques used on the same dataset. More importantly, this case study has shown that including a local searcher in the geometric semantic genetic programming system can speed up the search process and can result in fitter models that are able to produce an accurate forecasting also on unseen data.
Symbiont diversity may help coral reefs survive moderate climate change.
Baskett, Marissa L; Gaines, Steven D; Nisbet, Roger M
2009-01-01
Given climate change, thermal stress-related mass coral-bleaching events present one of the greatest anthropogenic threats to coral reefs. While corals and their symbiotic algae may respond to future temperatures through genetic adaptation and shifts in community compositions, the climate may change too rapidly for coral response. To test this potential for response, here we develop a model of coral and symbiont ecological dynamics and symbiont evolutionary dynamics. Model results without variation in symbiont thermal tolerance predict coral reef collapse within decades under multiple future climate scenarios, consistent with previous threshold-based predictions. However, model results with genetic or community-level variation in symbiont thermal tolerance can predict coral reef persistence into the next century, provided low enough greenhouse gas emissions occur. Therefore, the level of greenhouse gas emissions will have a significant effect on the future of coral reefs, and accounting for biodiversity and biological dynamics is vital to estimating the size of this effect.
Bagley, Justin C; Sandel, Michael; Travis, Joseph; Lozano-Vilano, María de Lourdes; Johnson, Jerald B
2013-10-09
Climatic and sea-level fluctuations throughout the last Pleistocene glacial cycle (~130-0 ka) profoundly influenced present-day distributions and genetic diversity of Northern Hemisphere biotas by forcing range contractions in many species during the glacial advance and allowing expansion following glacial retreat ('expansion-contraction' model). Evidence for such range dynamics and refugia in the unglaciated Gulf-Atlantic Coastal Plain stems largely from terrestrial species, and aquatic species Pleistocene responses remain relatively uninvestigated. Heterandria formosa, a wide-ranging regional endemic, presents an ideal system to test the expansion-contraction model within this biota. By integrating ecological niche modeling and phylogeography, we infer the Pleistocene history of this livebearing fish (Poeciliidae) and test for several predicted distributional and genetic effects of the last glaciation. Paleoclimatic models predicted range contraction to a single southwest Florida peninsula refugium during the Last Glacial Maximum, followed by northward expansion. We inferred spatial-population subdivision into four groups that reflect genetic barriers outside this refuge. Several other features of the genetic data were consistent with predictions derived from an expansion-contraction model: limited intraspecific divergence (e.g. mean mtDNA p-distance = 0.66%); a pattern of mtDNA diversity (mean Hd = 0.934; mean π = 0.007) consistent with rapid, recent population expansion; a lack of mtDNA isolation-by-distance; and clinal variation in allozyme diversity with higher diversity at lower latitudes near the predicted refugium. Statistical tests of mismatch distributions and coalescent simulations of the gene tree lent greater support to a scenario of post-glacial expansion and diversification from a single refugium than to any other model examined (e.g. multiple-refugia scenarios). Congruent results from diverse data indicate H. formosa fits the classic Pleistocene expansion-contraction model, even as the genetic data suggest additional ecological influences on population structure. While evidence for Plio-Pleistocene Gulf Coast vicariance is well described for many freshwater species presently codistributed with H. formosa, this species demography and diversification departs notably from this pattern. Species-specific expansion-contraction dynamics may therefore have figured more prominently in shaping Coastal Plain evolutionary history than previously thought. Our findings bolster growing appreciation for the complexity of phylogeographical structuring within North America's southern refugia, including responses of Coastal Plain freshwater biota to Pleistocene climatic fluctuations.
Genetic risk prediction using a spatial autoregressive model with adaptive lasso.
Wen, Yalu; Shen, Xiaoxi; Lu, Qing
2018-05-31
With rapidly evolving high-throughput technologies, studies are being initiated to accelerate the process toward precision medicine. The collection of the vast amounts of sequencing data provides us with great opportunities to systematically study the role of a deep catalog of sequencing variants in risk prediction. Nevertheless, the massive amount of noise signals and low frequencies of rare variants in sequencing data pose great analytical challenges on risk prediction modeling. Motivated by the development in spatial statistics, we propose a spatial autoregressive model with adaptive lasso (SARAL) for risk prediction modeling using high-dimensional sequencing data. The SARAL is a set-based approach, and thus, it reduces the data dimension and accumulates genetic effects within a single-nucleotide variant (SNV) set. Moreover, it allows different SNV sets having various magnitudes and directions of effect sizes, which reflects the nature of complex diseases. With the adaptive lasso implemented, SARAL can shrink the effects of noise SNV sets to be zero and, thus, further improve prediction accuracy. Through simulation studies, we demonstrate that, overall, SARAL is comparable to, if not better than, the genomic best linear unbiased prediction method. The method is further illustrated by an application to the sequencing data from the Alzheimer's Disease Neuroimaging Initiative. Copyright © 2018 John Wiley & Sons, Ltd.
Value of genetic profiling for the prediction of coronary heart disease.
van der Net, Jeroen B; Janssens, A Cecile J W; Sijbrands, Eric J G; Steyerberg, Ewout W
2009-07-01
Advances in high-throughput genomics facilitate the identification of novel genetic susceptibility variants for coronary heart disease (CHD). This may improve CHD risk prediction. The aim of the present simulation study was to investigate to what degree CHD risk can be predicted by testing multiple genetic variants (genetic profiling). We simulated genetic profiles for a population of 100,000 individuals with a 10-year CHD incidence of 10%. For each combination of model parameters (number of variants, genotype frequency and odds ratio [OR]), we calculated the area under the receiver operating characteristic curve (AUC) to indicate the discrimination between individuals who will and will not develop CHD. The AUC of genetic profiles could rise to 0.90 when 100 hypothetical variants with ORs of 1.5 and genotype frequencies of 50% were simulated. The AUC of a genetic profile consisting of 10 established variants, with ORs ranging from 1.13 to 1.42, was 0.59. When 2, 5, and 10 times as many identical variants would be identified, the AUCs were 0.63, 0.69, and 0.76. To obtain AUCs similar to those of conventional CHD risk predictors, a considerable number of additional common genetic variants need to be identified with preferably strong effects.
Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials.
Cuevas, Jaime; Granato, Italo; Fritsche-Neto, Roberto; Montesinos-Lopez, Osval A; Burgueño, Juan; Bandeira E Sousa, Massaine; Crossa, José
2018-03-28
In this study, we compared the prediction accuracy of the main genotypic effect model (MM) without G×E interactions, the multi-environment single variance G×E deviation model (MDs), and the multi-environment environment-specific variance G×E deviation model (MDe) where the random genetic effects of the lines are modeled with the markers (or pedigree). With the objective of further modeling the genetic residual of the lines, we incorporated the random intercepts of the lines ([Formula: see text]) and generated another three models. Each of these 6 models were fitted with a linear kernel method (Genomic Best Linear Unbiased Predictor, GB) and a Gaussian Kernel (GK) method. We compared these 12 model-method combinations with another two multi-environment G×E interactions models with unstructured variance-covariances (MUC) using GB and GK kernels (4 model-method). Thus, we compared the genomic-enabled prediction accuracy of a total of 16 model-method combinations on two maize data sets with positive phenotypic correlations among environments, and on two wheat data sets with complex G×E that includes some negative and close to zero phenotypic correlations among environments. The two models (MDs and MDE with the random intercept of the lines and the GK method) were computationally efficient and gave high prediction accuracy in the two maize data sets. Regarding the more complex G×E wheat data sets, the prediction accuracy of the model-method combination with G×E, MDs and MDe, including the random intercepts of the lines with GK method had important savings in computing time as compared with the G×E interaction multi-environment models with unstructured variance-covariances but with lower genomic prediction accuracy. Copyright © 2018 Cuevas et al.
Genomic-Enabled Prediction Kernel Models with Random Intercepts for Multi-environment Trials
Cuevas, Jaime; Granato, Italo; Fritsche-Neto, Roberto; Montesinos-Lopez, Osval A.; Burgueño, Juan; Bandeira e Sousa, Massaine; Crossa, José
2018-01-01
In this study, we compared the prediction accuracy of the main genotypic effect model (MM) without G×E interactions, the multi-environment single variance G×E deviation model (MDs), and the multi-environment environment-specific variance G×E deviation model (MDe) where the random genetic effects of the lines are modeled with the markers (or pedigree). With the objective of further modeling the genetic residual of the lines, we incorporated the random intercepts of the lines (l) and generated another three models. Each of these 6 models were fitted with a linear kernel method (Genomic Best Linear Unbiased Predictor, GB) and a Gaussian Kernel (GK) method. We compared these 12 model-method combinations with another two multi-environment G×E interactions models with unstructured variance-covariances (MUC) using GB and GK kernels (4 model-method). Thus, we compared the genomic-enabled prediction accuracy of a total of 16 model-method combinations on two maize data sets with positive phenotypic correlations among environments, and on two wheat data sets with complex G×E that includes some negative and close to zero phenotypic correlations among environments. The two models (MDs and MDE with the random intercept of the lines and the GK method) were computationally efficient and gave high prediction accuracy in the two maize data sets. Regarding the more complex G×E wheat data sets, the prediction accuracy of the model-method combination with G×E, MDs and MDe, including the random intercepts of the lines with GK method had important savings in computing time as compared with the G×E interaction multi-environment models with unstructured variance-covariances but with lower genomic prediction accuracy. PMID:29476023
Auinger, Hans-Jürgen; Schönleben, Manfred; Lehermeier, Christina; Schmidt, Malthe; Korzun, Viktor; Geiger, Hartwig H; Piepho, Hans-Peter; Gordillo, Andres; Wilde, Peer; Bauer, Eva; Schön, Chris-Carolin
2016-11-01
Genomic prediction accuracy can be significantly increased by model calibration across multiple breeding cycles as long as selection cycles are connected by common ancestors. In hybrid rye breeding, application of genome-based prediction is expected to increase selection gain because of long selection cycles in population improvement and development of hybrid components. Essentially two prediction scenarios arise: (1) prediction of the genetic value of lines from the same breeding cycle in which model training is performed and (2) prediction of lines from subsequent cycles. It is the latter from which a reduction in cycle length and consequently the strongest impact on selection gain is expected. We empirically investigated genome-based prediction of grain yield, plant height and thousand kernel weight within and across four selection cycles of a hybrid rye breeding program. Prediction performance was assessed using genomic and pedigree-based best linear unbiased prediction (GBLUP and PBLUP). A total of 1040 S 2 lines were genotyped with 16 k SNPs and each year testcrosses of 260 S 2 lines were phenotyped in seven or eight locations. The performance gap between GBLUP and PBLUP increased significantly for all traits when model calibration was performed on aggregated data from several cycles. Prediction accuracies obtained from cross-validation were in the order of 0.70 for all traits when data from all cycles (N CS = 832) were used for model training and exceeded within-cycle accuracies in all cases. As long as selection cycles are connected by a sufficient number of common ancestors and prediction accuracy has not reached a plateau when increasing sample size, aggregating data from several preceding cycles is recommended for predicting genetic values in subsequent cycles despite decreasing relatedness over time.
Applications of information theory, genetic algorithms, and neural models to predict oil flow
NASA Astrophysics Data System (ADS)
Ludwig, Oswaldo; Nunes, Urbano; Araújo, Rui; Schnitman, Leizer; Lepikson, Herman Augusto
2009-07-01
This work introduces a new information-theoretic methodology for choosing variables and their time lags in a prediction setting, particularly when neural networks are used in non-linear modeling. The first contribution of this work is the Cross Entropy Function (XEF) proposed to select input variables and their lags in order to compose the input vector of black-box prediction models. The proposed XEF method is more appropriate than the usually applied Cross Correlation Function (XCF) when the relationship among the input and output signals comes from a non-linear dynamic system. The second contribution is a method that minimizes the Joint Conditional Entropy (JCE) between the input and output variables by means of a Genetic Algorithm (GA). The aim is to take into account the dependence among the input variables when selecting the most appropriate set of inputs for a prediction problem. In short, theses methods can be used to assist the selection of input training data that have the necessary information to predict the target data. The proposed methods are applied to a petroleum engineering problem; predicting oil production. Experimental results obtained with a real-world dataset are presented demonstrating the feasibility and effectiveness of the method.
Kooke, Rik; Kruijer, Willem; Bours, Ralph; Becker, Frank; Kuhn, André; van de Geest, Henri; Buntjer, Jaap; Doeswijk, Timo; Guerra, José; Bouwmeester, Harro; Vreugdenhil, Dick; Keurentjes, Joost J B
2016-04-01
Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified. © 2016 American Society of Plant Biologists. All Rights Reserved.
Parasites and deleterious mutations: interactions influencing the evolutionary maintenance of sex.
Park, A W; Jokela, J; Michalakis, Y
2010-05-01
The restrictive assumptions associated with purely genetic and purely ecological mechanisms suggest that neither of the two forces, in isolation, can offer a general explanation for the evolutionary maintenance of sex. Consequently, attention has turned to pluralistic models (i.e. models that apply both ecological and genetic mechanisms). Existing research has shown that combining mutation accumulation and parasitism allows restrictive assumptions about genetic and parasite parameter values to be relaxed while still predicting the maintenance of sex. However, several empirical studies have shown that deleterious mutations and parasitism can reduce fitness to a greater extent than would be expected if the two acted independently. We show how interactions between these genetic and ecological forces can completely reverse predictions about the evolution of reproductive modes. Moreover, we demonstrate that synergistic interactions between infection and deleterious mutations can render sex evolutionarily stable even when there is antagonistic epistasis among deleterious mutations, thereby widening the conditions for the evolutionary maintenance of sex.
Assis, J; Serrão, E A; Claro, B; Perrin, C; Pearson, G A
2014-06-01
The climate-driven dynamics of species ranges is a critical research question in evolutionary ecology. We ask whether present intraspecific diversity is determined by the imprint of past climate. This is an ongoing debate requiring interdisciplinary examination of population genetic pools and persistence patterns across global ranges. Previously, contrasting inferences and predictions have resulted from distinct genomic coverage and/or geographical information. We aim to describe and explain the causes of geographical contrasts in genetic diversity and their consequences for the future baseline of the global genetic pool, by comparing present geographical distribution of genetic diversity and differentiation with predictive species distribution modelling (SDM) during past extremes, present time and future climate scenarios for a brown alga, Fucus vesiculosus. SDM showed that both atmospheric and oceanic variables shape the global distribution of intertidal species, revealing regions of persistence, extinction and expansion during glacial and postglacial periods. These explained the distribution and structure of present genetic diversity, consisting of differentiated genetic pools with maximal diversity in areas of long-term persistence. Most of the present species range comprises postglacial expansion zones and, in contrast to highly dispersive marine organisms, expansions involved only local fronts, leaving distinct genetic pools at rear edges. Besides unravelling a complex phylogeographical history and showing congruence between genetic diversity and persistent distribution zones, supporting the hypothesis of niche conservatism, range shifts and loss of unique genetic diversity at the rear edge were predicted for future climate scenarios, impoverishing the global gene pool. © 2014 John Wiley & Sons Ltd.
Mannering, Anne M.; Harold, Gordon T.; Leve, Leslie D.; Shelton, Katherine H.; Shaw, Daniel S.; Conger, Rand D.; Neiderhiser, Jenae M.; Scaramella, Laura V.; Reiss, David
2009-01-01
This study examined the longitudinal association between marital instability and child sleep problems at ages 9 and 18 months in 357 families with a genetically unrelated infant adopted at birth. This design eliminates shared genes as an explanation for similarities between parent and child. Structural equation modeling indicated that T1 marital instability predicted T2 child sleep problems, but T1 child sleep problems did not predict T2 marital instability. This pattern of results was replicated when models were estimated separately for mothers and children and for fathers and children. Thus, even after controlling for stability in sleep problems and marital instability and eliminating shared genetic influences on associations using a longitudinal adoption design, marital instability prospectively predicts early childhood sleep patterns. PMID:21557740
Privacy-preserving genomic testing in the clinic: a model using HIV treatment
McLaren, Paul J.; Raisaro, Jean Louis; Aouri, Manel; Rotger, Margalida; Ayday, Erman; Bartha, István; Delgado, Maria B.; Vallet, Yannick; Günthard, Huldrych F.; Cavassini, Matthias; Furrer, Hansjakob; Doco-Lecompte, Thanh; Marzolini, Catia; Schmid, Patrick; Di Benedetto, Caroline; Decosterd, Laurent A.; Fellay, Jacques; Hubaux, Jean-Pierre; Telenti, Amalio
2016-01-01
Purpose: The implementation of genomic-based medicine is hindered by unresolved questions regarding data privacy and delivery of interpreted results to health-care practitioners. We used DNA-based prediction of HIV-related outcomes as a model to explore critical issues in clinical genomics. Genet Med 18 8, 814–822. Methods: We genotyped 4,149 markers in HIV-positive individuals. Variants allowed for prediction of 17 traits relevant to HIV medical care, inference of patient ancestry, and imputation of human leukocyte antigen (HLA) types. Genetic data were processed under a privacy-preserving framework using homomorphic encryption, and clinical reports describing potentially actionable results were delivered to health-care providers. Genet Med 18 8, 814–822. Results: A total of 230 patients were included in the study. We demonstrated the feasibility of encrypting a large number of genetic markers, inferring patient ancestry, computing monogenic and polygenic trait risks, and reporting results under privacy-preserving conditions. The average execution time of a multimarker test on encrypted data was 865 ms on a standard computer. The proportion of tests returning potentially actionable genetic results ranged from 0 to 54%. Genet Med 18 8, 814–822. Conclusions: The model of implementation presented herein informs on strategies to deliver genomic test results for clinical care. Data encryption to ensure privacy helps to build patient trust, a key requirement on the road to genomic-based medicine. Genet Med 18 8, 814–822. PMID:26765343
Gene-Gene and Gene-Environment Interactions in Ulcerative Colitis
Wang, Ming-Hsi; Fiocchi, Claudio; Zhu, Xiaofeng; Ripke, Stephan; Kamboh, M. Ilyas; Rebert, Nancy; Duerr, Richard H.; Achkar, Jean-Paul
2014-01-01
Genome-wide association studies (GWAS) have identified at least 133 ulcerative colitis (UC) associated loci. The role of genetic factors in clinical practice is not clearly defined. The relevance of genetic variants to disease pathogenesis is still uncertain because of not characterized gene-gene and gene-environment interactions. We examined the predictive value of combining the 133 UC risk loci with genetic interactions in an ongoing inflammatory bowel disease (IBD) GWAS. The Wellcome Trust Case-Control Consortium (WTCCC) IBD GWAS was used as a replication cohort. We applied logic regression (LR), a novel adaptive regression methodology, to search for high order interactions. Exploratory genotype correlations with UC sub-phenotypes (extent of disease, need of surgery, age of onset, extra-intestinal manifestations and primary sclerosing cholangitis (PSC)) were conducted. The combination of 133 UC loci yielded good UC risk predictability (area under the curve [AUC] of 0.86). A higher cumulative allele score predicted higher UC risk. Through LR, several lines of evidence for genetic interactions were identified and successfully replicated in the WTCCC cohort. The genetic interactions combined with the gene-smoking interaction significantly improved predictability in the model (AUC, from 0.86 to 0.89, P=3.26E-05). Explained UC variance increased from 37% to 42% after adding the interaction terms. A within case analysis found suggested genetic association with PSC. Our study demonstrates that the LR methodology allows the identification and replication of high order genetic interactions in UC GWAS datasets. UC risk can be predicted by a 133 loci and improved by adding gene-gene and gene-environment interactions. PMID:24241240
Sieberts, Solveig K; Zhu, Fan; García-García, Javier; Stahl, Eli; Pratap, Abhishek; Pandey, Gaurav; Pappas, Dimitrios; Aguilar, Daniel; Anton, Bernat; Bonet, Jaume; Eksi, Ridvan; Fornés, Oriol; Guney, Emre; Li, Hongdong; Marín, Manuel Alejandro; Panwar, Bharat; Planas-Iglesias, Joan; Poglayen, Daniel; Cui, Jing; Falcao, Andre O; Suver, Christine; Hoff, Bruce; Balagurusamy, Venkat S K; Dillenberger, Donna; Neto, Elias Chaibub; Norman, Thea; Aittokallio, Tero; Ammad-Ud-Din, Muhammad; Azencott, Chloe-Agathe; Bellón, Víctor; Boeva, Valentina; Bunte, Kerstin; Chheda, Himanshu; Cheng, Lu; Corander, Jukka; Dumontier, Michel; Goldenberg, Anna; Gopalacharyulu, Peddinti; Hajiloo, Mohsen; Hidru, Daniel; Jaiswal, Alok; Kaski, Samuel; Khalfaoui, Beyrem; Khan, Suleiman Ali; Kramer, Eric R; Marttinen, Pekka; Mezlini, Aziz M; Molparia, Bhuvan; Pirinen, Matti; Saarela, Janna; Samwald, Matthias; Stoven, Véronique; Tang, Hao; Tang, Jing; Torkamani, Ali; Vert, Jean-Phillipe; Wang, Bo; Wang, Tao; Wennerberg, Krister; Wineinger, Nathan E; Xiao, Guanghua; Xie, Yang; Yeung, Rae; Zhan, Xiaowei; Zhao, Cheng; Greenberg, Jeff; Kremer, Joel; Michaud, Kaleb; Barton, Anne; Coenen, Marieke; Mariette, Xavier; Miceli, Corinne; Shadick, Nancy; Weinblatt, Michael; de Vries, Niek; Tak, Paul P; Gerlag, Danielle; Huizinga, Tom W J; Kurreeman, Fina; Allaart, Cornelia F; Louis Bridges, S; Criswell, Lindsey; Moreland, Larry; Klareskog, Lars; Saevarsdottir, Saedis; Padyukov, Leonid; Gregersen, Peter K; Friend, Stephen; Plenge, Robert; Stolovitzky, Gustavo; Oliva, Baldo; Guan, Yuanfang; Mangravite, Lara M; Bridges, S Louis; Criswell, Lindsey; Moreland, Larry; Klareskog, Lars; Saevarsdottir, Saedis; Padyukov, Leonid; Gregersen, Peter K; Friend, Stephen; Plenge, Robert; Stolovitzky, Gustavo; Oliva, Baldo; Guan, Yuanfang; Mangravite, Lara M
2016-08-23
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2)=0.18, P value=0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.
Expanding a dynamic flux balance model of yeast fermentation to genome-scale
2011-01-01
Background Yeast is considered to be a workhorse of the biotechnology industry for the production of many value-added chemicals, alcoholic beverages and biofuels. Optimization of the fermentation is a challenging task that greatly benefits from dynamic models able to accurately describe and predict the fermentation profile and resulting products under different genetic and environmental conditions. In this article, we developed and validated a genome-scale dynamic flux balance model, using experimentally determined kinetic constraints. Results Appropriate equations for maintenance, biomass composition, anaerobic metabolism and nutrient uptake are key to improve model performance, especially for predicting glycerol and ethanol synthesis. Prediction profiles of synthesis and consumption of the main metabolites involved in alcoholic fermentation closely agreed with experimental data obtained from numerous lab and industrial fermentations under different environmental conditions. Finally, fermentation simulations of genetically engineered yeasts closely reproduced previously reported experimental results regarding final concentrations of the main fermentation products such as ethanol and glycerol. Conclusion A useful tool to describe, understand and predict metabolite production in batch yeast cultures was developed. The resulting model, if used wisely, could help to search for new metabolic engineering strategies to manage ethanol content in batch fermentations. PMID:21595919
Constraints on decision making: implications from genetics, personality, and addiction.
Baker, Travis E; Stockwell, Tim; Holroyd, Clay B
2013-09-01
An influential neurocomputational theory of the biological mechanisms of decision making, the "basal ganglia go/no-go model," holds that individual variability in decision making is determined by differences in the makeup of a striatal system for approach and avoidance learning. The model has been tested empirically with the probabilistic selection task (PST), which determines whether individuals learn better from positive or negative feedback. In accordance with the model, in the present study we examined whether an individual's ability to learn from positive and negative reinforcement can be predicted by genetic factors related to the midbrain dopamine system. We also asked whether psychiatric and personality factors related to substance dependence and dopamine affect PST performance. Although we found characteristics that predicted individual differences in approach versus avoidance learning, these observations were qualified by additional findings that appear inconsistent with the predictions of the go/no-go model. These results highlight a need for future research to validate the PST as a measure of basal ganglia reward learning.
Predicting plant biomass accumulation from image-derived parameters
Chen, Dijun; Shi, Rongli; Pape, Jean-Michel; Neumann, Kerstin; Graner, Andreas; Chen, Ming; Klukas, Christian
2018-01-01
Abstract Background Image-based high-throughput phenotyping technologies have been rapidly developed in plant science recently, and they provide a great potential to gain more valuable information than traditionally destructive methods. Predicting plant biomass is regarded as a key purpose for plant breeders and ecologists. However, it is a great challenge to find a predictive biomass model across experiments. Results In the present study, we constructed 4 predictive models to examine the quantitative relationship between image-based features and plant biomass accumulation. Our methodology has been applied to 3 consecutive barley (Hordeum vulgare) experiments with control and stress treatments. The results proved that plant biomass can be accurately predicted from image-based parameters using a random forest model. The high prediction accuracy based on this model will contribute to relieving the phenotyping bottleneck in biomass measurement in breeding applications. The prediction performance is still relatively high across experiments under similar conditions. The relative contribution of individual features for predicting biomass was further quantified, revealing new insights into the phenotypic determinants of the plant biomass outcome. Furthermore, methods could also be used to determine the most important image-based features related to plant biomass accumulation, which would be promising for subsequent genetic mapping to uncover the genetic basis of biomass. Conclusions We have developed quantitative models to accurately predict plant biomass accumulation from image data. We anticipate that the analysis results will be useful to advance our views of the phenotypic determinants of plant biomass outcome, and the statistical methods can be broadly used for other plant species. PMID:29346559
The GP problem: quantifying gene-to-phenotype relationships.
Cooper, Mark; Chapman, Scott C; Podlich, Dean W; Hammer, Graeme L
2002-01-01
In this paper we refer to the gene-to-phenotype modeling challenge as the GP problem. Integrating information across levels of organization within a genotype-environment system is a major challenge in computational biology. However, resolving the GP problem is a fundamental requirement if we are to understand and predict phenotypes given knowledge of the genome and model dynamic properties of biological systems. Organisms are consequences of this integration, and it is a major property of biological systems that underlies the responses we observe. We discuss the E(NK) model as a framework for investigation of the GP problem and the prediction of system properties at different levels of organization. We apply this quantitative framework to an investigation of the processes involved in genetic improvement of plants for agriculture. In our analysis, N genes determine the genetic variation for a set of traits that are responsible for plant adaptation to E environment-types within a target population of environments. The N genes can interact in epistatic NK gene-networks through the way that they influence plant growth and development processes within a dynamic crop growth model. We use a sorghum crop growth model, available within the APSIM agricultural production systems simulation model, to integrate the gene-environment interactions that occur during growth and development and to predict genotype-to-phenotype relationships for a given E(NK) model. Directional selection is then applied to the population of genotypes, based on their predicted phenotypes, to simulate the dynamic aspects of genetic improvement by a plant-breeding program. The outcomes of the simulated breeding are evaluated across cycles of selection in terms of the changes in allele frequencies for the N genes and the genotypic and phenotypic values of the populations of genotypes.
Developing a clinical utility framework to evaluate prediction models in radiogenomics
NASA Astrophysics Data System (ADS)
Wu, Yirong; Liu, Jie; Munoz del Rio, Alejandro; Page, David C.; Alagoz, Oguzhan; Peissig, Peggy; Onitilo, Adedayo A.; Burnside, Elizabeth S.
2015-03-01
Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar's test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar's test provides a novel framework to evaluate prediction models in the realm of radiogenomics.
Elam, Kit K.; Wang, Frances L.; Bountress, Kaitlin; Chassin, Laurie; Pandika, Danielle; Lemery-Chalfant, Kathryn
2016-01-01
Deviance proneness models propose a multi-level interplay in which transactions among genetic, individual, and family risk factors place children at increased risk for substance use. We examined bidirectional transactions between impulsivity and family conflict from middle childhood to adolescence and their contributions to substance use in adolescence and emerging adulthood (n = 380). Moreover, we examined children’s, mothers’ and fathers’ polygenic risk scores for behavioral undercontrol, and mothers’ and fathers’ interparental conflict and substance disorder diagnoses as predictors of these transactions. Results support a developmental cascade model in which children’s polygenic risk scores predicted greater impulsivity in middle childhood. Impulsivity in middle childhood predicted greater family conflict in late childhood, which in turn predicted greater impulsivity in late adolescence. Adolescent impulsivity subsequently predicted greater substance use in emerging adulthood. Results are discussed with respect to evocative genotype-environment correlations within developmental cascades and applications to prevention efforts. PMID:27427799
Ethical principles and pitfalls of genetic testing for dementia.
Hedera, P
2001-01-01
Progress in the genetics of dementing disorders and the availability of clinical tests for practicing physicians increase the need for a better understanding of multifaceted issues associated with genetic testing. The genetics of dementia is complex, and genetic testing is fraught with many ethical concerns. Genetic testing can be considered for patients with a family history suggestive of a single gene disorder as a cause of dementia. Testing of affected patients should be accompanied by competent genetic counseling that focuses on probabilistic implications for at-risk first-degree relatives. Predictive testing of at-risk asymptomatic patients should be modeled after presymptomatic testing for Huntington's disease. Testing using susceptibility genes has only a limited diagnostic value at present because potential improvement in diagnostic accuracy does not justify potentially negative consequences for first-degree relatives. Predictive testing of unaffected subjects using susceptibility genes is currently not recommended because individual risk cannot be quantified and there are no therapeutic interventions for dementia in presymptomatic patients.
Potential and limits for rapid genetic adaptation to warming in a Great Barrier Reef coral.
Matz, Mikhail V; Treml, Eric A; Aglyamova, Galina V; Bay, Line K
2018-04-01
Can genetic adaptation in reef-building corals keep pace with the current rate of sea surface warming? Here we combine population genomics, biophysical modeling, and evolutionary simulations to predict future adaptation of the common coral Acropora millepora on the Great Barrier Reef (GBR). Genomics-derived migration rates were high (0.1-1% of immigrants per generation across half the latitudinal range of the GBR) and closely matched the biophysical model of larval dispersal. Both genetic and biophysical models indicated the prevalence of southward migration along the GBR that would facilitate the spread of heat-tolerant alleles to higher latitudes as the climate warms. We developed an individual-based metapopulation model of polygenic adaptation and parameterized it with population sizes and migration rates derived from the genomic analysis. We find that high migration rates do not disrupt local thermal adaptation, and that the resulting standing genetic variation should be sufficient to fuel rapid region-wide adaptation of A. millepora populations to gradual warming over the next 20-50 coral generations (100-250 years). Further adaptation based on novel mutations might also be possible, but this depends on the currently unknown genetic parameters underlying coral thermal tolerance and the rate of warming realized. Despite this capacity for adaptation, our model predicts that coral populations would become increasingly sensitive to random thermal fluctuations such as ENSO cycles or heat waves, which corresponds well with the recent increase in frequency of catastrophic coral bleaching events.
Human genetics as a model for target validation: finding new therapies for diabetes.
Thomsen, Soren K; Gloyn, Anna L
2017-06-01
Type 2 diabetes is a global epidemic with major effects on healthcare expenditure and quality of life. Currently available treatments are inadequate for the prevention of comorbidities, yet progress towards new therapies remains slow. A major barrier is the insufficiency of traditional preclinical models for predicting drug efficacy and safety. Human genetics offers a complementary model to assess causal mechanisms for target validation. Genetic perturbations are 'experiments of nature' that provide a uniquely relevant window into the long-term effects of modulating specific targets. Here, we show that genetic discoveries over the past decades have accurately predicted (now known) therapeutic mechanisms for type 2 diabetes. These findings highlight the potential for use of human genetic variation for prospective target validation, and establish a framework for future applications. Studies into rare, monogenic forms of diabetes have also provided proof-of-principle for precision medicine, and the applicability of this paradigm to complex disease is discussed. Finally, we highlight some of the limitations that are relevant to the use of genome-wide association studies (GWAS) in the search for new therapies for diabetes. A key outstanding challenge is the translation of GWAS signals into disease biology and we outline possible solutions for tackling this experimental bottleneck.
Fournier-Level, Alexandre; Perry, Emily O.; Wang, Jonathan A.; Braun, Peter T.; Migneault, Andrew; Cooper, Martha D.; Metcalf, C. Jessica E.; Schmitt, Johanna
2016-01-01
Predicting whether and how populations will adapt to rapid climate change is a critical goal for evolutionary biology. To examine the genetic basis of fitness and predict adaptive evolution in novel climates with seasonal variation, we grew a diverse panel of the annual plant Arabidopsis thaliana (multiparent advanced generation intercross lines) in controlled conditions simulating four climates: a present-day reference climate, an increased-temperature climate, a winter-warming only climate, and a poleward-migration climate with increased photoperiod amplitude. In each climate, four successive seasonal cohorts experienced dynamic daily temperature and photoperiod variation over a year. We measured 12 traits and developed a genomic prediction model for fitness evolution in each seasonal environment. This model was used to simulate evolutionary trajectories of the base population over 50 y in each climate, as well as 100-y scenarios of gradual climate change following adaptation to a reference climate. Patterns of plastic and evolutionary fitness response varied across seasons and climates. The increased-temperature climate promoted genetic divergence of subpopulations across seasons, whereas in the winter-warming and poleward-migration climates, seasonal genetic differentiation was reduced. In silico “resurrection experiments” showed limited evolutionary rescue compared with the plastic response of fitness to seasonal climate change. The genetic basis of adaptation and, consequently, the dynamics of evolutionary change differed qualitatively among scenarios. Populations with fewer founding genotypes and populations with genetic diversity reduced by prior selection adapted less well to novel conditions, demonstrating that adaptation to rapid climate change requires the maintenance of sufficient standing variation. PMID:27140640
Fournier-Level, Alexandre; Perry, Emily O; Wang, Jonathan A; Braun, Peter T; Migneault, Andrew; Cooper, Martha D; Metcalf, C Jessica E; Schmitt, Johanna
2016-05-17
Predicting whether and how populations will adapt to rapid climate change is a critical goal for evolutionary biology. To examine the genetic basis of fitness and predict adaptive evolution in novel climates with seasonal variation, we grew a diverse panel of the annual plant Arabidopsis thaliana (multiparent advanced generation intercross lines) in controlled conditions simulating four climates: a present-day reference climate, an increased-temperature climate, a winter-warming only climate, and a poleward-migration climate with increased photoperiod amplitude. In each climate, four successive seasonal cohorts experienced dynamic daily temperature and photoperiod variation over a year. We measured 12 traits and developed a genomic prediction model for fitness evolution in each seasonal environment. This model was used to simulate evolutionary trajectories of the base population over 50 y in each climate, as well as 100-y scenarios of gradual climate change following adaptation to a reference climate. Patterns of plastic and evolutionary fitness response varied across seasons and climates. The increased-temperature climate promoted genetic divergence of subpopulations across seasons, whereas in the winter-warming and poleward-migration climates, seasonal genetic differentiation was reduced. In silico "resurrection experiments" showed limited evolutionary rescue compared with the plastic response of fitness to seasonal climate change. The genetic basis of adaptation and, consequently, the dynamics of evolutionary change differed qualitatively among scenarios. Populations with fewer founding genotypes and populations with genetic diversity reduced by prior selection adapted less well to novel conditions, demonstrating that adaptation to rapid climate change requires the maintenance of sufficient standing variation.
Baudracco, J; Lopez-Villalobos, N; Holmes, C W; Comeron, E A; Macdonald, K A; Barry, T N
2013-05-01
A whole-farm, stochastic and dynamic simulation model was developed to predict biophysical and economic performance of grazing dairy systems. Several whole-farm models simulate grazing dairy systems, but most of them work at a herd level. This model, named e-Dairy, differs from the few models that work at an animal level, because it allows stochastic behaviour of the genetic merit of individual cows for several traits, namely, yields of milk, fat and protein, live weight (LW) and body condition score (BCS) within a whole-farm model. This model accounts for genetic differences between cows, is sensitive to genotype × environment interactions at an animal level and allows pasture growth, milk and supplements price to behave stochastically. The model includes an energy-based animal module that predicts intake at grazing, mammary gland functioning and body lipid change. This whole-farm model simulates a 365-day period for individual cows within a herd, with cow parameters randomly generated on the basis of the mean parameter values, defined as input and variance and co-variances from experimental data sets. The main inputs of e-Dairy are farm area, use of land, type of pasture, type of crops, monthly pasture growth rate, supplements offered, nutritional quality of feeds, herd description including herd size, age structure, calving pattern, BCS and LW at calving, probabilities of pregnancy, average genetic merit and economic values for items of income and costs. The model allows to set management policies to define: dry-off cows (ceasing of lactation), target pre- and post-grazing herbage mass and feed supplementation. The main outputs are herbage dry matter intake, annual pasture utilisation, milk yield, changes in BCS and LW, economic farm profit and return on assets. The model showed satisfactory accuracy of prediction when validated against two data sets from farmlet system experiments. Relative prediction errors were <10% for all variables, and concordance correlation coefficients over 0.80 for annual pasture utilisation, yields of milk and milk solids (MS; fat plus protein), and of 0.69 and 0.48 for LW and BCS, respectively. A simulation of two contrasting dairy systems is presented to show the practical use of the model. The model can be used to explore the effects of feeding level and genetic merit and their interactions for grazing dairy systems, evaluating the trade-offs between profit and the associated risk.
Pussegoda, K; Ross, C J; Visscher, H; Yazdanpanah, M; Brooks, B; Rassekh, S R; Zada, Y F; Dubé, M-P; Carleton, B C; Hayden, M R
2013-08-01
Cisplatin is a widely used chemotherapeutic agent for the treatment of solid tumors. A serious complication of cisplatin treatment is permanent hearing loss. The aim of this study was to replicate previous genetic findings in an independent cohort of 155 pediatric patients. Associations were replicated for genetic variants in TPMT (rs12201199, P = 0.0013, odds ratio (OR) 6.1) and ABCC3 (rs1051640, P = 0.036, OR 1.8). A predictive model combining variants in TPMT, ABCC3, and COMT with clinical variables (patient age, vincristine treatment, germ-cell tumor, and cranial irradiation) significantly improved the prediction of hearing-loss development as compared with using clinical risk factors alone (area under the curve (AUC) 0.786 vs. 0.708, P = 0.00048). The novel combination of genetic and clinical factors predicted the risk of hearing loss with a sensitivity of 50.3% and a specificity of 92.7%. These findings provide evidence to support the importance of TPMT, COMT, and ABCC3 in the prediction of cisplatin-induced hearing loss in children.
Pussegoda, K; Ross, CJ; Visscher, H; Yazdanpanah, M; Brooks, B; Rassekh, SR; Zada, YF; Dubé, M-P; Carleton, BC; Hayden, MR
2014-01-01
Cisplatin is a widely used chemotherapeutic agent for the treatment of solid tumors. A serious complication of cisplatin treatment is permanent hearing loss. The aim of this study was to replicate previous genetic findings in an independent cohort of 155 pediatric patients. Associations were replicated for genetic variants in TPMT (rs12201199, P = 0.0013, odds ratio (OR) 6.1) and ABCC3 (rs1051640, P = 0.036, OR 1.8). A predictive model combining variants in TPMT, ABCC3, and COMT with clinical variables (patient age, vincristine treatment, germ-cell tumor, and cranial irradiation) significantly improved the prediction of hearing-loss development as compared with using clinical risk factors alone (area under the curve (AUC) 0.786 vs. 0.708, P = 0.00048). The novel combination of genetic and clinical factors predicted the risk of hearing loss with a sensitivity of 50.3% and a specificity of 92.7%. These findings provide evidence to support the importance of TPMT, COMT, and ABCC3 in the prediction of cisplatin-induced hearing loss in children. PMID:23588304
Hou, Tingjun; Xu, Xiaojie
2002-12-01
In this study, the relationships between the brain-blood concentration ratio of 96 structurally diverse compounds with a large number of structurally derived descriptors were investigated. The linear models were based on molecular descriptors that can be calculated for any compound simply from a knowledge of its molecular structure. The linear correlation coefficients of the models were optimized by genetic algorithms (GAs), and the descriptors used in the linear models were automatically selected from 27 structurally derived descriptors. The GA optimizations resulted in a group of linear models with three or four molecular descriptors with good statistical significance. The change of descriptor use as the evolution proceeds demonstrates that the octane/water partition coefficient and the partial negative solvent-accessible surface area multiplied by the negative charge are crucial to brain-blood barrier permeability. Moreover, we found that the predictions using multiple QSPR models from GA optimization gave quite good results in spite of the diversity of structures, which was better than the predictions using the best single model. The predictions for the two external sets with 37 diverse compounds using multiple QSPR models indicate that the best linear models with four descriptors are sufficiently effective for predictive use. Considering the ease of computation of the descriptors, the linear models may be used as general utilities to screen the blood-brain barrier partitioning of drugs in a high-throughput fashion.
Integrating paleoecology and genetics of bird populations in two sky island archipelagos
McCormack, John E; Bowen, Bonnie S; Smith, Thomas B
2008-01-01
Background Genetic tests of paleoecological hypotheses have been rare, partly because recent genetic divergence is difficult to detect and time. According to fossil plant data, continuous woodland in the southwestern USA and northern Mexico became fragmented during the last 10,000 years, as warming caused cool-adapted species to retreat to high elevations. Most genetic studies of resulting 'sky islands' have either failed to detect recent divergence or have found discordant evidence for ancient divergence. We test this paleoecological hypothesis for the region with intraspecific mitochondrial DNA and microsatellite data from sky-island populations of a sedentary bird, the Mexican jay (Aphelocoma ultramarina). We predicted that populations on different sky islands would share common, ancestral alleles that existed during the last glaciation, but that populations on each sky island, owing to their isolation, would contain unique variants of postglacial origin. We also predicted that divergence times estimated from corrected genetic distance and a coalescence model would post-date the last glacial maximum. Results Our results provide multiple independent lines of support for postglacial divergence, with the predicted pattern of shared and unique mitochondrial DNA haplotypes appearing in two independent sky-island archipelagos, and most estimates of divergence time based on corrected genetic distance post-dating the last glacial maximum. Likewise, an isolation model based on multilocus gene coalescence indicated postglacial divergence of five pairs of sky islands. In contrast to their similar recent histories, the two archipelagos had dissimilar historical patterns in that sky islands in Arizona showed evidence for older divergence, suggesting different responses to the last glaciation. Conclusion This study is one of the first to provide explicit support from genetic data for a postglacial divergence scenario predicted by one of the best paleoecological records in the world. Our results demonstrate that sky islands act as generators of genetic diversity at both recent and historical timescales and underscore the importance of thorough sampling and the use of loci with fast mutation rates to studies that test hypotheses concerning recent genetic divergence. PMID:18588695
The relationship between population adaptive potential and extinction risk in a changing environment is not well understood. Although the expectation is that genetic diversity is directly related to the capacity of populations to adapt, the statistical and predictive aspects of ...
Understanding patterns of post-establishment spread by invasive species is critically important for the design of effective management strategies and the development of appropriate theoretical models predicting spatial expansion of introduced populations. Here we explore genetic ...
Transgressive Hybrids as Hopeful Monsters.
Dittrich-Reed, Dylan R; Fitzpatrick, Benjamin M
2013-06-01
The origin of novelty is a critical subject for evolutionary biologists. Early geneticists speculated about the sudden appearance of new species via special macromutations, epitomized by Goldschmidt's infamous "hopeful monster". Although these ideas were easily dismissed by the insights of the Modern Synthesis, a lingering fascination with the possibility of sudden, dramatic change has persisted. Recent work on hybridization and gene exchange suggests an underappreciated mechanism for the sudden appearance of evolutionary novelty that is entirely consistent with the principles of modern population genetics. Genetic recombination in hybrids can produce transgressive phenotypes, "monstrous" phenotypes beyond the range of parental populations. Transgressive phenotypes can be products of epistatic interactions or additive effects of multiple recombined loci. We compare several epistatic and additive models of transgressive segregation in hybrids and find that they are special cases of a general, classic quantitative genetic model. The Dobzhansky-Muller model predicts "hopeless" monsters, sterile and inviable transgressive phenotypes. The Bateson model predicts "hopeful" monsters with fitness greater than either parental population. The complementation model predicts both. Transgressive segregation after hybridization can rapidly produce novel phenotypes by recombining multiple loci simultaneously. Admixed populations will also produce many similar recombinant phenotypes at the same time, increasing the probability that recombinant "hopeful monsters" will establish true-breeding evolutionary lineages. Recombination is not the only (or even most common) process generating evolutionary novelty, but might be the most credible mechanism for sudden appearance of new forms.
Constance I. Millar; Bohun B. Kinloch; Robert D. Westfall
1992-01-01
Genetic diversity in sugar plne will be severely reduced by the blister rust pandemic predicted within the next 50 to 75 years. We model effects of the epidemic on genetic diversity at the stand and landscape levels for both natural and artificial regeneration. In natural stands, because natural frequencies of the dominant gene (R) for resistance are low, the most...
Malosetti, Marcos; Ribaut, Jean-Marcel; van Eeuwijk, Fred A.
2013-01-01
Genotype-by-environment interaction (GEI) is an important phenomenon in plant breeding. This paper presents a series of models for describing, exploring, understanding, and predicting GEI. All models depart from a two-way table of genotype by environment means. First, a series of descriptive and explorative models/approaches are presented: Finlay–Wilkinson model, AMMI model, GGE biplot. All of these approaches have in common that they merely try to group genotypes and environments and do not use other information than the two-way table of means. Next, factorial regression is introduced as an approach to explicitly introduce genotypic and environmental covariates for describing and explaining GEI. Finally, QTL modeling is presented as a natural extension of factorial regression, where marker information is translated into genetic predictors. Tests for regression coefficients corresponding to these genetic predictors are tests for main effect QTL expression and QTL by environment interaction (QEI). QTL models for which QEI depends on environmental covariables form an interesting model class for predicting GEI for new genotypes and new environments. For realistic modeling of genotypic differences across multiple environments, sophisticated mixed models are necessary to allow for heterogeneity of genetic variances and correlations across environments. The use and interpretation of all models is illustrated by an example data set from the CIMMYT maize breeding program, containing environments differing in drought and nitrogen stress. To help readers to carry out the statistical analyses, GenStat® programs, 15th Edition and Discovery® version, are presented as “Appendix.” PMID:23487515
Belay, T K; Svendsen, M; Kowalski, Z M; Ådnøy, T
2017-08-01
The aim of this study was to estimate genetic parameters for blood β-hydroxybutyrate (BHB) predicted from milk spectra and for clinical ketosis (KET), and to examine genetic association of blood BHB with KET and milk production traits (milk, fat, protein, and lactose yields, and milk fat, protein, and lactose contents). Data on milk traits, KET, and milk spectra were obtained from the Norwegian Dairy Herd Recording System with legal permission from TINE SA (Ås, Norway), the Norwegian Dairy Association that manages the central database. Data recorded up to 120 d after calving were considered. Blood BHB was predicted from milk spectra using a calibration model developed based on milk spectra and blood BHB measured in Polish dairy cows. The predicted blood BHB was grouped based on days in milk into 4 groups and each group was considered as a trait. The milk components for test-day milk samples were obtained by Fourier transform mid-infrared spectrometer with previously developed calibration equations from Foss (Hillerød, Denmark). Veterinarian-recorded KET data within 15 d before calving to 120 d after calving were used. Data were analyzed using univariate or bivariate linear animal models. Heritability estimates for predicted blood BHB at different stages of lactation were moderate, ranging from 0.250 to 0.365. Heritability estimate for KET from univariate analysis was 0.078, and the corresponding average estimate from bivariate analysis with BHB or milk production traits was 0.002. Genetic correlations between BHB traits were higher for adjacent lactation intervals and decreased as intervals were further apart. Predicted blood BHB at first test day was moderately genetically correlated with KET (0.469) and milk traits (ranged from -0.367 with protein content to 0.277 with milk yield), except for milk fat content from across lactation stages that had near zero genetic correlation with BHB (0.033). These genetic correlations indicate that a lower BHB is genetically associated with higher milk protein and lactose contents, but with lower yields of milk, fat, protein, and lactose, and with lower frequency of KET. Estimates of genetic correlation of KET with milk production traits were from -0.333 (with protein content) to 0.178 (with milk yield). Blood BHB can routinely be predicted from milk spectra analyzed from test-day milk samples, and thereby provides a practical alternative for selecting cows with lower susceptibility to ketosis, even though the correlations are moderate. The Authors. Published by the Federation of Animal Science Societies and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Machine learning derived risk prediction of anorexia nervosa.
Guo, Yiran; Wei, Zhi; Keating, Brendan J; Hakonarson, Hakon
2016-01-20
Anorexia nervosa (AN) is a complex psychiatric disease with a moderate to strong genetic contribution. In addition to conventional genome wide association (GWA) studies, researchers have been using machine learning methods in conjunction with genomic data to predict risk of diseases in which genetics play an important role. In this study, we collected whole genome genotyping data on 3940 AN cases and 9266 controls from the Genetic Consortium for Anorexia Nervosa (GCAN), the Wellcome Trust Case Control Consortium 3 (WTCCC3), Price Foundation Collaborative Group and the Children's Hospital of Philadelphia (CHOP), and applied machine learning methods for predicting AN disease risk. The prediction performance is measured by area under the receiver operating characteristic curve (AUC), indicating how well the model distinguishes cases from unaffected control subjects. Logistic regression model with the lasso penalty technique generated an AUC of 0.693, while Support Vector Machines and Gradient Boosted Trees reached AUC's of 0.691 and 0.623, respectively. Using different sample sizes, our results suggest that larger datasets are required to optimize the machine learning models and achieve higher AUC values. To our knowledge, this is the first attempt to assess AN risk based on genome wide genotype level data. Future integration of genomic, environmental and family-based information is likely to improve the AN risk evaluation process, eventually benefitting AN patients and families in the clinical setting.
Evaluation of a Genetic Risk Score to Improve Risk Prediction for Alzheimer's Disease.
Chouraki, Vincent; Reitz, Christiane; Maury, Fleur; Bis, Joshua C; Bellenguez, Celine; Yu, Lei; Jakobsdottir, Johanna; Mukherjee, Shubhabrata; Adams, Hieab H; Choi, Seung Hoan; Larson, Eric B; Fitzpatrick, Annette; Uitterlinden, Andre G; de Jager, Philip L; Hofman, Albert; Gudnason, Vilmundur; Vardarajan, Badri; Ibrahim-Verbaas, Carla; van der Lee, Sven J; Lopez, Oscar; Dartigues, Jean-François; Berr, Claudine; Amouyel, Philippe; Bennett, David A; van Duijn, Cornelia; DeStefano, Anita L; Launer, Lenore J; Ikram, M Arfan; Crane, Paul K; Lambert, Jean-Charles; Mayeux, Richard; Seshadri, Sudha
2016-06-18
Effective prevention of Alzheimer's disease (AD) requires the development of risk prediction tools permitting preclinical intervention. We constructed a genetic risk score (GRS) comprising common genetic variants associated with AD, evaluated its association with incident AD and assessed its capacity to improve risk prediction over traditional models based on age, sex, education, and APOEɛ4. In eight prospective cohorts included in the International Genomics of Alzheimer's Project (IGAP), we derived weighted sum of risk alleles from the 19 top SNPs reported by the IGAP GWAS in participants aged 65 and older without prevalent dementia. Hazard ratios (HR) of incident AD were estimated in Cox models. Improvement in risk prediction was measured by the difference in C-index (Δ-C), the integrated discrimination improvement (IDI) and continuous net reclassification improvement (NRI>0). Overall, 19,687 participants at risk were included, of whom 2,782 developed AD. The GRS was associated with a 17% increase in AD risk (pooled HR = 1.17; 95% CI = [1.13-1.21] per standard deviation increase in GRS; p-value = 2.86×10-16). This association was stronger among persons with at least one APOEɛ4 allele (HRGRS = 1.24; 95% CI = [1.15-1.34]) than in others (HRGRS = 1.13; 95% CI = [1.08-1.18]; pinteraction = 3.45×10-2). Risk prediction after seven years of follow-up showed a small improvement when adding the GRS to age, sex, APOEɛ4, and education (Δ-Cindex = 0.0043 [0.0019-0.0067]). Similar patterns were observed for IDI and NRI>0. In conclusion, a risk score incorporating common genetic variation outside the APOEɛ4 locus improved AD risk prediction and may facilitate risk stratification for prevention trials.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery
Chen, Yang; Xu, Rong
2017-01-01
Abstract Motivation: Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results: We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach (p
Dong, Zhanshan; Danilevskaya, Olga; Abadie, Tabare; Messina, Carlos; Coles, Nathan; Cooper, Mark
2012-01-01
The transition from the vegetative to reproductive development is a critical event in the plant life cycle. The accurate prediction of flowering time in elite germplasm is important for decisions in maize breeding programs and best agronomic practices. The understanding of the genetic control of flowering time in maize has significantly advanced in the past decade. Through comparative genomics, mutant analysis, genetic analysis and QTL cloning, and transgenic approaches, more than 30 flowering time candidate genes in maize have been revealed and the relationships among these genes have been partially uncovered. Based on the knowledge of the flowering time candidate genes, a conceptual gene regulatory network model for the genetic control of flowering time in maize is proposed. To demonstrate the potential of the proposed gene regulatory network model, a first attempt was made to develop a dynamic gene network model to predict flowering time of maize genotypes varying for specific genes. The dynamic gene network model is composed of four genes and was built on the basis of gene expression dynamics of the two late flowering id1 and dlf1 mutants, the early flowering landrace Gaspe Flint and the temperate inbred B73. The model was evaluated against the phenotypic data of the id1 dlf1 double mutant and the ZMM4 overexpressed transgenic lines. The model provides a working example that leverages knowledge from model organisms for the utilization of maize genomic information to predict a whole plant trait phenotype, flowering time, of maize genotypes.
How and how much does RAD-seq bias genetic diversity estimates?
Cariou, Marie; Duret, Laurent; Charlat, Sylvain
2016-11-08
RAD-seq is a powerful tool, increasingly used in population genomics. However, earlier studies have raised red flags regarding possible biases associated with this technique. In particular, polymorphism on restriction sites results in preferential sampling of closely related haplotypes, so that RAD data tends to underestimate genetic diversity. Here we (1) clarify the theoretical basis of this bias, highlighting the potential confounding effects of population structure and selection, (2) confront predictions to real data from in silico digestion of full genomes and (3) provide a proof of concept toward an ABC-based correction of the RAD-seq bias. Under a neutral and panmictic model, we confirm the previously established relationship between the true polymorphism and its RAD-based estimation, showing a more pronounced bias when polymorphism is high. Using more elaborate models, we show that selection, resulting in heterogeneous levels of polymorphism along the genome, exacerbates the bias and leads to a more pronounced underestimation. On the contrary, spatial genetic structure tends to reduce the bias. We confront the neutral and panmictic model to "ideal" empirical data (in silico RAD-sequencing) using full genomes from natural populations of the fruit fly Drosophila melanogaster and the fungus Shizophyllum commune, harbouring respectively moderate and high genetic diversity. In D. melanogaster, predictions fit the model, but the small difference between the true and RAD polymorphism makes this comparison insensitive to deviations from the model. In the highly polymorphic fungus, the model captures a large part of the bias but makes inaccurate predictions. Accordingly, ABC corrections based on this model improve the estimations, albeit with some imprecisions. The RAD-seq underestimation of genetic diversity associated with polymorphism in restriction sites becomes more pronounced when polymorphism is high. In practice, this means that in many systems where polymorphism does not exceed 2 %, the bias is of minor importance in the face of other sources of uncertainty, such as heterogeneous bases composition or technical artefacts. The neutral panmictic model provides a practical mean to correct the bias through ABC, albeit with some imprecisions. More elaborate ABC methods might integrate additional parameters, such as population structure and selection, but their opposite effects could hinder accurate corrections.
NASA Astrophysics Data System (ADS)
Wu, Jiasheng; Cao, Lin; Zhang, Guoqiang
2018-02-01
Cooling tower of air conditioning has been widely used as cooling equipment, and there will be broad application prospect if it can be reversibly used as heat source under heat pump heating operation condition. In view of the complex non-linear relationship of each parameter in the process of heat and mass transfer inside tower, In this paper, the BP neural network model based on genetic algorithm optimization (GABP neural network model) is established for the reverse use of cross flow cooling tower. The model adopts the structure of 6 inputs, 13 hidden nodes and 8 outputs. With this model, the outlet air dry bulb temperature, wet bulb temperature, water temperature, heat, sensible heat ratio and heat absorbing efficiency, Lewis number, a total of 8 the proportion of main performance parameters were predicted. Furthermore, the established network model is used to predict the water temperature and heat absorption of the tower at different inlet temperatures. The mean relative error MRE between BP predicted value and experimental value are 4.47%, 3.63%, 2.38%, 3.71%, 6.35%,3.14%, 13.95% and 6.80% respectively; the mean relative error MRE between GABP predicted value and experimental value are 2.66%, 3.04%, 2.27%, 3.02%, 6.89%, 3.17%, 11.50% and 6.57% respectively. The results show that the prediction results of GABP network model are better than that of BP network model; the simulation results are basically consistent with the actual situation. The GABP network model can well predict the heat and mass transfer performance of the cross flow cooling tower.
Lobo, Daniel; Morokuma, Junji; Levin, Michael
2016-09-01
Automated computational methods can infer dynamic regulatory network models directly from temporal and spatial experimental data, such as genetic perturbations and their resultant morphologies. Recently, a computational method was able to reverse-engineer the first mechanistic model of planarian regeneration that can recapitulate the main anterior-posterior patterning experiments published in the literature. Validating this comprehensive regulatory model via novel experiments that had not yet been performed would add in our understanding of the remarkable regeneration capacity of planarian worms and demonstrate the power of this automated methodology. Using the Michigan Molecular Interactions and STRING databases and the MoCha software tool, we characterized as hnf4 an unknown regulatory gene predicted to exist by the reverse-engineered dynamic model of planarian regeneration. Then, we used the dynamic model to predict the morphological outcomes under different single and multiple knock-downs (RNA interference) of hnf4 and its predicted gene pathway interactors β-catenin and hh Interestingly, the model predicted that RNAi of hnf4 would rescue the abnormal regenerated phenotype (tailless) of RNAi of hh in amputated trunk fragments. Finally, we validated these predictions in vivo by performing the same surgical and genetic experiments with planarian worms, obtaining the same phenotypic outcomes predicted by the reverse-engineered model. These results suggest that hnf4 is a regulatory gene in planarian regeneration, validate the computational predictions of the reverse-engineered dynamic model, and demonstrate the automated methodology for the discovery of novel genes, pathways and experimental phenotypes. michael.levin@tufts.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Wolfe, Marnin D; Kulakow, Peter; Rabbi, Ismail Y; Jannink, Jean-Luc
2016-08-31
In clonally propagated crops, non-additive genetic effects can be effectively exploited by the identification of superior genetic individuals as varieties. Cassava (Manihot esculenta Crantz) is a clonally propagated staple food crop that feeds hundreds of millions. We quantified the amount and nature of non-additive genetic variation for three key traits in a breeding population of cassava from sub-Saharan Africa using additive and non-additive genome-wide marker-based relationship matrices. We then assessed the accuracy of genomic prediction for total (additive plus non-additive) genetic value. We confirmed previous findings based on diallel populations, that non-additive genetic variation is significant for key cassava traits. Specifically, we found that dominance is particularly important for root yield and epistasis contributes strongly to variation in CMD resistance. Further, we showed that total genetic value predicted observed phenotypes more accurately than additive only models for root yield but not for dry matter content, which is mostly additive or for CMD resistance, which has high narrow-sense heritability. We address the implication of these results for cassava breeding and put our work in the context of previous results in cassava, and other plant and animal species. Copyright © 2016 Author et al.
Kuo, Ho-Chang; Wong, Henry Sung-Ching; Chang, Wei-Pin; Chen, Ben-Kuen; Wu, Mei-Shin; Yang, Kuender D; Hsieh, Kai-Sheng; Hsu, Yu-Wen; Liu, Shih-Feng; Liu, Xiao; Chang, Wei-Chiao
2017-10-01
Intravenous immunoglobulin (IVIG) is the treatment of choice in Kawasaki disease (KD). IVIG is used to prevent cardiovascular complications related to KD. However, a proportion of KD patients have persistent fever after IVIG treatment and are defined as IVIG resistant. To develop a risk scoring system based on genetic markers to predict IVIG responsiveness in KD patients, a total of 150 KD patients (126 IVIG responders and 24 IVIG nonresponders) were recruited for this study. A genome-wide association analysis was performed to compare the 2 groups and identified risk alleles for IVIG resistance. A weighted genetic risk score was calculated by the natural log of the odds ratio multiplied by the number of risk alleles. Eleven single-nucleotide polymorphisms were identified by genome-wide association study. The KD patients were categorized into 3 groups based on their calculated weighted genetic risk score. Results indicated a significant association between weighted genetic risk score (groups 3 and 4 versus group 1) and the response to IVIG (Fisher's exact P value 4.518×10 - 03 and 8.224×10 - 10 , respectively). This is the first weighted genetic risk score study based on a genome-wide association study in KD. The predictive model integrated the additive effects of all 11 single-nucleotide polymorphisms to provide a prediction of the responsiveness to IVIG. © 2017 The Authors.
Wang, Zhuo; Danziger, Samuel A; Heavner, Benjamin D; Ma, Shuyi; Smith, Jennifer J; Li, Song; Herricks, Thurston; Simeonidis, Evangelos; Baliga, Nitin S; Aitchison, John D; Price, Nathan D
2017-05-01
Gene regulatory and metabolic network models have been used successfully in many organisms, but inherent differences between them make networks difficult to integrate. Probabilistic Regulation Of Metabolism (PROM) provides a partial solution, but it does not incorporate network inference and underperforms in eukaryotes. We present an Integrated Deduced And Metabolism (IDREAM) method that combines statistically inferred Environment and Gene Regulatory Influence Network (EGRIN) models with the PROM framework to create enhanced metabolic-regulatory network models. We used IDREAM to predict phenotypes and genetic interactions between transcription factors and genes encoding metabolic activities in the eukaryote, Saccharomyces cerevisiae. IDREAM models contain many fewer interactions than PROM and yet produce significantly more accurate growth predictions. IDREAM consistently outperformed PROM using any of three popular yeast metabolic models and across three experimental growth conditions. Importantly, IDREAM's enhanced accuracy makes it possible to identify subtle synthetic growth defects. With experimental validation, these novel genetic interactions involving the pyruvate dehydrogenase complex suggested a new role for fatty acid-responsive factor Oaf1 in regulating acetyl-CoA production in glucose grown cells.
Effects of social contact and zygosity on 21-y weight change in male twins.
McCaffery, Jeanne M; Franz, Carol E; Jacobson, Kristen; Leahey, Tricia M; Xian, Hong; Wing, Rena R; Lyons, Michael J; Kremen, William S
2011-08-01
Recent evidence indicates that social contact is related to similarities in weight gain over time. However, no studies have examined this effect in a twin design, in which genetic and other environmental effects can also be estimated. We determined whether the frequency of social contact is associated with similarity in weight change from young adulthood (mean age: 20 y) to middle age (mean age: 41 y) in twins and quantified the percentage of variance in weight change attributable to social contact, genetic factors, and other environmental influences. Participants were 1966 monozygotic and 1529 dizygotic male twin pairs from the Vietnam-Era Twin Registry. Regression models tested whether frequency of social contact and zygosity predicted twin pair similarity in body mass index (BMI) change and weight change. Twin modeling was used to partition the percentage variance attributable to social contact, genetic, and other environmental effects. Twins gained an average of 3.99 BMI units, or 13.23 kg (29.11 lb), over 21 y. In regression models, both zygosity (P < 0.001) and degree of social contact (P < 0.02) significantly predicted twin pair similarity in BMI change. In twin modeling, social contact between twins contributed 16% of the variance in BMI change (P < 0.001), whereas genetic factors contributed 42%, with no effect of additional shared environmental factors (1%). Similar results were obtained for weight change. Frequency of social contact significantly predicted twin pair similarity in BMI and weight change over 21 y, independent of zygosity and other shared environmental influences.
Laurenson, Yan C S M; Kyriazakis, Ilias; Bishop, Stephen C
2013-10-18
Estimated breeding values (EBV) for faecal egg count (FEC) and genetic markers for host resistance to nematodes may be used to identify resistant animals for selective breeding programmes. Similarly, targeted selective treatment (TST) requires the ability to identify the animals that will benefit most from anthelmintic treatment. A mathematical model was used to combine the concepts and evaluate the potential of using genetic-based methods to identify animals for a TST regime. EBVs obtained by genomic prediction were predicted to be the best determinant criterion for TST in terms of the impact on average empty body weight and average FEC, whereas pedigree-based EBVs for FEC were predicted to be marginally worse than using phenotypic FEC as a determinant criterion. Whilst each method has financial implications, if the identification of host resistance is incorporated into a wider genomic selection indices or selective breeding programmes, then genetic or genomic information may be plausibly included in TST regimes. Copyright © 2013 Elsevier B.V. All rights reserved.
[The genetics of thrombosis in cancer].
Soria, José Manuel; López, Sonia
2015-01-01
Venous thromboembolism (VTE) is a multifactorial and complex disease in which the interaction of genetic factors (estimated at 60%) and environmental factors (e.g., the use of oral contraceptives, pregnancy, immobility and cancer) determine the risk of thrombosis for each individual. In particular, the association between thrombosis and cancer is well established. Approximately 20% of patients with cancer develop a thromboembolic event over the course of the natural history of the tumor process, with thrombosis being the second leading cause of death for these patients. One of the greatest challenges currently facing the field of oncology is the identification of patients at high risk of VTE who can benefit from thromboprophylaxis. Currently, there is a VTE risk prediction model for patients with cancer (the Khorana risk score); however, its ability to identify patients at high risk is very low. It is important to note that this score, which is based on five clinical parameters, ignores the genetic variability associated with VTE risk. In this article, we present the preliminary results of the Oncothromb study, whose objective is to develop an individual VTE risk prediction model for patients with cancer who are treated with outpatient chemotherapy. Our model includes the clinical and genetic data on each patient (Thrombo inCode(®) genetic profile). Only by integrating multiple layers of biological information (clinical, plasmatic and genetic) we could obtain models that provide accurate information as to which patients are at high risk of developing a thromboembolic event associated with cancer so as to take appropriate prophylactic measures. Copyright © 2015 Elsevier España, S.L.U. All rights reserved.
Validation of test-day models for genetic evaluation of dairy goats in Norway.
Andonov, S; Ødegård, J; Boman, I A; Svendsen, M; Holme, I J; Adnøy, T; Vukovic, V; Klemetsdal, G
2007-10-01
Test-day data for daily milk yield and fat, protein, and lactose content were sampled from the years 1988 to 2003 in 17 flocks belonging to 2 genetically well-tied buck circles. In total, records from 2,111 to 2,215 goats for content traits and 2,371 goats for daily milk yield were included in the analysis, averaging 2.6 and 4.8 observations per goat for the 2 groups of traits, respectively. The data were analyzed by using 4 test-day models with different modeling of fixed effects. Model [0] (the reference model) contained a fixed effect of year-season of kidding with regression on Ali-Schaeffer polynomials nested within the year-season classes, and a random effect of flock test-day. In model [1], the lactation curve effect from model [0] was replaced by a fixed effect of days in milk (in 3-d periods), the same for all year-seasons of kidding. Models [2] and [3] were obtained from model [1] by removing the fixed year-season of kidding effect and considering the flock test-day effect as either fixed or random, respectively. The models were compared by using 2 criteria: mean-squared error of prediction and a test of bias affecting the genetic trend. The first criterion indicated a preference for model [3], whereas the second criterion preferred model [1]. Mean-squared error of prediction is based on model fit, whereas the second criterion tests the ability of the model to produce unbiased genetic evaluation (i.e., its capability of separating environmental and genetic time trends). Thus, a fixed structure with year (year, year-season, or possibly flock-year) was indicated to appropriately separate time trends. Heritability estimates for daily milk yield and milk content were 0.26 and 0.24 to 0.27, respectively.
Taheri, Mahboobeh; Mohebbi, Ali
2008-08-30
In this study, a new approach for the auto-design of neural networks, based on a genetic algorithm (GA), has been used to predict collection efficiency in venturi scrubbers. The experimental input data, including particle diameter, throat gas velocity, liquid to gas flow rate ratio, throat hydraulic diameter, pressure drop across the venturi scrubber and collection efficiency as an output, have been used to create a GA-artificial neural network (ANN) model. The testing results from the model are in good agreement with the experimental data. Comparison of the results of the GA optimized ANN model with the results from the trial-and-error calibrated ANN model indicates that the GA-ANN model is more efficient. Finally, the effects of operating parameters such as liquid to gas flow rate ratio, throat gas velocity, and particle diameter on collection efficiency were determined.
20170312 - In Silico Dynamics: computer simulation in a ...
Abstract: Utilizing cell biological information to predict higher order biological processes is a significant challenge in predictive toxicology. This is especially true for highly dynamical systems such as the embryo where morphogenesis, growth and differentiation require precisely orchestrated interactions between diverse cell populations. In patterning the embryo, genetic signals setup spatial information that cells then translate into a coordinated biological response. This can be modeled as ‘biowiring diagrams’ representing genetic signals and responses. Because the hallmark of multicellular organization resides in the ability of cells to interact with one another via well-conserved signaling pathways, multiscale computational (in silico) models that enable these interactions provide a platform to translate cellular-molecular lesions perturbations into higher order predictions. Just as ‘the Cell’ is the fundamental unit of biology so too should it be the computational unit (‘Agent’) for modeling embryogenesis. As such, we constructed multicellular agent-based models (ABM) with ‘CompuCell3D’ (www.compucell3d.org) to simulate kinematics of complex cell signaling networks and enable critical tissue events for use in predictive toxicology. Seeding the ABMs with HTS/HCS data from ToxCast demonstrated the potential to predict, quantitatively, the higher order impacts of chemical disruption at the cellular or bioche
In Silico Dynamics: computer simulation in a Virtual Embryo ...
Abstract: Utilizing cell biological information to predict higher order biological processes is a significant challenge in predictive toxicology. This is especially true for highly dynamical systems such as the embryo where morphogenesis, growth and differentiation require precisely orchestrated interactions between diverse cell populations. In patterning the embryo, genetic signals setup spatial information that cells then translate into a coordinated biological response. This can be modeled as ‘biowiring diagrams’ representing genetic signals and responses. Because the hallmark of multicellular organization resides in the ability of cells to interact with one another via well-conserved signaling pathways, multiscale computational (in silico) models that enable these interactions provide a platform to translate cellular-molecular lesions perturbations into higher order predictions. Just as ‘the Cell’ is the fundamental unit of biology so too should it be the computational unit (‘Agent’) for modeling embryogenesis. As such, we constructed multicellular agent-based models (ABM) with ‘CompuCell3D’ (www.compucell3d.org) to simulate kinematics of complex cell signaling networks and enable critical tissue events for use in predictive toxicology. Seeding the ABMs with HTS/HCS data from ToxCast demonstrated the potential to predict, quantitatively, the higher order impacts of chemical disruption at the cellular or biochemical level. This is demonstrate
Model-driven discovery of underground metabolic functions in Escherichia coli.
Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M
2015-01-20
Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.
Mannering, Anne M; Harold, Gordon T; Leve, Leslie D; Shelton, Katherine H; Shaw, Daniel S; Conger, Rand D; Neiderhiser, Jenae M; Scaramella, Laura V; Reiss, David
2011-01-01
This study examined the longitudinal association between marital instability and child sleep problems at ages 9 and 18 months in 357 families with a genetically unrelated infant adopted at birth. This design eliminates shared genes as an explanation for similarities between parent and child. Structural equation modeling indicated that T1 marital instability predicted T2 child sleep problems, but T1 child sleep problems did not predict T2 marital instability. This result was replicated when models were estimated separately for mothers and fathers. Thus, even after controlling for stability in sleep problems and marital instability and eliminating shared genetic influences on associations using a longitudinal adoption design, marital instability prospectively predicts early childhood sleep patterns. © 2011 The Authors. Child Development © 2011 Society for Research in Child Development, Inc.
Assessing Multivariate Constraints to Evolution across Ten Long-Term Avian Studies
Teplitsky, Celine; Tarka, Maja; Møller, Anders P.; Nakagawa, Shinichi; Balbontín, Javier; Burke, Terry A.; Doutrelant, Claire; Gregoire, Arnaud; Hansson, Bengt; Hasselquist, Dennis; Gustafsson, Lars; de Lope, Florentino; Marzal, Alfonso; Mills, James A.; Wheelwright, Nathaniel T.; Yarrall, John W.; Charmantier, Anne
2014-01-01
Background In a rapidly changing world, it is of fundamental importance to understand processes constraining or facilitating adaptation through microevolution. As different traits of an organism covary, genetic correlations are expected to affect evolutionary trajectories. However, only limited empirical data are available. Methodology/Principal Findings We investigate the extent to which multivariate constraints affect the rate of adaptation, focusing on four morphological traits often shown to harbour large amounts of genetic variance and considered to be subject to limited evolutionary constraints. Our data set includes unique long-term data for seven bird species and a total of 10 populations. We estimate population-specific matrices of genetic correlations and multivariate selection coefficients to predict evolutionary responses to selection. Using Bayesian methods that facilitate the propagation of errors in estimates, we compare (1) the rate of adaptation based on predicted response to selection when including genetic correlations with predictions from models where these genetic correlations were set to zero and (2) the multivariate evolvability in the direction of current selection to the average evolvability in random directions of the phenotypic space. We show that genetic correlations on average decrease the predicted rate of adaptation by 28%. Multivariate evolvability in the direction of current selection was systematically lower than average evolvability in random directions of space. These significant reductions in the rate of adaptation and reduced evolvability were due to a general nonalignment of selection and genetic variance, notably orthogonality of directional selection with the size axis along which most (60%) of the genetic variance is found. Conclusions These results suggest that genetic correlations can impose significant constraints on the evolution of avian morphology in wild populations. This could have important impacts on evolutionary dynamics and hence population persistence in the face of rapid environmental change. PMID:24608111
Reduction of a metapopulation genetic model to an effective one-island model
NASA Astrophysics Data System (ADS)
Parra-Rojas, César; McKane, Alan J.
2018-04-01
We explore a model of metapopulation genetics which is based on a more ecologically motivated approach than is frequently used in population genetics. The size of the population is regulated by competition between individuals, rather than by artificially imposing a fixed population size. The increased complexity of the model is managed by employing techniques often used in the physical sciences, namely exploiting time-scale separation to eliminate fast variables and then constructing an effective model from the slow modes. We analyse this effective model and show that the predictions for the probability of fixation of the alleles and the mean time to fixation agree well with those found from numerical simulations of the original model. Contribution to the Focus Issue Evolutionary Modeling and Experimental Evolution edited by José Cuesta, Joachim Krug and Susanna Manrubia.
Genetic grouping strategies in selection efficiency of composite beef cattle ( × ).
Petrini, J; Pertile, S F N; Eler, J P; Ferraz, J B S; Mattos, E C; Figueiredo, L G G; Mourão, G B
2015-02-01
The inclusion of genetic groups in sire evaluation has been widely used to represent genetic differences among animals not accounted for by the absence of parentage data. However, the definition of these groups is still arbitrary, and studies assessing the effects of genetic grouping strategies on the selection efficiency are rare. Therefore, the aim in this study was to compare genetic grouping strategies for animals with unknown parentage in prediction of breeding values (EBV). The total of 179,302 records of weaning weight (WW), 29,825 records of scrotal circumference (SC), and 70,302 records of muscling score (MUSC) from Montana Tropical animals, a Brazilian composite beef cattle population, were used. Genetic grouping strategies involving year of birth, sex of the unknown parent, birth farm, breed composition, and their combinations were evaluated. Estimated breeding values were predicted for each approach simulating a loss of genealogy data. Thereafter, these EBV were compared to those obtained in an analysis involving a real relationship matrix to estimate selection efficiency and correlations between EBV and animal rankings. The analysis model included the fixed effects of contemporary groups and class of the dam age at calving, the covariates of additive and nonadditive genetic effects, and age, and the additive genetic effect of animal as random effects. A second model also included the fixed effects of genetic group. The use of genetic groups resulted in means of selection efficiency and correlation of 70.4 to 97.1% and 0.51 to 0.94 for WW, 85.8 to 98.8% and 0.82 to 0.98 for SC, and 85.1 to 98.6% and 0.74 to 0.97 for MUSC, respectively. High selection efficiencies were observed for year of birth and breed composition strategies. The maximum absolute difference in annual genetic gain estimated through the use of complete genealogy and genetic groups were 0.38 kg for WW, 0.02 cm for SC, and 0.01 for MUSC, with lower differences obtained when year of birth was adopted as a genetic group criterion. Grouping strategy must consider selection decisions and the number of genetic groups formed, in the way that genetic groups represent the genetic differences in population and allow an adequate prediction of EBV.
Predicted extinction of unique genetic diversity in marine forests of Cystoseira spp.
Buonomo, Roberto; Chefaoui, Rosa M; Lacida, Ricardo Bermejo; Engelen, Aschwin H; Serrão, Ester A; Airoldi, Laura
2018-07-01
Climate change is inducing shifts in species ranges across the globe. These can affect the genetic pools of species, including loss of genetic variability and evolutionary potential. In particular, geographically enclosed ecosystems, like the Mediterranean Sea, have a higher risk of suffering species loss and genetic erosion due to barriers to further range shifts and to dispersal. In this study, we address these questions for three habitat-forming seaweed species, Cystoseira tamariscifolia, C. amentacea and C. compressa, throughout their entire ranges in the Atlantic and Mediterranean regions. We aim to 1) describe their population genetic structure and diversity, 2) model the present and predict the future distribution and 3) assess the consequences of predicted future range shifts for their population genetic structure, according to two contrasting future climate change scenarios. A net loss of suitable areas was predicted in both climatic scenarios across the range of distribution of the three species. This loss was particularly severe for C. amentacea in the Mediterranean Sea (less 90% in the most extreme climatic scenario), suggesting that the species could become potentially at extinction risk. For all species, genetic data showed very differentiated populations, indicating low inter-population connectivity, and high and distinct genetic diversity in areas that were predicted to become lost, causing erosion of unique evolutionary lineages. Our results indicated that the Mediterranean Sea is the most threatened region, where future suitable Cystoseira habitats will become more limited. This is likely to have wider ecosystem impacts as there is a lack of species with the same ecological niche and functional role in the Mediterranean. The projected accelerated loss of already fragmented and disturbed populations and the long-term genetic effects highlight the urge for local scale management strategies that sustain the capacity of these habitat-forming species to persist despite climatic impacts while waiting for global emission reductions. Copyright © 2018 Elsevier Ltd. All rights reserved.
Field, J; Solís, C R; Queller, D C; Strassmann, J E
1998-06-01
Recent models postulate that the members of a social group assess their ecological and social environments and agree a "social contract" of reproductive partitioning (skew). We tested social contracts theory by using DNA microsatellites to measure skew in 24 cofoundress associations of paper wasps, Polistes bellicosus. In contrast to theoretical predictions, there was little variation in cofoundress relatedness, and relatedness either did not predict skew or was negatively correlated with it; the dominant/subordinate size ratio, assumed to reflect relative fighting ability, did not predict skew; and high skew was associated with decreased aggression by the rank 2 subordinate toward the dominant. High skew was associated with increased group size. A difficulty with measuring skew in real systems is the frequent changes in group composition that commonly occur in social animals. In P. bellicosus, 61% of egg layers and an unknown number of non-egg layers were absent by the time nests were collected. The social contracts models provide an attractive general framework linking genetics, ecology, and behavior, but there have been few direct tests of their predictions. We question assumptions underlying the models and suggest directions for future research.
A population genetic interpretation of GWAS findings for human quantitative traits
Bullaughey, Kevin; Hudson, Richard R.; Sella, Guy
2018-01-01
Human genome-wide association studies (GWASs) are revealing the genetic architecture of anthropomorphic and biomedical traits, i.e., the frequencies and effect sizes of variants that contribute to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes—notably, by mutation, natural selection, and genetic drift. Because many quantitative traits are subject to stabilizing selection and because genetic variation that affects one trait often affects many others, we model the genetic architecture of a focal trait that arises under stabilizing selection in a multidimensional trait space. We solve the model for the phenotypic distribution and allelic dynamics at steady state and derive robust, closed-form solutions for summary statistics of the genetic architecture. Our results provide a simple interpretation for missing heritability and why it varies among traits. They predict that the distribution of variances contributed by loci identified in GWASs is well approximated by a simple functional form that depends on a single parameter: the expected contribution to genetic variance of a strongly selected site affecting the trait. We test this prediction against the results of GWASs for height and body mass index (BMI) and find that it fits the data well, allowing us to make inferences about the degree of pleiotropy and mutational target size for these traits. Our findings help to explain why the GWAS for height explains more of the heritable variance than the similarly sized GWAS for BMI and to predict the increase in explained heritability with study sample size. Considering the demographic history of European populations, in which these GWASs were performed, we further find that most of the associations they identified likely involve mutations that arose shortly before or during the Out-of-Africa bottleneck at sites with selection coefficients around s = 10−3. PMID:29547617
An integrated approach to characterize genetic interaction networks in yeast metabolism
Szappanos, Balázs; Kovács, Károly; Szamecz, Béla; Honti, Frantisek; Costanzo, Michael; Baryshnikova, Anastasia; Gelius-Dietrich, Gabriel; Lercher, Martin J.; Jelasity, Márk; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Oliver, Stephen G.; Pál, Csaba; Papp, Balázs
2011-01-01
Intense experimental and theoretical efforts have been made to globally map genetic interactions, yet we still do not understand how gene-gene interactions arise from the operation of biomolecular networks. To bridge the gap between empirical and computational studies, we: i) quantitatively measure genetic interactions between ~185,000 metabolic gene pairs in Saccharomyces cerevisiae, ii) superpose the data on a detailed systems biology model of metabolism, and iii) introduce a machine-learning method to reconcile empirical interaction data with model predictions. We systematically investigate the relative impacts of functional modularity and metabolic flux coupling on the distribution of negative and positive genetic interactions. We also provide a mechanistic explanation for the link between the degree of genetic interaction, pleiotropy, and gene dispensability. Last, we demonstrate the feasibility of automated metabolic model refinement by correcting misannotations in NAD biosynthesis and confirming them by in vivo experiments. PMID:21623372
Multifactorial disease risk calculator: Risk prediction for multifactorial disease pedigrees.
Campbell, Desmond D; Li, Yiming; Sham, Pak C
2018-03-01
Construction of multifactorial disease models from epidemiological findings and their application to disease pedigrees for risk prediction is nontrivial for all but the simplest of cases. Multifactorial Disease Risk Calculator is a web tool facilitating this. It provides a user-friendly interface, extending a reported methodology based on a liability-threshold model. Multifactorial disease models incorporating all the following features in combination are handled: quantitative risk factors (including polygenic scores), categorical risk factors (including major genetic risk loci), stratified age of onset curves, and the partition of the population variance in disease liability into genetic, shared, and unique environment effects. It allows the application of such models to disease pedigrees. Pedigree-related outputs are (i) individual disease risk for pedigree members, (ii) n year risk for unaffected pedigree members, and (iii) the disease pedigree's joint liability distribution. Risk prediction for each pedigree member is based on using the constructed disease model to appropriately weigh evidence on disease risk available from personal attributes and family history. Evidence is used to construct the disease pedigree's joint liability distribution. From this, lifetime and n year risk can be predicted. Example disease models and pedigrees are provided at the website and are used in accompanying tutorials to illustrate the features available. The website is built on an R package which provides the functionality for pedigree validation, disease model construction, and risk prediction. Website: http://grass.cgs.hku.hk:3838/mdrc/current. © 2017 WILEY PERIODICALS, INC.
Rollins, Brent L; Ramakrishnan, Shravanan; Perri, Matthew
2014-01-01
Direct-to-consumer (DTC) advertising of predictive genetic tests (PGTs) has added a new dimension to health advertising. This study used an online survey based on the health belief model framework to examine and more fully understand consumers' responses and behavioral intentions in response to a PGT DTC advertisement. Overall, consumers reported moderate intentions to talk with their doctor and seek more information about PGTs after advertisement exposure, though consumers did not seem ready to take the advertised test or engage in active information search. Those who perceived greater threat from the disease, however, had significantly greater behavioral intentions and information search behavior.
Marceau, Kristine; Ram, Nilam; Neiderhiser, Jenae M; Laurent, Heidemarie K; Shaw, Daniel S; Fisher, Phil; Natsuaki, Misaki N; Leve, Leslie D
2013-11-01
Developmental plasticity models hypothesize the role of genetic and prenatal environmental influences on the development of the hypothalamic-pituitary-adrenal (HPA) axis and highlight that genes and the prenatal environment may moderate early postnatal environmental influences on HPA functioning. This article examines the interplay of genetic, prenatal and parenting influences across the first 4.5 years of life on a novel index of children's cortisol variability. Repeated measures data were obtained from 134 adoption-linked families, adopted children and both their adoptive parents and birth mothers, who participated in a longitudinal, prospective US domestic adoption study. Genetic and prenatal influences moderated associations between inconsistency in overreactive parenting from child age 9 months to 4.5 years and children's cortisol variability at 4.5 years differently for mothers and fathers. Among children whose birth mothers had high morning cortisol, adoptive fathers' inconsistent overreactive parenting predicted higher cortisol variability, whereas among children with low birth mother morning cortisol adoptive fathers' inconsistent overreactive parenting predicted lower cortisol variability. Among children who experienced high levels of prenatal risk, adoptive mothers' inconsistent overreactive parenting predicted lower cortisol variability and adoptive fathers' inconsistent overreactive parenting predicted higher cortisol variability, whereas among children who experienced low levels of prenatal risk there were no associations between inconsistent overreactive parenting and children's cortisol variability. Findings supported developmental plasticity models and uncovered novel developmental, gene × environment and prenatal × environment influences on children's cortisol functioning.
Quantitative genetic methods depending on the nature of the phenotypic trait.
de Villemereuil, Pierre
2018-01-24
A consequence of the assumptions of the infinitesimal model, one of the most important theoretical foundations of quantitative genetics, is that phenotypic traits are predicted to be most often normally distributed (so-called Gaussian traits). But phenotypic traits, especially those interesting for evolutionary biology, might be shaped according to very diverse distributions. Here, I show how quantitative genetics tools have been extended to account for a wider diversity of phenotypic traits using first the threshold model and then more recently using generalized linear mixed models. I explore the assumptions behind these models and how they can be used to study the genetics of non-Gaussian complex traits. I also comment on three recent methodological advances in quantitative genetics that widen our ability to study new kinds of traits: the use of "modular" hierarchical modeling (e.g., to study survival in the context of capture-recapture approaches for wild populations); the use of aster models to study a set of traits with conditional relationships (e.g., life-history traits); and, finally, the study of high-dimensional traits, such as gene expression. © 2018 New York Academy of Sciences.
Forest growth modeling and prediction (Volumes 1 & 2).
Alan R. Ek; Stephen R. Shifley; Thomas E. Burk
1988-01-01
Proceedings of the August 23-27 IUFRO Conference, Minneapolis, Minnesota. Includes 143 manuscripts dealing with growth and yield modeling; regeneration; site characterization; effects of fertilization, genetics, and disturbance; density management; evaluation; estimation; inventory; and application.
Predicting human genetic interactions from cancer genome evolution.
Lu, Xiaowen; Megchelenbrink, Wout; Notebaart, Richard A; Huynen, Martijn A
2015-01-01
Synthetic Lethal (SL) genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75) for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.
Gallien, Laure; Thuiller, Wilfried; Fort, Noémie; Boleda, Marti; Alberto, Florian J; Rioux, Delphine; Lainé, Juliette; Lavergne, Sébastien
2016-01-01
Climatic niche shifts have been documented in a number of invasive species by comparing the native and adventive climatic ranges in which they occur. However, these shifts likely represent changes in the realized climatic niches of invasive species, and may not necessarily be driven by genetic changes in climatic affinities. Until now the role of rapid niche evolution in the spread of invasive species remains a challenging issue with conflicting results. Here, we document a likely genetically-based climatic niche expansion of an annual plant invader, the common ragweed (Ambrosia artemisiifolia L.), a highly allergenic invasive species causing substantial public health issues. To do so, we looked for recent evolutionary change at the upward migration front of its adventive range in the French Alps. Based on species climatic niche models estimated at both global and regional scales we stratified our sampling design to adequately capture the species niche, and localized populations suspected of niche expansion. Using a combination of species niche modeling, landscape genetics models and common garden measurements, we then related the species genetic structure and its phenotypic architecture across the climatic niche. Our results strongly suggest that the common ragweed is rapidly adapting to local climatic conditions at its invasion front and that it currently expands its niche toward colder and formerly unsuitable climates in the French Alps (i.e. in sites where niche models would not predict its occurrence). Such results, showing that species climatic niches can evolve on very short time scales, have important implications for predictive models of biological invasions that do not account for evolutionary processes.
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
Some considerations on the use of ecological models to predict species' geographic distributions
Peterjohn, B.G.
2001-01-01
Peterson (2001) used Genetic Algorithm for Rule-set Prediction (GARP) models to predict distribution patterns from Breeding Bird Survey (BBS) data. Evaluations of these models should consider inherent limitations of BBS data: (1) BBS methods may not sample species and habitats equally; (2) using BBS data for both model development and testing may overlook poor fit of some models; and (3) BBS data may not provide the desired spatial resolution or capture temporal changes in species distributions. The predictive value of GARP models requires additional study, especially comparisons with distribution patterns from independent data sets. When employed at appropriate temporal and geographic scales, GARP models show considerable promise for conservation biology applications but provide limited inferences concerning processes responsible for the observed patterns.
2013-01-01
Background Climatic and sea-level fluctuations throughout the last Pleistocene glacial cycle (~130-0 ka) profoundly influenced present-day distributions and genetic diversity of Northern Hemisphere biotas by forcing range contractions in many species during the glacial advance and allowing expansion following glacial retreat ('expansion-contraction’ model). Evidence for such range dynamics and refugia in the unglaciated Gulf-Atlantic Coastal Plain stems largely from terrestrial species, and aquatic species Pleistocene responses remain relatively uninvestigated. Heterandria formosa, a wide-ranging regional endemic, presents an ideal system to test the expansion-contraction model within this biota. By integrating ecological niche modeling and phylogeography, we infer the Pleistocene history of this livebearing fish (Poeciliidae) and test for several predicted distributional and genetic effects of the last glaciation. Results Paleoclimatic models predicted range contraction to a single southwest Florida peninsula refugium during the Last Glacial Maximum, followed by northward expansion. We inferred spatial-population subdivision into four groups that reflect genetic barriers outside this refuge. Several other features of the genetic data were consistent with predictions derived from an expansion-contraction model: limited intraspecific divergence (e.g. mean mtDNA p-distance = 0.66%); a pattern of mtDNA diversity (mean Hd = 0.934; mean π = 0.007) consistent with rapid, recent population expansion; a lack of mtDNA isolation-by-distance; and clinal variation in allozyme diversity with higher diversity at lower latitudes near the predicted refugium. Statistical tests of mismatch distributions and coalescent simulations of the gene tree lent greater support to a scenario of post-glacial expansion and diversification from a single refugium than to any other model examined (e.g. multiple-refugia scenarios). Conclusions Congruent results from diverse data indicate H. formosa fits the classic Pleistocene expansion-contraction model, even as the genetic data suggest additional ecological influences on population structure. While evidence for Plio-Pleistocene Gulf Coast vicariance is well described for many freshwater species presently codistributed with H. formosa, this species demography and diversification departs notably from this pattern. Species-specific expansion-contraction dynamics may therefore have figured more prominently in shaping Coastal Plain evolutionary history than previously thought. Our findings bolster growing appreciation for the complexity of phylogeographical structuring within North America’s southern refugia, including responses of Coastal Plain freshwater biota to Pleistocene climatic fluctuations. PMID:24107245
A probabilistic model to predict clinical phenotypic traits from genome sequencing.
Chen, Yun-Ching; Douville, Christopher; Wang, Cheng; Niknafs, Noushin; Yeo, Grace; Beleva-Guthrie, Violeta; Carter, Hannah; Stenson, Peter D; Cooper, David N; Li, Biao; Mooney, Sean; Karchin, Rachel
2014-09-01
Genetic screening is becoming possible on an unprecedented scale. However, its utility remains controversial. Although most variant genotypes cannot be easily interpreted, many individuals nevertheless attempt to interpret their genetic information. Initiatives such as the Personal Genome Project (PGP) and Illumina's Understand Your Genome are sequencing thousands of adults, collecting phenotypic information and developing computational pipelines to identify the most important variant genotypes harbored by each individual. These pipelines consider database and allele frequency annotations and bioinformatics classifications. We propose that the next step will be to integrate these different sources of information to estimate the probability that a given individual has specific phenotypes of clinical interest. To this end, we have designed a Bayesian probabilistic model to predict the probability of dichotomous phenotypes. When applied to a cohort from PGP, predictions of Gilbert syndrome, Graves' disease, non-Hodgkin lymphoma, and various blood groups were accurate, as individuals manifesting the phenotype in question exhibited the highest, or among the highest, predicted probabilities. Thirty-eight PGP phenotypes (26%) were predicted with area-under-the-ROC curve (AUC)>0.7, and 23 (15.8%) of these were statistically significant, based on permutation tests. Moreover, in a Critical Assessment of Genome Interpretation (CAGI) blinded prediction experiment, the models were used to match 77 PGP genomes to phenotypic profiles, generating the most accurate prediction of 16 submissions, according to an independent assessor. Although the models are currently insufficiently accurate for diagnostic utility, we expect their performance to improve with growth of publicly available genomics data and model refinement by domain experts.
Genetic factors contribute to bleeding after cardiac surgery.
Welsby, I J; Podgoreanu, M V; Phillips-Bute, B; Mathew, J P; Smith, P K; Newman, M F; Schwinn, D A; Stafford-Smith, M
2005-06-01
Postoperative bleeding remains a common, serious problem for cardiac surgery patients, with striking inter-patient variability poorly explained by clinical, procedural, and biological markers. We tested the hypothesis that genetic polymorphisms of coagulation proteins and platelet glycoproteins are associated with bleeding after cardiac surgery. Seven hundred and eighty patients undergoing aortocoronary surgery with cardiopulmonary bypass were studied. Clinical covariates previously associated with bleeding were recorded and DNA isolated from preoperative blood. Matrix Assisted Laser Desorption/Ionization, Time-Of-Flight (MALDI-TOF) mass spectroscopy or polymerase chain reaction were used for genotype analysis. Multivariable linear regression modeling, including all genetic main effects and two-way gene-gene interactions, related clinical and genetic predictors to bleeding from the thorax and mediastinum. Nineteen candidate polymorphisms were assessed; seven [GPIaIIa-52C>T and 807C>T, GPIb alpha 524C>T, tissue factor-603A>G, prothrombin 20210G>A, tissue factor pathway inhibitor-399C>T, and angiotensin converting enzyme (ACE) deletion/insertion] demonstrate significant association with bleeding (P < 0.01). Adding genetic to clinical predictors results improves the model, doubling overall ability to predict bleeding (P < 0.01). We identified seven genetic polymorphisms associated with bleeding after cardiac surgery. Genetic factors appear primarily independent of, and explain at least as much variation in bleeding as clinical covariates; combining genetic and clinical factors double our ability to predict bleeding after cardiac surgery. Accounting for genotype may be necessary when stratifying risk of bleeding after cardiac surgery.
Green, Cathryn Gordon; Babineau, Vanessa; Jolicoeur-Martineau, Alexia; Bouvette-Turcot, Andrée-Anne; Minde, Klaus; Sassi, Roberto; St-André, Martin; Carrey, Normand; Atkinson, Leslie; Kennedy, James L; Steiner, Meir; Lydon, John; Gaudreau, Helene; Burack, Jacob A; Levitan, Robert; Meaney, Michael J; Wazana, Ashley
2017-08-01
Prenatal maternal depression and a multilocus genetic profile of two susceptibility genes implicated in the stress response were examined in an interaction model predicting negative emotionality in the first 3 years. In 179 mother-infant dyads from the Maternal Adversity, Vulnerability, and Neurodevelopment cohort, prenatal depression (Center for Epidemiologic Studies Depressions Scale) was assessed at 24 to 36 weeks. The multilocus genetic profile score consisted of the number of susceptibility alleles from the serotonin transporter linked polymorphic region gene (5-HTTLPR): no long-rs25531(A) (LA: short/short, short/long-rs25531(G) [LG], or LG/LG] vs. any LA) and the dopamine receptor D4 gene (six to eight repeats vs. two to five repeats). Negative emotionality was extracted from the Infant Behaviour Questionnaire-Revised at 3 and 6 months and the Early Child Behavior Questionnaire at 18 and 36 months. Mixed and confirmatory regression analyses indicated that prenatal depression and the multilocus genetic profile interacted to predict negative emotionality from 3 to 36 months. The results were characterized by a differential susceptibility model at 3 and 6 months and by a diathesis-stress model at 36 months.
Adapting APSIM to model the physiology and genetics of complex adaptive traits in field crops.
Hammer, Graeme L; van Oosterom, Erik; McLean, Greg; Chapman, Scott C; Broad, Ian; Harland, Peter; Muchow, Russell C
2010-05-01
Progress in molecular plant breeding is limited by the ability to predict plant phenotype based on its genotype, especially for complex adaptive traits. Suitably constructed crop growth and development models have the potential to bridge this predictability gap. A generic cereal crop growth and development model is outlined here. It is designed to exhibit reliable predictive skill at the crop level while also introducing sufficient physiological rigour for complex phenotypic responses to become emergent properties of the model dynamics. The approach quantifies capture and use of radiation, water, and nitrogen within a framework that predicts the realized growth of major organs based on their potential and whether the supply of carbohydrate and nitrogen can satisfy that potential. The model builds on existing approaches within the APSIM software platform. Experiments on diverse genotypes of sorghum that underpin the development and testing of the adapted crop model are detailed. Genotypes differing in height were found to differ in biomass partitioning among organs and a tall hybrid had significantly increased radiation use efficiency: a novel finding in sorghum. Introducing these genetic effects associated with plant height into the model generated emergent simulated phenotypic differences in green leaf area retention during grain filling via effects associated with nitrogen dynamics. The relevance to plant breeding of this capability in complex trait dissection and simulation is discussed.
Can multivariate models based on MOAKS predict OA knee pain? Data from the Osteoarthritis Initiative
NASA Astrophysics Data System (ADS)
Luna-Gómez, Carlos D.; Zanella-Calzada, Laura A.; Galván-Tejada, Jorge I.; Galván-Tejada, Carlos E.; Celaya-Padilla, José M.
2017-03-01
Osteoarthritis is the most common rheumatic disease in the world. Knee pain is the most disabling symptom in the disease, the prediction of pain is one of the targets in preventive medicine, this can be applied to new therapies or treatments. Using the magnetic resonance imaging and the grading scales, a multivariate model based on genetic algorithms is presented. Using a predictive model can be useful to associate minor structure changes in the joint with the future knee pain. Results suggest that multivariate models can be predictive with future knee chronic pain. All models; T0, T1 and T2, were statistically significant, all p values were < 0.05 and all AUC > 0.60.
USDA-ARS?s Scientific Manuscript database
Several organizations have developed prediction models for molecular breeding values (MBV) for quantitative growth and carcass traits in beef cattle using BovineSNP50 genotypes and phenotypic or EBV data. MBV for Angus cattle have been developed by IGENITY, Pfizer Animal Genetics, and a collaboratio...
Sieberts, Solveig K.; Zhu, Fan; García-García, Javier; Stahl, Eli; Pratap, Abhishek; Pandey, Gaurav; Pappas, Dimitrios; Aguilar, Daniel; Anton, Bernat; Bonet, Jaume; Eksi, Ridvan; Fornés, Oriol; Guney, Emre; Li, Hongdong; Marín, Manuel Alejandro; Panwar, Bharat; Planas-Iglesias, Joan; Poglayen, Daniel; Cui, Jing; Falcao, Andre O.; Suver, Christine; Hoff, Bruce; Balagurusamy, Venkat S. K.; Dillenberger, Donna; Neto, Elias Chaibub; Norman, Thea; Aittokallio, Tero; Ammad-ud-din, Muhammad; Azencott, Chloe-Agathe; Bellón, Víctor; Boeva, Valentina; Bunte, Kerstin; Chheda, Himanshu; Cheng, Lu; Corander, Jukka; Dumontier, Michel; Goldenberg, Anna; Gopalacharyulu, Peddinti; Hajiloo, Mohsen; Hidru, Daniel; Jaiswal, Alok; Kaski, Samuel; Khalfaoui, Beyrem; Khan, Suleiman Ali; Kramer, Eric R.; Marttinen, Pekka; Mezlini, Aziz M.; Molparia, Bhuvan; Pirinen, Matti; Saarela, Janna; Samwald, Matthias; Stoven, Véronique; Tang, Hao; Tang, Jing; Torkamani, Ali; Vert, Jean-Phillipe; Wang, Bo; Wang, Tao; Wennerberg, Krister; Wineinger, Nathan E.; Xiao, Guanghua; Xie, Yang; Yeung, Rae; Zhan, Xiaowei; Zhao, Cheng; Calaza, Manuel; Elmarakeby, Haitham; Heath, Lenwood S.; Long, Quan; Moore, Jonathan D.; Opiyo, Stephen Obol; Savage, Richard S.; Zhu, Jun; Greenberg, Jeff; Kremer, Joel; Michaud, Kaleb; Barton, Anne; Coenen, Marieke; Mariette, Xavier; Miceli, Corinne; Shadick, Nancy; Weinblatt, Michael; de Vries, Niek; Tak, Paul P.; Gerlag, Danielle; Huizinga, Tom W. J.; Kurreeman, Fina; Allaart, Cornelia F.; Louis Bridges Jr., S.; Criswell, Lindsey; Moreland, Larry; Klareskog, Lars; Saevarsdottir, Saedis; Padyukov, Leonid; Gregersen, Peter K.; Friend, Stephen; Plenge, Robert; Stolovitzky, Gustavo; Oliva, Baldo; Guan, Yuanfang; Mangravite, Lara M.
2016-01-01
Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in ∼one-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h2=0.18, P value=0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data. PMID:27549343
Fischer, Christine; Kuchenbäcker, Karoline; Engel, Christoph; Zachariae, Silke; Rhiem, Kerstin; Meindl, Alfons; Rahner, Nils; Dikow, Nicola; Plendl, Hansjörg; Debatin, Irmgard; Grimm, Tiemo; Gadzicki, Dorothea; Flöttmann, Ricarda; Horvath, Judit; Schröck, Evelin; Stock, Friedrich; Schäfer, Dieter; Schwaab, Ira; Kartsonaki, Christiana; Mavaddat, Nasim; Schlegelberger, Brigitte; Antoniou, Antonis C; Schmutzler, Rita
2013-06-01
Risk prediction models are widely used in clinical genetic counselling. Despite their frequent use, the genetic risk models BOADICEA, BRCAPRO, IBIS and extended Claus model (eCLAUS), used to estimate BRCA1/2 mutation carrier probabilities, have never been comparatively evaluated in a large sample from central Europe. Additionally, a novel version of BOADICEA that incorporates tumour pathology information has not yet been validated. Using data from 7352 German families we estimated BRCA1/2 carrier probabilities under each model and compared their discrimination and calibration. The incremental value of using pathology information in BOADICEA was assessed in a subsample of 4928 pedigrees with available data on breast tumour molecular markers oestrogen receptor, progesterone receptor and human epidermal growth factor 2. BRCAPRO (area under receiver operating characteristic curve (AUC)=0.80 (95% CI 0.78 to 0.81)) and BOADICEA (AUC=0.79 (0.78-0.80)), had significantly higher diagnostic accuracy than IBIS and eCLAUS (p<0.001). The AUC increased when pathology information was used in BOADICEA: AUC=0.81 (95% CI 0.80 to 0.83, p<0.001). At carrier thresholds of 10% and 15%, the net reclassification index was +3.9% and +5.4%, respectively, when pathology was included in the model. Overall, calibration was best for BOADICEA and worst for eCLAUS. With eCLAUS, twice as many mutation carriers were predicted than observed. Our results support the use of BRCAPRO and BOADICEA for decision making regarding genetic testing for BRCA1/2 mutations. However, model calibration has to be improved for this population. eCLAUS should not be used for estimating mutation carrier probabilities in clinical settings. Whenever possible, breast tumour molecular marker information should be taken into account.
Jiang, Y; Zhao, Y; Rodemann, B; Plieske, J; Kollers, S; Korzun, V; Ebmeyer, E; Argillier, O; Hinze, M; Ling, J; Röder, M S; Ganal, M W; Mette, M F; Reif, J C
2015-03-01
Genome-wide mapping approaches in diverse populations are powerful tools to unravel the genetic architecture of complex traits. The main goals of our study were to investigate the potential and limits to unravel the genetic architecture and to identify the factors determining the accuracy of prediction of the genotypic variation of Fusarium head blight (FHB) resistance in wheat (Triticum aestivum L.) based on data collected with a diverse panel of 372 European varieties. The wheat lines were phenotyped in multi-location field trials for FHB resistance and genotyped with 782 simple sequence repeat (SSR) markers, and 9k and 90k single-nucleotide polymorphism (SNP) arrays. We applied genome-wide association mapping in combination with fivefold cross-validations and observed surprisingly high accuracies of prediction for marker-assisted selection based on the detected quantitative trait loci (QTLs). Using a random sample of markers not selected for marker-trait associations revealed only a slight decrease in prediction accuracy compared with marker-based selection exploiting the QTL information. The same picture was confirmed in a simulation study, suggesting that relatedness is a main driver of the accuracy of prediction in marker-assisted selection of FHB resistance. When the accuracy of prediction of three genomic selection models was contrasted for the three marker data sets, no significant differences in accuracies among marker platforms and genomic selection models were observed. Marker density impacted the accuracy of prediction only marginally. Consequently, genomic selection of FHB resistance can be implemented most cost-efficiently based on low- to medium-density SNP arrays.
Sachdeva, Neha; Kumar, G Dinesh; Gupta, Ravi Prakash; Mathur, Anshu Shankar; Manikandan, B; Basu, Biswajit; Tuli, Deepak Kumar
2016-10-01
The aim of the present work was to develop a mathematical model to describe the biomass and (total) lipid productivity of Chlorella pyrenoidosa NCIM 2738 under heterotrophic conditions. Biomass growth rate was predicted by Droop's cell quota model, while changes observed in cell quota (utilization) under carbon excess conditions were used for the modeling and predicting the lipid accumulation rate. The model was simulated under non-limiting (excess) carbon and limiting nitrate concentration and validated with experimental data for the culture grown in batch (flask) mode under different nitrate concentrations. The present model incorporated two modes (growth and stressed) for the prediction of endogenous lipid synthesis/induction and aimed to predict the effect and response of the microalgae under nutrient starvation (stressed) conditions. MATLAB and Genetic Algorithm were employed for the prediction and validation of the model parameters. Copyright © 2016 Elsevier Ltd. All rights reserved.
Factors Motivating Individuals to Consider Genetic Testing for Type 2 Diabetes Risk Prediction
Wessel, Jennifer; Gupta, Jyoti; de Groot, Mary
2016-01-01
The purpose of this study was to identify attitudes and perceptions of willingness to participate in genetic testing for type 2 diabetes (T2D) risk prediction in the general population. Adults (n = 598) were surveyed on attitudes about utilizing genetic testing to predict future risk of T2D. Participants were recruited from public libraries (53%), online registry (37%) and a safety net hospital emergency department (10%). Respondents were 37±11 years old, primarily White (54%), female (69%), college educated (46%), with an annual income ≥$25,000 (56%). Half of participants were interested in genetic testing for T2D (52%) and 81% agreed/strongly agreed genetic testing should be available to the public. Only 57% of individuals knew T2D is preventable. A multivariate model to predict interest in genetic testing was adjusted for age, gender, recruitment location and BMI; significant predictors were motivation (high perceived personal risk of T2D [OR = 4.38 (1.76, 10.9)]; family history [OR = 2.56 (1.46, 4.48)]; desire to know risk prior to disease onset [OR = 3.25 (1.94, 5.42)]; and knowing T2D is preventable [OR = 2.11 (1.24, 3.60)], intention (if the cost is free [OR = 10.2 (4.27, 24.6)]; and learning T2D is preventable [OR = 5.18 (1.95, 13.7)]) and trust of genetic testing results [OR = 0.03 (0.003, 0.30)]. Individuals are interested in genetic testing for T2D risk which offers unique information that is personalized. Financial accessibility, validity of the test and availability of diabetes prevention programs were identified as predictors of interest in T2D testing. PMID:26789839
Factors Motivating Individuals to Consider Genetic Testing for Type 2 Diabetes Risk Prediction.
Wessel, Jennifer; Gupta, Jyoti; de Groot, Mary
2016-01-01
The purpose of this study was to identify attitudes and perceptions of willingness to participate in genetic testing for type 2 diabetes (T2D) risk prediction in the general population. Adults (n = 598) were surveyed on attitudes about utilizing genetic testing to predict future risk of T2D. Participants were recruited from public libraries (53%), online registry (37%) and a safety net hospital emergency department (10%). Respondents were 37 ± 11 years old, primarily White (54%), female (69%), college educated (46%), with an annual income ≥$25,000 (56%). Half of participants were interested in genetic testing for T2D (52%) and 81% agreed/strongly agreed genetic testing should be available to the public. Only 57% of individuals knew T2D is preventable. A multivariate model to predict interest in genetic testing was adjusted for age, gender, recruitment location and BMI; significant predictors were motivation (high perceived personal risk of T2D [OR = 4.38 (1.76, 10.9)]; family history [OR = 2.56 (1.46, 4.48)]; desire to know risk prior to disease onset [OR = 3.25 (1.94, 5.42)]; and knowing T2D is preventable [OR = 2.11 (1.24, 3.60)], intention (if the cost is free [OR = 10.2 (4.27, 24.6)]; and learning T2D is preventable [OR = 5.18 (1.95, 13.7)]) and trust of genetic testing results [OR = 0.03 (0.003, 0.30)]. Individuals are interested in genetic testing for T2D risk which offers unique information that is personalized. Financial accessibility, validity of the test and availability of diabetes prevention programs were identified as predictors of interest in T2D testing.
Predictive model for survival in patients with gastric cancer.
Goshayeshi, Ladan; Hoseini, Benyamin; Yousefli, Zahra; Khooie, Alireza; Etminani, Kobra; Esmaeilzadeh, Abbas; Golabpour, Amin
2017-12-01
Gastric cancer is one of the most prevalent cancers in the world. Characterized by poor prognosis, it is a frequent cause of cancer in Iran. The aim of the study was to design a predictive model of survival time for patients suffering from gastric cancer. This was a historical cohort conducted between 2011 and 2016. Study population were 277 patients suffering from gastric cancer. Data were gathered from the Iranian Cancer Registry and the laboratory of Emam Reza Hospital in Mashhad, Iran. Patients or their relatives underwent interviews where it was needed. Missing values were imputed by data mining techniques. Fifteen factors were analyzed. Survival was addressed as a dependent variable. Then, the predictive model was designed by combining both genetic algorithm and logistic regression. Matlab 2014 software was used to combine them. Of the 277 patients, only survival of 80 patients was available whose data were used for designing the predictive model. Mean ?SD of missing values for each patient was 4.43?.41 combined predictive model achieved 72.57% accuracy. Sex, birth year, age at diagnosis time, age at diagnosis time of patients' family, family history of gastric cancer, and family history of other gastrointestinal cancers were six parameters associated with patient survival. The study revealed that imputing missing values by data mining techniques have a good accuracy. And it also revealed six parameters extracted by genetic algorithm effect on the survival of patients with gastric cancer. Our combined predictive model, with a good accuracy, is appropriate to forecast the survival of patients suffering from Gastric cancer. So, we suggest policy makers and specialists to apply it for prediction of patients' survival.
Wind power prediction based on genetic neural network
NASA Astrophysics Data System (ADS)
Zhang, Suhan
2017-04-01
The scale of grid connected wind farms keeps increasing. To ensure the stability of power system operation, make a reasonable scheduling scheme and improve the competitiveness of wind farm in the electricity generation market, it's important to accurately forecast the short-term wind power. To reduce the influence of the nonlinear relationship between the disturbance factor and the wind power, the improved prediction model based on genetic algorithm and neural network method is established. To overcome the shortcomings of long training time of BP neural network and easy to fall into local minimum and improve the accuracy of the neural network, genetic algorithm is adopted to optimize the parameters and topology of neural network. The historical data is used as input to predict short-term wind power. The effectiveness and feasibility of the method is verified by the actual data of a certain wind farm as an example.
Abdel Moniem, H E M; Schemerhorn, B J; DeWoody, J A; Holland, J D
2016-10-01
Landscape connectivity, the degree to which the landscape structure facilitates or impedes organismal movement and gene flow, is increasingly important to conservationists and land managers. Metrics for describing the undulating shape of continuous habitat surfaces can expand the usefulness of continuous gradient surfaces that describe habitat and predict the flow of organisms and genes. We adopted a landscape gradient model of habitat and used surface metrics of connectivity to model the genetic continuity between populations of the banded longhorn beetle [Typocerus v. velutinus (Olivier)] collected at 17 sites across a fragmentation gradient in Indiana, USA. We tested the hypothesis that greater habitat connectivity facilitates gene flow between beetle populations against a null model of isolation by distance (IBD). We used next-generation sequencing to develop 10 polymorphic microsatellite loci and genotype the individual beetles to assess the population genetic structure. Isolation by distance did not explain the population genetic structure. The surface metrics model of habitat connectivity explained the variance in genetic dissimilarities 30 times better than the IBD model. We conclude that surface metrology of habitat maps is a powerful extension of landscape genetics in heterogeneous landscapes. © 2016 John Wiley & Sons Ltd.
Greenberg, Marisa; Smith, Rachel A
2016-01-01
Genetic test results reveal not only personal information about a person's likelihood of certain medical conditions but also information about the person's genetic relatives. Given the familial nature of genetic information, one's obligation to protect family members may be a motive for disclosing genetic test results, but this claim has not been methodically tested. Existing models of disclosure decision making presume self-interested motives, such as seeking social support, instead of other-interested motives, like familial obligation. This study investigated young adults' (N = 173) motives to share a genetic-based health condition, alpha-1 antitrypsin deficiency, after reading a hypothetical vignette. Results show that social support and familial obligation were both reported as motives for disclosure. In fact, some participants reported familial obligation as their primary motivator for disclosure. Finally, stronger familial obligation predicted increased likelihood of disclosing hypothetical genetic test results. Implications of these results were discussed in reference to theories of disclosure decision-making models and the practice of genetic disclosures.
The evolution of life-history variation in fishes, with particular reference to flatfishes
NASA Astrophysics Data System (ADS)
Roff, Derek A.
This paper explores four aspects of the evolution of life-history variation in fish, with particular reference to the flatfishes: 1. genetic variation and evolutionary response; 2. the size and age at first reproduction; 3. adult lifespan and variation in recruitment; 4. the relationship between reproductive effort and age. Evolutionary response may be limited by previous evolutionary pathways (phylogenetic variation) or by lack of genetic variation due to selection for a single trait. Estimates of heritability suggest, as predicted, that selection is stronger on life-history traits than morphological traits; but there is still adequate genetic variation to permit fairly rapid evolutionary changes. Several approaches to the analysis of the optimal age and size at first reproduction are discussed in the light of a general life-history model based on the assumption that natural selection maximizes r or R 0. It is concluded that one of the most important areas of future research is the relationship between reproduction and mortality. Murphy's hypothesis that the reproductive lifespan should increase with variation in spawning success is shown to be incorrect for fish, at least at the level of interspecific comparison. The model of Charlesworth & León predicting the sufficient condition for reproductive effort to increase with age is tested: in 28 of 31 cases the model predicts an increase of reproductive effort with age. These results suggest that, in general, reproductive effort should increase with age in fish. This prediction is confirmed in the 15 species for which adequate data exist.
Karayianni, Katerina N; Grimaldi, Keith A; Nikita, Konstantina S; Valavanis, Ioannis K
2015-01-01
This paper aims to enlighten the complex etiology beneath obesity by analysing data from a large nutrigenetics study, in which nutritional and genetic factors associated with obesity were recorded for around two thousand individuals. In our previous work, these data have been analysed using artificial neural network methods, which identified optimised subsets of factors to predict one's obesity status. These methods did not reveal though how the selected factors interact with each other in the obtained predictive models. For that reason, parallel Multifactor Dimensionality Reduction (pMDR) was used here to further analyse the pre-selected subsets of nutrigenetic factors. Within pMDR, predictive models using up to eight factors were constructed, further reducing the input dimensionality, while rules describing the interactive effects of the selected factors were derived. In this way, it was possible to identify specific genetic variations and their interactive effects with particular nutritional factors, which are now under further study.
Hoffman, Eric A.; Tye, Matthew R.; Hether, Tyler D.; Savage, Anna E.
2017-01-01
North American amphibians have recently been impacted by two major emerging pathogens, the fungus Batrachochytrium dendrobatidis (Bd) and iridoviruses in the genus Ranavirus (Rv). Environmental factors and host genetics may play important roles in disease dynamics, but few studies incorporate both of these components into their analyses. Here, we investigated the role of environmental and genetic factors in driving Bd and Rv infection prevalence and severity in a biodiversity hot spot, the southeastern United States. We used quantitative PCR to characterize Bd and Rv dynamics in natural populations of three amphibian species: Notophthalmus perstriatus, Hyla squirella and Pseudacris ornata. We combined pathogen data, genetic diversity metrics generated from neutral markers, and environmental variables into general linear models to evaluate how these factors impact infectious disease dynamics. Occurrence, prevalence and intensity of Bd and Rv varied across species and populations, but only one species, Pseudacris ornata, harbored high Bd intensities in the majority of sampled populations. Genetic diversity and climate variables both predicted Bd prevalence, whereas climatic variables alone predicted infection intensity. We conclude that Bd is more abundant in the southeastern United States than previously thought and that genetic and environmental factors are both important for predicting amphibian pathogen dynamics. Incorporating both genetic and environmental information into conservation plans for amphibians is necessary for the development of more effective management strategies to mitigate the impact of emerging infectious diseases. PMID:28448517
A predictive assessment of genetic correlations between traits in chickens using markers.
Momen, Mehdi; Mehrgardi, Ahmad Ayatollahi; Sheikhy, Ayoub; Esmailizadeh, Ali; Fozi, Masood Asadi; Kranis, Andreas; Valente, Bruno D; Rosa, Guilherme J M; Gianola, Daniel
2017-02-01
Genomic selection has been successfully implemented in plant and animal breeding programs to shorten generation intervals and accelerate genetic progress per unit of time. In practice, genomic selection can be used to improve several correlated traits simultaneously via multiple-trait prediction, which exploits correlations between traits. However, few studies have explored multiple-trait genomic selection. Our aim was to infer genetic correlations between three traits measured in broiler chickens by exploring kinship matrices based on a linear combination of measures of pedigree and marker-based relatedness. A predictive assessment was used to gauge genetic correlations. A multivariate genomic best linear unbiased prediction model was designed to combine information from pedigree and genome-wide markers in order to assess genetic correlations between three complex traits in chickens, i.e. body weight at 35 days of age (BW), ultrasound area of breast meat (BM) and hen-house egg production (HHP). A dataset with 1351 birds that were genotyped with the 600 K Affymetrix platform was used. A kinship kernel (K) was constructed as K = λ G + (1 - λ)A, where A is the numerator relationship matrix, measuring pedigree-based relatedness, and G is a genomic relationship matrix. The weight (λ) assigned to each source of information varied over the grid λ = (0, 0.2, 0.4, 0.6, 0.8, 1). Maximum likelihood estimates of heritability and genetic correlations were obtained at each λ, and the "optimum" λ was determined using cross-validation. Estimates of genetic correlations were affected by the weight placed on the source of information used to build K. For example, the genetic correlation between BW-HHP and BM-HHP changed markedly when λ varied from 0 (only A used for measuring relatedness) to 1 (only genomic information used). As λ increased, predictive correlations (correlation between observed phenotypes and predicted breeding values) increased and mean-squared predictive error decreased. However, the improvement in predictive ability was not monotonic, with an optimum found at some 0 < λ < 1, i.e., when both sources of information were used together. Our findings indicate that multiple-trait prediction may benefit from combining pedigree and marker information. Also, it appeared that expected correlated responses to selection computed from standard theory may differ from realized responses. The predictive assessment provided a metric for performance evaluation as well as a means for expressing uncertainty of outcomes of multiple-trait selection.
Application of Response Surface Methods To Determine Conditions for Optimal Genomic Prediction
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2017-01-01
An epistatic genetic architecture can have a significant impact on prediction accuracies of genomic prediction (GP) methods. Machine learning methods predict traits comprised of epistatic genetic architectures more accurately than statistical methods based on additive mixed linear models. The differences between these types of GP methods suggest a diagnostic for revealing genetic architectures underlying traits of interest. In addition to genetic architecture, the performance of GP methods may be influenced by the sample size of the training population, the number of QTL, and the proportion of phenotypic variability due to genotypic variability (heritability). Possible values for these factors and the number of combinations of the factor levels that influence the performance of GP methods can be large. Thus, efficient methods for identifying combinations of factor levels that produce most accurate GPs is needed. Herein, we employ response surface methods (RSMs) to find the experimental conditions that produce the most accurate GPs. We illustrate RSM with an example of simulated doubled haploid populations and identify the combination of factors that maximize the difference between prediction accuracies of best linear unbiased prediction (BLUP) and support vector machine (SVM) GP methods. The greatest impact on the response is due to the genetic architecture of the population, heritability of the trait, and the sample size. When epistasis is responsible for all of the genotypic variance and heritability is equal to one and the sample size of the training population is large, the advantage of using the SVM method vs. the BLUP method is greatest. However, except for values close to the maximum, most of the response surface shows little difference between the methods. We also determined that the conditions resulting in the greatest prediction accuracy for BLUP occurred when genetic architecture consists solely of additive effects, and heritability is equal to one. PMID:28720710
Santana, M L; Eler, J P; Bignardi, A B; Ferraz, J B S
2014-03-01
The objectives of the present study were: (1) to evaluate the importance of genotype × production environment interaction for the genetic evaluation of birth weight (BW) and weaning weight (WW) in a population of composite beef cattle in Brazil, and (2) to investigate the importance of sire × contemporary group interaction (S × CG) to model G × E and improve the accuracy of prediction in routine genetic evaluations of this population. Analyses were performed with one, two (favorable and unfavorable) or three (favorable, intermediate, unfavorable) different definitions of production environments. Thus, BW and WW records of animals in a favorable environment were assigned to either trait 1, in an intermediate environment to trait 2 or in an unfavorable environment to trait 3. The (co)variance components were estimated using Gibbs sampling in single-, bi- or three-trait animal models according to the definition of number of production environments. In general, the estimates of genetic parameters for BW and WW were similar between environments. The additive genetic correlations between production environments were close to unity for BW; however, when examining the highest posterior density intervals, the correlation between favorable and unfavorable environments reached a value of only 0.70, a fact that may lead to changes in the ranking of sires across environments. The posterior mean genetic correlation between direct effects was 0.63 in favorable and unfavorable environments for WW. When S × CG was included in two- or three-trait analyses, all direct genetic correlations were close to unity, suggesting that there was no evidence of a genotype × production environment interaction. Furthermore, the model including S × CG contributed to prevent overestimation of the accuracy of breeding values of sires, provided a lower error of prediction for both direct and maternal breeding values, lower squared bias, residual variance and deviance information criterion than the model omitting S × CG. Thus, the model that included S × CG can therefore be considered the best model on the basis of these criteria. The genotype × production environment interaction should not be neglected in the genetic evaluation of BW and WW in the present population of beef cattle. The inclusion of S × CG in the model is a feasible and plausible alternative to model the effects of G × E in the genetic evaluations.
Saastamoinen, Marjo; Bocedi, Greta; Cote, Julien; Legrand, Delphine; Guillaume, Frédéric; Wheat, Christopher W; Fronhofer, Emanuel A; Garcia, Cristina; Henry, Roslyn; Husby, Arild; Baguette, Michel; Bonte, Dries; Coulon, Aurélie; Kokko, Hanna; Matthysen, Erik; Niitepõld, Kristjan; Nonaka, Etsuko; Stevens, Virginie M; Travis, Justin M J; Donohue, Kathleen; Bullock, James M; Del Mar Delgado, Maria
2018-02-01
Dispersal is a process of central importance for the ecological and evolutionary dynamics of populations and communities, because of its diverse consequences for gene flow and demography. It is subject to evolutionary change, which begs the question, what is the genetic basis of this potentially complex trait? To address this question, we (i) review the empirical literature on the genetic basis of dispersal, (ii) explore how theoretical investigations of the evolution of dispersal have represented the genetics of dispersal, and (iii) discuss how the genetic basis of dispersal influences theoretical predictions of the evolution of dispersal and potential consequences. Dispersal has a detectable genetic basis in many organisms, from bacteria to plants and animals. Generally, there is evidence for significant genetic variation for dispersal or dispersal-related phenotypes or evidence for the micro-evolution of dispersal in natural populations. Dispersal is typically the outcome of several interacting traits, and this complexity is reflected in its genetic architecture: while some genes of moderate to large effect can influence certain aspects of dispersal, dispersal traits are typically polygenic. Correlations among dispersal traits as well as between dispersal traits and other traits under selection are common, and the genetic basis of dispersal can be highly environment-dependent. By contrast, models have historically considered a highly simplified genetic architecture of dispersal. It is only recently that models have started to consider multiple loci influencing dispersal, as well as non-additive effects such as dominance and epistasis, showing that the genetic basis of dispersal can influence evolutionary rates and outcomes, especially under non-equilibrium conditions. For example, the number of loci controlling dispersal can influence projected rates of dispersal evolution during range shifts and corresponding demographic impacts. Incorporating more realism in the genetic architecture of dispersal is thus necessary to enable models to move beyond the purely theoretical towards making more useful predictions of evolutionary and ecological dynamics under current and future environmental conditions. To inform these advances, empirical studies need to answer outstanding questions concerning whether specific genes underlie dispersal variation, the genetic architecture of context-dependent dispersal phenotypes and behaviours, and correlations among dispersal and other traits. © 2017 The Authors. Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
Bocedi, Greta; Cote, Julien; Legrand, Delphine; Guillaume, Frédéric; Wheat, Christopher W.; Fronhofer, Emanuel A.; Garcia, Cristina; Henry, Roslyn; Husby, Arild; Baguette, Michel; Bonte, Dries; Coulon, Aurélie; Kokko, Hanna; Matthysen, Erik; Niitepõld, Kristjan; Nonaka, Etsuko; Stevens, Virginie M.; Travis, Justin M. J.; Donohue, Kathleen; Bullock, James M.; del Mar Delgado, Maria
2017-01-01
ABSTRACT Dispersal is a process of central importance for the ecological and evolutionary dynamics of populations and communities, because of its diverse consequences for gene flow and demography. It is subject to evolutionary change, which begs the question, what is the genetic basis of this potentially complex trait? To address this question, we (i) review the empirical literature on the genetic basis of dispersal, (ii) explore how theoretical investigations of the evolution of dispersal have represented the genetics of dispersal, and (iii) discuss how the genetic basis of dispersal influences theoretical predictions of the evolution of dispersal and potential consequences. Dispersal has a detectable genetic basis in many organisms, from bacteria to plants and animals. Generally, there is evidence for significant genetic variation for dispersal or dispersal‐related phenotypes or evidence for the micro‐evolution of dispersal in natural populations. Dispersal is typically the outcome of several interacting traits, and this complexity is reflected in its genetic architecture: while some genes of moderate to large effect can influence certain aspects of dispersal, dispersal traits are typically polygenic. Correlations among dispersal traits as well as between dispersal traits and other traits under selection are common, and the genetic basis of dispersal can be highly environment‐dependent. By contrast, models have historically considered a highly simplified genetic architecture of dispersal. It is only recently that models have started to consider multiple loci influencing dispersal, as well as non‐additive effects such as dominance and epistasis, showing that the genetic basis of dispersal can influence evolutionary rates and outcomes, especially under non‐equilibrium conditions. For example, the number of loci controlling dispersal can influence projected rates of dispersal evolution during range shifts and corresponding demographic impacts. Incorporating more realism in the genetic architecture of dispersal is thus necessary to enable models to move beyond the purely theoretical towards making more useful predictions of evolutionary and ecological dynamics under current and future environmental conditions. To inform these advances, empirical studies need to answer outstanding questions concerning whether specific genes underlie dispersal variation, the genetic architecture of context‐dependent dispersal phenotypes and behaviours, and correlations among dispersal and other traits. PMID:28776950
Comparison of the theoretical and real-world evolutionary potential of a genetic circuit
NASA Astrophysics Data System (ADS)
Razo-Mejia, M.; Boedicker, J. Q.; Jones, D.; DeLuna, A.; Kinney, J. B.; Phillips, R.
2014-04-01
With the development of next-generation sequencing technologies, many large scale experimental efforts aim to map genotypic variability among individuals. This natural variability in populations fuels many fundamental biological processes, ranging from evolutionary adaptation and speciation to the spread of genetic diseases and drug resistance. An interesting and important component of this variability is present within the regulatory regions of genes. As these regions evolve, accumulated mutations lead to modulation of gene expression, which may have consequences for the phenotype. A simple model system where the link between genetic variability, gene regulation and function can be studied in detail is missing. In this article we develop a model to explore how the sequence of the wild-type lac promoter dictates the fold-change in gene expression. The model combines single-base pair resolution maps of transcription factor and RNA polymerase binding energies with a comprehensive thermodynamic model of gene regulation. The model was validated by predicting and then measuring the variability of lac operon regulation in a collection of natural isolates. We then implement the model to analyze the sensitivity of the promoter sequence to the regulatory output, and predict the potential for regulation to evolve due to point mutations in the promoter region.
Dobata, Shigeto
2012-12-01
Policing against selfishness is now regarded as the main force maintaining cooperation, by reducing costly conflict in complex social systems. Although policing has been studied extensively in social insect colonies, its coevolution against selfishness has not been fully captured by previous theories. In this study, I developed a two-trait quantitative genetic model of the conflict between selfish immature females (usually larvae) and policing workers in eusocial Hymenoptera over the immatures' propensity to develop into new queens. This model allows for the analysis of coevolution between genomes expressed in immatures and workers that collectively determine the immatures' queen caste fate. The main prediction of the model is that a higher level of polyandry leads to a smaller fraction of queens produced among new females through caste fate policing. The other main prediction of the present model is that, as a result of arms race, caste fate policing by workers coevolves with exaggerated selfishness of the immatures achieving maximum potential to develop into queens. Moreover, the model can incorporate genetic correlation between traits, which has been largely unexplored in social evolution theory. This study highlights the importance of understanding social traits as influenced by the coevolution of conflicting genomes. © 2012 The Author. Evolution© 2012 The Society for the Study of Evolution.
Barría, Agustín; Christensen, Kris A.; Yoshida, Grazyella M.; Correa, Katharina; Jedlicki, Ana; Lhorente, Jean P.; Davidson, William S.; Yáñez, José M.
2018-01-01
Piscirickettsia salmonis is one of the main infectious diseases affecting coho salmon (Oncorhynchus kisutch) farming, and current treatments have been ineffective for the control of this disease. Genetic improvement for P. salmonis resistance has been proposed as a feasible alternative for the control of this infectious disease in farmed fish. Genotyping by sequencing (GBS) strategies allow genotyping of hundreds of individuals with thousands of single nucleotide polymorphisms (SNPs), which can be used to perform genome wide association studies (GWAS) and predict genetic values using genome-wide information. We used double-digest restriction-site associated DNA (ddRAD) sequencing to dissect the genetic architecture of resistance against P. salmonis in a farmed coho salmon population and to identify molecular markers associated with the trait. We also evaluated genomic selection (GS) models in order to determine the potential to accelerate the genetic improvement of this trait by means of using genome-wide molecular information. A total of 764 individuals from 33 full-sib families (17 highly resistant and 16 highly susceptible) were experimentally challenged against P. salmonis and their genotypes were assayed using ddRAD sequencing. A total of 9,389 SNPs markers were identified in the population. These markers were used to test genomic selection models and compare different GWAS methodologies for resistance measured as day of death (DD) and binary survival (BIN). Genomic selection models showed higher accuracies than the traditional pedigree-based best linear unbiased prediction (PBLUP) method, for both DD and BIN. The models showed an improvement of up to 95% and 155% respectively over PBLUP. One SNP related with B-cell development was identified as a potential functional candidate associated with resistance to P. salmonis defined as DD. PMID:29440129
Camarinha-Silva, Amelia; Maushammer, Maria; Wellmann, Robin; Vital, Marius; Preuss, Siegfried; Bennewitz, Jörn
2017-07-01
The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine. Copyright © 2017 by the Genetics Society of America.
Weather prediction using a genetic memory
NASA Technical Reports Server (NTRS)
Rogers, David
1990-01-01
Kanaerva's sparse distributed memory (SDM) is an associative memory model based on the mathematical properties of high dimensional binary address spaces. Holland's genetic algorithms are a search technique for high dimensional spaces inspired by evolutional processes of DNA. Genetic Memory is a hybrid of the above two systems, in which the memory uses a genetic algorithm to dynamically reconfigure its physical storage locations to reflect correlations between the stored addresses and data. This architecture is designed to maximize the ability of the system to scale-up to handle real world problems.
Identifying future models for delivering genetic services: a nominal group study in primary care
Elwyn, Glyn; Edwards, Adrian; Iredale, Rachel; Davies, Peter; Gray, Jonathon
2005-01-01
Background To enable primary care medical practitioners to generate a range of possible service delivery models for genetic counselling services and critically assess their suitability. Methods Modified nominal group technique using in primary care professional development workshops. Results 37 general practitioners in Wales, United Kingdom too part in the nominal group process. The practitioners who attended did not believe current systems were sufficient to meet anticipated demand for genetic services. A wide range of different service models was proposed, although no single option emerged as a clear preference. No argument was put forward for genetic assessment and counselling being central to family practice, neither was there a voice for the view that the family doctor should become skilled at advising patients about predictive genetic testing and be able to counsel patients about the wider implications of genetic testing for patients and their family members, even for areas such as common cancers. Nevertheless, all the preferred models put a high priority on providing the service in the community, and often co-located in primary care, by clinicians who had developed expertise. Conclusion There is a need for a wider debate about how healthcare systems address individual concerns about genetic concerns and risk, especially given the increasing commercial marketing of genetic tests. PMID:15831099
Dikmen, S; Cole, J B; Null, D J; Hansen, P J
2012-06-01
Genetic selection for body temperature during heat stress might be a useful approach to reduce the magnitude of heat stress effects on production and reproduction. Objectives of the study were to estimate the genetic parameters of rectal temperature (RT) in dairy cows in freestall barns under heat stress conditions and to determine the genetic and phenotypic correlations of rectal temperature with other traits. Afternoon RT were measured in a total of 1,695 lactating Holstein cows sired by 509 bulls during the summer in North Florida. Genetic parameters were estimated with Gibbs sampling, and best linear unbiased predictions of breeding values were predicted using an animal model. The heritability of RT was estimated to be 0.17 ± 0.13. Predicted transmitting abilities for rectal temperature changed 0.0068 ± 0.0020°C/yr from (birth year) 2002 to 2008. Approximate genetic correlations between RT and 305-d milk, fat, and protein yields, productive life, and net merit were significant and positive, whereas approximate genetic correlations between RT and somatic cell count score and daughter pregnancy rate were significant and negative. Rectal temperature during heat stress has moderate heritability, but genetic correlations with economically important traits mean that selection for RT could lead to lower productivity unless methods are used to identify genes affecting RT that do not adversely affect other traits of economic importance. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ghoochani, Omid M; Ghanian, Mansour; Baradaran, Masoud; Azadi, Hossein
2017-03-01
Organisms that have been genetically engineered and modified (GM) are referred to as genetically modified organisms (GMOs). Bt crops are plants that have been genetically modified to produce certain proteins from the soil bacteria Bacillus thuringiensis (Bt), which makes these plants resistant to certain lepidopteran and coleopteran species. Genetically Modified (GM) rice was produced in 2006 by Iranian researchers from Tarom Mowla'ii and has since been called 'Bt rice'. As rice is an important source of food for over 3 billion inhabitants on Earth, this study aims to use a correlational survey in order to shed light on the predicting factors relating to the extent of stakeholders' behavioral intentions towards Bt rice. It is assumed and the results confirm that "attitudes toward GM crops" can be used as a bridge in the Attitude Model and the Behavioral Intention Model in order to establish an integrated model. To this end, a case study was made of the Southwest part of Iran in order to verify this research model. This study also revealed that as a part of the integrated research framework in the Behavior Intention Model both constructs of attitude and the subjective norm of the respondents serve as the predicting factors of stakeholders' intentions of working with Bt rice. In addition, the Attitude Model, as the other part of the integrated research framework, showed that the stakeholders' attitudes toward Bt rice can only be determined by the perceived benefits (e.g. positive outcomes) of Bt rice.
Shields, B M; McDonald, T J; Ellard, S; Campbell, M J; Hyde, C; Hattersley, A T
2012-05-01
Diagnosing MODY is difficult. To date, selection for molecular genetic testing for MODY has used discrete cut-offs of limited clinical characteristics with varying sensitivity and specificity. We aimed to use multiple, weighted, clinical criteria to determine an individual's probability of having MODY, as a crucial tool for rational genetic testing. We developed prediction models using logistic regression on data from 1,191 patients with MODY (n = 594), type 1 diabetes (n = 278) and type 2 diabetes (n = 319). Model performance was assessed by receiver operating characteristic (ROC) curves, cross-validation and validation in a further 350 patients. The models defined an overall probability of MODY using a weighted combination of the most discriminative characteristics. For MODY, compared with type 1 diabetes, these were: lower HbA(1c), parent with diabetes, female sex and older age at diagnosis. MODY was discriminated from type 2 diabetes by: lower BMI, younger age at diagnosis, female sex, lower HbA(1c), parent with diabetes, and not being treated with oral hypoglycaemic agents or insulin. Both models showed excellent discrimination (c-statistic = 0.95 and 0.98, respectively), low rates of cross-validated misclassification (9.2% and 5.3%), and good performance on the external test dataset (c-statistic = 0.95 and 0.94). Using the optimal cut-offs, the probability models improved the sensitivity (91% vs 72%) and specificity (94% vs 91%) for identifying MODY compared with standard criteria of diagnosis <25 years and an affected parent. The models are now available online at www.diabetesgenes.org . We have developed clinical prediction models that calculate an individual's probability of having MODY. This allows an improved and more rational approach to determine who should have molecular genetic testing.
Family conflict interacts with genetic liability in predicting childhood and adolescent depression.
Rice, Frances; Harold, Gordon T; Shelton, Katherine H; Thapar, Anita
2006-07-01
To test for gene-environment interaction with depressive symptoms and family conflict. Specifically, to first examine whether the influence of family conflict in predicting depressive symptoms is increased in individuals at genetic risk of depression. Second, to test whether the genetic component of variance in depressive symptoms increases as levels of family conflict increase. A longitudinal twin design was used. Children ages 5 to 16 were reassessed approximately 3 years later to test whether the influence of family conflict in predicting depressive symptoms varied according to genetic liability. The conflict subscale of the Family Environment Scale was used to assess family conflict and the Mood and Feelings Questionnaire was used to assess depressive symptoms. The response rate to the questionnaire at time 1 was 73% and 65% at time 2. Controlling for initial symptoms levels (i.e., internalizing at time 1), primary analyses were conducted using ordinary least-squares multiple regression. Structural equation models, using raw score maximum likelihood estimation, were also fit to the data for the purpose of model fit comparison. Results suggested significant gene-environment interaction specifically with depressive symptoms and family conflict. Genetic factors were of greater importance in the etiology of depressive symptoms where levels of family conflict were high. The effects of family conflict on depressive symptoms were greater in children and adolescents at genetic risk of depression. The present results suggest that children with a family history of depression may be at an increased risk of developing depressive symptoms in response to family conflict. Intervention programs that incorporate one or more family systems may be of benefit in alleviating the adverse effect of negative family factors on children.
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Life-history and habitat features influence the within-river genetic structure of Atlantic salmon.
Vähä, Juha-Pekka; Erkinaro, Jaakko; Niemelä, Eero; Primmer, Craig R
2007-07-01
Defining populations and identifying ecological and life-history characteristics affecting genetic structure is important for understanding species biology and hence, for managing threatened or endangered species or populations. In this study, populations of the world's largest indigenous Atlantic salmon (Salmo salar) stock were first inferred using model-based clustering methods, following which life-history and habitat variables best predicting the genetic diversity of populations were identified. This study revealed that natal homing of Atlantic salmon within the Teno River system is accurate at least to the tributary level. Generally, defining populations by main tributaries was observed to be a reasonable approach in this large river system, whereas in the mainstem of the river, the number of inferred populations was fewer than the number of distinct sampling sites. Mainstem and headwater populations were genetically more diverse and less diverged, while each tributary fostered a distinct population with high genetic differentiation and lower genetic diversity. Population structure and variation in genetic diversity among populations were poorly explained by geographical distance. In contrast, age-structure, as estimated by the proportion of multisea-winter spawners, was the most predictive variable in explaining the variation in the genetic diversity of the populations. This observation, being in agreement with theoretical predictions, emphasizes the essence of large multisea-winter females in maintaining the genetic diversity of populations. In addition, the unique genetic diversity of populations, as estimated by private allele richness, was affected by the ease of accessibility of a site, with more difficult to access sites having lower unique genetic diversity. Our results show that despite this species' high capacity for migration, tributaries foster relatively closed populations with little gene flow which will be important to consider when developing management strategies for the system.
Altmann, Vivian; Schumacher-Schuh, Artur F; Rieck, Mariana; Callegari-Jacques, Sidia M; Rieder, Carlos R M; Hutz, Mara H
2016-04-01
Levodopa is first-line treatment of Parkinson's disease motor symptoms but, dose response is highly variable. Therefore, the aim of this study was to determine how much levodopa dose could be explained by biological, pharmacological and genetic factors. A total of 224 Parkinson's disease patients were genotyped for SV2C and SLC6A3 polymorphisms by allelic discrimination assays. Comedication, demographic and clinical data were also assessed. All variables with p < 0.20 were included in a multiple regression analysis for dose prediction. The final model explained 23% of dose variation (F = 11.54; p < 0.000001). Although a good prediction model was obtained, it still needs to be tested in an independent sample to be validated.
Privacy-preserving genomic testing in the clinic: a model using HIV treatment.
McLaren, Paul J; Raisaro, Jean Louis; Aouri, Manel; Rotger, Margalida; Ayday, Erman; Bartha, István; Delgado, Maria B; Vallet, Yannick; Günthard, Huldrych F; Cavassini, Matthias; Furrer, Hansjakob; Doco-Lecompte, Thanh; Marzolini, Catia; Schmid, Patrick; Di Benedetto, Caroline; Decosterd, Laurent A; Fellay, Jacques; Hubaux, Jean-Pierre; Telenti, Amalio
2016-08-01
The implementation of genomic-based medicine is hindered by unresolved questions regarding data privacy and delivery of interpreted results to health-care practitioners. We used DNA-based prediction of HIV-related outcomes as a model to explore critical issues in clinical genomics. We genotyped 4,149 markers in HIV-positive individuals. Variants allowed for prediction of 17 traits relevant to HIV medical care, inference of patient ancestry, and imputation of human leukocyte antigen (HLA) types. Genetic data were processed under a privacy-preserving framework using homomorphic encryption, and clinical reports describing potentially actionable results were delivered to health-care providers. A total of 230 patients were included in the study. We demonstrated the feasibility of encrypting a large number of genetic markers, inferring patient ancestry, computing monogenic and polygenic trait risks, and reporting results under privacy-preserving conditions. The average execution time of a multimarker test on encrypted data was 865 ms on a standard computer. The proportion of tests returning potentially actionable genetic results ranged from 0 to 54%. The model of implementation presented herein informs on strategies to deliver genomic test results for clinical care. Data encryption to ensure privacy helps to build patient trust, a key requirement on the road to genomic-based medicine.Genet Med 18 8, 814-822.
A statistical framework for genetic association studies of power curves in bird flight
Lin, Min; Zhao, Wei
2006-01-01
How the power required for bird flight varies as a function of forward speed can be used to predict the flight style and behavioral strategy of a bird for feeding and migration. A U-shaped curve was observed between the power and flight velocity in many birds, which is consistent to the theoretical prediction by aerodynamic models. In this article, we present a general genetic model for fine mapping of quantitative trait loci (QTL) responsible for power curves in a sample of birds drawn from a natural population. This model is developed within the maximum likelihood context, implemented with the EM algorithm for estimating the population genetic parameters of QTL and the simplex algorithm for estimating the QTL genotype-specific parameters of power curves. Using Monte Carlo simulation derived from empirical observations of power curves in the European starling (Sturnus vulgaris), we demonstrate how the underlying QTL for power curves can be detected from molecular markers and how the QTL detected affect the most appropriate flight speeds used to design an optimal migration strategy. The results from our model can be directly integrated into a conceptual framework for understanding flight origin and evolution. PMID:17066123
Effect of genetic polymorphisms on development of gout.
Urano, Wako; Taniguchi, Atsuo; Inoue, Eisuke; Sekita, Chieko; Ichikawa, Naomi; Koseki, Yumi; Kamatani, Naoyuki; Yamanaka, Hisashi
2013-08-01
To validate the association between genetic polymorphisms and gout in Japanese patients, and to investigate the cumulative effects of multiple genetic factors on the development of gout. Subjects were 153 Japanese male patients with gout and 532 male controls. The genotypes of 11 polymorphisms in the 10 genes that have been indicated to be associated with serum uric acid levels or gout were determined. The cumulative effects of the genetic polymorphisms were investigated using a weighted genotype risk score (wGRS) based on the number of risk alleles and the OR for gout. A model to discriminate between patients with gout and controls was constructed by incorporating the wGRS and clinical factors. C statistics method was applied to evaluate the capability of the model to discriminate gout patients from controls. Seven polymorphisms were shown to be associated with gout. The mean wGRS was significantly higher in patients with gout (15.2 ± 2.01) compared to controls (13.4 ± 2.10; p < 0.0001). The C statistic for the model using genetic information alone was 0.72, while the C statistic was 0.81 for the full model that incorporated all genetic and clinical factors. Accumulation of multiple genetic factors is associated with the development of gout. A prediction model for gout that incorporates genetic and clinical factors may be useful for identifying individuals who are at risk of gout.
NASA Astrophysics Data System (ADS)
Ji, Liang-Bo; Chen, Fang
2017-07-01
Numerical simulation and intelligent optimization technology were adopted for rolling and extrusion of zincked sheet. By response surface methodology (RSM), genetic algorithm (GA) and data processing technology, an efficient optimization of process parameters for rolling of zincked sheet was investigated. The influence trend of roller gap, rolling speed and friction factor effects on reduction rate and plate shortening rate were analyzed firstly. Then a predictive response surface model for comprehensive quality index of part was created using RSM. Simulated and predicted values were compared. Through genetic algorithm method, the optimal process parameters for the forming of rolling were solved. They were verified and the optimum process parameters of rolling were obtained. It is feasible and effective.
Radiogenomics to characterize regional genetic heterogeneity in glioblastoma
Hu, Leland S.; Ning, Shuluo; Eschbacher, Jennifer M.; Baxter, Leslie C.; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C.; Peng, Sen; Smith, Kris A.; Nakaji, Peter; Karis, John P.; Quarles, C. Chad; Wu, Teresa; Loftus, Joseph C.; Jenkins, Robert B.; Sicotte, Hugues; Kollmeyer, Thomas M.; O'Neill, Brian P.; Elmquist, William; Hoxworth, Joseph M.; Frakes, David; Sarkaria, Jann; Swanson, Kristin R.; Tran, Nhan L.; Li, Jing; Mitchell, J. Ross
2017-01-01
Background Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. Methods We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). Results We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). Conclusion MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. PMID:27502248
Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel
2016-02-03
Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to phenotypic variation for the three traits studied. Overall, the contribution of additive genetic variance to the total genetic variance was much greater than that of dominance variance. Our results show that all genomic regions are important for the prediction of the targeted traits, and the whole-genome approach was reaffirmed as the best tool for genome-enabled prediction of quantitative traits.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
Putting mechanisms into crop production models.
Boote, Kenneth J; Jones, James W; White, Jeffrey W; Asseng, Senthold; Lizaso, Jon I
2013-09-01
Crop growth models dynamically simulate processes of C, N and water balance on daily or hourly time-steps to predict crop growth and development and at season-end, final yield. Their ability to integrate effects of genetics, environment and crop management have led to applications ranging from understanding gene function to predicting potential impacts of climate change. The history of crop models is reviewed briefly, and their level of mechanistic detail for assimilation and respiration, ranging from hourly leaf-to-canopy assimilation to daily radiation-use efficiency is discussed. Crop models have improved steadily over the past 30-40 years, but much work remains. Improvements are needed for the prediction of transpiration response to elevated CO₂ and high temperature effects on phenology and reproductive fertility, and simulation of root growth and nutrient uptake under stressful edaphic conditions. Mechanistic improvements are needed to better connect crop growth to genetics and to soil fertility, soil waterlogging and pest damage. Because crop models integrate multiple processes and consider impacts of environment and management, they have excellent potential for linking research from genomics and allied disciplines to crop responses at the field scale, thus providing a valuable tool for deciphering genotype by environment by management effects. © 2013 John Wiley & Sons Ltd.
The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A.; Kim, Sungjoon; Wilson, Christopher J.; Lehár, Joseph; Kryukov, Gregory V.; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F.; Monahan, John E.; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A.; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H.; Cheng, Jill; Yu, Guoying K.; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D.; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C.; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P.; Gabriel, Stacey B.; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E.; Weber, Barbara L.; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L.; Meyerson, Matthew; Golub, Todd R.; Morrissey, Michael P.; Sellers, William R.; Schlegel, Robert; Garraway, Levi A.
2012-01-01
The systematic translation of cancer genomic data into knowledge of tumor biology and therapeutic avenues remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacologic annotation is available1. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacologic profiles for 24 anticancer drugs across 479 of the lines, this collection allowed identification of genetic, lineage, and gene expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Altogether, our results suggest that large, annotated cell line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of “personalized” therapeutic regimens2. PMID:22460905
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.
Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas; Venkatesan, Kavitha; Margolin, Adam A; Kim, Sungjoon; Wilson, Christopher J; Lehár, Joseph; Kryukov, Gregory V; Sonkin, Dmitriy; Reddy, Anupama; Liu, Manway; Murray, Lauren; Berger, Michael F; Monahan, John E; Morais, Paula; Meltzer, Jodi; Korejwa, Adam; Jané-Valbuena, Judit; Mapa, Felipa A; Thibault, Joseph; Bric-Furlong, Eva; Raman, Pichai; Shipway, Aaron; Engels, Ingo H; Cheng, Jill; Yu, Guoying K; Yu, Jianjun; Aspesi, Peter; de Silva, Melanie; Jagtap, Kalpana; Jones, Michael D; Wang, Li; Hatton, Charles; Palescandolo, Emanuele; Gupta, Supriya; Mahan, Scott; Sougnez, Carrie; Onofrio, Robert C; Liefeld, Ted; MacConaill, Laura; Winckler, Wendy; Reich, Michael; Li, Nanxin; Mesirov, Jill P; Gabriel, Stacey B; Getz, Gad; Ardlie, Kristin; Chan, Vivien; Myer, Vic E; Weber, Barbara L; Porter, Jeff; Warmuth, Markus; Finan, Peter; Harris, Jennifer L; Meyerson, Matthew; Golub, Todd R; Morrissey, Michael P; Sellers, William R; Schlegel, Robert; Garraway, Levi A
2012-03-28
The systematic translation of cancer genomic data into knowledge of tumour biology and therapeutic possibilities remains challenging. Such efforts should be greatly aided by robust preclinical model systems that reflect the genomic diversity of human cancers and for which detailed genetic and pharmacological annotation is available. Here we describe the Cancer Cell Line Encyclopedia (CCLE): a compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. When coupled with pharmacological profiles for 24 anticancer drugs across 479 of the cell lines, this collection allowed identification of genetic, lineage, and gene-expression-based predictors of drug sensitivity. In addition to known predictors, we found that plasma cell lineage correlated with sensitivity to IGF1 receptor inhibitors; AHR expression was associated with MEK inhibitor efficacy in NRAS-mutant lines; and SLFN11 expression predicted sensitivity to topoisomerase inhibitors. Together, our results indicate that large, annotated cell-line collections may help to enable preclinical stratification schemata for anticancer agents. The generation of genetic predictions of drug response in the preclinical setting and their incorporation into cancer clinical trial design could speed the emergence of 'personalized' therapeutic regimens.
Zhu, Jinning; Xu, Xuan; Tao, Qing; Yi, Panpan; Yu, Dan; Xu, Xinwei
2017-07-01
Ecological niche modeling is an effective tool to characterize the spatial distribution of suitable areas for species, and it is especially useful for predicting the potential distribution of invasive species. The widespread submerged plant Hydrilla verticillata (hydrilla) has an obvious phylogeographical pattern: Four genetic lineages occupy distinct regions in native range, and only one lineage invades the Americas. Here, we aimed to evaluate climatic niche conservatism of hydrilla in North America at the intraspecific level and explore its invasion potential in the Americas by comparing climatic niches in a phylogenetic context. Niche shift was found in the invasion process of hydrilla in North America, which is probably mainly attributed to high levels of somatic mutation. Dramatic changes in range expansion in the Americas were predicted in the situation of all four genetic lineages invading the Americas or future climatic changes, especially in South America; this suggests that there is a high invasion potential of hydrilla in the Americas. Our findings provide useful information for the management of hydrilla in the Americas and give an example of exploring intraspecific climatic niche to better understand species invasion.
Cryptic biodiversity loss linked to global climate change
NASA Astrophysics Data System (ADS)
Bálint, M.; Domisch, S.; Engelhardt, C. H. M.; Haase, P.; Lehrian, S.; Sauer, J.; Theissinger, K.; Pauls, S. U.; Nowak, C.
2011-09-01
Global climate change (GCC) significantly affects distributional patterns of organisms, and considerable impacts on biodiversity are predicted for the next decades. Inferred effects include large-scale range shifts towards higher altitudes and latitudes, facilitation of biological invasions and species extinctions. Alterations of biotic patterns caused by GCC have usually been predicted on the scale of taxonomically recognized morphospecies. However, the effects of climate change at the most fundamental level of biodiversity--intraspecific genetic diversity--remain elusive. Here we show that the use of morphospecies-based assessments of GCC effects will result in underestimations of the true scale of biodiversity loss. Species distribution modelling and assessments of mitochondrial DNA variability in nine montane aquatic insect species in Europe indicate that future range contractions will be accompanied by severe losses of cryptic evolutionary lineages and genetic diversity within these lineages. These losses greatly exceed those at the scale of morphospecies. We also document that the extent of range reduction may be a useful proxy when predicting losses of genetic diversity. Our results demonstrate that intraspecific patterns of genetic diversity should be considered when estimating the effects of climate change on biodiversity.
Isocost Lines Describe the Cellular Economy of Genetic Circuits
Gyorgy, Andras; Jiménez, José I.; Yazbek, John; Huang, Hsin-Ho; Chung, Hattie; Weiss, Ron; Del Vecchio, Domitilla
2015-01-01
Genetic circuits in living cells share transcriptional and translational resources that are available in limited amounts. This leads to unexpected couplings among seemingly unconnected modules, which result in poorly predictable circuit behavior. In this study, we determine these interdependencies between products of different genes by characterizing the economy of how transcriptional and translational resources are allocated to the production of proteins in genetic circuits. We discover that, when expressed from the same plasmid, the combinations of attainable protein concentrations are constrained by a linear relationship, which can be interpreted as an isocost line, a concept used in microeconomics. We created a library of circuits with two reporter genes, one constitutive and the other inducible in the same plasmid, without a regulatory path between them. In agreement with the model predictions, experiments reveal that the isocost line rotates when changing the ribosome binding site strength of the inducible gene and shifts when modifying the plasmid copy number. These results demonstrate that isocost lines can be employed to predict how genetic circuits become coupled when sharing resources and provide design guidelines for minimizing the effects of such couplings. PMID:26244745
Ergon, T; Ergon, R
2017-03-01
Genetic assimilation emerges from selection on phenotypic plasticity. Yet, commonly used quantitative genetics models of linear reaction norms considering intercept and slope as traits do not mimic the full process of genetic assimilation. We argue that intercept-slope reaction norm models are insufficient representations of genetic effects on linear reaction norms and that considering reaction norm intercept as a trait is unfortunate because the definition of this trait relates to a specific environmental value (zero) and confounds genetic effects on reaction norm elevation with genetic effects on environmental perception. Instead, we suggest a model with three traits representing genetic effects that, respectively, (i) are independent of the environment, (ii) alter the sensitivity of the phenotype to the environment and (iii) determine how the organism perceives the environment. The model predicts that, given sufficient additive genetic variation in environmental perception, the environmental value at which reaction norms tend to cross will respond rapidly to selection after an abrupt environmental change, and eventually becomes equal to the new mean environment. This readjustment of the zone of canalization becomes completed without changes in genetic correlations, genetic drift or imposing any fitness costs of maintaining plasticity. The asymptotic evolutionary outcome of this three-trait linear reaction norm generally entails a lower degree of phenotypic plasticity than the two-trait model, and maximum expected fitness does not occur at the mean trait values in the population. © 2016 The Authors. Journal of Evolutionary Biology published by John Wiley & Sons Ltd on behalf of European Society for Evolutionary Biology.
Comparison of Family History and SNPs for Predicting Risk of Complex Disease
Do, Chuong B.; Hinds, David A.; Francke, Uta; Eriksson, Nicholas
2012-01-01
The clinical utility of family history and genetic tests is generally well understood for simple Mendelian disorders and rare subforms of complex diseases that are directly attributable to highly penetrant genetic variants. However, little is presently known regarding the performance of these methods in situations where disease susceptibility depends on the cumulative contribution of multiple genetic factors of moderate or low penetrance. Using quantitative genetic theory, we develop a model for studying the predictive ability of family history and single nucleotide polymorphism (SNP)–based methods for assessing risk of polygenic disorders. We show that family history is most useful for highly common, heritable conditions (e.g., coronary artery disease), where it explains roughly 20%–30% of disease heritability, on par with the most successful SNP models based on associations discovered to date. In contrast, we find that for diseases of moderate or low frequency (e.g., Crohn disease) family history accounts for less than 4% of disease heritability, substantially lagging behind SNPs in almost all cases. These results indicate that, for a broad range of diseases, already identified SNP associations may be better predictors of risk than their family history–based counterparts, despite the large fraction of missing heritability that remains to be explained. Our model illustrates the difficulty of using either family history or SNPs for standalone disease prediction. On the other hand, we show that, unlike family history, SNP–based tests can reveal extreme likelihood ratios for a relatively large percentage of individuals, thus providing potentially valuable adjunctive evidence in a differential diagnosis. PMID:23071447
Predicting Hydrologic Function With Aquatic Gene Fragments
NASA Astrophysics Data System (ADS)
Good, S. P.; URycki, D. R.; Crump, B. C.
2018-03-01
Recent advances in microbiology techniques, such as genetic sequencing, allow for rapid and cost-effective collection of large quantities of genetic information carried within water samples. Here we posit that the unique composition of aquatic DNA material within a water sample contains relevant information about hydrologic function at multiple temporal scales. In this study, machine learning was used to develop discharge prediction models trained on the relative abundance of bacterial taxa classified into operational taxonomic units (OTUs) based on 16S rRNA gene sequences from six large arctic rivers. We term this approach "genohydrology," and show that OTU relative abundances can be used to predict river discharge at monthly and longer timescales. Based on a single DNA sample from each river, the average Nash-Sutcliffe efficiency (NSE) for predicted mean monthly discharge values throughout the year was 0.84, while the NSE for predicted discharge values across different return intervals was 0.67. These are considerable improvements over predictions based only on the area-scaled mean specific discharge of five similar rivers, which had average NSE values of 0.64 and -0.32 for seasonal and recurrence interval discharge values, respectively. The genohydrology approach demonstrates that genetic diversity within the aquatic microbiome is a large and underutilized data resource with benefits for prediction of hydrologic function.
The genetic architecture of maize height.
Peiffer, Jason A; Romay, Maria C; Gore, Michael A; Flint-Garcia, Sherry A; Zhang, Zhiwu; Millard, Mark J; Gardner, Candice A C; McMullen, Michael D; Holland, James B; Bradbury, Peter J; Buckler, Edward S
2014-04-01
Height is one of the most heritable and easily measured traits in maize (Zea mays L.). Given a pedigree or estimates of the genomic identity-by-state among related plants, height is also accurately predictable. But, mapping alleles explaining natural variation in maize height remains a formidable challenge. To address this challenge, we measured the plant height, ear height, flowering time, and node counts of plants grown in >64,500 plots across 13 environments. These plots contained >7300 inbreds representing most publically available maize inbreds in the United States and families of the maize Nested Association Mapping (NAM) panel. Joint-linkage mapping of quantitative trait loci (QTL), fine mapping in near isogenic lines (NILs), genome-wide association studies (GWAS), and genomic best linear unbiased prediction (GBLUP) were performed. The heritability of maize height was estimated to be >90%. Mapping NAM family-nested QTL revealed the largest explained 2.1 ± 0.9% of height variation. The effects of two tropical alleles at this QTL were independently validated by fine mapping in NIL families. Several significant associations found by GWAS colocalized with established height loci, including brassinosteroid-deficient dwarf1, dwarf plant1, and semi-dwarf2. GBLUP explained >80% of height variation in the panels and outperformed bootstrap aggregation of family-nested QTL models in evaluations of prediction accuracy. These results revealed maize height was under strong genetic control and had a highly polygenic genetic architecture. They also showed that multiple models of genetic architecture differing in polygenicity and effect sizes can plausibly explain a population's variation in maize height, but they may vary in predictive efficacy.
Dinov, Ivo D; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W; Price, Nathan D; Van Horn, John D; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M; Dauer, William; Toga, Arthur W
2016-01-01
A unique archive of Big Data on Parkinson's Disease is collected, managed and disseminated by the Parkinson's Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson's disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data-large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources-all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson's disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson's disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer's, Huntington's, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.
High school students' understanding and problem solving in population genetics
NASA Astrophysics Data System (ADS)
Soderberg, Patti D.
This study is an investigation of student understanding of population genetics and how students developed, used and revised conceptual models to solve problems. The students in this study participated in three rounds of problem solving. The first round involved the use of a population genetics model to predict the number of carriers in a population. The second round required them to revise their model of simple dominance population genetics to make inferences about populations containing three phenotype variations. The third round of problem solving required the students to revise their model of population genetics to explain anomalous data where the proportions of males and females with a trait varied significantly. As the students solved problems, they were involved in basic scientific processes as they observed population phenomena, constructed explanatory models to explain the data they observed, and attempted to persuade their peers as to the adequacy of their models. In this study, the students produced new knowledge about the genetics of a trait in a population through the revision and use of explanatory population genetics models using reasoning that was similar to what scientists do. The students learned, used and revised a model of Hardy-Weinberg equilibrium to generate and test hypotheses about the genetics of phenotypes given only population data. Students were also interviewed prior to and following instruction. This study suggests that a commonly held intuitive belief about the predominance of a dominant variation in populations is resistant to change, despite instruction and interferes with a student's ability to understand Hardy-Weinberg equilibrium and microevolution.
Devaux, C; Lavigne, C; Austerlitz, F; Klein, E K
2007-02-01
Understanding patterns of pollen movement at the landscape scale is important for establishing management rules following the release of genetically modified (GM) crops. We use here a mating model adapted to cultivated species to estimate dispersal kernels from the genotypes of the progenies of male-sterile plants positioned at different sampling sites within a 10 x 10-km oilseed rape production area. Half of the pollen clouds sampled by the male-sterile plants originated from uncharacterized pollen sources that could consist of both large volunteer and feral populations, and fields within and outside the study area. The geometric dispersal kernel was the most appropriate to predict pollen movement in the study area. It predicted a much larger proportion of long-distance pollination than previously fitted dispersal kernels. This best-fitting mating model underestimated the level of differentiation among pollen clouds but could predict its spatial structure. The estimation method was validated on simulated genotypic data, and proved to provide good estimates of both the shape of the dispersal kernel and the rate and composition of pollen issued from uncharacterized pollen sources. The best dispersal kernel fitted here, the geometric kernel, should now be integrated into models that aim at predicting gene flow at the landscape level, in particular between GM and non-GM crops.
NASA Astrophysics Data System (ADS)
Yasick, A. L.; Wolin, J. A.; Krebs, R. A.
2005-05-01
This study investigates two species of stoneflies with potentially opposing dispersal capabilities and genetic structure within four watersheds in the Lake Erie drainage system of Northeast Ohio. This research is two fold; it provides information on genetic variation of two understudied aquatic invertebrate species and the impact of human land-use practices on this variation. Populations of Allocapnia recta, a winter emerging stonefly are predicted to have the least genetic variation within the four watersheds and most differences among sites due to its rudimentary wing structure and winter emergence. Leuctra tenuis is predicted to have greater genetic variability within sites and fewer differences among sites because of its higher migration potential. In both species, models of isolation by distance will be tested. Distinct polymorphisms exist within the 16s rRNA region of A. recta suggesting that this fragment has sufficient variation to address these questions.
Ehret, A; Hochstuhl, D; Krattenmacher, N; Tetens, J; Klein, M S; Gronwald, W; Thaller, G
2015-01-01
Subclinical ketosis is one of the most prevalent metabolic disorders in high-producing dairy cows during early lactation. This renders its early detection and prevention important for both economical and animal-welfare reasons. Construction of reliable predictive models is challenging, because traits like ketosis are commonly affected by multiple factors. In this context, machine learning methods offer great advantages because of their universal learning ability and flexibility in integrating various sorts of data. Here, an artificial-neural-network approach was applied to investigate the utility of metabolic, genetic, and milk performance data for the prediction of milk levels of β-hydroxybutyrate within and across consecutive weeks postpartum. Data were collected from 218 dairy cows during their first 5wk in milk. All animals were genotyped with a 50,000 SNP panel, and weekly information on the concentrations of the milk metabolites glycerophosphocholine and phosphocholine as well as milk composition data (milk yield, fat and protein percentage) was available. The concentration of β-hydroxybutyric acid in milk was used as target variable in all prediction models. Average correlations between observed and predicted target values up to 0.643 could be obtained, if milk metabolite and routine milk recording data were combined for prediction at the same day within weeks. Predictive performance of metabolic as well as milk performance-based models was higher than that of models based on genetic information. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
TayyebTaher, M.; Esmaeilzadeh, S. Majid
2017-07-01
This article presents an application of Model Predictive Controller (MPC) to the attitude control of a geostationary flexible satellite. SIMO model has been used for the geostationary satellite, using the Lagrange equations. Flexibility is also included in the modelling equations. The state space equations are expressed in order to simplify the controller. Naturally there is no specific tuning rule to find the best parameters of an MPC controller which fits the desired controller. Being an intelligence method for optimizing problem, Genetic Algorithm has been used for optimizing the performance of MPC controller by tuning the controller parameter due to minimum rise time, settling time, overshoot of the target point of the flexible structure and its mode shape amplitudes to make large attitude maneuvers possible. The model included geosynchronous orbit environment and geostationary satellite parameters. The simulation results of the flexible satellite with attitude maneuver shows the efficiency of proposed optimization method in comparison with LQR optimal controller.
NASA Astrophysics Data System (ADS)
Hafner, Robert; Stewart, Jim
Past problem-solving research has provided a basis for helping students structure their knowledge and apply appropriate problem-solving strategies to solve problems for which their knowledge (or mental models) of scientific phenomena is adequate (model-using problem solving). This research examines how problem solving in the domain of Mendelian genetics proceeds in situations where solvers' mental models are insufficient to solve problems at hand (model-revising problem solving). Such situations require solvers to use existing models to recognize anomalous data and to revise those models to accommodate the data. The study was conducted in the context of 9-week high school genetics course and addressed: the heuristics charactenstic of successful model-revising problem solving: the nature of the model revisions, made by students as well as the nature of model development across problem types; and the basis upon which solvers decide that a revised model is sufficient (that t has both predictive and explanatory power).
Context-sensitive network-based disease genetics prediction and its implications in drug discovery.
Chen, Yang; Xu, Rong
2017-04-01
Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach ( p
Xu, Wumei; Liu, Lu; He, Tianhua; Cao, Min; Sha, Liqing; Hu, Yuehua; Li, Qiaoming; Li, Jie
2016-01-01
A negative species-genetic diversity correlation (SGDC) could be predicted by the niche variation hypothesis, whereby an increase in species diversity within community reduces the genetic diversity of the co-occurring species because of the reduction in average niche breadth; alternatively, competition could reduce effective population size and therefore genetic diversity of the species within community. We tested these predictions within a 20 ha tropical forest dynamics plot (FDP) in the Xishuangbanna tropical seasonal rainforest. We established 15 plots within the FDP and investigated the soil properties, tree diversity, and genetic diversity of a common tree species Beilschmiedia roxburghiana within each plot. We observed a significant negative correlation between tree diversity and the genetic diversity of B. roxburghiana within the communities. Using structural equation modeling, we further determined that the inter-plot environmental characteristics (soil pH and phosphorus availability) directly affected tree diversity and that the tree diversity within the community determined the genetic diversity of B. roxburghiana. Increased soil pH and phosphorus availability might promote the coexistence of more tree species within community and reduce genetic diversity of B. roxburghiana for the reduced average niche breadth; alternatively, competition could reduce effective population size and therefore genetic diversity of B. roxburghiana within community. PMID:26860815
Xu, Wumei; Liu, Lu; He, Tianhua; Cao, Min; Sha, Liqing; Hu, Yuehua; Li, Qiaoming; Li, Jie
2016-02-10
A negative species-genetic diversity correlation (SGDC) could be predicted by the niche variation hypothesis, whereby an increase in species diversity within community reduces the genetic diversity of the co-occurring species because of the reduction in average niche breadth; alternatively, competition could reduce effective population size and therefore genetic diversity of the species within community. We tested these predictions within a 20 ha tropical forest dynamics plot (FDP) in the Xishuangbanna tropical seasonal rainforest. We established 15 plots within the FDP and investigated the soil properties, tree diversity, and genetic diversity of a common tree species Beilschmiedia roxburghiana within each plot. We observed a significant negative correlation between tree diversity and the genetic diversity of B. roxburghiana within the communities. Using structural equation modeling, we further determined that the inter-plot environmental characteristics (soil pH and phosphorus availability) directly affected tree diversity and that the tree diversity within the community determined the genetic diversity of B. roxburghiana. Increased soil pH and phosphorus availability might promote the coexistence of more tree species within community and reduce genetic diversity of B. roxburghiana for the reduced average niche breadth; alternatively, competition could reduce effective population size and therefore genetic diversity of B. roxburghiana within community.
Tiezzi, F; de Los Campos, G; Parker Gaddis, K L; Maltecca, C
2017-03-01
Genotype by environment interaction (G × E) in dairy cattle productive traits has been shown to exist, but current genetic evaluation methods do not take this component into account. As several environmental descriptors (e.g., climate, farming system) are known to vary within the United States, not accounting for the G × E could lead to reranking of bulls and loss in genetic gain. Using test-day records on milk yield, somatic cell score, fat, and protein percentage from all over the United States, we computed within herd-year-season daughter yield deviations for 1,087 Holstein bulls and regressed them on genetic and environmental information to estimate variance components and to assess prediction accuracy. Genomic information was obtained from a 50k SNP marker panel. Environmental effect inputs included herd (160 levels), geographical region (7 levels), geographical location (2 variables), climate information (7 variables), and management conditions of the herds (16 total variables divided in 4 subgroups). For each set of environmental descriptors, environmental, genomic, and G × E components were sequentially fitted. Variance components estimates confirmed the presence of G × E on milk yield, with its effect being larger than main genetic effect and the environmental effect for some models. Conversely, G × E was moderate for somatic cell score and small for milk composition. Genotype by environment interaction, when included, partially eroded the genomic effect (as compared with the models where G × E was not included), suggesting that the genomic variance could at least in part be attributed to G × E not appropriately accounted for. Model predictive ability was assessed using 3 cross-validation schemes (new bulls, incomplete progeny test, and new environmental conditions), and performance was compared with a reference model including only the main genomic effect. In each scenario, at least 1 of the models including G × E was able to perform better than the reference model, although it was not possible to find the overall best-performing model that included the same set of environmental descriptors. In general, the methodology used is promising in accounting for G × E in genomic predictions, but challenges exist in identifying a unique set of covariates capable of describing the entire variety of environments. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Migault, Vincent; Pallas, Benoît; Costes, Evelyne
2016-01-01
In crops, optimizing target traits in breeding programs can be fostered by selecting appropriate combinations of architectural traits which determine light interception and carbon acquisition. In apple tree, architectural traits were observed to be under genetic control. However, architectural traits also result from many organogenetic and morphological processes interacting with the environment. The present study aimed at combining a FSPM built for apple tree, MAppleT, with genetic determinisms of architectural traits, previously described in a bi-parental population. We focused on parameters related to organogenesis (phyllochron and immediate branching) and morphogenesis processes (internode length and leaf area) during the first year of tree growth. Two independent datasets collected in 2004 and 2007 on 116 genotypes, issued from a 'Starkrimson' × 'Granny Smith' cross, were used. The phyllochron was estimated as a function of thermal time and sylleptic branching was modeled subsequently depending on phyllochron. From a genetic map built with SNPs, marker effects were estimated on four MAppleT parameters with rrBLUP, using 2007 data. These effects were then considered in MAppleT to simulate tree development in the two climatic conditions. The genome wide prediction model gave consistent estimations of parameter values with correlation coefficients between observed values and estimated values from SNP markers ranging from 0.79 to 0.96. However, the accuracy of the prediction model following cross validation schemas was lower. Three integrative traits (the number of leaves, trunk length, and number of sylleptic laterals) were considered for validating MAppleT simulations. In 2007 climatic conditions, simulated values were close to observations, highlighting the correct simulation of genetic variability. However, in 2004 conditions which were not used for model calibration, the simulations differed from observations. This study demonstrates the possibility of integrating genome-based information in a FSPM for a perennial fruit tree. It also showed that further improvements are required for improving the prediction ability. Especially temperature effect should be extended and other factors taken into account for modeling GxE interactions. Improvements could also be expected by considering larger populations and by testing other genome wide prediction models. Despite these limitations, this study opens new possibilities for supporting plant breeding by in silico evaluations of the impact of genotypic polymorphisms on plant integrative phenotypes.
Lee, Andrew J; Cunningham, Alex P; Tischkowitz, Marc; Simard, Jacques; Pharoah, Paul D; Easton, Douglas F; Antoniou, Antonis C
2016-12-01
The proliferation of gene panel testing precipitates the need for a breast cancer (BC) risk model that incorporates the effects of mutations in several genes and family history (FH). We extended the BOADICEA model to incorporate the effects of truncating variants in PALB2, CHEK2, and ATM. The BC incidence was modeled via the explicit effects of truncating variants in BRCA1/2, PALB2, CHEK2, and ATM and other unobserved genetic effects using segregation analysis methods. The predicted average BC risk by age 80 for an ATM mutation carrier is 28%, 30% for CHEK2, 50% for PALB2, and 74% for BRCA1 and BRCA2. However, the BC risks are predicted to increase with FH burden. In families with mutations, predicted risks for mutation-negative members depend on both FH and the specific mutation. The reduction in BC risk after negative predictive testing is greatest when a BRCA1 mutation is identified in the family, but for women whose relatives carry a CHEK2 or ATM mutation, the risks decrease slightly. The model may be a valuable tool for counseling women who have undergone gene panel testing for providing consistent risks and harmonizing their clinical management. A Web application can be used to obtain BC risks in clinical practice (http://ccge.medschl.cam.ac.uk/boadicea/).Genet Med 18 12, 1190-1198.
Dennis, N A; Stachowicz, K; Visser, B; Hely, F S; Berg, D K; Friggens, N C; Amer, P R; Meier, S; Burke, C R
2018-04-01
Fertility of the dairy cow relies on complex interactions between genetics, physiology, and management. Mathematical modeling can combine a range of information sources to facilitate informed predictions of cow fertility in scenarios that are difficult to evaluate empirically. We have developed a stochastic model that incorporates genetic and physiological data from more than 70 published reports on a wide range of fertility-related traits in dairy cattle. The model simulates pedigree, random mating, genetically correlated traits (in the form of breeding values for traits such as hours in estrus, estrous cycle length, age at puberty, milk yield, and so on), and interacting environmental variables. This model was used to generate a large simulated data set (200,000 cows replicated 100 times) of herd records within a seasonal dairy production system (based on an average New Zealand system). Using these simulated data, we investigated the genetic component of lifetime reproductive success (LRS), which, in reality, would be impractical to assess empirically. We defined LRS as the total number of times, during her lifetime, a cow calved within the first 42 d of the calving season. Sire estimated breeding values for LRS and other traits were calculated using simulated daughter records. Daughter pregnancy rate in the first lactation (PD_1) was the strongest single predictor of a sire's genetic merit for LRS (R 2 = 0.81). A simple predictive model containing PD_1, calving date for the second season and calving rate in the first season provided a good estimate of sire LRS (R 2 = 0.97). Daughters from sires with extremely high (n = 99,995 daughters, sire LRS = +0.70) or low (n = 99,635 daughters, sire LRS = -0.73) LRS estimated breeding values were compared over a single generation. Of the 14 underlying component traits of fertility, 12 were divergent between the 2 lines. This suggests that genetic variation in female fertility has a complex and multifactorial genetic basis. When simulated phenotypes were compared, daughters of the high LRS sires (HiFERT) reached puberty 44.5 d younger and calved ∼14 d younger at each parity than daughters from low LRS sires (LoFERT). Despite having a much lower genetic potential for milk production (-400 L/lactation) than LoFERT cows, HiFERT cows produced 33% more milk over their lifetime due to additional lactations before culling. In summary, this simulation model suggests that LRS contributes substantially to cow productivity, and novel selection criteria would facilitate a more accurate prediction at a younger age. The Authors. Published by FASS Inc. and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Vallejo, Roger L; Leeds, Timothy D; Gao, Guangtu; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Fragomeni, Breno O; Wiens, Gregory D; Palti, Yniv
2017-02-01
Previously, we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative that enables exploitation of within-family genetic variation. We compared three GS models [single-step genomic best linear unbiased prediction (ssGBLUP), weighted ssGBLUP (wssGBLUP), and BayesB] to predict genomic-enabled breeding values (GEBV) for BCWD resistance in a commercial rainbow trout population, and compared the accuracy of GEBV to traditional estimates of breeding values (EBV) from a pedigree-based BLUP (P-BLUP) model. We also assessed the impact of sampling design on the accuracy of GEBV predictions. For these comparisons, we used BCWD survival phenotypes recorded on 7893 fish from 102 families, of which 1473 fish from 50 families had genotypes [57 K single nucleotide polymorphism (SNP) array]. Naïve siblings of the training fish (n = 930 testing fish) were genotyped to predict their GEBV and mated to produce 138 progeny testing families. In the following generation, 9968 progeny were phenotyped to empirically assess the accuracy of GEBV predictions made on their non-phenotyped parents. The accuracy of GEBV from all tested GS models were substantially higher than the P-BLUP model EBV. The highest increase in accuracy relative to the P-BLUP model was achieved with BayesB (97.2 to 108.8%), followed by wssGBLUP at iteration 2 (94.4 to 97.1%) and 3 (88.9 to 91.2%) and ssGBLUP (83.3 to 85.3%). Reducing the training sample size to n = ~1000 had no negative impact on the accuracy (0.67 to 0.72), but with n = ~500 the accuracy dropped to 0.53 to 0.61 if the training and testing fish were full-sibs, and even substantially lower, to 0.22 to 0.25, when they were not full-sibs. Using progeny performance data, we showed that the accuracy of genomic predictions is substantially higher than estimates obtained from the traditional pedigree-based BLUP model for BCWD resistance. Overall, we found that using a much smaller training sample size compared to similar studies in livestock, GS can substantially improve the selection accuracy and genetic gains for this trait in a commercial rainbow trout breeding population.
Predicting evolutionary rescue via evolving plasticity in stochastic environments
Baskett, Marissa L.
2016-01-01
Phenotypic plasticity and its evolution may help evolutionary rescue in a novel and stressful environment, especially if environmental novelty reveals cryptic genetic variation that enables the evolution of increased plasticity. However, the environmental stochasticity ubiquitous in natural systems may alter these predictions, because high plasticity may amplify phenotype–environment mismatches. Although previous studies have highlighted this potential detrimental effect of plasticity in stochastic environments, they have not investigated how it affects extinction risk in the context of evolutionary rescue and with evolving plasticity. We investigate this question here by integrating stochastic demography with quantitative genetic theory in a model with simultaneous change in the mean and predictability (temporal autocorrelation) of the environment. We develop an approximate prediction of long-term persistence under the new pattern of environmental fluctuations, and compare it with numerical simulations for short- and long-term extinction risk. We find that reduced predictability increases extinction risk and reduces persistence because it increases stochastic load during rescue. This understanding of how stochastic demography, phenotypic plasticity, and evolution interact when evolution acts on cryptic genetic variation revealed in a novel environment can inform expectations for invasions, extinctions, or the emergence of chemical resistance in pests. PMID:27655762
Wildenhain, Jan; Spitzer, Michaela; Dolma, Sonam; Jarvik, Nick; White, Rachel; Roy, Marcia; Griffiths, Emma; Bellows, David S.; Wright, Gerard D.; Tyers, Mike
2016-01-01
The network structure of biological systems suggests that effective therapeutic intervention may require combinations of agents that act synergistically. However, a dearth of systematic chemical combination datasets have limited the development of predictive algorithms for chemical synergism. Here, we report two large datasets of linked chemical-genetic and chemical-chemical interactions in the budding yeast Saccharomyces cerevisiae. We screened 5,518 unique compounds against 242 diverse yeast gene deletion strains to generate an extended chemical-genetic matrix (CGM) of 492,126 chemical-gene interaction measurements. This CGM dataset contained 1,434 genotype-specific inhibitors, termed cryptagens. We selected 128 structurally diverse cryptagens and tested all pairwise combinations to generate a benchmark dataset of 8,128 pairwise chemical-chemical interaction tests for synergy prediction, termed the cryptagen matrix (CM). An accompanying database resource called ChemGRID was developed to enable analysis, visualisation and downloads of all data. The CGM and CM datasets will facilitate the benchmarking of computational approaches for synergy prediction, as well as chemical structure-activity relationship models for anti-fungal drug discovery. PMID:27874849
Is pigment patterning in fish skin determined by the Turing mechanism?
Watanabe, Masakatsu; Kondo, Shigeru
2015-02-01
More than half a century ago, Alan Turing postulated that pigment patterns may arise from a mechanism that could be mathematically modeled based on the diffusion of two substances that interact with each other. Over the past 15 years, the molecular and genetic tools to verify this prediction have become available. Here, we review experimental studies aimed at identifying the mechanism underlying pigment pattern formation in zebrafish. Extensive molecular genetic studies in this model organism have revealed the interactions between the pigment cells that are responsible for the patterns. The mechanism discovered is substantially different from that predicted by the mathematical model, but it retains the property of 'local activation and long-range inhibition', a necessary condition for Turing pattern formation. Although some of the molecular details of pattern formation remain to be elucidated, current evidence confirms that the underlying mechanism is mathematically equivalent to the Turing mechanism. Copyright © 2014 Elsevier Ltd. All rights reserved.
McGrath, Lauren M.; Braaten, Ellen B.; Doty, Nathan D.; Willoughby, Brian L.; Wilson, H. Kent; O’Donnell, Ellen H.; Colvin, Mary K.; Ditmars, Hillary L.; Blais, Jessica E.; Hill, Erin N.; Metzger, Aaron; Perlis, Roy H.; Willcutt, Erik G.; Smoller, Jordan W.; Waldman, Irwin D.; Faraone, Stephen V.; Seidman, Larry J.; Doyle, Alysa E.
2016-01-01
Background Evidence that different neuropsychiatric conditions share genetic liability has increased interest in phenotypes with ‘cross-disorder’ relevance, as they may contribute to revised models of psychopathology. Cognition is a promising construct for study; yet, evidence that the same cognitive functions are impaired across different forms of psychopathology comes primarily from separate studies of individual categorical diagnoses versus controls. Given growing support for dimensional models that cut across traditional diagnostic boundaries, we aimed to determine, within a single cohort, whether performance on measures of executive functions (EFs) predicted dimensions of different psychopathological conditions known to share genetic liability. Methods Data are from 393 participants, ages 8 to 17, consecutively enrolled in the Longitudinal Study of Genetic Influences on Cognition (LOGIC). This project is conducting deep phenotyping and genomic analyses in youth referred for neuropsychiatric evaluation. Using structural equation modeling, we examined whether EFs predicted variation in core dimensions of autism spectrum disorder, bipolar illness and schizophrenia, including social responsiveness, mania/emotion regulation, and positive symptoms of psychosis, respectively. Results We modeled three cognitive factors (working memory, shifting, and executive processing speed) that loaded on a second-order EF factor. The EF factor predicted variation in our three target traits but not in a negative control (somatization). Moreover, this EF factor was primarily associated with the overlapping (rather than unique) variance across the three outcome measures, suggesting it related to a general increase in psychopathology symptoms across those dimensions. Conclusions Findings extend support for the relevance of cognition to neuropsychiatric conditions that share underlying genetic risk. They suggest that higher-order cognition, including EFs, relate to the dimensional spectrum of each of these disorders and not just the clinical diagnoses. Moreover, results have implications for bottom-up models linking genes, cognition, and a general psychopathology liability. PMID:26411927
Genetic drift and selection in many-allele range expansions.
Weinstein, Bryan T; Lavrentovich, Maxim O; Möbius, Wolfram; Murray, Andrew W; Nelson, David R
2017-12-01
We experimentally and numerically investigate the evolutionary dynamics of four competing strains of E. coli with differing expansion velocities in radially expanding colonies. We compare experimental measurements of the average fraction, correlation functions between strains, and the relative rates of genetic domain wall annihilations and coalescences to simulations modeling the population as a one-dimensional ring of annihilating and coalescing random walkers with deterministic biases due to selection. The simulations reveal that the evolutionary dynamics can be collapsed onto master curves governed by three essential parameters: (1) an expansion length beyond which selection dominates over genetic drift; (2) a characteristic angular correlation describing the size of genetic domains; and (3) a dimensionless constant quantifying the interplay between a colony's curvature at the frontier and its selection length scale. We measure these parameters with a new technique that precisely measures small selective differences between spatially competing strains and show that our simulations accurately predict the dynamics without additional fitting. Our results suggest that the random walk model can act as a useful predictive tool for describing the evolutionary dynamics of range expansions composed of an arbitrary number of genotypes with different fitnesses.
Genetic and physiological bases for phenological responses to current and predicted climates
Wilczek, A. M.; Burghardt, L. T.; Cobb, A. R.; Cooper, M. D.; Welch, S. M.; Schmitt, J.
2010-01-01
We are now reaching the stage at which specific genetic factors with known physiological effects can be tied directly and quantitatively to variation in phenology. With such a mechanistic understanding, scientists can better predict phenological responses to novel seasonal climates. Using the widespread model species Arabidopsis thaliana, we explore how variation in different genetic pathways can be linked to phenology and life-history variation across geographical regions and seasons. We show that the expression of phenological traits including flowering depends critically on the growth season, and we outline an integrated life-history approach to phenology in which the timing of later life-history events can be contingent on the environmental cues regulating earlier life stages. As flowering time in many plants is determined by the integration of multiple environmentally sensitive gene pathways, the novel combinations of important seasonal cues in projected future climates will alter how phenology responds to variation in the flowering time gene network with important consequences for plant life history. We discuss how phenology models in other systems—both natural and agricultural—could employ a similar framework to explore the potential contribution of genetic variation to the physiological integration of cues determining phenology. PMID:20819808
Leve, Leslie D.; DeGarmo, David S.; Bridgett, David J.; Neiderhiser, Jenae M.; Shaw, Daniel S.; Harold, Gordon T.; Natsuaki, Misaki N.; Reiss, David
2012-01-01
Poor executive functioning has been implicated in children’s concurrent and future behavioral difficulties, making work aimed at understanding processes related to the development of early executive function (EF) critical for models of developmental psychopathology. Deficits in EF have been associated with adverse prenatal experiences, genetic influences, and temperament characteristics. However, our ability to disentangle the predictive and independent effects of these influences has been limited by a dearth of genetically-informed research designs that also consider prenatal influences. The present study examined EF and language development in a sample of 361 toddlers who were adopted at birth and reared in non-relative adoptive families. Predictors included genetic influences (as inherited from birth mothers), prenatal risk, and growth in child negative emotionality. Structural equation modeling indicated that the effect of prenatal risk on toddler effortful attention at age 27 months became nonsignificant once genetic influences were considered in the model. In addition, genetic influences had unique effects on toddler effortful attention. Latent growth modeling indicated that increases in toddler negative emotionality from 9 to 27 months were associated with poorer delay of gratification and poorer language development. Similar results were obtained in models incorporating birth father data. Mechanisms of intergenerational transmission of EF deficits are discussed. PMID:22799580
Leve, Leslie D; DeGarmo, David S; Bridgett, David J; Neiderhiser, Jenae M; Shaw, Daniel S; Harold, Gordon T; Natsuaki, Misaki N; Reiss, David
2013-06-01
Poor executive functioning has been implicated in children's concurrent and future behavioral difficulties, making work aimed at understanding processes related to the development of early executive function (EF) critical for models of developmental psychopathology. Deficits in EF have been associated with adverse prenatal experiences, genetic influences, and temperament characteristics. However, our ability to disentangle the predictive and independent effects of these influences has been limited by a dearth of genetically informed research designs that also consider prenatal influences. The present study examined EF and language development in a sample of 361 toddlers who were adopted at birth and reared in nonrelative adoptive families. Predictors included genetic influences (as inherited from birth mothers), prenatal risk, and growth in child negative emotionality. Structural equation modeling indicated that the effect of prenatal risk on toddler effortful attention at age 27 months became nonsignificant once genetic influences were considered in the model. In addition, genetic influences had unique effects on toddler effortful attention. Latent growth modeling indicated that increases in toddler negative emotionality from 9 to 27 months were associated with poorer delay of gratification and poorer language development. Similar results were obtained in models incorporating birth father data. Mechanisms of intergenerational transmission of EF deficits are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved
Identification of landscape features influencing gene flow: How useful are habitat selection models?
Roffler, Gretchen H.; Schwartz, Michael K.; Pilgrim, Kristy L.; Talbot, Sandra L.; Sage, Kevin; Adams, Layne G.; Luikart, Gordon
2016-01-01
Understanding how dispersal patterns are influenced by landscape heterogeneity is critical for modeling species connectivity. Resource selection function (RSF) models are increasingly used in landscape genetics approaches. However, because the ecological factors that drive habitat selection may be different from those influencing dispersal and gene flow, it is important to consider explicit assumptions and spatial scales of measurement. We calculated pairwise genetic distance among 301 Dall's sheep (Ovis dalli dalli) in southcentral Alaska using an intensive noninvasive sampling effort and 15 microsatellite loci. We used multiple regression of distance matrices to assess the correlation of pairwise genetic distance and landscape resistance derived from an RSF, and combinations of landscape features hypothesized to influence dispersal. Dall's sheep gene flow was positively correlated with steep slopes, moderate peak normalized difference vegetation indices (NDVI), and open land cover. Whereas RSF covariates were significant in predicting genetic distance, the RSF model itself was not significantly correlated with Dall's sheep gene flow, suggesting that certain habitat features important during summer (rugged terrain, mid-range elevation) were not influential to effective dispersal. This work underscores that consideration of both habitat selection and landscape genetics models may be useful in developing management strategies to both meet the immediate survival of a species and allow for long-term genetic connectivity.
The long-term evolution of multilocus traits under frequency-dependent disruptive selection.
van Doorn, G Sander; Dieckmann, Ulf
2006-11-01
Frequency-dependent disruptive selection is widely recognized as an important source of genetic variation. Its evolutionary consequences have been extensively studied using phenotypic evolutionary models, based on quantitative genetics, game theory, or adaptive dynamics. However, the genetic assumptions underlying these approaches are highly idealized and, even worse, predict different consequences of frequency-dependent disruptive selection. Population genetic models, by contrast, enable genotypic evolutionary models, but traditionally assume constant fitness values. Only a minority of these models thus addresses frequency-dependent selection, and only a few of these do so in a multilocus context. An inherent limitation of these remaining studies is that they only investigate the short-term maintenance of genetic variation. Consequently, the long-term evolution of multilocus characters under frequency-dependent disruptive selection remains poorly understood. We aim to bridge this gap between phenotypic and genotypic models by studying a multilocus version of Levene's soft-selection model. Individual-based simulations and deterministic approximations based on adaptive dynamics theory provide insights into the underlying evolutionary dynamics. Our analysis uncovers a general pattern of polymorphism formation and collapse, likely to apply to a wide variety of genetic systems: after convergence to a fitness minimum and the subsequent establishment of genetic polymorphism at multiple loci, genetic variation becomes increasingly concentrated on a few loci, until eventually only a single polymorphic locus remains. This evolutionary process combines features observed in quantitative genetics and adaptive dynamics models, and it can be explained as a consequence of changes in the selection regime that are inherent to frequency-dependent disruptive selection. Our findings demonstrate that the potential of frequency-dependent disruptive selection to maintain polygenic variation is considerably smaller than previously expected.
Young, Emma F; Belchier, Mark; Hauser, Lorenz; Horsburgh, Gavin J; Meredith, Michael P; Murphy, Eugene J; Pascoal, Sonia; Rock, Jennifer; Tysklind, Niklas; Carvalho, Gary R
2015-06-01
Understanding the key drivers of population connectivity in the marine environment is essential for the effective management of natural resources. Although several different approaches to evaluating connectivity have been used, they are rarely integrated quantitatively. Here, we use a 'seascape genetics' approach, by combining oceanographic modelling and microsatellite analyses, to understand the dominant influences on the population genetic structure of two Antarctic fishes with contrasting life histories, Champsocephalus gunnari and Notothenia rossii. The close accord between the model projections and empirical genetic structure demonstrated that passive dispersal during the planktonic early life stages is the dominant influence on patterns and extent of genetic structuring in both species. The shorter planktonic phase of C. gunnari restricts direct transport of larvae between distant populations, leading to stronger regional differentiation. By contrast, geographic distance did not affect differentiation in N. rossii, whose longer larval period promotes long-distance dispersal. Interannual variability in oceanographic flows strongly influenced the projected genetic structure, suggesting that shifts in circulation patterns due to climate change are likely to impact future genetic connectivity and opportunities for local adaptation, resilience and recovery from perturbations. Further development of realistic climate models is required to fully assess such potential impacts.
NASA Astrophysics Data System (ADS)
Kashid, Satishkumar S.; Maity, Rajib
2012-08-01
SummaryPrediction of Indian Summer Monsoon Rainfall (ISMR) is of vital importance for Indian economy, and it has been remained a great challenge for hydro-meteorologists due to inherent complexities in the climatic systems. The Large-scale atmospheric circulation patterns from tropical Pacific Ocean (ENSO) and those from tropical Indian Ocean (EQUINOO) are established to influence the Indian Summer Monsoon Rainfall. The information of these two large scale atmospheric circulation patterns in terms of their indices is used to model the complex relationship between Indian Summer Monsoon Rainfall and the ENSO as well as EQUINOO indices. However, extracting the signal from such large-scale indices for modeling such complex systems is significantly difficult. Rainfall predictions have been done for 'All India' as one unit, as well as for five 'homogeneous monsoon regions of India', defined by Indian Institute of Tropical Meteorology. Recent 'Artificial Intelligence' tool 'Genetic Programming' (GP) has been employed for modeling such problem. The Genetic Programming approach is found to capture the complex relationship between the monthly Indian Summer Monsoon Rainfall and large scale atmospheric circulation pattern indices - ENSO and EQUINOO. Research findings of this study indicate that GP-derived monthly rainfall forecasting models, that use large-scale atmospheric circulation information are successful in prediction of All India Summer Monsoon Rainfall with correlation coefficient as good as 0.866, which may appears attractive for such a complex system. A separate analysis is carried out for All India Summer Monsoon rainfall for India as one unit, and five homogeneous monsoon regions, based on ENSO and EQUINOO indices of months of March, April and May only, performed at end of month of May. In this case, All India Summer Monsoon Rainfall could be predicted with 0.70 as correlation coefficient with somewhat lesser Correlation Coefficient (C.C.) values for different 'homogeneous monsoon regions'.
Cheng, Jun-Hu; Sun, Da-Wen; Pu, Hongbin
2016-04-15
The potential use of feature wavelengths for predicting drip loss in grass carp fish, as affected by being frozen at -20°C for 24 h and thawed at 4°C for 1, 2, 4, and 6 days, was investigated. Hyperspectral images of frozen-thawed fish were obtained and their corresponding spectra were extracted. Least-squares support vector machine and multiple linear regression (MLR) models were established using five key wavelengths, selected by combining a genetic algorithm and successive projections algorithm, and this showed satisfactory performance in drip loss prediction. The MLR model with a determination coefficient of prediction (R(2)P) of 0.9258, and lower root mean square error estimated by a prediction (RMSEP) of 1.12%, was applied to transfer each pixel of the image and generate the distribution maps of exudation changes. The results confirmed that it is feasible to identify the feature wavelengths using variable selection methods and chemometric analysis for developing on-line multispectral imaging. Copyright © 2015 Elsevier Ltd. All rights reserved.
A synthetic genetic edge detection program.
Tabor, Jeffrey J; Salis, Howard M; Simpson, Zachary Booth; Chevalier, Aaron A; Levskaya, Anselm; Marcotte, Edward M; Voigt, Christopher A; Ellington, Andrew D
2009-06-26
Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E. coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks.
Defining the consequences of genetic variation on a proteome–wide scale
Chick, Joel M.; Munger, Steven C.; Simecek, Petr; Huttlin, Edward L.; Choi, Kwangbom; Gatti, Daniel M.; Raghupathy, Narayanan; Svenson, Karen L.; Churchill, Gary A.; Gygi, Steven P.
2016-01-01
Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass spectrometry-based method for protein quantification with an emerging outbred mouse model containing extensive genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in livers from 192 Diversity outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein–protein interactions. Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort of collaborative cross mice. PMID:27309819
A Synthetic Genetic Edge Detection Program
Tabor, Jeffrey J.; Salis, Howard; Simpson, Zachary B.; Chevalier, Aaron A.; Levskaya, Anselm; Marcotte, Edward M.; Voigt, Christopher A.; Ellington, Andrew D.
2009-01-01
Summary Edge detection is a signal processing algorithm common in artificial intelligence and image recognition programs. We have constructed a genetically encoded edge detection algorithm that programs an isogenic community of E.coli to sense an image of light, communicate to identify the light-dark edges, and visually present the result of the computation. The algorithm is implemented using multiple genetic circuits. An engineered light sensor enables cells to distinguish between light and dark regions. In the dark, cells produce a diffusible chemical signal that diffuses into light regions. Genetic logic gates are used so that only cells that sense light and the diffusible signal produce a positive output. A mathematical model constructed from first principles and parameterized with experimental measurements of the component circuits predicts the performance of the complete program. Quantitatively accurate models will facilitate the engineering of more complex biological behaviors and inform bottom-up studies of natural genetic regulatory networks. PMID:19563759
Quantifying and predicting Drosophila larvae crawling phenotypes
NASA Astrophysics Data System (ADS)
Günther, Maximilian N.; Nettesheim, Guilherme; Shubeita, George T.
2016-06-01
The fruit fly Drosophila melanogaster is a widely used model for cell biology, development, disease, and neuroscience. The fly’s power as a genetic model for disease and neuroscience can be augmented by a quantitative description of its behavior. Here we show that we can accurately account for the complex and unique crawling patterns exhibited by individual Drosophila larvae using a small set of four parameters obtained from the trajectories of a few crawling larvae. The values of these parameters change for larvae from different genetic mutants, as we demonstrate for fly models of Alzheimer’s disease and the Fragile X syndrome, allowing applications such as genetic or drug screens. Using the quantitative model of larval crawling developed here we use the mutant-specific parameters to robustly simulate larval crawling, which allows estimating the feasibility of laborious experimental assays and aids in their design.
A Computational Workflow for the Automated Generation of Models of Genetic Designs.
Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil
2018-06-05
Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.
A quantitative test of population genetics using spatiogenetic patterns in bacterial colonies.
Korolev, Kirill S; Xavier, João B; Nelson, David R; Foster, Kevin R
2011-10-01
It is widely accepted that population-genetics theory is the cornerstone of evolutionary analyses. Empirical tests of the theory, however, are challenging because of the complex relationships between space, dispersal, and evolution. Critically, we lack quantitative validation of the spatial models of population genetics. Here we combine analytics, on- and off-lattice simulations, and experiments with bacteria to perform quantitative tests of the theory. We study two bacterial species, the gut microbe Escherichia coli and the opportunistic pathogen Pseudomonas aeruginosa, and show that spatiogenetic patterns in colony biofilms of both species are accurately described by an extension of the one-dimensional stepping-stone model. We use one empirical measure, genetic diversity at the colony periphery, to parameterize our models and show that we can then accurately predict another key variable: the degree of short-range cell migration along an edge. Moreover, the model allows us to estimate other key parameters, including effective population size (density) at the expansion frontier. While our experimental system is a simplification of natural microbial community, we argue that it constitutes proof of principle that the spatial models of population genetics can quantitatively capture organismal evolution.
A Geographically Explicit Genetic Model of Worldwide Human-Settlement History
Liu, Hua; Prugnolle, Franck; Manica, Andrea; Balloux, François
2006-01-01
Currently available genetic and archaeological evidence is generally interpreted as supportive of a recent single origin of modern humans in East Africa. However, this is where the near consensus on human settlement history ends, and considerable uncertainty clouds any more detailed aspect of human colonization history. Here, we present a dynamic genetic model of human settlement history coupled with explicit geographical distances from East Africa, the likely origin of modern humans. We search for the best-supported parameter space by fitting our analytical prediction to genetic data that are based on 52 human populations analyzed at 783 autosomal microsatellite markers. This framework allows us to jointly estimate the key parameters of the expansion of modern humans. Our best estimates suggest an initial expansion of modern humans ∼56,000 years ago from a small founding population of ∼1,000 effective individuals. Our model further points to high growth rates in newly colonized habitats. The general fit of the model with the data is excellent. This suggests that coupling analytical genetic models with explicit demography and geography provides a powerful tool for making inferences on human-settlement history. PMID:16826514
Simulation, prediction, and genetic analyses of daily methane emissions in dairy cattle.
Yin, T; Pinent, T; Brügemann, K; Simianer, H; König, S
2015-08-01
This study presents an approach combining phenotypes from novel traits, deterministic equations from cattle nutrition, and stochastic simulation techniques from animal breeding to generate test-day methane emissions (MEm) of dairy cows. Data included test-day production traits (milk yield, fat percentage, protein percentage, milk urea nitrogen), conformation traits (wither height, hip width, body condition score), female fertility traits (days open, calving interval, stillbirth), and health traits (clinical mastitis) from 961 first lactation Brown Swiss cows kept on 41 low-input farms in Switzerland. Test-day MEm were predicted based on the traits from the current data set and 2 deterministic prediction equations, resulting in the traits labeled MEm1 and MEm2. Stochastic simulations were used to assign individual concentrate intake in dependency of farm-type specifications (requirement when calculating MEm2). Genetic parameters for MEm1 and MEm2 were estimated using random regression models. Predicted MEm had moderate heritabilities over lactation and ranged from 0.15 to 0.37, with highest heritabilities around DIM 100. Genetic correlations between MEm1 and MEm2 ranged between 0.91 and 0.94. Antagonistic genetic correlations in the range from 0.70 to 0.92 were found for the associations between MEm2 and milk yield. Genetic correlations between MEm with days open and with calving interval increased from 0.10 at the beginning to 0.90 at the end of lactation. Genetic relationships between MEm2 and stillbirth were negative (0 to -0.24) from the beginning to the peak phase of lactation. Positive genetic relationships in the range from 0.02 to 0.49 were found between MEm2 with clinical mastitis. Interpretation of genetic (co)variance components should also consider the limitations when using data generated by prediction equations. Prediction functions only describe that part of MEm which is dependent on the factors and effects included in the function. With high probability, there are more important effects contributing to variations of MEm that are not explained or are independent from these functions. Furthermore, autocorrelations exist between indicator traits and predicted MEm. Nevertheless, this integrative approach, combining information from dairy cattle nutrition with dairy cattle genetics, generated novel traits which are difficult to record on a large scale. The simulated data basis for MEm was used to determine the size of a cow calibration group for genomic selection. A calibration group including 2,581 cows with MEm phenotypes was competitive with conventional breeding strategies. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Greenberg, Marisa; Smith, Rachel A.
2016-01-01
Genetic test results reveal not only personal information about a person’s likelihood of certain medical conditions but also information about their genetic relatives (Annas, Glantz, & Roche, 1995). Given the familial nature of genetic information, one’s obligation to protect family members may be a motive for disclosing genetic test results, but this claim has not been methodically tested. Existing models of disclosure decision-making presume self-interested motives, such as seeking social support, instead of other-interested motives, like familial obligation. This study investigated young adults’ (N = 173) motives to share a genetic-based health condition, alpha-1 antitrypsin deficiency, after reading a hypothetical vignette. Results show that social support and familial obligation were both reported as motives for disclosure. In fact, some participants reported familial obligation as their primary motivator for disclosure. Finally, stronger familial obligation predicted increased likelihood of disclosing hypothetical genetic test results. Implications of these results were discussed in reference to theories of disclosure decision-making models and the practice of genetic disclosures. PMID:26507777
Characterizing the genetic structure of a forensic DNA database using a latent variable approach.
Kruijver, Maarten
2016-07-01
Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Qi, Haishan; Lv, Mengmeng; Song, Kejing; Wen, Jianping
2017-05-01
Herein, the hyper-producing strain for ascomycin was engineered based on 13 C-labeling experiments and elementary flux modes analysis (EFMA). First, the metabolism of non-model organism Streptomyces hygroscopicus var. ascomyceticus SA68 was investigated and an updated network model was reconstructed using 13 C- metabolic flux analysis. Based on the precise model, EFMA was further employed to predict genetic targets for higher ascomycin production. Chorismatase (FkbO) and pyruvate carboxylase (Pyc) were predicted as the promising overexpression and deletion targets, respectively. The corresponding mutant TD-FkbO and TD-ΔPyc exhibited the consistency effects between model prediction and experimental results. Finally, the combined genetic manipulations were performed, achieving a high-yield ascomycin engineering strain TD-ΔPyc-FkbO with production up to 610 mg/L, 84.8% improvement compared with the parent strain SA68. These results manifested that the integration of 13 C-labeling experiments and in silico pathway analysis could serve as a promising concept to enhance ascomycin production, as well as other valuable products. Biotechnol. Bioeng. 2017;114: 1036-1044. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Stochastic many-body problems in ecology, evolution, neuroscience, and systems biology
NASA Astrophysics Data System (ADS)
Butler, Thomas C.
Using the tools of many-body theory, I analyze problems in four different areas of biology dominated by strong fluctuations: The evolutionary history of the genetic code, spatiotemporal pattern formation in ecology, spatiotemporal pattern formation in neuroscience and the robustness of a model circadian rhythm circuit in systems biology. In the first two research chapters, I demonstrate that the genetic code is extremely optimal (in the sense that it manages the effects of point mutations or mistranslations efficiently), more than an order of magnitude beyond what was previously thought. I further show that the structure of the genetic code implies that early proteins were probably only loosely defined. Both the nature of early proteins and the extreme optimality of the genetic code are interpreted in light of recent theory [1] as evidence that the evolution of the genetic code was driven by evolutionary dynamics that were dominated by horizontal gene transfer. I then explore the optimality of a proposed precursor to the genetic code. The results show that the precursor code has only limited optimality, which is interpreted as evidence that the precursor emerged prior to translation, or else never existed. In the next part of the dissertation, I introduce a many-body formalism for reaction-diffusion systems described at the mesoscopic scale with master equations. I first apply this formalism to spatially-extended predator-prey ecosystems, resulting in the prediction that many-body correlations and fluctuations drive population cycles in time, called quasicycles. Most of these results were previously known, but were derived using the system size expansion [2, 3]. I next apply the analytical techniques developed in the study of quasi-cycles to a simple model of Turing patterns in a predator-prey ecosystem. This analysis shows that fluctuations drive the formation of a new kind of spatiotemporal pattern formation that I name "quasi-patterns." These quasi-patterns exist over a much larger range of physically accessible parameters than the patterns predicted in mean field theory and therefore account for the apparent observations in ecology of patterns in regimes where Turing patterns do not occur. I further show that quasi-patterns have statistical properties that allow them to be distinguished empirically from mean field Turing patterns. I next analyze a model of visual cortex in the brain that has striking similarities to the activator-inhibitor model of ecosystem quasi-pattern formation. Through analysis of the resulting phase diagram, I show that the architecture of the neural network in the visual cortex is configured to make the visual cortex robust to unwanted internally generated spatial structure that interferes with normal visual function. I also predict that some geometric visual hallucinations are quasi-patterns and that the visual cortex supports a new phase of spatially scale invariant behavior present far from criticality. In the final chapter, I explore the effects of fluctuations on cycles in systems biology, specifically the pervasive phenomenon of circadian rhythms. By exploring the behavior of a generic stochastic model of circadian rhythms, I show that the circadian rhythm circuit exploits leaky mRNA production to safeguard the cycle from failure. I also show that this safeguard mechanism is highly robust to changes in the rate of leaky mRNA production. Finally, I explore the failure of the deterministic model in two different contexts, one where the deterministic model predicts cycles where they do not exist, and another context in which cycles are not predicted by the deterministic model.
Prospects for Genomic Selection in Cassava Breeding.
Wolfe, Marnin D; Del Carpio, Dunia Pino; Alabi, Olumide; Ezenwaka, Lydia C; Ikeogu, Ugochukwu N; Kayondo, Ismail S; Lozano, Roberto; Okeke, Uche G; Ozimati, Alfred A; Williams, Esuma; Egesi, Chiedozie; Kawuki, Robert S; Kulakow, Peter; Rabbi, Ismail Y; Jannink, Jean-Luc
2017-11-01
Cassava ( Crantz) is a clonally propagated staple food crop in the tropics. Genomic selection (GS) has been implemented at three breeding institutions in Africa to reduce cycle times. Initial studies provided promising estimates of predictive abilities. Here, we expand on previous analyses by assessing the accuracy of seven prediction models for seven traits in three prediction scenarios: cross-validation within populations, cross-population prediction and cross-generation prediction. We also evaluated the impact of increasing the training population (TP) size by phenotyping progenies selected either at random or with a genetic algorithm. Cross-validation results were mostly consistent across programs, with nonadditive models predicting of 10% better on average. Cross-population accuracy was generally low (mean = 0.18) but prediction of cassava mosaic disease increased up to 57% in one Nigerian population when data from another related population were combined. Accuracy across generations was poorer than within-generation accuracy, as expected, but accuracy for dry matter content and mosaic disease severity should be sufficient for rapid-cycling GS. Selection of a prediction model made some difference across generations, but increasing TP size was more important. With a genetic algorithm, selection of one-third of progeny could achieve an accuracy equivalent to phenotyping all progeny. We are in the early stages of GS for this crop but the results are promising for some traits. General guidelines that are emerging are that TPs need to continue to grow but phenotyping can be done on a cleverly selected subset of individuals, reducing the overall phenotyping burden. Copyright © 2017 Crop Science Society of America.
Prediction of first episode of panic attack among white-collar workers.
Watanabe, Akira; Nakao, Kazuhisa; Tokuyama, Madoka; Takeda, Masatoshi
2005-04-01
The purpose of the present study was to elucidate a longitudinal matrix of the etiology for first-episode panic attack among white-collar workers. A path model was designed for this purpose. A 5-year, open-cohort study was carried out in a Japanese company. To evaluate the risk factors associated with the onset of a first episode of panic attack, the odds ratios of a new episode of panic attack were calculated by logistic regression. The path model contained five predictor variables: gender difference, overprotection, neuroticism, lifetime history of major depression, and recent stressful life events. The logistic regression analysis indicated that a person with a lifetime history of major depression and recent stressful life events had a fivefold and a threefold higher risk of panic attacks at follow up, respectively. The path model for the prediction of a first episode of panic attack fitted the data well. However, this model presented low accountability for the variance in the ultimate dependent variables, the first episode of panic attack. Three predictors (neuroticism, lifetime history of major depression, and recent stressful life events) had a direct effect on the risk for a first episode of panic attack, whereas gender difference and overprotection had no direct effect. The present model could not fully predict first episodes of panic attack in white-collar workers. To make a path model for the prediction of the first episode of panic attack, other strong predictor variables, which were not surveyed in the present study, are needed. It is suggested that genetic variables are among the other strong predictor variables. A new path model containing genetic variables (e.g. family history etc.) will be needed to predict the first episode of panic attack.
Walling, Craig A; Morrissey, Michael B; Foerster, Katharina; Clutton-Brock, Tim H; Pemberton, Josephine M; Kruuk, Loeske E B
2014-12-01
Evolutionary theory predicts that genetic constraints should be widespread, but empirical support for their existence is surprisingly rare. Commonly applied univariate and bivariate approaches to detecting genetic constraints can underestimate their prevalence, with important aspects potentially tractable only within a multivariate framework. However, multivariate genetic analyses of data from natural populations are challenging because of modest sample sizes, incomplete pedigrees, and missing data. Here we present results from a study of a comprehensive set of life history traits (juvenile survival, age at first breeding, annual fecundity, and longevity) for both males and females in a wild, pedigreed, population of red deer (Cervus elaphus). We use factor analytic modeling of the genetic variance-covariance matrix ( G: ) to reduce the dimensionality of the problem and take a multivariate approach to estimating genetic constraints. We consider a range of metrics designed to assess the effect of G: on the deflection of a predicted response to selection away from the direction of fastest adaptation and on the evolvability of the traits. We found limited support for genetic constraint through genetic covariances between traits, both within sex and between sexes. We discuss these results with respect to other recent findings and to the problems of estimating these parameters for natural populations. Copyright © 2014 Walling et al.
Walling, Craig A.; Morrissey, Michael B.; Foerster, Katharina; Clutton-Brock, Tim H.; Pemberton, Josephine M.; Kruuk, Loeske E. B.
2014-01-01
Evolutionary theory predicts that genetic constraints should be widespread, but empirical support for their existence is surprisingly rare. Commonly applied univariate and bivariate approaches to detecting genetic constraints can underestimate their prevalence, with important aspects potentially tractable only within a multivariate framework. However, multivariate genetic analyses of data from natural populations are challenging because of modest sample sizes, incomplete pedigrees, and missing data. Here we present results from a study of a comprehensive set of life history traits (juvenile survival, age at first breeding, annual fecundity, and longevity) for both males and females in a wild, pedigreed, population of red deer (Cervus elaphus). We use factor analytic modeling of the genetic variance–covariance matrix (G) to reduce the dimensionality of the problem and take a multivariate approach to estimating genetic constraints. We consider a range of metrics designed to assess the effect of G on the deflection of a predicted response to selection away from the direction of fastest adaptation and on the evolvability of the traits. We found limited support for genetic constraint through genetic covariances between traits, both within sex and between sexes. We discuss these results with respect to other recent findings and to the problems of estimating these parameters for natural populations. PMID:25278555
Heinrich, Angela; Müller, Kathrin U; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Bromberg, Uli; Büchel, Christian; Conrod, Patricia; Fauth-Bühler, Mira; Papadopoulos, Dimitri; Gallinat, Jürgen; Garavan, Hugh; Gowland, Penny; Heinz, Andreas; Ittermann, Bernd; Mann, Karl; Martinot, Jean-Luc; Paus, Tomáš; Pausova, Zdenka; Smolka, Michael; Ströhle, Andreas; Rietschel, Marcella; Flor, Herta; Schumann, Gunter; Nees, Frauke
2016-07-01
Adolescence is a time that can set the course of alcohol abuse later in life. Sensitivity to reward on multiple levels is a major factor in this development. We examined 736 adolescents from the IMAGEN longitudinal study for alcohol drinking during early (mean age=14.37) and again later (mean age=16.45) adolescence. Conducting structural equation modeling we evaluated the contribution of reward-related personality traits, behavior, brain responses and candidate genes. Personality seems to be most important in explaining alcohol drinking in early adolescence. However, genetic variations in ANKK1 (rs1800497) and HOMER1 (rs7713917) play an equal role in predicting alcohol drinking two years later and are most important in predicting the increase in alcohol consumption. We hypothesize that the initiation of alcohol use may be driven more strongly by personality while the transition to increased alcohol use is more genetically influenced. Copyright © 2016 Elsevier B.V. All rights reserved.
A test of the facultative calibration/reactive heritability model of extraversion
Haysom, Hannah J.; Mitchem, Dorian G.; Lee, Anthony J.; Wright, Margaret J.; Martin, Nicholas G.; Keller, Matthew C.; Zietsch, Brendan P.
2015-01-01
A model proposed by Lukaszewski and Roney (2011) suggests that each individual’s level of extraversion is calibrated to other traits that predict the success of an extraverted behavioural strategy. Under ‘facultative calibration’, extraversion is not directly heritable, but rather exhibits heritability through its calibration to directly heritable traits (“reactive heritability”). The current study uses biometrical modelling of 1659 identical and non-identical twins and their siblings to assess whether the genetic variation in extraversion is calibrated to variation in facial attractiveness, intelligence, height in men and body mass index (BMI) in women. Extraversion was significantly positively correlated with facial attractiveness in both males (r=.11) and females (r=.18), but correlations between extraversion and the other variables were not consistent with predictions. Further, twin modelling revealed that the genetic variation in facial attractiveness did not account for a substantial proportion of the variation in extraversion in either males (2.4%) or females (0.5%). PMID:26880866
Genotype-phenotype association study via new multi-task learning model
Huo, Zhouyuan; Shen, Dinggang
2018-01-01
Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896
The reality and importance of founder speciation in evolution.
Templeton, Alan R
2008-05-01
A founder event occurs when a new population is established from a small number of individuals drawn from a large ancestral population. Mayr proposed that genetic drift in an isolated founder population could alter the selective forces in an epistatic system, an observation supported by recent studies. Carson argued that a period of relaxed selection could occur when a founder population is in an open ecological niche, allowing rapid population growth after the founder event. Selectable genetic variation can actually increase during this founder-flush phase due to recombination, enhanced survival of advantageous mutations, and the conversion of non-additive genetic variance into additive variance in an epistatic system, another empirically confirmed prediction. Templeton combined the theories of Mayr and Carson with population genetic models to predict the conditions under which founder events can contribute to speciation, and these predictions are strongly confirmed by the empirical literature. Much of the criticism of founder speciation is based upon equating founder speciation to an adaptive peak shift opposed by selection. However, Mayr, Carson and Templeton all modeled a positive interaction of selection and drift, and Templeton showed that founder speciation is incompatible with peak-shift conditions. Although rare, founder speciation can have a disproportionate importance in adaptive innovation and radiation, and examples are given to show that "rare" does not mean "unimportant" in evolution. Founder speciation also interacts with other speciation mechanisms such that a speciation event is not a one-dimensional process due to either selection alone or drift alone. (c) 2008 Wiley Periodicals, Inc.
Assessing Predictive Properties of Genome-Wide Selection in Soybeans
Xavier, Alencar; Muir, William M.; Rainey, Katy Martin
2016-01-01
Many economically important traits in plant breeding have low heritability or are difficult to measure. For these traits, genomic selection has attractive features and may boost genetic gains. Our goal was to evaluate alternative scenarios to implement genomic selection for yield components in soybean (Glycine max L. merr). We used a nested association panel with cross validation to evaluate the impacts of training population size, genotyping density, and prediction model on the accuracy of genomic prediction. Our results indicate that training population size was the factor most relevant to improvement in genome-wide prediction, with greatest improvement observed in training sets up to 2000 individuals. We discuss assumptions that influence the choice of the prediction model. Although alternative models had minor impacts on prediction accuracy, the most robust prediction model was the combination of reproducing kernel Hilbert space regression and BayesB. Higher genotyping density marginally improved accuracy. Our study finds that breeding programs seeking efficient genomic selection in soybeans would best allocate resources by investing in a representative training set. PMID:27317786
Gómez Fernández, María Jimena; Boston, Emma S M; Gaggiotti, Oscar E; Kittlein, Marcelo J; Mirol, Patricia M
2016-12-01
In this study we combine information from landscape characteristics, demographic inference and species distribution modelling to identify environmental factors that shape the genetic distribution of the fossorial rodent Ctenomys. We sequenced the mtDNA control region and amplified 12 microsatellites from 27 populations distributed across the Iberá wetland ecosystem. Hierarchical Bayesian modelling was used to construct phylogenies and estimate divergence times. We developed species distribution models to determine what climatic variables and soil parameters predicted species presence by comparing the current to the historic and predicted future distribution of the species. Finally, we explore the impact of environmental variables on the genetic structure of Ctenomys based on current and past species distributions. The variables that consistently correlated with the predicted distribution of the species and explained the observed genetic differentiation among populations included the distribution of well-drained sandy soils and temperature seasonality. A core region of stable suitable habitat was identified from the Last Interglacial, which is projected to remain stable into the future. This region is also the most genetically diverse and is currently under strong anthropogenic pressure. Results reveal complex demographic dynamics, which have been in constant change in both time and space, and are likely linked to the evolution of the Paraná River. We suggest that any alteration of soil properties (climatic or anthropic) may significantly impact the availability of suitable habitat and consequently the ability of individuals to disperse. The protection of this core stable habitat is of prime importance given the increasing levels of human disturbance across this wetland system and the threat of climate change.
Cecchinato, A; De Marchi, M; Gallo, L; Bittante, G; Carnier, P
2009-10-01
The aims of this study were to investigate variation of milk coagulation property (MCP) measures and their predictions obtained by mid-infrared spectroscopy (MIR), to investigate the genetic relationship between measures of MCP and MIR predictions, and to estimate the expected response from a breeding program focusing on the enhancement of MCP using MIR predictions as indicator traits. Individual milk samples were collected from 1,200 Brown Swiss cows (progeny of 50 artificial insemination sires) reared in 30 herds located in northern Italy. Rennet coagulation time (RCT, min) and curd firmness (a(30), mm) were measured using a computerized renneting meter. The MIR data were recorded over the spectral range of 4,000 to 900 cm(-1). Prediction models for RCT and a(30) based on MIR spectra were developed using partial least squares regression. A cross-validation procedure was carried out. The procedure involved the partition of available data into 2 subsets: a calibration subset and a test subset. The calibration subset was used to develop a calibration equation able to predict individual MCP phenotypes using MIR spectra. The test subset was used to validate the calibration equation and to estimate heritabilities and genetic correlations for measured MCP and their predictions obtained from MIR spectra and the calibration equation. Point estimates of heritability ranged from 0.30 to 0.34 and from 0.22 to 0.24 for RCT and a(30), respectively. Heritability estimates for MCP predictions were larger than those obtained for measured MCP. Estimated genetic correlations between measures and predictions of RCT were very high and ranged from 0.91 to 0.96. Estimates of the genetic correlation between measures and predictions of a(30) were large and ranged from 0.71 to 0.87. Predictions of MCP provided by MIR techniques can be proposed as indicator traits for the genetic enhancement of MCP. The expected response of RCT and a(30) ensured by the selection using MIR predictions as indicator traits was equal to or slightly less than the response achievable through a single measurement of these traits. Breeding strategies for the enhancement of MCP based on MIR predictions as indicator traits could be easily and immediately implemented for dairy cattle populations where routine acquisition of spectra from individual milk samples is already performed.
Memory Resilience to Alzheimer's Genetic Risk: Sex Effects in Predictor Profiles.
McDermott, Kirstie L; McFall, G Peggy; Andrews, Shea J; Anstey, Kaarin J; Dixon, Roger A
2017-10-01
Apolipoprotein E (APOE) ɛ4 and Clusterin (CLU) C alleles are risk factors for Alzheimer's disease (AD) and episodic memory (EM) decline. Memory resilience occurs when genetically at-risk adults perform at high and sustained levels. We investigated whether (a) memory resilience to AD genetic risk is predicted by biological and other risk markers and (b) the prediction profiles vary by sex and AD risk variant. Using a longitudinal sample of nondemented adults (n = 642, aged 53-95) we focused on memory resilience (over 9 years) to 2 AD risk variants (APOE, CLU). Growth mixture models classified resilience. Random forest analysis, stratified by sex, tested the predictive importance of 22 nongenetic risk factors from 5 domains (n = 24-112). For both sexes, younger age, higher education, stronger grip, and everyday novel cognitive activity predicted memory resilience. For women, 9 factors from functional, health, mobility, and lifestyle domains were also predictive. For men, only fewer depressive symptoms was an additional important predictor. The prediction profiles were similar for APOE and CLU. Although several factors predicted resilience in both sexes, a greater number applied only to women. Sex-specific mechanisms and intervention targets are implied. © The Author 2016. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Flow discharge prediction in compound channels using linear genetic programming
NASA Astrophysics Data System (ADS)
Azamathulla, H. Md.; Zahiri, A.
2012-08-01
SummaryFlow discharge determination in rivers is one of the key elements in mathematical modelling in the design of river engineering projects. Because of the inundation of floodplains and sudden changes in river geometry, flow resistance equations are not applicable for compound channels. Therefore, many approaches have been developed for modification of flow discharge computations. Most of these methods have satisfactory results only in laboratory flumes. Due to the ability to model complex phenomena, the artificial intelligence methods have recently been employed for wide applications in various fields of water engineering. Linear genetic programming (LGP), a branch of artificial intelligence methods, is able to optimise the model structure and its components and to derive an explicit equation based on the variables of the phenomena. In this paper, a precise dimensionless equation has been derived for prediction of flood discharge using LGP. The proposed model was developed using published data compiled for stage-discharge data sets for 394 laboratories, and field of 30 compound channels. The results indicate that the LGP model has a better performance than the existing models.
Docherty, A R; Moscati, A; Peterson, R; Edwards, A C; Adkins, D E; Bacanu, S A; Bigdeli, T B; Webb, B T; Flint, J; Kendler, K S
2016-10-25
Biometrical genetic studies suggest that the personality dimensions, including neuroticism, are moderately heritable (~0.4 to 0.6). Quantitative analyses that aggregate the effects of many common variants have recently further informed genetic research on European samples. However, there has been limited research to date on non-European populations. This study examined the personality dimensions in a large sample of Han Chinese descent (N=10 064) from the China, Oxford, and VCU Experimental Research on Genetic Epidemiology study, aimed at identifying genetic risk factors for recurrent major depression among a rigorously ascertained cohort. Heritability of neuroticism as measured by the Eysenck Personality Questionnaire (EPQ) was estimated to be low but statistically significant at 10% (s.e.=0.03, P=0.0001). In addition to EPQ, neuroticism based on a three-factor model, data for the Big Five (BF) personality dimensions (neuroticism, openness, conscientiousness, extraversion and agreeableness) measured by the Big Five Inventory were available for controls (n=5596). Heritability estimates of the BF were not statistically significant despite high power (>0.85) to detect heritabilities of 0.10. Polygenic risk scores constructed by best linear unbiased prediction weights applied to split-half samples failed to significantly predict any of the personality traits, but polygenic risk for neuroticism, calculated with LDpred and based on predictive variants previously identified from European populations (N=171 911), significantly predicted major depressive disorder case-control status (P=0.0004) after false discovery rate correction. The scores also significantly predicted EPQ neuroticism (P=6.3 × 10 -6 ). Factor analytic results of the measures indicated that any differences in heritabilities across samples may be due to genetic variation or variation in haplotype structure between samples, rather than measurement non-invariance. Findings demonstrate that neuroticism can be significantly predicted across ancestry, and highlight the importance of studying polygenic contributions to personality in non-European populations.
Bittante, G; Ferragina, A; Cipolat-Gotet, C; Cecchinato, A
2014-10-01
Cheese yield is an important technological trait in the dairy industry. The aim of this study was to infer the genetic parameters of some cheese yield-related traits predicted using Fourier-transform infrared (FTIR) spectral analysis and compare the results with those obtained using an individual model cheese-producing procedure. A total of 1,264 model cheeses were produced using 1,500-mL milk samples collected from individual Brown Swiss cows, and individual measurements were taken for 10 traits: 3 cheese yield traits (fresh curd, curd total solids, and curd water as a percent of the weight of the processed milk), 4 milk nutrient recovery traits (fat, protein, total solids, and energy of the curd as a percent of the same nutrient in the processed milk), and 3 daily cheese production traits per cow (fresh curd, total solids, and water weight of the curd). Each unprocessed milk sample was analyzed using a MilkoScan FT6000 (Foss, Hillerød, Denmark) over the spectral range, from 5,000 to 900 wavenumber × cm(-1). The FTIR spectrum-based prediction models for the previously mentioned traits were developed using modified partial least-square regression. Cross-validation of the whole data set yielded coefficients of determination between the predicted and measured values in cross-validation of 0.65 to 0.95 for all traits, except for the recovery of fat (0.41). A 3-fold external validation was also used, in which the available data were partitioned into 2 subsets: a training set (one-third of the herds) and a testing set (two-thirds). The training set was used to develop calibration equations, whereas the testing subsets were used for external validation of the calibration equations and to estimate the heritabilities and genetic correlations of the measured and FTIR-predicted phenotypes. The coefficients of determination between the predicted and measured values in cross-validation results obtained from the training sets were very similar to those obtained from the whole data set, but the coefficient of determination of validation values for the external validation sets were much lower for all traits (0.30 to 0.73), and particularly for fat recovery (0.05 to 0.18), for the training sets compared with the full data set. For each testing subset, the (co)variance components for the measured and FTIR-predicted phenotypes were estimated using bivariate Bayesian analyses and linear models. The intraherd heritabilities for the predicted traits obtained from our internal cross-validation using the whole data set ranged from 0.085 for daily yield of curd solids to 0.576 for protein recovery, and were similar to those obtained from the measured traits (0.079 to 0.586, respectively). The heritabilities estimated from the testing data set used for external validation were more variable but similar (on average) to the corresponding values obtained from the whole data set. Moreover, the genetic correlations between the predicted and measured traits were high in general (0.791 to 0.996), and they were always higher than the corresponding phenotypic correlations (0.383 to 0.995), especially for the external validation subset. In conclusion, we herein report that application of the cross-validation technique to the whole data set tended to overestimate the predictive ability of FTIR spectra, give more precise phenotypic predictions than the calibrations obtained using smaller data sets, and yield genetic correlations similar to those obtained from the measured traits. Collectively, our findings indicate that FTIR predictions have the potential to be used as indicator traits for the rapid and inexpensive selection of dairy populations for improvement of cheese yield, milk nutrient recovery in curd, and daily cheese production per cow. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Barker, Brittany S.; Rodríguez-Robles, Javier A.; Cook, Joseph A.
2014-01-01
The effects of late Quaternary climate on distributions and evolutionary dynamics of insular species are poorly understood in most tropical archipelagoes. We used ecological niche models under past and current climate to derive hypotheses regarding how stable climatic conditions shaped genetic diversity in two ecologically distinctive frogs in Puerto Rico. Whereas the Mountain Coquí, Eleutherodactylus portoricensis, is restricted to montane forest in the Cayey and Luquillo Mountains, the Red-eyed Coquí, E. antillensis, is a habitat generalist distributed across the entire Puerto Rican Bank (Puerto Rico and the Virgin Islands, excluding St. Croix). To test our hypotheses, we conducted phylogeographic and population genetic analyses based on mitochondrial and nuclear loci of each species across their range in Puerto Rico. Patterns of population differentiation in E. portoricensis, but not in E. antillensis, supported our hypotheses. For E. portoricensis, these patterns include: individuals isolated by long-term unsuitable climate in the Río Grande de Loíza Basin in eastern Puerto Rico belong to different genetic clusters; past and current climate strongly predicted genetic differentiation; and Cayey and Luquillo Mountains populations split prior to the last interglacial. For E. antillensis, these patterns include: genetic clusters did not fully correspond to predicted long-term unsuitable climate; and past and current climate weakly predicted patterns of genetic differentiation. Genetic signatures in E. antillensis are consistent with a recent range expansion into western Puerto Rico, possibly resulting from climate change and anthropogenic influences. As predicted, regions with a large area of long-term suitable climate were associated with higher genetic diversity in both species, suggesting larger and more stable populations. Finally, we discussed the implications of our findings for developing evidence-based management decisions for E. portoricensis, a taxon of special concern. Our findings illustrate the role of persistent suitable climatic conditions in promoting the persistence and diversification of tropical island organisms. PMID:26508809
Barker, Brittany S; Rodríguez-Robles, Javier A; Cook, Joseph A
2015-08-01
The effects of late Quaternary climate on distributions and evolutionary dynamics of insular species are poorly understood in most tropical archipelagoes. We used ecological niche models under past and current climate to derive hypotheses regarding how stable climatic conditions shaped genetic diversity in two ecologically distinctive frogs in Puerto Rico. Whereas the Mountain Coquí, Eleutherodactylus portoricensis , is restricted to montane forest in the Cayey and Luquillo Mountains, the Red-eyed Coquí, E. antillensis , is a habitat generalist distributed across the entire Puerto Rican Bank (Puerto Rico and the Virgin Islands, excluding St. Croix). To test our hypotheses, we conducted phylogeographic and population genetic analyses based on mitochondrial and nuclear loci of each species across their range in Puerto Rico. Patterns of population differentiation in E. portoricensis , but not in E. antillensis , supported our hypotheses. For E. portoricensis , these patterns include: individuals isolated by long-term unsuitable climate in the Río Grande de Loíza Basin in eastern Puerto Rico belong to different genetic clusters; past and current climate strongly predicted genetic differentiation; and Cayey and Luquillo Mountains populations split prior to the last interglacial. For E. antillensis , these patterns include: genetic clusters did not fully correspond to predicted long-term unsuitable climate; and past and current climate weakly predicted patterns of genetic differentiation. Genetic signatures in E. antillensis are consistent with a recent range expansion into western Puerto Rico, possibly resulting from climate change and anthropogenic influences. As predicted, regions with a large area of long-term suitable climate were associated with higher genetic diversity in both species, suggesting larger and more stable populations. Finally, we discussed the implications of our findings for developing evidence-based management decisions for E. portoricensis , a taxon of special concern. Our findings illustrate the role of persistent suitable climatic conditions in promoting the persistence and diversification of tropical island organisms.
Goel, Purva; Bapat, Sanket; Vyas, Renu; Tambe, Amruta; Tambe, Sanjeev S
2015-11-13
The development of quantitative structure-retention relationships (QSRR) aims at constructing an appropriate linear/nonlinear model for the prediction of the retention behavior (such as Kovats retention index) of a solute on a chromatographic column. Commonly, multi-linear regression and artificial neural networks are used in the QSRR development in the gas chromatography (GC). In this study, an artificial intelligence based data-driven modeling formalism, namely genetic programming (GP), has been introduced for the development of quantitative structure based models predicting Kovats retention indices (KRI). The novelty of the GP formalism is that given an example dataset, it searches and optimizes both the form (structure) and the parameters of an appropriate linear/nonlinear data-fitting model. Thus, it is not necessary to pre-specify the form of the data-fitting model in the GP-based modeling. These models are also less complex, simple to understand, and easy to deploy. The effectiveness of GP in constructing QSRRs has been demonstrated by developing models predicting KRIs of light hydrocarbons (case study-I) and adamantane derivatives (case study-II). In each case study, two-, three- and four-descriptor models have been developed using the KRI data available in the literature. The results of these studies clearly indicate that the GP-based models possess an excellent KRI prediction accuracy and generalization capability. Specifically, the best performing four-descriptor models in both the case studies have yielded high (>0.9) values of the coefficient of determination (R(2)) and low values of root mean squared error (RMSE) and mean absolute percent error (MAPE) for training, test and validation set data. The characteristic feature of this study is that it introduces a practical and an effective GP-based method for developing QSRRs in gas chromatography that can be gainfully utilized for developing other types of data-driven models in chromatography science. Copyright © 2015 Elsevier B.V. All rights reserved.
VTE Risk assessment - a prognostic Model: BATER Cohort Study of young women.
Heinemann, Lothar Aj; Dominh, Thai; Assmann, Anita; Schramm, Wolfgang; Schürmann, Rolf; Hilpert, Jan; Spannagl, Michael
2005-04-18
BACKGROUND: Community-based cohort studies are not available that evaluated the predictive power of both clinical and genetic risk factors for venous thromboembolism (VTE). There is, however, clinical need to forecast the likelihood of future occurrence of VTE, at least qualitatively, to support decisions about intensity of diagnostic or preventive measures. MATERIALS AND METHODS: A 10-year observation period of the Bavarian Thromboembolic Risk (BATER) study, a cohort study of 4337 women (18-55 years), was used to develop a predictive model of VTE based on clinical and genetic variables at baseline (1993). The objective was to prepare a probabilistic scheme that discriminates women with virtually no VTE risk from those at higher levels of absolute VTE risk in the foreseeable future. A multivariate analysis determined which variables at baseline were the best predictors of a future VTE event, provided a ranking according to the predictive power, and permitted to design a simple graphic scheme to assess the individual VTE risk using five predictor variables. RESULTS: Thirty-four new confirmed VTEs occurred during the observation period of over 32,000 women-years (WYs). A model was developed mainly based on clinical information (personal history of previous VTE and family history of VTE, age, BMI) and one composite genetic risk markers (combining Factor V Leiden and Prothrombin G20210A Mutation). Four levels of increasing VTE risk were arbitrarily defined to map the prevalence in the study population: No/low risk of VTE (61.3%), moderate risk (21.1%), high risk (6.0%), very high risk of future VTE (0.9%). In 10.6% of the population the risk assessment was not possible due to lacking VTE cases. The average incidence rates for VTE in these four levels were: 4.1, 12.3, 47.2, and 170.5 per 104 WYs for no, moderate, high, and very high risk, respectively. CONCLUSION: Our prognostic tool - containing clinical information (and if available also genetic data) - seems to be worthwhile testing in medical practice in order to confirm or refute the positive findings of this study. Our cohort study will be continued to include more VTE cases and to increase predictive value of the model.
Sexing adult black-legged kittiwakes by DNA, behavior, and morphology
Jodice, P.G.R.; Lanctot, Richard B.; Gill, V.A.; Roby, D.D.; Hatch, Shyla A.
2000-01-01
We sexed adult Black-legged Kittiwakes (Rissa tridactyla) using DNA-based genetic techniques, behavior and morphology and compared results from these techniques. Genetic and morphology data were collected on 605 breeding kittiwakes and sex-specific behaviors were recorded for a sub-sample of 285 of these individuals. We compared sex classification based on both genetic and behavioral techniques for this sub-sample to assess the accuracy of the genetic technique. DNA-based techniques correctly sexed 97.2% and sex-specific behaviors, 96.5% of this sub-sample. We used the corrected genetic classifications from this sub-sample and the genetic classifications for the remaining birds, under the assumption they were correct, to develop predictive morphometric discriminant function models for all 605 birds. These models accurately predicted the sex of 73-96% of individuals examined, depending on the sample of birds used and the characters included. The most accurate single measurement for determining sex was length of head plus bill, which correctly classified 88% of individuals tested. When both members of a pair were measured, classification levels improved and approached the accuracy of both behavioral observations and genetic analyses. Morphometric techniques were only slightly less accurate than genetic techniques but were easier to implement in the field and less costly. Behavioral observations, while highly accurate, required that birds be easily observable during the breeding season and that birds be identifiable. As such, sex-specific behaviors may best be applied as a confirmation of sex for previously marked birds. All three techniques thus have the potential to be highly accurate, and the selection of one or more will depend on the circumstances of any particular field study.
Scliar, Marilia O; Soares-Souza, Giordano B; Chevitarese, Juliana; Lemos, Livia; Magalhães, Wagner C S; Fagundes, Nelson J; Bonatto, Sandro L; Yeager, Meredith; Chanock, Stephen J; Tarazona-Santos, Eduardo
2012-03-01
Elucidating the pattern of genetic diversity for non-European populations is necessary to make the benefits of human genetics research available to individuals from these groups. In the era of large human genomic initiatives, Native American populations have been neglected, in particular, the Quechua, the largest South Amerindian group settled along the Andes. We characterized the genetic diversity of a Quechua population in a global setting, using autosomal noncoding sequences (nine unlinked loci for a total of 16 kb), 351 unlinked SNPs and 678 microsatellites and tested predictions of the model of the evolution of Native Americans proposed by (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496). European admixture is <5% and African ancestry is barely detectable in the studied population. The largest genetic distances were between African versus Quechua or Melanesian populations, which is concordant with the African origin of modern humans and the fact that South America was the last part of the world to be peopled. The diversity in the Quechua population is comparable with that of Eurasian populations, and the allele frequency spectrum based on resequencing data does not reflect a reduction in the proportion of rare alleles. Thus, the Quechua population is a large reservoir of common and rare genetic variants of South Amerindians. These results are consistent with and complement our evolutionary model of South Amerindians (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496), proposed based on Y-chromosome data, which predicts high genomic diversity due to the high level of gene flow between Andean populations and their long-term effective population size. Copyright © 2012 Wiley Periodicals, Inc.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Dinov, Ivo D.; Heavner, Ben; Tang, Ming; Glusman, Gustavo; Chard, Kyle; Darcy, Mike; Madduri, Ravi; Pa, Judy; Spino, Cathie; Kesselman, Carl; Foster, Ian; Deutsch, Eric W.; Price, Nathan D.; Van Horn, John D.; Ames, Joseph; Clark, Kristi; Hood, Leroy; Hampstead, Benjamin M.; Dauer, William; Toga, Arthur W.
2016-01-01
Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications. PMID:27494614
Adjemian, Jennifer C Z; Girvetz, Evan H; Beckett, Laurel; Foley, Janet E
2006-01-01
More than 20 species of fleas in California are implicated as potential vectors of Yersinia pestis. Extremely limited spatial data exist for plague vectors-a key component to understanding where the greatest risks for human, domestic animal, and wildlife health exist. This study increases the spatial data available for 13 potential plague vectors by using the ecological niche modeling system Genetic Algorithm for Rule-Set Production (GARP) to predict their respective distributions. Because the available sample sizes in our data set varied greatly from one species to another, we also performed an analysis of the robustness of GARP by using the data available for flea Oropsylla montana (Baker) to quantify the effects that sample size and the chosen explanatory variables have on the final species distribution map. GARP effectively modeled the distributions of 13 vector species. Furthermore, our analyses show that all of these modeled ranges are robust, with a sample size of six fleas or greater not significantly impacting the percentage of the in-state area where the flea was predicted to be found, or the testing accuracy of the model. The results of this study will help guide the sampling efforts of future studies focusing on plague vectors.
How Obstacles Perturb Population Fronts and Alter Their Genetic Structure.
Möbius, Wolfram; Murray, Andrew W; Nelson, David R
2015-12-01
As populations spread into new territory, environmental heterogeneities can shape the population front and genetic composition. We focus here on the effects of an important building block of heterogeneous environments, isolated obstacles. With a combination of experiments, theory, and simulation, we show how isolated obstacles both create long-lived distortions of the front shape and amplify the effect of genetic drift. A system of bacteriophage T7 spreading on a spatially heterogeneous Escherichia coli lawn serves as an experimental model system to study population expansions. Using an inkjet printer, we create well-defined replicates of the lawn and quantitatively study the population expansion of phage T7. The transient perturbations of the population front found in the experiments are well described by a model in which the front moves with constant speed. Independent of the precise details of the expansion, we show that obstacles create a kink in the front that persists over large distances and is insensitive to the details of the obstacle's shape. The small deviations between experimental findings and the predictions of the constant speed model can be understood with a more general reaction-diffusion model, which reduces to the constant speed model when the obstacle size is large compared to the front width. Using this framework, we demonstrate that frontier genotypes just grazing the side of an isolated obstacle increase in abundance, a phenomenon we call 'geometry-enhanced genetic drift', complementary to the founder effect associated with spatial bottlenecks. Bacterial range expansions around nutrient-poor barriers and stochastic simulations confirm this prediction. The effect of the obstacle on the genealogy of individuals at the front is characterized by simulations and rationalized using the constant speed model. Lastly, we consider the effect of two obstacles on front shape and genetic composition of the population illuminating the effects expected from complex environments with many obstacles.
How Obstacles Perturb Population Fronts and Alter Their Genetic Structure
Möbius, Wolfram; Murray, Andrew W.; Nelson, David R.
2015-01-01
As populations spread into new territory, environmental heterogeneities can shape the population front and genetic composition. We focus here on the effects of an important building block of heterogeneous environments, isolated obstacles. With a combination of experiments, theory, and simulation, we show how isolated obstacles both create long-lived distortions of the front shape and amplify the effect of genetic drift. A system of bacteriophage T7 spreading on a spatially heterogeneous Escherichia coli lawn serves as an experimental model system to study population expansions. Using an inkjet printer, we create well-defined replicates of the lawn and quantitatively study the population expansion of phage T7. The transient perturbations of the population front found in the experiments are well described by a model in which the front moves with constant speed. Independent of the precise details of the expansion, we show that obstacles create a kink in the front that persists over large distances and is insensitive to the details of the obstacle’s shape. The small deviations between experimental findings and the predictions of the constant speed model can be understood with a more general reaction-diffusion model, which reduces to the constant speed model when the obstacle size is large compared to the front width. Using this framework, we demonstrate that frontier genotypes just grazing the side of an isolated obstacle increase in abundance, a phenomenon we call ‘geometry-enhanced genetic drift’, complementary to the founder effect associated with spatial bottlenecks. Bacterial range expansions around nutrient-poor barriers and stochastic simulations confirm this prediction. The effect of the obstacle on the genealogy of individuals at the front is characterized by simulations and rationalized using the constant speed model. Lastly, we consider the effect of two obstacles on front shape and genetic composition of the population illuminating the effects expected from complex environments with many obstacles. PMID:26696601
General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.
de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael
2016-11-01
Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.
Isocost Lines Describe the Cellular Economy of Genetic Circuits.
Gyorgy, Andras; Jiménez, José I; Yazbek, John; Huang, Hsin-Ho; Chung, Hattie; Weiss, Ron; Del Vecchio, Domitilla
2015-08-04
Genetic circuits in living cells share transcriptional and translational resources that are available in limited amounts. This leads to unexpected couplings among seemingly unconnected modules, which result in poorly predictable circuit behavior. In this study, we determine these interdependencies between products of different genes by characterizing the economy of how transcriptional and translational resources are allocated to the production of proteins in genetic circuits. We discover that, when expressed from the same plasmid, the combinations of attainable protein concentrations are constrained by a linear relationship, which can be interpreted as an isocost line, a concept used in microeconomics. We created a library of circuits with two reporter genes, one constitutive and the other inducible in the same plasmid, without a regulatory path between them. In agreement with the model predictions, experiments reveal that the isocost line rotates when changing the ribosome binding site strength of the inducible gene and shifts when modifying the plasmid copy number. These results demonstrate that isocost lines can be employed to predict how genetic circuits become coupled when sharing resources and provide design guidelines for minimizing the effects of such couplings. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Kim, Joanne Soo-Min; Coyte, Peter C.; Cotterchio, Michelle; Keogh, Louise A.; Flander, Louisa B.; Gaff, Clara; Laporte, Audrey
2016-01-01
Background This study investigated whether receiving the results of predictive genetic testing for Lynch syndrome—indicating the presence or absence of an inherited predisposition to various cancers, including colorectal cancer—was associated with change in individual colonoscopy and smoking behaviours, which could prevent colorectal cancer. Methods The study population included individuals with no previous diagnosis of colorectal cancer, whose families had already-identified deleterious mutations in the mismatch repair or EPCAM genes. Hypotheses were generated from a simple health economics model and tested against individual-level panel data from the Australasian Colorectal Cancer Family Registry. Results The empirical analysis revealed evidence consistent with some of the hypotheses, with a higher likelihood of undergoing colonoscopy in those who discovered their genetic predisposition to colorectal cancer and a lower likelihood of quitting smoking in those who discovered their lack thereof. Conclusion Predictive genetic information about Lynch syndrome was associated with change in individual colonoscopy and smoking behaviours but not necessarily in ways to improve population health. Impact The study findings suggest that the impact of personalized medicine on disease prevention is intricate, warranting further analyses to determine the net benefits and costs. PMID:27528600
Linear genetic programming application for successive-station monthly streamflow prediction
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Kahya, Ercan; Yerdelen, Cahit
2014-09-01
In recent decades, artificial intelligence (AI) techniques have been pronounced as a branch of computer science to model wide range of hydrological phenomena. A number of researches have been still comparing these techniques in order to find more effective approaches in terms of accuracy and applicability. In this study, we examined the ability of linear genetic programming (LGP) technique to model successive-station monthly streamflow process, as an applied alternative for streamflow prediction. A comparative efficiency study between LGP and three different artificial neural network algorithms, namely feed forward back propagation (FFBP), generalized regression neural networks (GRNN), and radial basis function (RBF), has also been presented in this study. For this aim, firstly, we put forward six different successive-station monthly streamflow prediction scenarios subjected to training by LGP and FFBP using the field data recorded at two gauging stations on Çoruh River, Turkey. Based on Nash-Sutcliffe and root mean squared error measures, we then compared the efficiency of these techniques and selected the best prediction scenario. Eventually, GRNN and RBF algorithms were utilized to restructure the selected scenario and to compare with corresponding FFBP and LGP. Our results indicated the promising role of LGP for successive-station monthly streamflow prediction providing more accurate results than those of all the ANN algorithms. We found an explicit LGP-based expression evolved by only the basic arithmetic functions as the best prediction model for the river, which uses the records of the both target and upstream stations.
Breast cancer risks and risk prediction models.
Engel, Christoph; Fischer, Christine
2015-02-01
BRCA1/2 mutation carriers have a considerably increased risk to develop breast and ovarian cancer. The personalized clinical management of carriers and other at-risk individuals depends on precise knowledge of the cancer risks. In this report, we give an overview of the present literature on empirical cancer risks, and we describe risk prediction models that are currently used for individual risk assessment in clinical practice. Cancer risks show large variability between studies. Breast cancer risks are at 40-87% for BRCA1 mutation carriers and 18-88% for BRCA2 mutation carriers. For ovarian cancer, the risk estimates are in the range of 22-65% for BRCA1 and 10-35% for BRCA2. The contralateral breast cancer risk is high (10-year risk after first cancer 27% for BRCA1 and 19% for BRCA2). Risk prediction models have been proposed to provide more individualized risk prediction, using additional knowledge on family history, mode of inheritance of major genes, and other genetic and non-genetic risk factors. User-friendly software tools have been developed that serve as basis for decision-making in family counseling units. In conclusion, further assessment of cancer risks and model validation is needed, ideally based on prospective cohort studies. To obtain such data, clinical management of carriers and other at-risk individuals should always be accompanied by standardized scientific documentation.
A Crowdsourcing Approach to Developing and Assessing Prediction Algorithms for AML Prognosis
Noren, David P.; Long, Byron L.; Norel, Raquel; Rrhissorrakrai, Kahn; Hess, Kenneth; Hu, Chenyue Wendy; Bisberg, Alex J.; Schultz, Andre; Engquist, Erik; Liu, Li; Lin, Xihui; Chen, Gregory M.; Xie, Honglei; Hunter, Geoffrey A. M.; Norman, Thea; Friend, Stephen H.; Stolovitzky, Gustavo; Kornblau, Steven; Qutub, Amina A.
2016-01-01
Acute Myeloid Leukemia (AML) is a fatal hematological cancer. The genetic abnormalities underlying AML are extremely heterogeneous among patients, making prognosis and treatment selection very difficult. While clinical proteomics data has the potential to improve prognosis accuracy, thus far, the quantitative means to do so have yet to be developed. Here we report the results and insights gained from the DREAM 9 Acute Myeloid Prediction Outcome Prediction Challenge (AML-OPC), a crowdsourcing effort designed to promote the development of quantitative methods for AML prognosis prediction. We identify the most accurate and robust models in predicting patient response to therapy, remission duration, and overall survival. We further investigate patient response to therapy, a clinically actionable prediction, and find that patients that are classified as resistant to therapy are harder to predict than responsive patients across the 31 models submitted to the challenge. The top two performing models, which held a high sensitivity to these patients, substantially utilized the proteomics data to make predictions. Using these models, we also identify which signaling proteins were useful in predicting patient therapeutic response. PMID:27351836
NASA Astrophysics Data System (ADS)
Jiao, Peng; Yang, Er; Ni, Yong Xin
2018-06-01
The overland flow resistance on grassland slope of 20° was studied by using simulated rainfall experiments. Model of overland flow resistance coefficient was established based on BP neural network. The input variations of model were rainfall intensity, flow velocity, water depth, and roughness of slope surface, and the output variations was overland flow resistance coefficient. Model was optimized by Genetic Algorithm. The results show that the model can be used to calculate overland flow resistance coefficient, and has high simulation accuracy. The average prediction error of the optimized model of test set is 8.02%, and the maximum prediction error was 18.34%.
Single Nucleotide Polymorphisms Predict Symptom Severity of Autism Spectrum Disorder
ERIC Educational Resources Information Center
Jiao, Yun; Chen, Rong; Ke, Xiaoyan; Cheng, Lu; Chu, Kangkang; Lu, Zuhong; Herskovits, Edward H.
2012-01-01
Autism is widely believed to be a heterogeneous disorder; diagnosis is currently based solely on clinical criteria, although genetic, as well as environmental, influences are thought to be prominent factors in the etiology of most forms of autism. Our goal is to determine whether a predictive model based on single-nucleotide polymorphisms (SNPs)…
Stream Flow Prediction by Remote Sensing and Genetic Programming
NASA Technical Reports Server (NTRS)
Chang, Ni-Bin
2009-01-01
A genetic programming (GP)-based, nonlinear modeling structure relates soil moisture with synthetic-aperture-radar (SAR) images to present representative soil moisture estimates at the watershed scale. Surface soil moisture measurement is difficult to obtain over a large area due to a variety of soil permeability values and soil textures. Point measurements can be used on a small-scale area, but it is impossible to acquire such information effectively in large-scale watersheds. This model exhibits the capacity to assimilate SAR images and relevant geoenvironmental parameters to measure soil moisture.
Genetic interaction networks: better understand to better predict
Boucher, Benjamin; Jenna, Sarah
2013-01-01
A genetic interaction (GI) between two genes generally indicates that the phenotype of a double mutant differs from what is expected from each individual mutant. In the last decade, genome scale studies of quantitative GIs were completed using mainly synthetic genetic array technology and RNA interference in yeast and Caenorhabditis elegans. These studies raised questions regarding the functional interpretation of GIs, the relationship of genetic and molecular interaction networks, the usefulness of GI networks to infer gene function and co-functionality, the evolutionary conservation of GI, etc. While GIs have been used for decades to dissect signaling pathways in genetic models, their functional interpretations are still not trivial. The existence of a GI between two genes does not necessarily imply that these two genes code for interacting proteins or that the two genes are even expressed in the same cell. In fact, a GI only implies that the two genes share a functional relationship. These two genes may be involved in the same biological process or pathway; or they may also be involved in compensatory pathways with unrelated apparent function. Considering the powerful opportunity to better understand gene function, genetic relationship, robustness and evolution, provided by a genome-wide mapping of GIs, several in silico approaches have been employed to predict GIs in unicellular and multicellular organisms. Most of these methods used weighted data integration. In this article, we will review the later knowledge acquired on GI networks in metazoans by looking more closely into their relationship with pathways, biological processes and molecular complexes but also into their modularity and organization. We will also review the different in silico methods developed to predict GIs and will discuss how the knowledge acquired on GI networks can be used to design predictive tools with higher performances. PMID:24381582
Bayesian Networks Predict Neuronal Transdifferentiation.
Ainsworth, Richard I; Ai, Rizi; Ding, Bo; Li, Nan; Zhang, Kai; Wang, Wei
2018-05-30
We employ the language of Bayesian networks to systematically construct gene-regulation topologies from deep-sequencing single-nucleus RNA-Seq data for human neurons. From the perspective of the cell-state potential landscape, we identify attractors that correspond closely to different neuron subtypes. Attractors are also recovered for cell states from an independent data set confirming our models accurate description of global genetic regulations across differing cell types of the neocortex (not included in the training data). Our model recovers experimentally confirmed genetic regulations and community analysis reveals genetic associations in common pathways. Via a comprehensive scan of all theoretical three-gene perturbations of gene knockout and overexpression, we discover novel neuronal trans-differrentiation recipes (including perturbations of SATB2, GAD1, POU6F2 and ADARB2) for excitatory projection neuron and inhibitory interneuron subtypes. Copyright © 2018, G3: Genes, Genomes, Genetics.
Smith, Rachel A.; Greenberg, Marisa; Parrott, Roxanne L.
2014-01-01
With a growing interest in using genetic information to motivate young adults’ health behaviors, audience segmentation is needed for effective campaign design. Using latent class analysis, this study identifies segments based on young adults’ (N = 327) beliefs about genetic threats to their health and personal efficacy over genetic influences on their health. A four-class model was identified. The model indicators fit the risk perception attitude framework (Rimal & Real, 2003), but the covariates (e.g., current health behaviors) did not. In addition, opinion leader qualities covaried with one profile: those in this profile engaged in fewer preventative behaviors and more dangerous treatment options, and also liked to persuade others, making them a particularly salient group for campaign efforts. The implications for adult-onset disorders, like alpha-1 antitrypsin deficiency are discussed. PMID:24111749
Neuro-genetic non-invasive temperature estimation: intensity and spatial prediction.
Teixeira, César A; Ruano, M Graça; Ruano, António E; Pereira, Wagner C A
2008-06-01
The existence of proper non-invasive temperature estimators is an essential aspect when thermal therapy applications are envisaged. These estimators must be good predictors to enable temperature estimation at different operational situations, providing better control of the therapeutic instrumentation. In this work, radial basis functions artificial neural networks were constructed to access temperature evolution on an ultrasound insonated medium. The employed models were radial basis functions neural networks with external dynamics induced by their inputs. Both the most suited set of model inputs and number of neurons in the network were found using the multi-objective genetic algorithm. The neural models were validated in two situations: the operating ones, as used in the construction of the network; and in 11 unseen situations. The new data addressed two new spatial locations and a new intensity level, assessing the intensity and space prediction capacity of the proposed model. Good performance was obtained during the validation process both in terms of the spatial points considered and whenever the new intensity level was within the range of applied intensities. A maximum absolute error of 0.5 degrees C+/-10% (0.5 degrees C is the gold-standard threshold in hyperthermia/diathermia) was attained with low computationally complex models. The results confirm that the proposed neuro-genetic approach enables foreseeing temperature propagation, in connection to intensity and space parameters, thus enabling the assessment of different operating situations with proper temperature resolution.
Saavedra-Sotelo, Nancy C; Calderon-Aguilera, Luis E; Reyes-Bonilla, Héctor; Paz-García, David A; López-Pérez, Ramón A; Cupul-Magaña, Amilcar; Cruz-Barraza, José A; Rocha-Olivares, Axayácatl
2013-01-01
The coral fauna of the Eastern Tropical Pacific (ETP) is depauperate and peripheral; hence, it has drawn attention to the factors allowing its survival. Here, we use a genetic seascape approach and ecological niche modeling to unravel the environmental factors correlating with the genetic variation of Porites panamensis, a hermatypic coral endemic to the ETP. Specifically, we test if levels of diversity and connectivity are higher among abundant than among depauperate populations, as expected by a geographically relaxed version of the Abundant Center Hypothesis (rel-ACH). Unlike the original ACH, referring to a geographical center of distribution of maximal abundance, the rel-ACH refers only to a center of maximum abundance, irrespective of its geographic position. The patterns of relative abundance of P. panamensis in the Mexican Pacific revealed that northern populations from Baja California represent its center of abundance; and southern depauperate populations along the continental margin are peripheral relative to it. Genetic patterns of diversity and structure of nuclear DNA sequences (ribosomal DNA and a single copy open reading frame) and five alloenzymatic loci partially agreed with rel-ACH predictions. We found higher diversity levels in peninsular populations and significant differentiation between peninsular and continental colonies. In addition, continental populations showed higher levels of differentiation and lower connectivity than peninsular populations in the absence of isolation by distance in each region. Some discrepancies with model expectations may relate to the influence of significant habitat discontinuities in the face of limited dispersal potential. Environmental data analyses and niche modeling allowed us to identify temperature, water clarity, and substrate availability as the main factors correlating with patterns of abundance, genetic diversity, and structure, which may hold the key to the survival of P. panamensis in the face of widespread environmental degradation. PMID:24324860
Tanaka, Keiko; Sekijima, Yoshiki; Yoshida, Kunihiro; Mizuuchi, Asako; Yamashita, Hiromi; Tamai, Mariko; Ikeda, Shu-ichi; Fukushima, Yoshimitsu
2013-01-01
The current status of predictive genetic testing for late-onset hereditary neurological diseases in Japan is largely unknown. In this study, we analyzed data from 73 clients who visited the Division of Clinical and Molecular Genetics, Shinshu University Hospital, for the purpose of predictive genetic testing. The clients consisted of individuals with family histories of familial amyloid polyneuropathy (FAP; n=30), Huntington's disease (HD; n=16), spinocerebellar degeneration (SCD; n=14), myotonic dystrophy type 1 (DM1; n=9), familial amyotrophic lateral sclerosis type 1 (ALS1; n=3), and Alzheimer's disease (AD; n=1). Forty-nine of the 73 (67.1%) clients were in their twenties or thirties. Twenty-seven of the 73 (37.0%) clients visited a medical institution within 3 months after becoming aware of predictive genetic testing. The most common reason for requesting predictive genetic testing was a need for certainty or to reduce uncertainty and anxiety. The decision-making about marriage and having a child was also a main reason in clients in the twenties and thirties. The numbers of clients who actually underwent predictive genetic testing was 22 of 30 (73.3%) in FAP, 3 of 16 (18.8%) in HD, 6 of 10 (60.0%) in SCD, 7 of 9 (77.8%) in DM1, and 0 of 3 (0%) in ALS1 (responsible gene of the disease was unknown in 4 SCD patients and an AD patient). The percentage of test usage was lower in untreatable diseases such as HD and SCD than that in FAP, suggesting that many clients changed their way of thinking on the significance of testing through multiple genetic counseling sessions. In addition, it was obvious that existence of disease-modifying therapy promoted usage of predictive genetic testing in FAP. Improvement of genetic counseling system to manage predictive genetic testing is necessary, as consultation concerning predictive genetic testing is the main motivation to visit genetic counseling clinic in many at-risk clients.
Model-based analysis of N-glycosylation in Chinese hamster ovary cells
Krambeck, Frederick J.; Bennun, Sandra V.; Betenbaugh, Michael J.
2017-01-01
The Chinese hamster ovary (CHO) cell is the gold standard for manufacturing of glycosylated recombinant proteins for production of biotherapeutics. The similarity of its glycosylation patterns to the human versions enable the products of this cell line favorable pharmacokinetic properties and lower likelihood of causing immunogenic responses. Because glycan structures are the product of the concerted action of intracellular enzymes, it is difficult to predict a priori how the effects of genetic manipulations alter glycan structures of cells and therapeutic properties. For that reason, quantitative models able to predict glycosylation have emerged as promising tools to deal with the complexity of glycosylation processing. For example, an earlier version of the same model used in this study was used by others to successfully predict changes in enzyme activities that could produce a desired change in glycan structure. In this study we utilize an updated version of this model to provide a comprehensive analysis of N-glycosylation in ten Chinese hamster ovary (CHO) cell lines that include a wild type parent and nine mutants of CHO, through interpretation of previously published mass spectrometry data. The updated N-glycosylation mathematical model contains up to 50,605 glycan structures. Adjusting the enzyme activities in this model to match N-glycan mass spectra produces detailed predictions of the glycosylation process, enzyme activity profiles and complete glycosylation profiles of each of the cell lines. These profiles are consistent with biochemical and genetic data reported previously. The model-based results also predict glycosylation features of the cell lines not previously published, indicating more complex changes in glycosylation enzyme activities than just those resulting directly from gene mutations. The model predicts that the CHO cell lines possess regulatory mechanisms that allow them to adjust glycosylation enzyme activities to mitigate side effects of the primary loss or gain of glycosylation function known to exist in these mutant cell lines. Quantitative models of CHO cell glycosylation have the potential for predicting how glycoengineering manipulations might affect glycoform distributions to improve the therapeutic performance of glycoprotein products. PMID:28486471
Predicting paclitaxel-induced neutropenia using the DMET platform.
Nieuweboer, Annemieke J M; Smid, Marcel; de Graan, Anne-Joy M; Elbouazzaoui, Samira; de Bruijn, Peter; Martens, John W; Mathijssen, Ron H J; van Schaik, Ron H N
2015-01-01
The use of paclitaxel in cancer treatment is limited by paclitaxel-induced neutropenia. We investigated the ability of genetic variation in drug-metabolizing enzymes and transporters to predict hematological toxicity. Using a discovery and validation approach, we identified a pharmacogenetic predictive model for neutropenia. For this, a drug-metabolizing enzymes and transporters plus DNA chip was used, which contains 1936 SNPs in 225 metabolic enzyme and drug-transporter genes. Our 10-SNP model in 279 paclitaxel-dosed patients reached 43% sensitivity in the validation cohort. Analysis in 3-weekly treated patients only resulted in improved sensitivity of 79%, with a specificity of 33%. None of our models reached statistical significance. Our drug-metabolizing enzymes and transporters-based SNP-models are currently of limited value for predicting paclitaxel-induced neutropenia in clinical practice. Original submitted 9 March 2015; Revision submitted 20 May 2015.
Mirkhani, Seyyed Alireza; Gharagheizi, Farhad; Sattari, Mehdi
2012-03-01
Evaluation of diffusion coefficients of pure compounds in air is of great interest for many diverse industrial and air quality control applications. In this communication, a QSPR method is applied to predict the molecular diffusivity of chemical compounds in air at 298.15K and atmospheric pressure. Four thousand five hundred and seventy nine organic compounds from broad spectrum of chemical families have been investigated to propose a comprehensive and predictive model. The final model is derived by Genetic Function Approximation (GFA) and contains five descriptors. Using this dedicated model, we obtain satisfactory results quantified by the following statistical results: Squared Correlation Coefficient=0.9723, Standard Deviation Error=0.003 and Average Absolute Relative Deviation=0.3% for the predicted properties from existing experimental values. Copyright © 2011 Elsevier Ltd. All rights reserved.
Choisy, Marc; de Roode, Jacobus C
2014-08-01
Animal medication against parasites can occur either as a genetically fixed (constitutive) or phenotypically plastic (induced) behavior. Taking the tritrophic interaction between the monarch butterfly Danaus plexippus, its protozoan parasite Ophryocystis elektroscirrha, and its food plant Asclepias spp. as a test case, we develop a game-theory model to identify the epidemiological (parasite prevalence and virulence) and environmental (plant toxicity and abundance) conditions that predict the evolution of genetically fixed versus phenotypically plastic forms of medication. Our model shows that the relative benefits (the antiparasitic properties of medicinal food) and costs (side effects of medicine, the costs of searching for medicine, and the costs of plasticity itself) crucially determine whether medication is genetically fixed or phenotypically plastic. Our model suggests that animals evolve phenotypic plasticity when parasite risk (a combination of virulence and prevalence and thus a measure of the strength of parasite-mediated selection) is relatively low to moderately high and genetically fixed medication when parasite risk becomes very high. The latter occurs because at high parasite risk, the costs of plasticity are outweighed by the benefits of medication. Our model provides a simple and general framework to study the conditions that drive the evolution of alternative forms of animal medication.
LaBuda, M C; DeFries, J C; Plomin, R; Fulker, D W
1986-10-01
A path model of genetic and shared family environmental transmission was fitted to general cognitive ability data from 1-, 2-, 3-, and 4-year-old adopted and nonadopted children and their parents in order to assess the etiology of longitudinal stability from infancy to early childhood. Stability across years is moderate and is due mainly to influences not predicted by parental IQ. Results of the present study, in conjunction with those of previous twin studies, suggest substantial genetic stability from infancy and early childhood to adulthood.
Bouvet, J-M; Makouanzi, G; Cros, D; Vigneron, Ph
2016-01-01
Hybrids are broadly used in plant breeding and accurate estimation of variance components is crucial for optimizing genetic gain. Genome-wide information may be used to explore models designed to assess the extent of additive and non-additive variance and test their prediction accuracy for the genomic selection. Ten linear mixed models, involving pedigree- and marker-based relationship matrices among parents, were developed to estimate additive (A), dominance (D) and epistatic (AA, AD and DD) effects. Five complementary models, involving the gametic phase to estimate marker-based relationships among hybrid progenies, were developed to assess the same effects. The models were compared using tree height and 3303 single-nucleotide polymorphism markers from 1130 cloned individuals obtained via controlled crosses of 13 Eucalyptus urophylla females with 9 Eucalyptus grandis males. Akaike information criterion (AIC), variance ratios, asymptotic correlation matrices of estimates, goodness-of-fit, prediction accuracy and mean square error (MSE) were used for the comparisons. The variance components and variance ratios differed according to the model. Models with a parent marker-based relationship matrix performed better than those that were pedigree-based, that is, an absence of singularities, lower AIC, higher goodness-of-fit and accuracy and smaller MSE. However, AD and DD variances were estimated with high s.es. Using the same criteria, progeny gametic phase-based models performed better in fitting the observations and predicting genetic values. However, DD variance could not be separated from the dominance variance and null estimates were obtained for AA and AD effects. This study highlighted the advantages of progeny models using genome-wide information. PMID:26328760
Conomos, Matthew P; Miller, Michael B; Thornton, Timothy A
2015-05-01
Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multidimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using 10 (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness. © 2015 WILEY PERIODICALS, INC.
A genetic algorithm approach in interface and surface structure optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jian
The thesis is divided into two parts. In the first part a global optimization method is developed for the interface and surface structures optimization. Two prototype systems are chosen to be studied. One is Si[001] symmetric tilted grain boundaries and the other is Ag/Au induced Si(111) surface. It is found that Genetic Algorithm is very efficient in finding lowest energy structures in both cases. Not only existing structures in the experiments can be reproduced, but also many new structures can be predicted using Genetic Algorithm. Thus it is shown that Genetic Algorithm is a extremely powerful tool for the materialmore » structures predictions. The second part of the thesis is devoted to the explanation of an experimental observation of thermal radiation from three-dimensional tungsten photonic crystal structures. The experimental results seems astounding and confusing, yet the theoretical models in the paper revealed the physics insight behind the phenomena and can well reproduced the experimental results.« less
He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Xu, Pao; Yang, Runqing
2018-02-01
To genetically analyse growth traits in genetically improved farmed tilapia (GIFT), the body weight (BWE) and main morphological traits, including body length (BL), body depth (BD), body width (BWI), head length (HL) and length of the caudal peduncle (CPL), were measured six times in growth duration on 1451 fish from 45 mixed families of full and half sibs. A random regression model (RRM) was used to model genetic changes of the growth traits with days of age and estimate the heritability for any growth point and genetic correlations between pairwise growth points. Using the covariance function based on optimal RRMs, the heritabilities were estimated to be from 0.102 to 0.662 for BWE, 0.157 to 0.591 for BL, 0.047 to 0.621 for BD, 0.018 to 0.577 for BWI, 0.075 to 0.597 for HL and 0.032 to 0.610 for CPL between 60 and 140 days of age. All genetic correlations exceeded 0.5 between pairwise growth points. Moreover, the traits at initial days of age showed less correlation with those at later days of age. With phenotypes observed repeatedly, the model choice showed that the optimal RRMs could more precisely predict breeding values at a specific growth time than repeatability models or multiple trait animal models, which enhanced the efficiency of selection for the BWE and main morphological traits.
Combest, Austin J.; Roberts, Patrick J.; Dillon, Patrick M.; Sandison, Katie; Hanna, Suzan K.; Ross, Charlene; Habibi, Sohrab; Zamboni, Beth; Müller, Markus; Brunner, Martin; Sharpless, Norman E.
2012-01-01
Background. Rodent studies are a vital step in the development of novel anticancer therapeutics and are used in pharmacokinetic (PK), toxicology, and efficacy studies. Traditionally, anticancer drug development has relied on xenograft implantation of human cancer cell lines in immunocompromised mice for efficacy screening of a candidate compound. The usefulness of xenograft models for efficacy testing, however, has been questioned, whereas genetically engineered mouse models (GEMMs) and orthotopic syngeneic transplants (OSTs) may offer some advantages for efficacy assessment. A critical factor influencing the predictability of rodent tumor models is drug PKs, but a comprehensive comparison of plasma and tumor PK parameters among xenograft models, OSTs, GEMMs, and human patients has not been performed. Methods. In this work, we evaluated the plasma and tumor dispositions of an antimelanoma agent, carboplatin, in patients with cutaneous melanoma compared with four different murine melanoma models (one GEMM, one human cell line xenograft, and two OSTs). Results. Using microdialysis to sample carboplatin tumor disposition, we found that OSTs and xenografts were poor predictors of drug exposure in human tumors, whereas the GEMM model exhibited PK parameters similar to those seen in human tumors. Conclusions. The tumor PKs of carboplatin in a GEMM of melanoma more closely resembles the tumor disposition in patients with melanoma than transplanted tumor models. GEMMs show promise in becoming an improved prediction model for intratumoral PKs and response in patients with solid tumors. PMID:22993143
Critical Buckling Pressure in Mouse Carotid Arteries with Altered Elastic Fibers
Luetkemeyer, Callan M.; James, Rhys H.; Devarakonda, Siva Teja; Le, Victoria P.; Liu, Qin; Han, Hai-Chao; Wagenseil, Jessica E.
2015-01-01
Arteries can buckle axially under applied critical buckling pressure due to a mechanical instability. Buckling can cause arterial tortuosity leading to flow irregularities and stroke. Genetic mutations in elastic fiber proteins are associated with arterial tortuosity in humans and mice, and may be the result of alterations in critical buckling pressure. Hence, the objective of this study is to investigate how genetic defects in elastic fibers affect buckling pressure. We use mouse models of human disease with reduced amounts of elastin (Eln+/−) and with defects in elastic fiber assembly due to the absence of fibulin-5 (Fbln5−/−). We find that Eln+/− arteries have reduced buckling pressure compared to their wild-type controls. Fbln5−/− arteries have similar buckling pressure to wild-type at low axial stretch, but increased buckling pressure at high stretch. We fit material parameters to mechanical test data for Eln+/−, Fbln5−/− and wild-type arteries using Fung and four-fiber strain energy functions. Fitted parameters are used to predict theoretical buckling pressure based on equilibrium of an inflated, buckled, thick-walled cylinder. In general, the theoretical predictions underestimate the buckling pressure at low axial stretch and overestimate the buckling pressure at high stretch. The theoretical predictions with both models replicate the increased buckling pressure at high stretch for Fbln5−/− arteries, but the four-fiber model predictions best match the experimental trends in buckling pressure changes with axial stretch. This study provides experimental and theoretical methods for further investigating the influence of genetic mutations in elastic fibers on buckling behavior and the development of arterial tortuosity. PMID:25771258
NASA Astrophysics Data System (ADS)
Friedel, Michael; Buscema, Massimo
2016-04-01
Aquatic ecosystem models can potentially be used to understand the influence of stresses on catchment resource quality. Given that catchment responses are functions of natural and anthropogenic stresses reflected in sparse and spatiotemporal biological, physical, and chemical measurements, an ecosystem is difficult to model using statistical or numerical methods. We propose an artificial adaptive systems approach to model ecosystems. First, an unsupervised machine-learning (ML) network is trained using the set of available sparse and disparate data variables. Second, an evolutionary algorithm with genetic doping is applied to reduce the number of ecosystem variables to an optimal set. Third, the optimal set of ecosystem variables is used to retrain the ML network. Fourth, a stochastic cross-validation approach is applied to quantify and compare the nonlinear uncertainty in selected predictions of the original and reduced models. Results are presented for aquatic ecosystems (tens of thousands of square kilometers) undergoing landscape change in the USA: Upper Illinois River Basin and Central Colorado Assessment Project Area, and Southland region, NZ.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Schmitt, Thomas; Habel, Jan Christian; Rödder, Dennis; Louy, Dirk
2014-07-01
Mountain species have evolved important genetic differentiation due to past climatic fluctuations. The genetic uniqueness of many of these lineages is now at risk due to global warming. Here, we analyse allozyme polymorphisms of 1306 individuals (36 populations) of the mountain butterfly Erebia manto and perform Species Distribution Models (SDMs). As a consensus of analyses, we obtained six most likely genetic clusters: (i) Pyrenees with Massif Central; (ii) Vosges; (iii-v) Alps including the Slovakian Carpathians; (vi) southern Carpathians. The Vosges population showed the strongest genetic split from all other populations, being almost as strong as the split between E. manto and its sister species Erebia eriphyle. The distinctiveness of the Pyrenees-Massif Central group and of the southern Carpathians group from all other groups is also quite high. All three groups are assumed to have survived more than one full glacial-interglacial cycle close to their current distributions with up-hill and down-slope shifts conforming climatic conditions. In contrast with these well-differentiated groups, the three groups present in the Alps and the Slovakian Carpathians show a much shallower genetic structure and thus also should be of a more recent origin. As predicted by our SDM projections, rising temperatures will strongly impact the distribution of E. manto. While the populations in the Alps are predicted to shrink, the survival of the three lineages present here should not be at risk. The situation of the three other lineages is quite different. All models predict the extinction of the Vosges lineage in the wake of global warming, and also the southern Carpathians and Pyrenees-Massif Central lineages might be at high risk to disappear. Thus, albeit global warming will therefore be unlikely to threaten E. manto as a species, an important proportion of the species' intraspecific differentiation and thus uniqueness might be lost. © 2014 John Wiley & Sons Ltd.
The ethics of disclosing genetic diagnosis for Alzheimer's disease: do we need a new paradigm?
Arribas-Ayllon, Michael
2011-01-01
Genetic testing for rare Mendelian disorders represents the dominant ethical paradigm in clinical and professional practice. Predictive testing for Huntington's disease is the model against which other kinds of genetic testing are evaluated, including testing for Alzheimer's disease. This paper retraces the historical development of ethical reasoning in relation to predictive genetic testing and reviews a range of ethical, sociological and psychological literature from the 1970s to the present. In the past, ethical reasoning has embodied a distinct style whereby normative principles are developed from a dominant disease exemplar. This reductionist approach to formulating ethical frameworks breaks down in the case of disease susceptibility. Recent developments in the genetics of Alzheimer's disease present a significant case for reconsidering the ethics of disclosing risk for common complex diseases. Disclosing the results of susceptibility testing for Alzheimer's disease has different social, psychological and behavioural consequences. Furthermore, what genetic susceptibility means to individuals and their families is diffuse and often mitigated by other factors and concerns. The ethics of disclosing a genetic diagnosis of susceptibility is contingent on whether professionals accept that probabilistic risk information is in fact 'diagnostic' and it will rely substantially on empirical evidence of how people actually perceive, recall and communicate complex risk information.
Evolutionary genetics of host shifts in herbivorous insects: insights from the age of genomics.
Vertacnik, Kim L; Linnen, Catherine R
2017-02-01
Adaptation to different host taxa is a key driver of insect diversification. Herbivorous insects are classic models for ecological and evolutionary research, but it is recent advances in sequencing, statistics, and molecular technologies that have cleared the way for investigations into the proximate genetic mechanisms underlying host shifts. In this review, we discuss how genome-scale data are revealing-at resolutions previously unimaginable-the genetic architecture of host-use traits, the causal loci underlying host shifts, and the predictability of host-use evolution. Collectively, these studies are providing novel insights into longstanding questions about host-use evolution. On the basis of this synthesis, we suggest that different host-use traits are likely to differ in their genetic architecture (number of causal loci and the nature of their genetic correlations) and genetic predictability (extent of gene or mutation reuse), indicating that any conclusions about the causes and consequences of host-use evolution will depend heavily on which host-use traits are investigated. To draw robust conclusions and identify general patterns in host-use evolution, we argue that investigation of diverse host-use traits and identification of causal genes and mutations should be the top priorities for future studies on the evolutionary genetics of host shifts. © 2017 New York Academy of Sciences.
Wu, Lang; Shi, Wei; Long, Jirong; Guo, Xingyi; Michailidou, Kyriaki; Beesley, Jonathan; Bolla, Manjeet K; Shu, Xiao-Ou; Lu, Yingchang; Cai, Qiuyin; Al-Ejeh, Fares; Rozali, Esdy; Wang, Qin; Dennis, Joe; Li, Bingshan; Zeng, Chenjie; Feng, Helian; Gusev, Alexander; Barfield, Richard T; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Aronson, Kristan J; Auer, Paul L; Barrdahl, Myrto; Baynes, Caroline; Beckmann, Matthias W; Benitez, Javier; Bermisheva, Marina; Blomqvist, Carl; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Brinton, Louise; Broberg, Per; Brucker, Sara Y; Burwinkel, Barbara; Caldés, Trinidad; Canzian, Federico; Carter, Brian D; Castelao, J Esteban; Chang-Claude, Jenny; Chen, Xiaoqing; Cheng, Ting-Yuan David; Christiansen, Hans; Clarke, Christine L; Collée, Margriet; Cornelissen, Sten; Couch, Fergus J; Cox, David; Cox, Angela; Cross, Simon S; Cunningham, Julie M; Czene, Kamila; Daly, Mary B; Devilee, Peter; Doheny, Kimberly F; Dörk, Thilo; Dos-Santos-Silva, Isabel; Dumont, Martine; Dwek, Miriam; Eccles, Diana M; Eilber, Ursula; Eliassen, A Heather; Engel, Christoph; Eriksson, Mikael; Fachal, Laura; Fasching, Peter A; Figueroa, Jonine; Flesch-Janys, Dieter; Fletcher, Olivia; Flyger, Henrik; Fritschi, Lin; Gabrielson, Marike; Gago-Dominguez, Manuela; Gapstur, Susan M; García-Closas, Montserrat; Gaudet, Mia M; Ghoussaini, Maya; Giles, Graham G; Goldberg, Mark S; Goldgar, David E; González-Neira, Anna; Guénel, Pascal; Hahnen, Eric; Haiman, Christopher A; Håkansson, Niclas; Hall, Per; Hallberg, Emily; Hamann, Ute; Harrington, Patricia; Hein, Alexander; Hicks, Belynda; Hillemanns, Peter; Hollestelle, Antoinette; Hoover, Robert N; Hopper, John L; Huang, Guanmengqian; Humphreys, Keith; Hunter, David J; Jakubowska, Anna; Janni, Wolfgang; John, Esther M; Johnson, Nichola; Jones, Kristine; Jones, Michael E; Jung, Audrey; Kaaks, Rudolf; Kerin, Michael J; Khusnutdinova, Elza; Kosma, Veli-Matti; Kristensen, Vessela N; Lambrechts, Diether; Le Marchand, Loic; Li, Jingmei; Lindström, Sara; Lissowska, Jolanta; Lo, Wing-Yee; Loibl, Sibylle; Lubinski, Jan; Luccarini, Craig; Lux, Michael P; MacInnis, Robert J; Maishman, Tom; Kostovska, Ivana Maleva; Mannermaa, Arto; Manson, JoAnn E; Margolin, Sara; Mavroudis, Dimitrios; Meijers-Heijboer, Hanne; Meindl, Alfons; Menon, Usha; Meyer, Jeffery; Mulligan, Anna Marie; Neuhausen, Susan L; Nevanlinna, Heli; Neven, Patrick; Nielsen, Sune F; Nordestgaard, Børge G; Olopade, Olufunmilayo I; Olson, Janet E; Olsson, Håkan; Peterlongo, Paolo; Peto, Julian; Plaseska-Karanfilska, Dijana; Prentice, Ross; Presneau, Nadege; Pylkäs, Katri; Rack, Brigitte; Radice, Paolo; Rahman, Nazneen; Rennert, Gad; Rennert, Hedy S; Rhenius, Valerie; Romero, Atocha; Romm, Jane; Rudolph, Anja; Saloustros, Emmanouil; Sandler, Dale P; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Schneeweiss, Andreas; Scott, Rodney J; Scott, Christopher G; Seal, Sheila; Shah, Mitul; Shrubsole, Martha J; Smeets, Ann; Southey, Melissa C; Spinelli, John J; Stone, Jennifer; Surowy, Harald; Swerdlow, Anthony J; Tamimi, Rulla M; Tapper, William; Taylor, Jack A; Terry, Mary Beth; Tessier, Daniel C; Thomas, Abigail; Thöne, Kathrin; Tollenaar, Rob A E M; Torres, Diana; Truong, Thérèse; Untch, Michael; Vachon, Celine; Van Den Berg, David; Vincent, Daniel; Waisfisz, Quinten; Weinberg, Clarice R; Wendt, Camilla; Whittemore, Alice S; Wildiers, Hans; Willett, Walter C; Winqvist, Robert; Wolk, Alicja; Xia, Lucy; Yang, Xiaohong R; Ziogas, Argyrios; Ziv, Elad; Dunning, Alison M; Pharoah, Paul D P; Simard, Jacques; Milne, Roger L; Edwards, Stacey L; Kraft, Peter; Easton, Douglas F; Chenevix-Trench, Georgia; Zheng, Wei
2018-06-18
The breast cancer risk variants identified in genome-wide association studies explain only a small fraction of the familial relative risk, and the genes responsible for these associations remain largely unknown. To identify novel risk loci and likely causal genes, we performed a transcriptome-wide association study evaluating associations of genetically predicted gene expression with breast cancer risk in 122,977 cases and 105,974 controls of European ancestry. We used data from the Genotype-Tissue Expression Project to establish genetic models to predict gene expression in breast tissue and evaluated model performance using data from The Cancer Genome Atlas. Of the 8,597 genes evaluated, significant associations were identified for 48 at a Bonferroni-corrected threshold of P < 5.82 × 10 -6 , including 14 genes at loci not yet reported for breast cancer. We silenced 13 genes and showed an effect for 11 on cell proliferation and/or colony-forming efficiency. Our study provides new insights into breast cancer genetics and biology.
Accuracy of Predicted Genomic Breeding Values in Purebred and Crossbred Pigs.
Hidalgo, André M; Bastiaansen, John W M; Lopes, Marcos S; Harlizius, Barbara; Groenen, Martien A M; de Koning, Dirk-Jan
2015-05-26
Genomic selection has been widely implemented in dairy cattle breeding when the aim is to improve performance of purebred animals. In pigs, however, the final product is a crossbred animal. This may affect the efficiency of methods that are currently implemented for dairy cattle. Therefore, the objective of this study was to determine the accuracy of predicted breeding values in crossbred pigs using purebred genomic and phenotypic data. A second objective was to compare the predictive ability of SNPs when training is done in either single or multiple populations for four traits: age at first insemination (AFI); total number of piglets born (TNB); litter birth weight (LBW); and litter variation (LVR). We performed marker-based and pedigree-based predictions. Within-population predictions for the four traits ranged from 0.21 to 0.72. Multi-population prediction yielded accuracies ranging from 0.18 to 0.67. Predictions across purebred populations as well as predicting genetic merit of crossbreds from their purebred parental lines for AFI performed poorly (not significantly different from zero). In contrast, accuracies of across-population predictions and accuracies of purebred to crossbred predictions for LBW and LVR ranged from 0.08 to 0.31 and 0.11 to 0.31, respectively. Accuracy for TNB was zero for across-population prediction, whereas for purebred to crossbred prediction it ranged from 0.08 to 0.22. In general, marker-based outperformed pedigree-based prediction across populations and traits. However, in some cases pedigree-based prediction performed similarly or outperformed marker-based prediction. There was predictive ability when purebred populations were used to predict crossbred genetic merit using an additive model in the populations studied. AFI was the only exception, indicating that predictive ability depends largely on the genetic correlation between PB and CB performance, which was 0.31 for AFI. Multi-population prediction was no better than within-population prediction for the purebred validation set. Accuracy of prediction was very trait-dependent. Copyright © 2015 Hidalgo et al.
Norén, Karin; Angerbjörn, Anders
2014-05-01
Many key species in northern ecosystems are characterised by high-amplitude cyclic population demography. In 1924, Charles Elton described the ecology and evolution of cyclic populations in a classic paper and, since then, a major focus has been the underlying causes of population cycles. Elton hypothesised that fluctuations reduced population genetic variation and influenced the direction of selection pressures. In concordance with Elton, present theories concern the direct consequences of population cycles for genetic structure due to the processes of genetic drift and selection, but also include feedback models of genetic composition on population dynamics. Most of these theories gained mathematical support during the 1970s and onwards, but due to methodological drawbacks, difficulties in long-term sampling and a complex interplay between microevolutionary processes, clear empirical data allowing the testing of these predictions are still scarce. Current genetic tools allow for estimates of genetic variation and identification of adaptive genomic regions, making this an ideal time to revisit this subject. Herein, we attempt to contribute towards a consensus regarding the enigma described by Elton almost 90 years ago. We present nine predictions covering the direct and genetic feedback consequences of population cycles on genetic variation and population structure, and review the empirical evidence. Generally, empirical support for the predictions was low and scattered, with obvious gaps in the understanding of basic population processes. We conclude that genetic variation in northern cyclic populations generally is high and that the geographic distribution and amount of diversity are usually suggested to be determined by various forms of context- and density-dependent dispersal exceeding the impact of genetic drift. Furthermore, we found few clear signatures of selection determining genetic composition in cyclic populations. Dispersal is assumed to have a strong impact on genetic structuring and we suggest that the signatures of other microevolutionary processes such as genetic drift and selection are weaker and have been over-shadowed by density-dependent dispersal. We emphasise that basic biological and demographical questions still need to be answered and stress the importance of extensive sampling, appropriate choice of tools and the value of standardised protocols. © 2013 The Authors. Biological Reviews © 2013 Cambridge Philosophical Society.
Radiogenomics to characterize regional genetic heterogeneity in glioblastoma.
Hu, Leland S; Ning, Shuluo; Eschbacher, Jennifer M; Baxter, Leslie C; Gaw, Nathan; Ranjbar, Sara; Plasencia, Jonathan; Dueck, Amylou C; Peng, Sen; Smith, Kris A; Nakaji, Peter; Karis, John P; Quarles, C Chad; Wu, Teresa; Loftus, Joseph C; Jenkins, Robert B; Sicotte, Hugues; Kollmeyer, Thomas M; O'Neill, Brian P; Elmquist, William; Hoxworth, Joseph M; Frakes, David; Sarkaria, Jann; Swanson, Kristin R; Tran, Nhan L; Li, Jing; Mitchell, J Ross
2017-01-01
Glioblastoma (GBM) exhibits profound intratumoral genetic heterogeneity. Each tumor comprises multiple genetically distinct clonal populations with different therapeutic sensitivities. This has implications for targeted therapy and genetically informed paradigms. Contrast-enhanced (CE)-MRI and conventional sampling techniques have failed to resolve this heterogeneity, particularly for nonenhancing tumor populations. This study explores the feasibility of using multiparametric MRI and texture analysis to characterize regional genetic heterogeneity throughout MRI-enhancing and nonenhancing tumor segments. We collected multiple image-guided biopsies from primary GBM patients throughout regions of enhancement (ENH) and nonenhancing parenchyma (so called brain-around-tumor, [BAT]). For each biopsy, we analyzed DNA copy number variants for core GBM driver genes reported by The Cancer Genome Atlas. We co-registered biopsy locations with MRI and texture maps to correlate regional genetic status with spatially matched imaging measurements. We also built multivariate predictive decision-tree models for each GBM driver gene and validated accuracies using leave-one-out-cross-validation (LOOCV). We collected 48 biopsies (13 tumors) and identified significant imaging correlations (univariate analysis) for 6 driver genes: EGFR, PDGFRA, PTEN, CDKN2A, RB1, and TP53. Predictive model accuracies (on LOOCV) varied by driver gene of interest. Highest accuracies were observed for PDGFRA (77.1%), EGFR (75%), CDKN2A (87.5%), and RB1 (87.5%), while lowest accuracy was observed in TP53 (37.5%). Models for 4 driver genes (EGFR, RB1, CDKN2A, and PTEN) showed higher accuracy in BAT samples (n = 16) compared with those from ENH segments (n = 32). MRI and texture analysis can help characterize regional genetic heterogeneity, which offers potential diagnostic value under the paradigm of individualized oncology. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Neuro-Oncology. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Ramachandran, Sohini; Deshpande, Omkar; Roseman, Charles C.; Rosenberg, Noah A.; Feldman, Marcus W.; Cavalli-Sforza, L. Luca
2005-01-01
Equilibrium models of isolation by distance predict an increase in genetic differentiation with geographic distance. Here we find a linear relationship between genetic and geographic distance in a worldwide sample of human populations, with major deviations from the fitted line explicable by admixture or extreme isolation. A close relationship is shown to exist between the correlation of geographic distance and genetic differentiation (as measured by FST) and the geographic pattern of heterozygosity across populations. Considering a worldwide set of geographic locations as possible sources of the human expansion, we find that heterozygosities in the globally distributed populations of the data set are best explained by an expansion originating in Africa and that no geographic origin outside of Africa accounts as well for the observed patterns of genetic diversity. Although the relationship between FST and geographic distance has been interpreted in the past as the result of an equilibrium model of drift and dispersal, simulation shows that the geographic pattern of heterozygosities in this data set is consistent with a model of a serial founder effect starting at a single origin. Given this serial-founder scenario, the relationship between genetic and geographic distance allows us to derive bounds for the effects of drift and natural selection on human genetic variation. PMID:16243969
Meseret, S.; Tamir, B.; Gebreyohannes, G.; Lidauer, M.; Negussie, E.
2015-01-01
The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations. PMID:26194217
NASA Astrophysics Data System (ADS)
Haghighattalab, Atena
Wheat breeders are in a race for genetic gain to secure the future nutritional needs of a growing population. Multiple barriers exist in the acceleration of crop improvement. Emerging technologies are reducing these obstacles. Advances in genotyping technologies have significantly decreased the cost of characterizing the genetic make-up of candidate breeding lines. However, this is just part of the equation. Field-based phenotyping informs a breeder's decision as to which lines move forward in the breeding cycle. This has long been the most expensive and time-consuming, though most critical, aspect of breeding. The grand challenge remains in connecting genetic variants to observed phenotypes followed by predicting phenotypes based on the genetic composition of lines or cultivars. In this context, the current study was undertaken to investigate the utility of UAS in assessment field trials in wheat breeding programs. The major objective was to integrate remotely sensed data with geospatial analysis for high throughput phenotyping of large wheat breeding nurseries. The initial step was to develop and validate a semi-automated high-throughput phenotyping pipeline using a low-cost UAS and NIR camera, image processing, and radiometric calibration to build orthomosaic imagery and 3D models. The relationship between plot-level data (vegetation indices and height) extracted from UAS imagery and manual measurements were examined and found to have a high correlation. Data derived from UAS imagery performed as well as manual measurements while exponentially increasing the amount of data available. The high-resolution, high-temporal HTP data extracted from this pipeline offered the opportunity to develop a within season grain yield prediction model. Due to the variety in genotypes and environmental conditions, breeding trials are inherently spatial in nature and vary non-randomly across the field. This makes geographically weighted regression models a good choice as a geospatial prediction model. Finally, with the addition of georeferenced and spatial data integral in HTP and imagery, we were able to reduce the environmental effect from the data and increase the accuracy of UAS plot-level data. The models developed through this research, when combined with genotyping technologies, increase the volume, accuracy, and reliability of phenotypic data to better inform breeder selections. This increased accuracy with evaluating and predicting grain yield will help breeders to rapidly identify and advance the most promising candidate wheat varieties.
Process-time Optimization of Vacuum Degassing Using a Genetic Alloy Design Approach
Dilner, David; Lu, Qi; Mao, Huahai; Xu, Wei; van der Zwaag, Sybrand; Selleby, Malin
2014-01-01
This paper demonstrates the use of a new model consisting of a genetic algorithm in combination with thermodynamic calculations and analytical process models to minimize the processing time during a vacuum degassing treatment of liquid steel. The model sets multiple simultaneous targets for final S, N, O, Si and Al levels and uses the total slag mass, the slag composition, the steel composition and the start temperature as optimization variables. The predicted optimal conditions agree well with industrial practice. For those conditions leading to the shortest process time the target compositions for S, N and O are reached almost simultaneously. PMID:28788286
A discrete mathematical model applied to genetic regulation and metabolic networks.
Asenjo, A J; Ramirez, P; Rapaport, I; Aracena, J; Goles, E; Andrews, B A
2007-03-01
This paper describes the use of a discrete mathematical model to represent the basic mechanisms of regulation of the bacteria E. coli in batch fermentation. The specific phenomena studied were the changes in metabolism and genetic regulation when the bacteria use three different carbon substrates (glucose, glycerol, and acetate). The model correctly predicts the behavior of E. coli vis-à-vis substrate mixtures. In a mixture of glucose, glycerol, and acetate, it prefers glucose, then glycerol, and finally acetate. The model included 67 nodes; 28 were genes, 20 enzymes, and 19 regulators/biochemical compounds. The model represents both the genetic regulation and metabolic networks in an inrtegrated form, which is how they function biologically. This is one of the first attempts to include both of these networks in one model. Previously, discrete mathematical models were used only to describe genetic regulation networks. The study of the network dynamics generated 8 (2(3)) fixed points, one for each nutrient configuration (substrate mixture) in the medium. The fixed points of the discrete model reflect the phenotypes described. Gene expression and the patterns of the metabolic fluxes generated are described accurately. The activation of the gene regulation network depends basically on the presence of glucose and glycerol. The model predicts the behavior when mixed carbon sources are utilized as well as when there is no carbon source present. Fictitious jokers (Joker1, Joker2, and Repressor SdhC) had to be created to control 12 genes whose regulation mechanism is unknown, since glycerol and glucose do not act directly on the genes. The approach presented in this paper is particularly useful to investigate potential unknown gene regulation mechanisms; such a novel approach can also be used to describe other gene regulation situations such as the comparison between non-recombinant and recombinant yeast strain, producing recombinant proteins, presently under investigation in our group.
Strategies for Selecting Crosses Using Genomic Prediction in Two Wheat Breeding Programs.
Lado, Bettina; Battenfield, Sarah; Guzmán, Carlos; Quincke, Martín; Singh, Ravi P; Dreisigacker, Susanne; Peña, R Javier; Fritz, Allan; Silva, Paula; Poland, Jesse; Gutiérrez, Lucía
2017-07-01
The single most important decision in plant breeding programs is the selection of appropriate crosses. The ideal cross would provide superior predicted progeny performance and enough diversity to maintain genetic gain. The aim of this study was to compare the best crosses predicted using combinations of mid-parent value and variance prediction accounting for linkage disequilibrium (V) or assuming linkage equilibrium (V). After predicting the mean and the variance of each cross, we selected crosses based on mid-parent value, the top 10% of the progeny, and weighted mean and variance within progenies for grain yield, grain protein content, mixing time, and loaf volume in two applied wheat ( L.) breeding programs: Instituto Nacional de Investigación Agropecuaria (INIA) Uruguay and CIMMYT Mexico. Although the variance of the progeny is important to increase the chances of finding superior individuals from transgressive segregation, we observed that the mid-parent values of the crosses drove the genetic gain but the variance of the progeny had a small impact on genetic gain for grain yield. However, the relative importance of the variance of the progeny was larger for quality traits. Overall, the genomic resources and the statistical models are now available to plant breeders to predict both the performance of breeding lines per se as well as the value of progeny from any potential crosses. Copyright © 2017 Crop Science Society of America.
Genetic determination of height-mediated mate choice.
Tenesa, Albert; Rawlik, Konrad; Navarro, Pau; Canela-Xandri, Oriol
2016-01-19
Numerous studies have reported positive correlations among couples for height. This suggests that humans find individuals of similar height attractive. However, the answer to whether the choice of a mate with a similar phenotype is genetically or environmentally determined has been elusive. Here we provide an estimate of the genetic contribution to height choice in mates in 13,068 genotyped couples. Using a mixed linear model we show that 4.1% of the variation in the mate height choice is determined by a person's own genotype, as expected in a model where one's height determines the choice of mate height. Furthermore, the genotype of an individual predicts their partners' height in an independent dataset of 15,437 individuals with 13% accuracy, which is 64% of the theoretical maximum achievable with a heritability of 0.041. Theoretical predictions suggest that approximately 5% of the heritability of height is due to the positive covariance between allelic effects at different loci, which is caused by assortative mating. Hence, the coupling of alleles with similar effects could substantially contribute to the missing heritability of height. These estimates provide new insight into the mechanisms that govern mate choice in humans and warrant the search for the genetic causes of choice of mate height. They have important methodological implications and contribute to the missing heritability debate.
Centromere-associated meiotic drive and female fitness variation in Mimulus.
Fishman, Lila; Kelly, John K
2015-05-01
Female meiotic drive, in which chromosomal variants preferentially segregate to the egg pole during asymmetric female meiosis, is a theoretically pervasive but still mysterious form of selfish evolution. Like other selfish genetic elements, driving chromosomes may be maintained as balanced polymorphisms by pleiotropic or linked fitness costs. A centromere-associated driver (D) with a ∼58:42 female-specific transmission advantage occurs at intermediate frequency (32-40%) in the Iron Mountain population of the yellow monkeyflower, Mimulus guttatus. Previously determined male fertility costs are sufficient to prevent the fixation of D, but predict a higher equilibrium frequency. To better understand the dynamics and effects of D, we developed a new population genetic model and measured genotype-specific lifetime female fitness in the wild. In three of four years, and across all years, D imposed significant recessive seedset costs, most likely due to hitchhiking by deleterious mutations. With both male and female costs as measured, and 58:42 drive, our model predicts an equilibrium frequency of D (38%) very close to the observed value. Thus, D represents a rare selfish genetic element whose local population genetic dynamics have been fully parameterized, and the observation of equilibrium sets the stage for investigations of coevolution with suppressors. © 2015 The Author(s).
[Diabetes and predictive medicine--parallax of the present time].
Rybka, J
2010-04-01
Predictive genetics uses genetic testing to estimate the risk in asymptomatic persons. Since in the case of multifactorial diseases predictive genetic analysis deals with findings which allow wider interpretation, it has a higher predictive value in expressly qualified diseases (monogenous) with high penetration compared to multifactorial (polygenous) diseases with high participation of environmental factors. In most "civilisation" (multifactorial) diseases including diabetes, heredity and environmental factors do not play two separate, independent roles. Instead, their interactions play a principal role. The new classification of diabetes is based on the implementation of not only ethiopathogenetic, but also genetic research. Diabetes mellitus type 1 (DM1T) is a polygenous multifactorial disease with the genetic component carrying about one half of the risk, the non-genetic one the other half. The study of the autoimmune nature of DM1T in connection with genetic analysis is going to bring about new insights in DM1T prediction. The author presents new pieces of knowledge on molecular genetics concerning certain specific types of diabetes. Issues relating to heredity in diabetes mellitus type 2 (DM2T) are even more complex. The disease has a polygenous nature, and the phenotype of a patient with DM2T, in addition to environmental factors, involves at least three, perhaps even tens of different genetic variations. At present, results at the genom-wide level appear to be most promising. The current concept of prediabetes is a realistic foundation for our prediction and prevention of DM2T. A multifactorial, multimarker approach based on our understanding of new pathophysiological factors of DM2T, tries to outline a "map" of prediabetes physiology, and if these tests are combined with sophisticated methods of genetic forecasting of DM2T, this may represent a significant step in our methodology of diabetes prediction. So far however, predictive genetics is limited by the interpretation of genetic predisposition and individualisation of the level of risk. There is no doubt that interpretation calls for co-operation with clinicians, while results of genetic analyses should presently be not uncritically overestimated. Predictive medicine, however, unquestionably fulfills the preventive focus of modern medicine, and genetic analysis is a perspective diagnostic method.
Dogan, Meeshanthini V; Grumbach, Isabella M; Michaelson, Jacob J; Philibert, Robert A
2018-01-01
An improved method for detecting coronary heart disease (CHD) could have substantial clinical impact. Building on the idea that systemic effects of CHD risk factors are a conglomeration of genetic and environmental factors, we use machine learning techniques and integrate genetic, epigenetic and phenotype data from the Framingham Heart Study to build and test a Random Forest classification model for symptomatic CHD. Our classifier was trained on n = 1,545 individuals and consisted of four DNA methylation sites, two SNPs, age and gender. The methylation sites and SNPs were selected during the training phase. The final trained model was then tested on n = 142 individuals. The test data comprised of individuals removed based on relatedness to those in the training dataset. This integrated classifier was capable of classifying symptomatic CHD status of those in the test set with an accuracy, sensitivity and specificity of 78%, 0.75 and 0.80, respectively. In contrast, a model using only conventional CHD risk factors as predictors had an accuracy and sensitivity of only 65% and 0.42, respectively, but with a specificity of 0.89 in the test set. Regression analyses of the methylation signatures illustrate our ability to map these signatures to known risk factors in CHD pathogenesis. These results demonstrate the capability of an integrated approach to effectively model symptomatic CHD status. These results also suggest that future studies of biomaterial collected from longitudinally informative cohorts that are specifically characterized for cardiac disease at follow-up could lead to the introduction of sensitive, readily employable integrated genetic-epigenetic algorithms for predicting onset of future symptomatic CHD.
Bommert, Andrea; Rahnenführer, Jörg; Lang, Michel
2017-01-01
Finding a good predictive model for a high-dimensional data set can be challenging. For genetic data, it is not only important to find a model with high predictive accuracy, but it is also important that this model uses only few features and that the selection of these features is stable. This is because, in bioinformatics, the models are used not only for prediction but also for drawing biological conclusions which makes the interpretability and reliability of the model crucial. We suggest using three target criteria when fitting a predictive model to a high-dimensional data set: the classification accuracy, the stability of the feature selection, and the number of chosen features. As it is unclear which measure is best for evaluating the stability, we first compare a variety of stability measures. We conclude that the Pearson correlation has the best theoretical and empirical properties. Also, we find that for the stability assessment behaviour it is most important that a measure contains a correction for chance or large numbers of chosen features. Then, we analyse Pareto fronts and conclude that it is possible to find models with a stable selection of few features without losing much predictive accuracy.
Weigel, K A; VanRaden, P M; Norman, H D; Grosu, H
2017-12-01
In the early 1900s, breed society herdbooks had been established and milk-recording programs were in their infancy. Farmers wanted to improve the productivity of their cattle, but the foundations of population genetics, quantitative genetics, and animal breeding had not been laid. Early animal breeders struggled to identify genetically superior families using performance records that were influenced by local environmental conditions and herd-specific management practices. Daughter-dam comparisons were used for more than 30 yr and, although genetic progress was minimal, the attention given to performance recording, genetic theory, and statistical methods paid off in future years. Contemporary (herdmate) comparison methods allowed more accurate accounting for environmental factors and genetic progress began to accelerate when these methods were coupled with artificial insemination and progeny testing. Advances in computing facilitated the implementation of mixed linear models that used pedigree and performance data optimally and enabled accurate selection decisions. Sequencing of the bovine genome led to a revolution in dairy cattle breeding, and the pace of scientific discovery and genetic progress accelerated rapidly. Pedigree-based models have given way to whole-genome prediction, and Bayesian regression models and machine learning algorithms have joined mixed linear models in the toolbox of modern animal breeders. Future developments will likely include elucidation of the mechanisms of genetic inheritance and epigenetic modification in key biological pathways, and genomic data will be used with data from on-farm sensors to facilitate precision management on modern dairy farms. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Huynh-Tran, V H; Gilbert, H; David, I
2017-11-01
The objective of the present study was to compare a random regression model, usually used in genetic analyses of longitudinal data, with the structured antedependence (SAD) model to study the longitudinal feed conversion ratio (FCR) in growing Large White pigs and to propose criteria for animal selection when used for genetic evaluation. The study was based on data from 11,790 weekly FCR measures collected on 1,186 Large White male growing pigs. Random regression (RR) using orthogonal polynomial Legendre and SAD models was used to estimate genetic parameters and predict FCR-based EBV for each of the 10 wk of the test. The results demonstrated that the best SAD model (1 order of antedependence of degree 2 and a polynomial of degree 2 for the innovation variance for the genetic and permanent environmental effects, i.e., 12 parameters) provided a better fit for the data than RR with a quadratic function for the genetic and permanent environmental effects (13 parameters), with Bayesian information criteria values of -10,060 and -9,838, respectively. Heritabilities with the SAD model were higher than those of RR over the first 7 wk of the test. Genetic correlations between weeks were higher than 0.68 for short intervals between weeks and decreased to 0.08 for the SAD model and -0.39 for RR for the longest intervals. These differences in genetic parameters showed that, contrary to the RR approach, the SAD model does not suffer from border effect problems and can handle genetic correlations that tend to 0. Summarized breeding values were proposed for each approach as linear combinations of the individual weekly EBV weighted by the coefficients of the first or second eigenvector computed from the genetic covariance matrix of the additive genetic effects. These summarized breeding values isolated EBV trajectories over time, capturing either the average general value or the slope of the trajectory. Finally, applying the SAD model over a reduced period of time suggested that similar selection choices would result from the use of the records from the first 8 wk of the test. To conclude, the SAD model performed well for the genetic evaluation of longitudinal phenotypes.
Ritchie, Marylyn D; White, Bill C; Parker, Joel S; Hahn, Lance W; Moore, Jason H
2003-01-01
Background Appropriate definition of neural network architecture prior to data analysis is crucial for successful data mining. This can be challenging when the underlying model of the data is unknown. The goal of this study was to determine whether optimizing neural network architecture using genetic programming as a machine learning strategy would improve the ability of neural networks to model and detect nonlinear interactions among genes in studies of common human diseases. Results Using simulated data, we show that a genetic programming optimized neural network approach is able to model gene-gene interactions as well as a traditional back propagation neural network. Furthermore, the genetic programming optimized neural network is better than the traditional back propagation neural network approach in terms of predictive ability and power to detect gene-gene interactions when non-functional polymorphisms are present. Conclusion This study suggests that a machine learning strategy for optimizing neural network architecture may be preferable to traditional trial-and-error approaches for the identification and characterization of gene-gene interactions in common, complex human diseases. PMID:12846935
Predicting prolonged dose titration in patients starting warfarin.
Finkelman, Brian S; French, Benjamin; Bershaw, Luanne; Brensinger, Colleen M; Streiff, Michael B; Epstein, Andrew E; Kimmel, Stephen E
2016-11-01
Patients initiating warfarin therapy generally experience a dose-titration period of weeks to months, during which time they are at higher risk of both thromboembolic and bleeding events. Accurate prediction of prolonged dose titration could help clinicians determine which patients might be better treated by alternative anticoagulants that, while more costly, do not require dose titration. A prediction model was derived in a prospective cohort of patients starting warfarin (n = 390), using Cox regression, and validated in an external cohort (n = 663) from a later time period. Prolonged dose titration was defined as a dose-titration period >12 weeks. Predictor variables were selected using a modified best subsets algorithm, using leave-one-out cross-validation to reduce overfitting. The final model had five variables: warfarin indication, insurance status, number of doctor's visits in the previous year, smoking status, and heart failure. The area under the ROC curve (AUC) in the derivation cohort was 0.66 (95%CI 0.60, 0.74) using leave-one-out cross-validation, but only 0.59 (95%CI 0.54, 0.64) in the external validation cohort, and varied across clinics. Including genetic factors in the model did not improve the area under the ROC curve (0.59; 95%CI 0.54, 0.65). Relative utility curves indicated that the model was unlikely to provide a clinically meaningful benefit compared with no prediction. Our results suggest that prolonged dose titration cannot be accurately predicted in warfarin patients using traditional clinical, social, and genetic predictors, and that accurate prediction will need to accommodate heterogeneities across clinical sites and over time. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Will Big Data Close the Missing Heritability Gap?
Kim, Hwasoon; Grueneberg, Alexander; Vazquez, Ana I; Hsu, Stephen; de Los Campos, Gustavo
2017-11-01
Despite the important discoveries reported by genome-wide association (GWA) studies, for most traits and diseases the prediction R-squared (R-sq.) achieved with genetic scores remains considerably lower than the trait heritability. Modern biobanks will soon deliver unprecedentedly large biomedical data sets: Will the advent of big data close the gap between the trait heritability and the proportion of variance that can be explained by a genomic predictor? We addressed this question using Bayesian methods and a data analysis approach that produces a surface response relating prediction R-sq. with sample size and model complexity ( e.g. , number of SNPs). We applied the methodology to data from the interim release of the UK Biobank. Focusing on human height as a model trait and using 80,000 records for model training, we achieved a prediction R-sq. in testing ( n = 22,221) of 0.24 (95% C.I.: 0.23-0.25). Our estimates show that prediction R-sq. increases with sample size, reaching an estimated plateau at values that ranged from 0.1 to 0.37 for models using 500 and 50,000 (GWA-selected) SNPs, respectively. Soon much larger data sets will become available. Using the estimated surface response, we forecast that larger sample sizes will lead to further improvements in prediction R-sq. We conclude that big data will lead to a substantial reduction of the gap between trait heritability and the proportion of interindividual differences that can be explained with a genomic predictor. However, even with the power of big data, for complex traits we anticipate that the gap between prediction R-sq. and trait heritability will not be fully closed. Copyright © 2017 by the Genetics Society of America.
Cooperation in the dark: signalling and collective action in quorum-sensing bacteria.
Brown, S P; Johnstone, R A
2001-05-07
The study of quorum-sensing bacteria has revealed a widespread mechanism of coordinating bacterial gene expression with cell density. By monitoring a constitutively produced signal molecule, individual bacteria can limit their expression of group-beneficial phenotypes to cell densities that guarantee an effective group outcome. In this paper, we attempt to move away from a commonly expressed view that these impressive feats of coordination are examples of multicellularity in prokaryotic populations. Here, we look more closely at the individual conflict underlying this cooperation, illustrating that, even under significant levels of genetic conflict, signalling and resultant cooperative behaviour can stably exist. A predictive two-trait model of signal strength and of the extent of cooperation is developed as a function of relatedness (reflecting multiplicity of infection) and basic population demographic parameters. The model predicts that the strength of quorum signalling will increase as conflict (multiplicity of infecting strains) increases, as individuals attempt to coax more cooperative contributions from their competitors, leading to a devaluation of the signal as an indicator of density. Conversely, as genetic conflict increases, the model predicts that the threshold density for cooperation will increase and the subsequent strength of group cooperation will be depressed.
Multi-objective optimization to predict muscle tensions in a pinch function using genetic algorithm
NASA Astrophysics Data System (ADS)
Bensghaier, Amani; Romdhane, Lotfi; Benouezdou, Fethi
2012-03-01
This work is focused on the determination of the thumb and the index finger muscle tensions in a tip pinch task. A biomechanical model of the musculoskeletal system of the thumb and the index finger is developed. Due to the assumptions made in carrying out the biomechanical model, the formulated force analysis problem is indeterminate leading to an infinite number of solutions. Thus, constrained single and multi-objective optimization methodologies are used in order to explore the muscular redundancy and to predict optimal muscle tension distributions. Various models are investigated using the optimization process. The basic criteria to minimize are the sum of the muscle stresses, the sum of individual muscle tensions and the maximum muscle stress. The multi-objective optimization is solved using a Pareto genetic algorithm to obtain non-dominated solutions, defined as the set of optimal distributions of muscle tensions. The results show the advantage of the multi-objective formulation over the single objective one. The obtained solutions are compared to those available in the literature demonstrating the effectiveness of our approach in the analysis of the fingers musculoskeletal systems when predicting muscle tensions.
Role of dopamine D2 receptors in human reinforcement learning.
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-09-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well.
Role of Dopamine D2 Receptors in Human Reinforcement Learning
Eisenegger, Christoph; Naef, Michael; Linssen, Anke; Clark, Luke; Gandamaneni, Praveen K; Müller, Ulrich; Robbins, Trevor W
2014-01-01
Influential neurocomputational models emphasize dopamine (DA) as an electrophysiological and neurochemical correlate of reinforcement learning. However, evidence of a specific causal role of DA receptors in learning has been less forthcoming, especially in humans. Here we combine, in a between-subjects design, administration of a high dose of the selective DA D2/3-receptor antagonist sulpiride with genetic analysis of the DA D2 receptor in a behavioral study of reinforcement learning in a sample of 78 healthy male volunteers. In contrast to predictions of prevailing models emphasizing DA's pivotal role in learning via prediction errors, we found that sulpiride did not disrupt learning, but rather induced profound impairments in choice performance. The disruption was selective for stimuli indicating reward, whereas loss avoidance performance was unaffected. Effects were driven by volunteers with higher serum levels of the drug, and in those with genetically determined lower density of striatal DA D2 receptors. This is the clearest demonstration to date for a causal modulatory role of the DA D2 receptor in choice performance that might be distinct from learning. Our findings challenge current reward prediction error models of reinforcement learning, and suggest that classical animal models emphasizing a role of postsynaptic DA D2 receptors in motivational aspects of reinforcement learning may apply to humans as well. PMID:24713613
Sasaki, Satoshi; Comber, Alexis J; Suzuki, Hiroshi; Brunsdon, Chris
2010-01-28
Ambulance response time is a crucial factor in patient survival. The number of emergency cases (EMS cases) requiring an ambulance is increasing due to changes in population demographics. This is decreasing ambulance response times to the emergency scene. This paper predicts EMS cases for 5-year intervals from 2020, to 2050 by correlating current EMS cases with demographic factors at the level of the census area and predicted population changes. It then applies a modified grouping genetic algorithm to compare current and future optimal locations and numbers of ambulances. Sets of potential locations were evaluated in terms of the (current and predicted) EMS case distances to those locations. Future EMS demands were predicted to increase by 2030 using the model (R2 = 0.71). The optimal locations of ambulances based on future EMS cases were compared with current locations and with optimal locations modelled on current EMS case data. Optimising the location of ambulance stations locations reduced the average response times by 57 seconds. Current and predicted future EMS demand at modelled locations were calculated and compared. The reallocation of ambulances to optimal locations improved response times and could contribute to higher survival rates from life-threatening medical events. Modelling EMS case 'demand' over census areas allows the data to be correlated to population characteristics and optimal 'supply' locations to be identified. Comparing current and future optimal scenarios allows more nuanced planning decisions to be made. This is a generic methodology that could be used to provide evidence in support of public health planning and decision making.
Genotype * environment interaction: a case study for Douglas-fir in western Oregon.
Robert K. Campbell
1992-01-01
Unrecognized genotype x environment interactions (g,e) can bias genetic-gain predictions and models for predicting growth dynamics or species perturbations by global climate change. This study tested six sets of families in 10 plantation sites in a 78-thousand-hectare breeding zone. Plantation differences accounted for 71 percent of sums of squares (15-year heights),...
The fatigue life prediction of aluminium alloy using genetic algorithm and neural network
NASA Astrophysics Data System (ADS)
Susmikanti, Mike
2013-09-01
The behavior of the fatigue life of the industrial materials is very important. In many cases, the material with experiencing fatigue life cannot be avoided, however, there are many ways to control their behavior. Many investigations of the fatigue life phenomena of alloys have been done, but it is high cost and times consuming computation. This paper report the modeling and simulation approaches to predict the fatigue life behavior of Aluminum Alloys and resolves some problems of computation. First, the simulation using genetic algorithm was utilized to optimize the load to obtain the stress values. These results can be used to provide N-cycle fatigue life of the material. Furthermore, the experimental data was applied as input data in the neural network learning, while the samples data were applied for testing of the training data. Finally, the multilayer perceptron algorithm is applied to predict whether the given data sets in accordance with the fatigue life of the alloy. To achieve rapid convergence, the Levenberg-Marquardt algorithm was also employed. The simulations results shows that the fatigue behaviors of aluminum under pressure can be predicted. In addition, implementation of neural networks successfully identified a model for material fatigue life.
Almendro, Vanessa; Cheng, Yu-Kang; Randles, Amanda; Itzkovitz, Shalev; Marusyk, Andriy; Ametller, Elisabet; Gonzalez-Farre, Xavier; Muñoz, Montse; Russnes, Hege G; Helland, Aslaug; Rye, Inga H; Borresen-Dale, Anne-Lise; Maruyama, Reo; van Oudenaarden, Alexander; Dowsett, Mitchell; Jones, Robin L; Reis-Filho, Jorge; Gascon, Pere; Gönen, Mithat; Michor, Franziska; Polyak, Kornelia
2014-02-13
Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and posttreatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Almendro, Vanessa; Cheng, Yu-Kang; Randles, Amanda; Itzkovitz, Shalev; Marusyk, Andriy; Ametller, Elisabet; Gonzalez-Farre, Xavier; Muñoz, Montse; Russnes, Hege G.; Helland, Åslaug; Rye, Inga H.; Borresen-Dale, Anne-Lise; Maruyama, Reo; van Oudenaarden, Alexander; Dowsett, Mitchell; Jones, Robin L.; Reis-Filho, Jorge; Gascon, Pere; Gönen, Mithat; Michor, Franziska; Polyak, Kornelia
2014-01-01
SUMMARY Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor subtype-specific and it did not change during treatment in tumors with partial or no response. However, lower pre-treatment genetic diversity was significantly associated with complete pathologic response. In contrast, phenotypic diversity was different between pre- and post-treatment samples. We also observed significant changes in the spatial distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution. PMID:24462293
Almendro, Vanessa; Cheng, Yu -Kang; Randles, Amanda; ...
2014-02-01
Cancer therapy exerts a strong selection pressure that shapes tumor evolution, yet our knowledge of how tumors change during treatment is limited. Here, we report the analysis of cellular heterogeneity for genetic and phenotypic features and their spatial distribution in breast tumors pre- and post-neoadjuvant chemotherapy. We found that intratumor genetic diversity was tumor-subtype specific, and it did not change during treatment in tumors with partial or no response. However, lower pretreatment genetic diversity was significantly associated with pathologic complete response. In contrast, phenotypic diversity was different between pre- and post-treatment samples. We also observed significant changes in the spatialmore » distribution of cells with distinct genetic and phenotypic features. We used these experimental data to develop a stochastic computational model to infer tumor growth patterns and evolutionary dynamics. Our results highlight the importance of integrated analysis of genotypes and phenotypes of single cells in intact tissues to predict tumor evolution.« less
Ebtehaj, Isa; Bonakdari, Hossein
2014-01-01
The existence of sediments in wastewater greatly affects the performance of the sewer and wastewater transmission systems. Increased sedimentation in wastewater collection systems causes problems such as reduced transmission capacity and early combined sewer overflow. The article reviews the performance of the genetic algorithm (GA) and imperialist competitive algorithm (ICA) in minimizing the target function (mean square error of observed and predicted Froude number). To study the impact of bed load transport parameters, using four non-dimensional groups, six different models have been presented. Moreover, the roulette wheel selection method is used to select the parents. The ICA with root mean square error (RMSE) = 0.007, mean absolute percentage error (MAPE) = 3.5% show better results than GA (RMSE = 0.007, MAPE = 5.6%) for the selected model. All six models return better results than the GA. Also, the results of these two algorithms were compared with multi-layer perceptron and existing equations.
Genetic risk scores and family history as predictors of schizophrenia in Nordic registers.
Lu, Y; Pouget, J G; Andreassen, O A; Djurovic, S; Esko, T; Hultman, C M; Metspalu, A; Milani, L; Werge, T; Sullivan, P F
2018-05-01
Family history is a long-standing and readily obtainable risk factor for schizophrenia (SCZ). Low-cost genotyping technologies have enabled large genetic studies of SCZ, and the results suggest the utility of genetic risk scores (GRS, direct assessments of inherited common variant risk). Few studies have evaluated family history and GRS simultaneously to ask whether one can explain away the other. We studied 5959 SCZ cases and 8717 controls from four Nordic countries. All subjects had family history data from national registers and genome-wide genotypes that were processed through the quality control procedures used by the Psychiatric Genomics Consortium. Using external training data, GRS were estimated for SCZ, bipolar disorder (BIP), major depression, autism, educational attainment, and body mass index. Multivariable modeling was used to estimate effect sizes. Using harmonized genomic and national register data from Denmark, Estonia, Norway, and Sweden, we confirmed that family history of SCZ and GRS for SCZ and BIP were risk factors for SCZ. In a joint model, the effects of GRS for SCZ and BIP were essentially unchanged, and the effect of family history was attenuated but remained significant. The predictive capacity of a model including GRS and family history neared the minimum for clinical utility. Combining national register data with measured genetic risk factors represents an important investigative approach for psychotic disorders. Our findings suggest the potential clinical utility of combining GRS and family history for early prediction and diagnostic improvements.
Widespread covariation of early environmental exposures and trait-associated polygenic variation.
Krapohl, E; Hannigan, L J; Pingault, J-B; Patel, H; Kadeva, N; Curtis, C; Breen, G; Newhouse, S J; Eley, T C; O'Reilly, P F; Plomin, R
2017-10-31
Although gene-environment correlation is recognized and investigated by family studies and recently by SNP-heritability studies, the possibility that genetic effects on traits capture environmental risk factors or protective factors has been neglected by polygenic prediction models. We investigated covariation between trait-associated polygenic variation identified by genome-wide association studies (GWASs) and specific environmental exposures, controlling for overall genetic relatedness using a genomic relatedness matrix restricted maximum-likelihood model. In a UK-representative sample ( n = 6,710), we find widespread covariation between offspring trait-associated polygenic variation and parental behavior and characteristics relevant to children's developmental outcomes-independently of population stratification. For instance, offspring genetic risk for schizophrenia was associated with paternal age ( R 2 = 0.002; P = 1e-04), and offspring education-associated variation was associated with variance in breastfeeding ( R 2 = 0.021; P = 7e-30), maternal smoking during pregnancy ( R 2 = 0.008; P = 5e-13), parental smacking ( R 2 = 0.01; P = 4e-15), household income ( R 2 = 0.032; P = 1e-22), watching television ( R 2 = 0.034; P = 5e-47), and maternal education ( R 2 = 0.065; P = 3e-96). Education-associated polygenic variation also captured covariation between environmental exposures and children's inattention/hyperactivity, conduct problems, and educational achievement. The finding that genetic variation identified by trait GWASs partially captures environmental risk factors or protective factors has direct implications for risk prediction models and the interpretation of GWAS findings.
Cardoso, F F; Tempelman, R J
2012-07-01
The objectives of this work were to assess alternative linear reaction norm (RN) models for genetic evaluation of Angus cattle in Brazil. That is, we investigated the interaction between genotypes and continuous descriptors of the environmental variation to examine evidence of genotype by environment interaction (G×E) in post-weaning BW gain (PWG) and to compare the environmental sensitivity of national and imported Angus sires. Data were collected by the Brazilian Angus Improvement Program from 1974 to 2005 and consisted of 63,098 records and a pedigree file with 95,896 animals. Six models were implemented using Bayesian inference and compared using the Deviance Information Criterion (DIC). The simplest model was M(1), a traditional animal model, which showed the largest DIC and hence the poorest fit when compared with the 4 alternative RN specifications accounting for G×E. In M(2), a 2-step procedure was implemented using the contemporary group posterior means of M(1) as the environmental gradient, ranging from -92.6 to +265.5 kg. Moreover, the benefits of jointly estimating all parameters in a 1-step approach were demonstrated by M(3). Additionally, we extended M(3) to allow for residual heteroskedasticity using an exponential function (M(4)) and the best fitting (smallest DIC) environmental classification model (M(5)) specification. Finally, M(6) added just heteroskedastic residual variance to M(1). Heritabilities were less at harsh environments and increased with the improvement of production conditions for all RN models. Rank correlations among genetic merit predictions obtained by M(1) and by the best fitting RN models M(3) (homoskedastic) and M(5) (heteroskedastic) at different environmental levels ranged from 0.79 and 0.81, suggesting biological importance of G×E in Brazilian Angus PWG. These results suggest that selection progress could be optimized by adopting environment-specific genetic merit predictions. The PWG environmental sensitivity of imported North American origin bulls (0.046 ± 0.009) was significantly larger (P < 0.05) than that of local sires (0.012 ± 0.013). Moreover, PWG of progeny of imported sires exceeded that of native sires in medium and superior production levels. On the other hand, Angus cattle locally selected in Brazil tended to be more robust to environmental changes and hence be more suitable when production environments for potential progeny is uncertain.
Modeling Self-Healing of Concrete Using Hybrid Genetic Algorithm–Artificial Neural Network
Ramadan Suleiman, Ahmed; Nehdi, Moncef L.
2017-01-01
This paper presents an approach to predicting the intrinsic self-healing in concrete using a hybrid genetic algorithm–artificial neural network (GA–ANN). A genetic algorithm was implemented in the network as a stochastic optimizing tool for the initial optimal weights and biases. This approach can assist the network in achieving a global optimum and avoid the possibility of the network getting trapped at local optima. The proposed model was trained and validated using an especially built database using various experimental studies retrieved from the open literature. The model inputs include the cement content, water-to-cement ratio (w/c), type and dosage of supplementary cementitious materials, bio-healing materials, and both expansive and crystalline additives. Self-healing indicated by means of crack width is the model output. The results showed that the proposed GA–ANN model is capable of capturing the complex effects of various self-healing agents (e.g., biochemical material, silica-based additive, expansive and crystalline components) on the self-healing performance in cement-based materials. PMID:28772495
Additive Genetic Variability and the Bayesian Alphabet
Gianola, Daniel; de los Campos, Gustavo; Hill, William G.; Manfredi, Eduardo; Fernando, Rohan
2009-01-01
The use of all available molecular markers in statistical models for prediction of quantitative traits has led to what could be termed a genomic-assisted selection paradigm in animal and plant breeding. This article provides a critical review of some theoretical and statistical concepts in the context of genomic-assisted genetic evaluation of animals and crops. First, relationships between the (Bayesian) variance of marker effects in some regression models and additive genetic variance are examined under standard assumptions. Second, the connection between marker genotypes and resemblance between relatives is explored, and linkages between a marker-based model and the infinitesimal model are reviewed. Third, issues associated with the use of Bayesian models for marker-assisted selection, with a focus on the role of the priors, are examined from a theoretical angle. The sensitivity of a Bayesian specification that has been proposed (called “Bayes A”) with respect to priors is illustrated with a simulation. Methods that can solve potential shortcomings of some of these Bayesian regression procedures are discussed briefly. PMID:19620397
Model Organisms Facilitate Rare Disease Diagnosis and Therapeutic Research
Wangler, Michael F.; Yamamoto, Shinya; Chao, Hsiao-Tuan; Posey, Jennifer E.; Westerfield, Monte; Postlethwait, John; Hieter, Philip; Boycott, Kym M.; Campeau, Philippe M.; Bellen, Hugo J.
2017-01-01
Efforts to identify the genetic underpinnings of rare undiagnosed diseases increasingly involve the use of next-generation sequencing and comparative genomic hybridization methods. These efforts are limited by a lack of knowledge regarding gene function, and an inability to predict the impact of genetic variation on the encoded protein function. Diagnostic challenges posed by undiagnosed diseases have solutions in model organism research, which provides a wealth of detailed biological information. Model organism geneticists are by necessity experts in particular genes, gene families, specific organs, and biological functions. Here, we review the current state of research into undiagnosed diseases, highlighting large efforts in North America and internationally, including the Undiagnosed Diseases Network (UDN) (Supplemental Material, File S1) and UDN International (UDNI), the Centers for Mendelian Genomics (CMG), and the Canadian Rare Diseases Models and Mechanisms Network (RDMM). We discuss how merging human genetics with model organism research guides experimental studies to solve these medical mysteries, gain new insights into disease pathogenesis, and uncover new therapeutic strategies. PMID:28874452
Modeling Self-Healing of Concrete Using Hybrid Genetic Algorithm-Artificial Neural Network.
Ramadan Suleiman, Ahmed; Nehdi, Moncef L
2017-02-07
This paper presents an approach to predicting the intrinsic self-healing in concrete using a hybrid genetic algorithm-artificial neural network (GA-ANN). A genetic algorithm was implemented in the network as a stochastic optimizing tool for the initial optimal weights and biases. This approach can assist the network in achieving a global optimum and avoid the possibility of the network getting trapped at local optima. The proposed model was trained and validated using an especially built database using various experimental studies retrieved from the open literature. The model inputs include the cement content, water-to-cement ratio (w/c), type and dosage of supplementary cementitious materials, bio-healing materials, and both expansive and crystalline additives. Self-healing indicated by means of crack width is the model output. The results showed that the proposed GA-ANN model is capable of capturing the complex effects of various self-healing agents (e.g., biochemical material, silica-based additive, expansive and crystalline components) on the self-healing performance in cement-based materials.
Contributions of Genes and Environment to Developmental Change in Alcohol Use.
Long, E C; Verhulst, B; Aggen, S H; Kendler, K S; Gillespie, N A
2017-09-01
The precise nature of how genetic and environmental risk factors influence changes in alcohol use (AU) over time has not yet been investigated. Therefore, the aim of the present study is to examine the nature of longitudinal changes in these risk factors to AU from mid-adolescence through young adulthood. Using a large sample of male twins, we compared five developmental models that each makes different predictions regarding the longitudinal changes in genetic and environmental risks for AU. The best-fitting model indicated that genetic influences were consistent with a gradual growth in the liability to AU, whereas unique environmental risk factors were consistent with an accumulation of risks across time. These results imply that two distinct processes influence adolescent AU between the ages of 15-25. Genetic effects influence baseline levels of AU and rates of change across time, while unique environmental effects are more cumulative.
Hidalgo, A M; Bastiaansen, J W M; Lopes, M S; Veroneze, R; Groenen, M A M; de Koning, D-J
2015-07-01
Genomic selection is applied to dairy cattle breeding to improve the genetic progress of purebred (PB) animals, whereas in pigs and poultry the target is a crossbred (CB) animal for which a different strategy appears to be needed. The source of information used to estimate the breeding values, i.e., using phenotypes of CB or PB animals, may affect the accuracy of prediction. The objective of our study was to assess the direct genomic value (DGV) accuracy of CB and PB pigs using different sources of phenotypic information. Data used were from 3 populations: 2,078 Dutch Landrace-based, 2,301 Large White-based, and 497 crossbreds from an F1 cross between the 2 lines. Two female reproduction traits were analyzed: gestation length (GLE) and total number of piglets born (TNB). Phenotypes used in the analyses originated from offspring of genotyped individuals. Phenotypes collected on CB and PB animals were analyzed as separate traits using a single-trait model. Breeding values were estimated separately for each trait in a pedigree BLUP analysis and subsequently deregressed. Deregressed EBV for each trait originating from different sources (CB or PB offspring) were used to study the accuracy of genomic prediction. Accuracy of prediction was computed as the correlation between DGV and the DEBV of the validation population. Accuracy of prediction within PB populations ranged from 0.43 to 0.62 across GLE and TNB. Accuracies to predict genetic merit of CB animals with one PB population in the training set ranged from 0.12 to 0.28, with the exception of using the CB offspring phenotype of the Dutch Landrace that resulted in an accuracy estimate around 0 for both traits. Accuracies to predict genetic merit of CB animals with both parental PB populations in the training set ranged from 0.17 to 0.30. We conclude that prediction within population and trait had good predictive ability regardless of the trait being the PB or CB performance, whereas using PB population(s) to predict genetic merit of CB animals had zero to moderate predictive ability. We observed that the DGV accuracy of CB animals when training on PB data was greater than or equal to training on CB data. However, when results are corrected for the different levels of reliabilities in the PB and CB training data, we showed that training on CB data does outperform PB data for the prediction of CB genetic merit, indicating that more CB animals should be phenotyped to increase the reliability and, consequently, accuracy of DGV for CB genetic merit.
Fire alters patterns of genetic diversity among 3 lizard species in Florida Scrub habitat.
Schrey, Aaron W; Ashton, Kyle G; Heath, Stacy; McCoy, Earl D; Mushinsky, Henry R
2011-01-01
The Florida Sand Skink (Plestiodon reynoldsi), the Florida Scrub Lizard (Sceloporus woodi), and the Six-lined Racerunner (Aspidoscelis sexlineata) occur in the threatened and fire-maintained Florida scrub habitat. Fire may have different consequences to local genetic diversity of these species because they each have different microhabitat preference. We collected tissue samples of each species from 3 sites with different time-since-fire: Florida Sand Skink n = 73, Florida Scrub Lizard n = 70, and Six-lined Racerunner n = 66. We compared the effect of fire on genetic diversity at microsatellite loci for each species. We screened 8 loci for the Florida Sand Skink, 6 loci for the Florida Scrub Lizard, and 6 loci for the Six-lined Racerunner. We also tested 2 potential driving mechanisms for the observed change in genetic diversity, a metapopulation source/sink model and a local demographic model. Genetic diversity varied with fire history, and significant genetic differentiation occurred among sites. The Florida Scrub Lizard had highest genetic variation at more recently burned sites, whereas the Florida Sand Skink and the Six-lined Racerunner had highest genetic variation at less recently burned sites. Habitat preferences of the Florida Sand Skink and the Florida Scrub Lizard may explain their discordant results, and the Six-lined Racerunner may have a more complicated genetic response to fire or is acted on at a different geographic scale than we have investigated. Our results indicate that these species may respond to fire in a more complicated manner than predicted by our metapopulation model or local demographic model. Our results show that the population-level responses in genetic diversity to fire are species-specific mandating conservation management of habitat diversity through a mosaic of burn frequencies.
Pérez-Garrido, Alfonso; Morales Helguera, Aliuska; Abellán Guillén, Adela; Cordeiro, M Natália D S; Garrido Escudero, Amalio
2009-01-15
This paper reports a QSAR study for predicting the complexation of a large and heterogeneous variety of substances (233 organic compounds) with beta-cyclodextrins (beta-CDs). Several different theoretical molecular descriptors, calculated solely from the molecular structure of the compounds under investigation, and an efficient variable selection procedure, like the Genetic Algorithm, led to models with satisfactory global accuracy and predictivity. But the best-final QSAR model is based on Topological descriptors meanwhile offering a reasonable interpretation. This QSAR model was able to explain ca. 84% of the variance in the experimental activity, and displayed very good internal cross-validation statistics and predictivity on external data. It shows that the driving forces for CD complexation are mainly hydrophobic and steric (van der Waals) interactions. Thus, the results of our study provide a valuable tool for future screening and priority testing of beta-CDs guest molecules.
Evaluation of a whole-farm model for pasture-based dairy systems.
Beukes, P C; Palliser, C C; Macdonald, K A; Lancaster, J A S; Levy, G; Thorrold, B S; Wastney, M E
2008-06-01
In the temperate climate of New Zealand, animals can be grazed outdoors all year round. The pasture is supplemented with conserved feed, with the amount being determined by seasonal pasture growth, genetics of the herd, and stocking rate. The large number of factors that affect production makes it impractical and expensive to use field trials to explore all the farm system options. A model of an in situ-grazed pasture system has been developed to provide a tool for developing and testing novel farm systems; for example, different levels of bought-in supplements and different levels of nitrogen fertilizer application, to maintain sustainability or environmental integrity and profitability. It consists of a software framework that links climate information, on a daily basis, with dynamic, mechanistic component-models for pasture growth and animal metabolism, as well as management policies. A unique feature is that the component models were developed and published by other groups, and are retained in their original software language. The aim of this study was to compare the model, called the whole-farm model (WFM) with a farm trial that was conducted over 3 yr and in which data were collected specifically for evaluating the WFM. Data were used from the first year to develop the WFM and data from the second and third year to evaluate the model. The model predicted annual pasture production, end-of-season cow liveweight, cow body condition score, and pasture cover across season with relative prediction error <20%. Milk yield and milksolids (fat + protein) were overpredicted by approximately 30% even though both annual and monthly pasture and supplement intake were predicted with acceptable accuracy, suggesting that the metabolic conversion of feed to fat, protein, and lactose in the mammary gland needs to be refined. Because feed growth and intake predictions were acceptable, economic predictions can be made using the WFM, with an adjustment for milk yield, to test different management policies, alterations in climate, or the use of genetically improved animals, pastures, or crops.