Sample records for statistical genetic analysis

  1. Do-it-yourself statistics: A computer-assisted likelihood approach to analysis of data from genetic crosses.

    PubMed Central

    Robbins, L G

    2000-01-01

    Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965

  2. SimHap GUI: an intuitive graphical user interface for genetic association analysis.

    PubMed

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-12-25

    Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis.

  3. SimHap GUI: An intuitive graphical user interface for genetic association analysis

    PubMed Central

    Carter, Kim W; McCaskie, Pamela A; Palmer, Lyle J

    2008-01-01

    Background Researchers wishing to conduct genetic association analysis involving single nucleotide polymorphisms (SNPs) or haplotypes are often confronted with the lack of user-friendly graphical analysis tools, requiring sophisticated statistical and informatics expertise to perform relatively straightforward tasks. Tools, such as the SimHap package for the R statistics language, provide the necessary statistical operations to conduct sophisticated genetic analysis, but lacks a graphical user interface that allows anyone but a professional statistician to effectively utilise the tool. Results We have developed SimHap GUI, a cross-platform integrated graphical analysis tool for conducting epidemiological, single SNP and haplotype-based association analysis. SimHap GUI features a novel workflow interface that guides the user through each logical step of the analysis process, making it accessible to both novice and advanced users. This tool provides a seamless interface to the SimHap R package, while providing enhanced functionality such as sophisticated data checking, automated data conversion, and real-time estimations of haplotype simulation progress. Conclusion SimHap GUI provides a novel, easy-to-use, cross-platform solution for conducting a range of genetic and non-genetic association analyses. This provides a free alternative to commercial statistics packages that is specifically designed for genetic association analysis. PMID:19109877

  4. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data

    PubMed Central

    Zhu, Yun; Fan, Ruzong; Xiong, Momiao

    2017-01-01

    Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics. PMID:29040274

  5. A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

    PubMed

    John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

    2018-06-01

    Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.

  6. mvMapper: statistical and geographical data exploration and visualization of multivariate analysis of population structure

    USDA-ARS?s Scientific Manuscript database

    Characterizing population genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata is not always easily integrated into t...

  7. MetaGenyo: a web tool for meta-analysis of genetic association studies.

    PubMed

    Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

    2017-12-16

    Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .

  8. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

    PubMed

    Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

    2015-08-01

    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.

  9. Genetic structure of populations and differentiation in forest trees

    Treesearch

    Raymond P. Guries; F. Thomas Ledig

    1981-01-01

    Electrophoretic techniques permit population biologists to analyze genetic structure of natural populations by using large numbers of allozyme loci. Several methods of analysis have been applied to allozyme data, including chi-square contingency tests, F-statistics, and genetic distance. This paper compares such statistics for pitch pine (Pinus rigida...

  10. Some Conceptual Deficiencies in "Developmental" Behavior Genetics.

    ERIC Educational Resources Information Center

    Gottlieb, Gilbert

    1995-01-01

    Criticizes the application of the statistical procedures of the population-genetic approach within evolutionary biology to the study of psychological development. Argues that the application of the statistical methods of population genetics--primarily the analysis of variance--to the causes of psychological development is bound to result in a…

  11. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension.

    PubMed

    Zhu, Xiaofeng; Feng, Tao; Tayo, Bamidele O; Liang, Jingjing; Young, J Hunter; Franceschini, Nora; Smith, Jennifer A; Yanek, Lisa R; Sun, Yan V; Edwards, Todd L; Chen, Wei; Nalls, Mike; Fox, Ervin; Sale, Michele; Bottinger, Erwin; Rotimi, Charles; Liu, Yongmei; McKnight, Barbara; Liu, Kiang; Arnett, Donna K; Chakravati, Aravinda; Cooper, Richard S; Redline, Susan

    2015-01-08

    Genome-wide association studies (GWASs) have identified many genetic variants underlying complex traits. Many detected genetic loci harbor variants that associate with multiple-even distinct-traits. Most current analysis approaches focus on single traits, even though the final results from multiple traits are evaluated together. Such approaches miss the opportunity to systemically integrate the phenome-wide data available for genetic association analysis. In this study, we propose a general approach that can integrate association evidence from summary statistics of multiple traits, either correlated, independent, continuous, or binary traits, which might come from the same or different studies. We allow for trait heterogeneity effects. Population structure and cryptic relatedness can also be controlled. Our simulations suggest that the proposed method has improved statistical power over single-trait analysis in most of the cases we studied. We applied our method to the Continental Origins and Genetic Epidemiology Network (COGENT) African ancestry samples for three blood pressure traits and identified four loci (CHIC2, HOXA-EVX1, IGFBP1/IGFBP3, and CDH17; p < 5.0 × 10(-8)) associated with hypertension-related traits that were missed by a single-trait analysis in the original report. Six additional loci with suggestive association evidence (p < 5.0 × 10(-7)) were also observed, including CACNA1D and WNT3. Our study strongly suggests that analyzing multiple phenotypes can improve statistical power and that such analysis can be executed with the summary statistics from GWASs. Our method also provides a way to study a cross phenotype (CP) association by using summary statistics from GWASs of multiple phenotypes. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  12. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  13. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  14. Methods of analysis and resources available for genetic trait mapping.

    PubMed

    Ott, J

    1999-01-01

    Methods of genetic linkage analysis are reviewed and put in context with other mapping techniques. Sources of information are outlined (books, web sites, computer programs). Special consideration is given to statistical problems in canine genetic mapping (heterozygosity, inbreeding, marker maps).

  15. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  16. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  17. Teaching Principles of Linkage and Gene Mapping with the Tomato.

    ERIC Educational Resources Information Center

    Hawk, James A.; And Others

    1980-01-01

    A three-point linkage system in tomatoes is used to explain concepts of gene mapping, linking and statistical analysis. The system is designed for teaching the effective use of statistics, and the power of genetic analysis from statistical analysis of phenotypic ratios. (Author/SA)

  18. Genome-wide association analysis of secondary imaging phenotypes from the Alzheimer's disease neuroimaging initiative study.

    PubMed

    Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-02-01

    The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Genome-wide scans of genetic variants for psychophysiological endophenotypes: a methodological overview.

    PubMed

    Iacono, William G; Malone, Stephen M; Vaidyanathan, Uma; Vrieze, Scott I

    2014-12-01

    This article provides an introductory overview of the investigative strategy employed to evaluate the genetic basis of 17 endophenotypes examined as part of a 20-year data collection effort from the Minnesota Center for Twin and Family Research. Included are characterization of the study samples, descriptive statistics for key properties of the psychophysiological measures, and rationale behind the steps taken in the molecular genetic study design. The statistical approach included (a) biometric analysis of twin and family data, (b) heritability analysis using 527,829 single nucleotide polymorphisms (SNPs), (c) genome-wide association analysis of these SNPs and 17,601 autosomal genes, (d) follow-up analyses of candidate SNPs and genes hypothesized to have an association with each endophenotype, (e) rare variant analysis of nonsynonymous SNPs in the exome, and (f) whole genome sequencing association analysis using 27 million genetic variants. These methods were used in the accompanying empirical articles comprising this special issue, Genome-Wide Scans of Genetic Variants for Psychophysiological Endophenotypes. Copyright © 2014 Society for Psychophysiological Research.

  20. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  1. Model-Based Linkage Analysis of a Quantitative Trait.

    PubMed

    Song, Yeunjoo E; Song, Sunah; Schnell, Audrey H

    2017-01-01

    Linkage Analysis is a family-based method of analysis to examine whether any typed genetic markers cosegregate with a given trait, in this case a quantitative trait. If linkage exists, this is taken as evidence in support of a genetic basis for the trait. Historically, linkage analysis was performed using a binary disease trait, but has been extended to include quantitative disease measures. Quantitative traits are desirable as they provide more information than binary traits. Linkage analysis can be performed using single-marker methods (one marker at a time) or multipoint (using multiple markers simultaneously). In model-based linkage analysis the genetic model for the trait of interest is specified. There are many software options for performing linkage analysis. Here, we use the program package Statistical Analysis for Genetic Epidemiology (S.A.G.E.). S.A.G.E. was chosen because it also includes programs to perform data cleaning procedures and to generate and test genetic models for a quantitative trait, in addition to performing linkage analysis. We demonstrate in detail the process of running the program LODLINK to perform single-marker analysis, and MLOD to perform multipoint analysis using output from SEGREG, where SEGREG was used to determine the best fitting statistical model for the trait.

  2. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs.

    PubMed

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-05-28

    Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4-15.9 times faster, while Unphased jobs performed 1.1-18.6 times faster compared to the accumulated computation duration. Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance.

  3. Application of the Linux cluster for exhaustive window haplotype analysis using the FBAT and Unphased programs

    PubMed Central

    Mishima, Hiroyuki; Lidral, Andrew C; Ni, Jun

    2008-01-01

    Background Genetic association studies have been used to map disease-causing genes. A newly introduced statistical method, called exhaustive haplotype association study, analyzes genetic information consisting of different numbers and combinations of DNA sequence variations along a chromosome. Such studies involve a large number of statistical calculations and subsequently high computing power. It is possible to develop parallel algorithms and codes to perform the calculations on a high performance computing (HPC) system. However, most existing commonly-used statistic packages for genetic studies are non-parallel versions. Alternatively, one may use the cutting-edge technology of grid computing and its packages to conduct non-parallel genetic statistical packages on a centralized HPC system or distributed computing systems. In this paper, we report the utilization of a queuing scheduler built on the Grid Engine and run on a Rocks Linux cluster for our genetic statistical studies. Results Analysis of both consecutive and combinational window haplotypes was conducted by the FBAT (Laird et al., 2000) and Unphased (Dudbridge, 2003) programs. The dataset consisted of 26 loci from 277 extended families (1484 persons). Using the Rocks Linux cluster with 22 compute-nodes, FBAT jobs performed about 14.4–15.9 times faster, while Unphased jobs performed 1.1–18.6 times faster compared to the accumulated computation duration. Conclusion Execution of exhaustive haplotype analysis using non-parallel software packages on a Linux-based system is an effective and efficient approach in terms of cost and performance. PMID:18541045

  4. Quantitative trait nucleotide analysis using Bayesian model selection.

    PubMed

    Blangero, John; Goring, Harald H H; Kent, Jack W; Williams, Jeff T; Peterson, Charles P; Almasy, Laura; Dyer, Thomas D

    2005-10-01

    Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.

  5. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

    PubMed

    Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

    2017-12-07

    Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  7. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R

    PubMed Central

    Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis. PMID:27792763

  8. PopSc: Computing Toolkit for Basic Statistics of Molecular Population Genetics Simultaneously Implemented in Web-Based Calculator, Python and R.

    PubMed

    Chen, Shi-Yi; Deng, Feilong; Huang, Ying; Li, Cao; Liu, Linhai; Jia, Xianbo; Lai, Song-Jia

    2016-01-01

    Although various computer tools have been elaborately developed to calculate a series of statistics in molecular population genetics for both small- and large-scale DNA data, there is no efficient and easy-to-use toolkit available yet for exclusively focusing on the steps of mathematical calculation. Here, we present PopSc, a bioinformatic toolkit for calculating 45 basic statistics in molecular population genetics, which could be categorized into three classes, including (i) genetic diversity of DNA sequences, (ii) statistical tests for neutral evolution, and (iii) measures of genetic differentiation among populations. In contrast to the existing computer tools, PopSc was designed to directly accept the intermediate metadata, such as allele frequencies, rather than the raw DNA sequences or genotyping results. PopSc is first implemented as the web-based calculator with user-friendly interface, which greatly facilitates the teaching of population genetics in class and also promotes the convenient and straightforward calculation of statistics in research. Additionally, we also provide the Python library and R package of PopSc, which can be flexibly integrated into other advanced bioinformatic packages of population genetics analysis.

  9. Guidelines for collecting and maintaining archives for genetic monitoring

    Treesearch

    Jennifer A. Jackson; Linda Laikre; C. Scott Baker; Katherine C. Kendall; F. W. Allendorf; M. K. Schwartz

    2011-01-01

    Rapid advances in molecular genetic techniques and the statistical analysis of genetic data have revolutionized the way that populations of animals, plants and microorganisms can be monitored. Genetic monitoring is the practice of using molecular genetic markers to track changes in the abundance, diversity or distribution of populations, species or ecosystems over time...

  10. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  11. Using volcano plots and regularized-chi statistics in genetic association studies.

    PubMed

    Li, Wentian; Freudenberg, Jan; Suh, Young Ju; Yang, Yaning

    2014-02-01

    Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson's chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term "regularized-chi". This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Distribution of lod scores in oligogenic linkage analysis.

    PubMed

    Williams, J T; North, K E; Martin, L J; Comuzzie, A G; Göring, H H; Blangero, J

    2001-01-01

    In variance component oligogenic linkage analysis it can happen that the residual additive genetic variance bounds to zero when estimating the effect of the ith quantitative trait locus. Using quantitative trait Q1 from the Genetic Analysis Workshop 12 simulated general population data, we compare the observed lod scores from oligogenic linkage analysis with the empirical lod score distribution under a null model of no linkage. We find that zero residual additive genetic variance in the null model alters the usual distribution of the likelihood-ratio statistic.

  13. A Genome-Wide Association Analysis Reveals Epistatic Cancellation of Additive Genetic Variance for Root Length in Arabidopsis thaliana.

    PubMed

    Lachowiec, Jennifer; Shen, Xia; Queitsch, Christine; Carlborg, Örjan

    2015-01-01

    Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. Here, we examined the genetics of Arabidopsis thaliana root length and found that the genomic narrow-sense heritability for this trait in the examined population was statistically zero. The low amount of additive genetic variance that could be captured by the genome-wide genotypes likely explains why no associations to root length could be found using standard additive-model-based genome-wide association (GWA) approaches. However, as the broad-sense heritability for root length was significantly larger, and primarily due to epistasis, we also performed an epistatic GWA analysis to map loci contributing to the epistatic genetic variance. Four interacting pairs of loci were revealed, involving seven chromosomal loci that passed a standard multiple-testing corrected significance threshold. The genotype-phenotype maps for these pairs revealed epistasis that cancelled out the additive genetic variance, explaining why these loci were not detected in the additive GWA analysis. Small population sizes, such as in our experiment, increase the risk of identifying false epistatic interactions due to testing for associations with very large numbers of multi-marker genotypes in few phenotyped individuals. Therefore, we estimated the false-positive risk using a new statistical approach that suggested half of the associated pairs to be true positive associations. Our experimental evaluation of candidate genes within the seven associated loci suggests that this estimate is conservative; we identified functional candidate genes that affected root development in four loci that were part of three of the pairs. The statistical epistatic analyses were thus indispensable for confirming known, and identifying new, candidate genes for root length in this population of wild-collected A. thaliana accessions. We also illustrate how epistatic cancellation of the additive genetic variance explains the insignificant narrow-sense and significant broad-sense heritability by using a combination of careful statistical epistatic analyses and functional genetic experiments.

  14. Current genetic methodologies in the identification of disaster victims and in forensic analysis.

    PubMed

    Ziętkiewicz, Ewa; Witt, Magdalena; Daca, Patrycja; Zebracka-Gala, Jadwiga; Goniewicz, Mariusz; Jarząb, Barbara; Witt, Michał

    2012-02-01

    This review presents the basic problems and currently available molecular techniques used for genetic profiling in disaster victim identification (DVI). The environmental conditions of a mass disaster often result in severe fragmentation, decomposition and intermixing of the remains of victims. In such cases, traditional identification based on the anthropological and physical characteristics of the victims is frequently inconclusive. This is the reason why DNA profiling became the gold standard for victim identification in mass-casualty incidents (MCIs) or any forensic cases where human remains are highly fragmented and/or degraded beyond recognition. The review provides general information about the sources of genetic material for DNA profiling, the genetic markers routinely used during genetic profiling (STR markers, mtDNA and single-nucleotide polymorphisms [SNP]) and the basic statistical approaches used in DNA-based disaster victim identification. Automated technological platforms that allow the simultaneous analysis of a multitude of genetic markers used in genetic identification (oligonucleotide microarray techniques and next-generation sequencing) are also presented. Forensic and population databases containing information on human variability, routinely used for statistical analyses, are discussed. The final part of this review is focused on recent developments, which offer particularly promising tools for forensic applications (mRNA analysis, transcriptome variation in individuals/populations and genetic profiling of specific cells separated from mixtures).

  15. Analysis of a genetically structured variance heterogeneity model using the Box-Cox transformation.

    PubMed

    Yang, Ye; Christensen, Ole F; Sorensen, Daniel

    2011-02-01

    Over recent years, statistical support for the presence of genetic factors operating at the level of the environmental variance has come from fitting a genetically structured heterogeneous variance model to field or experimental data in various species. Misleading results may arise due to skewness of the marginal distribution of the data. To investigate how the scale of measurement affects inferences, the genetically structured heterogeneous variance model is extended to accommodate the family of Box-Cox transformations. Litter size data in rabbits and pigs that had previously been analysed in the untransformed scale were reanalysed in a scale equal to the mode of the marginal posterior distribution of the Box-Cox parameter. In the rabbit data, the statistical evidence for a genetic component at the level of the environmental variance is considerably weaker than that resulting from an analysis in the original metric. In the pig data, the statistical evidence is stronger, but the coefficient of correlation between additive genetic effects affecting mean and variance changes sign, compared to the results in the untransformed scale. The study confirms that inferences on variances can be strongly affected by the presence of asymmetry in the distribution of data. We recommend that to avoid one important source of spurious inferences, future work seeking support for a genetic component acting on environmental variation using a parametric approach based on normality assumptions confirms that these are met.

  16. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION DATA.

    PubMed

    Wu, Zheyang; Zhao, Hongyu

    2012-01-01

    For more fruitful discoveries of genetic variants associated with diseases in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. This article provides a statistical framework to rigorously address this question through analytical power calculations for common model search strategies to detect binary trait loci: marginal search, exhaustive search, forward search, and two-stage screening search. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among score test statistics of logistic regressions. We derive analytical results under two power definitions: the power of finding all the associated markers and the power of finding at least one associated marker. We also consider two types of error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to consider a broad genetic model space to investigate the relative performances of different model search strategies. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. Even though we focus on genetic association analysis, our results on the power of model selection procedures are clearly very general and applicable to other studies.

  17. ON MODEL SELECTION STRATEGIES TO IDENTIFY GENES UNDERLYING BINARY TRAITS USING GENOME-WIDE ASSOCIATION DATA

    PubMed Central

    Wu, Zheyang; Zhao, Hongyu

    2013-01-01

    For more fruitful discoveries of genetic variants associated with diseases in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. This article provides a statistical framework to rigorously address this question through analytical power calculations for common model search strategies to detect binary trait loci: marginal search, exhaustive search, forward search, and two-stage screening search. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among score test statistics of logistic regressions. We derive analytical results under two power definitions: the power of finding all the associated markers and the power of finding at least one associated marker. We also consider two types of error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to consider a broad genetic model space to investigate the relative performances of different model search strategies. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. Even though we focus on genetic association analysis, our results on the power of model selection procedures are clearly very general and applicable to other studies. PMID:23956610

  18. General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

    PubMed Central

    Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

    2013-01-01

    We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515

  19. Visual analysis of geocoded twin data puts nature and nurture on the map.

    PubMed

    Davis, O S P; Haworth, C M A; Lewis, C M; Plomin, R

    2012-09-01

    Twin studies allow us to estimate the relative contributions of nature and nurture to human phenotypes by comparing the resemblance of identical and fraternal twins. Variation in complex traits is a balance of genetic and environmental influences; these influences are typically estimated at a population level. However, what if the balance of nature and nurture varies depending on where we grow up? Here we use statistical and visual analysis of geocoded data from over 6700 families to show that genetic and environmental contributions to 45 childhood cognitive and behavioral phenotypes vary geographically in the United Kingdom. This has implications for detecting environmental exposures that may interact with the genetic influences on complex traits, and for the statistical power of samples recruited for genetic association studies. More broadly, our experience demonstrates the potential for collaborative exploratory visualization to act as a lingua franca for large-scale interdisciplinary research.

  20. Statistical genetics concepts and approaches in schizophrenia and related neuropsychiatric research.

    PubMed

    Schork, Nicholas J; Greenwood, Tiffany A; Braff, David L

    2007-01-01

    Statistical genetics is a research field that focuses on mathematical models and statistical inference methodologies that relate genetic variations (ie, naturally occurring human DNA sequence variations or "polymorphisms") to particular traits or diseases (phenotypes) usually from data collected on large samples of families or individuals. The ultimate goal of such analysis is the identification of genes and genetic variations that influence disease susceptibility. Although of extreme interest and importance, the fact that many genes and environmental factors contribute to neuropsychiatric diseases of public health importance (eg, schizophrenia, bipolar disorder, and depression) complicates relevant studies and suggests that very sophisticated mathematical and statistical modeling may be required. In addition, large-scale contemporary human DNA sequencing and related projects, such as the Human Genome Project and the International HapMap Project, as well as the development of high-throughput DNA sequencing and genotyping technologies have provided statistical geneticists with a great deal of very relevant and appropriate information and resources. Unfortunately, the use of these resources and their interpretation are not straightforward when applied to complex, multifactorial diseases such as schizophrenia. In this brief and largely nonmathematical review of the field of statistical genetics, we describe many of the main concepts, definitions, and issues that motivate contemporary research. We also provide a discussion of the most pressing contemporary problems that demand further research if progress is to be made in the identification of genes and genetic variations that predispose to complex neuropsychiatric diseases.

  1. The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.

    PubMed

    Christensen, G B; Knight, S; Camp, N J

    2009-11-01

    We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.

  2. Genetics and epidemiology, congenital anomalies and cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Friedman, J.M.

    1997-03-01

    Many of the basic statistical methods used in epidemiology - regression, analysis of variance, and estimation of relative risk, for example - originally were developed for the genetic analysis of biometric data. The familiarity that many geneticists have with this methodology has helped geneticists to understand and accept genetic epidemiology as a scientific discipline. It worth noting, however, that most of the work in genetic epidemiology during the past decade has been devoted to linkage and other family studies, rather than to population-based investigations of the type that characterize much of mainstream epidemiology. 30 refs., 2 tabs.

  3. A weighted U statistic for association analyses considering genetic heterogeneity.

    PubMed

    Wei, Changshuai; Elston, Robert C; Lu, Qing

    2016-07-20

    Converging evidence suggests that common complex diseases with the same or similar clinical manifestations could have different underlying genetic etiologies. While current research interests have shifted toward uncovering rare variants and structural variations predisposing to human diseases, the impact of heterogeneity in genetic studies of complex diseases has been largely overlooked. Most of the existing statistical methods assume the disease under investigation has a homogeneous genetic effect and could, therefore, have low power if the disease undergoes heterogeneous pathophysiological and etiological processes. In this paper, we propose a heterogeneity-weighted U (HWU) method for association analyses considering genetic heterogeneity. HWU can be applied to various types of phenotypes (e.g., binary and continuous) and is computationally efficient for high-dimensional genetic data. Through simulations, we showed the advantage of HWU when the underlying genetic etiology of a disease was heterogeneous, as well as the robustness of HWU against different model assumptions (e.g., phenotype distributions). Using HWU, we conducted a genome-wide analysis of nicotine dependence from the Study of Addiction: Genetics and Environments dataset. The genome-wide analysis of nearly one million genetic markers took 7h, identifying heterogeneous effects of two new genes (i.e., CYP3A5 and IKBKB) on nicotine dependence. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants.

    PubMed

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-04-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.

  5. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants

    PubMed Central

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-01-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325

  6. EvolQG - An R package for evolutionary quantitative genetics

    PubMed Central

    Melo, Diogo; Garcia, Guilherme; Hubbe, Alex; Assis, Ana Paula; Marroig, Gabriel

    2016-01-01

    We present an open source package for performing evolutionary quantitative genetics analyses in the R environment for statistical computing. Evolutionary theory shows that evolution depends critically on the available variation in a given population. When dealing with many quantitative traits this variation is expressed in the form of a covariance matrix, particularly the additive genetic covariance matrix or sometimes the phenotypic matrix, when the genetic matrix is unavailable and there is evidence the phenotypic matrix is sufficiently similar to the genetic matrix. Given this mathematical representation of available variation, the \\textbf{EvolQG} package provides functions for calculation of relevant evolutionary statistics; estimation of sampling error; corrections for this error; matrix comparison via correlations, distances and matrix decomposition; analysis of modularity patterns; and functions for testing evolutionary hypotheses on taxa diversification. PMID:27785352

  7. An ecological genetic delineation of local seed-source provenance for ecological restoration

    PubMed Central

    Krauss, Siegfried L; Sinclair, Elizabeth A; Bussell, John D; Hobbs, Richard J

    2013-01-01

    An increasingly important practical application of the analysis of spatial genetic structure within plant species is to help define the extent of local provenance seed collection zones that minimize negative impacts in ecological restoration programs. Here, we derive seed sourcing guidelines from a novel range-wide assessment of spatial genetic structure of 24 populations of Banksia menziesii (Proteaceae), a widely distributed Western Australian tree of significance in local ecological restoration programs. An analysis of molecular variance (AMOVA) of 100 amplified fragment length polymorphism (AFLP) markers revealed significant genetic differentiation among populations (ΦPT = 0.18). Pairwise population genetic dissimilarity was correlated with geographic distance, but not environmental distance derived from 15 climate variables, suggesting overall neutrality of these markers with regard to these climate variables. Nevertheless, Bayesian outlier analysis identified four markers potentially under selection, although these were not correlated with the climate variables. We calculated a global R-statistic using analysis of similarities (ANOSIM) to test the statistical significance of population differentiation and to infer a threshold seed collection zone distance of ∼60 km (all markers) and 100 km (outlier markers) when genetic distance was regressed against geographic distance. Population pairs separated by >60 km were, on average, twice as likely to be significantly genetically differentiated than population pairs separated by <60 km, suggesting that habitat-matched sites within a 30-km radius around a restoration site genetically defines a local provenance seed collection zone for B. menziesii. Our approach is a novel probability-based practical solution for the delineation of a local seed collection zone to minimize negative genetic impacts in ecological restoration. PMID:23919158

  8. A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.

    PubMed

    Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S

    2016-01-01

    Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.

  9. Coherent spectroscopic methods for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution

    NASA Astrophysics Data System (ADS)

    Moguilnaya, T.; Suminov, Y.; Botikov, A.; Ignatov, S.; Kononenko, A.; Agibalov, A.

    2017-01-01

    We developed the new automatic method that combines the method of forced luminescence and stimulated Brillouin scattering. This method is used for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution. We carried out the statistical spectral analysis of pathogens, genetically modified soy and nano-particles of silver in water from different regions in order to determine the statistical errors of the method. We studied spectral characteristics of these objects in water to perform the initial identification with 95% probability. These results were used for creation of the model of the device for monitor of pathogenic organisms and working model of the device to determine the genetically modified soy in meat.

  10. Joint multi-population analysis for genetic linkage of bipolar disorder or "wellness" to chromosome 4p.

    PubMed

    Visscher, P M; Haley, C S; Ewald, H; Mors, O; Egeland, J; Thiel, B; Ginns, E; Muir, W; Blackwood, D H

    2005-02-05

    To test the hypothesis that the same genetic loci confer susceptibility to, or protection from, disease in different populations, and that a combined analysis would improve the map resolution of a common susceptibility locus, we analyzed data from three studies that had reported linkage to bipolar disorder in a small region on chromosome 4p. Data sets comprised phenotypic information and genetic marker data on Scottish, Danish, and USA extended pedigrees. Across the three data sets, 913 individuals appeared in the pedigrees, 462 were classified, either as unaffected (323) or affected (139) with unipolar or bipolar disorder. A consensus linkage map was created from 14 microsatellite markers in a 33 cM region. Phenotypic and genetic data were analyzed using a variance component (VC) and allele sharing method. All previously reported elevated test statistics in the region were confirmed with one or both analysis methods, indicating the presence of one or more susceptibility genes to bipolar disorder in the three populations in the studied chromosome segment. When the results from both the VC and allele sharing method were considered, there was strong evidence for a susceptibility locus in the data from Scotland, some evidence in the data from Denmark and relatively less evidence in the data from the USA. The test statistics from the Scottish data set dominated the test statistics from the other studies, and no improved map resolution for a putative genetic locus underlying susceptibility in all three studies was obtained. Studies reporting linkage to the same region require careful scrutiny and preferably joint or meta analysis on the same basis in order to ensure that the results are truly comparable. (c) 2004 Wiley-Liss, Inc.

  11. From sexless to sexy: Why it is time for human genetics to consider and report analyses of sex.

    PubMed

    Powers, Matthew S; Smith, Phillip H; McKee, Sherry A; Ehringer, Marissa A

    2017-01-01

    Science has come a long way with regard to the consideration of sex differences in clinical and preclinical research, but one field remains behind the curve: human statistical genetics. The goal of this commentary is to raise awareness and discussion about how to best consider and evaluate possible sex effects in the context of large-scale human genetic studies. Over the course of this commentary, we reinforce the importance of interpreting genetic results in the context of biological sex, establish evidence that sex differences are not being considered in human statistical genetics, and discuss how best to conduct and report such analyses. Our recommendation is to run stratified analyses by sex no matter the sample size or the result and report the findings. Summary statistics from stratified analyses are helpful for meta-analyses, and patterns of sex-dependent associations may be hidden in a combined dataset. In the age of declining sequencing costs, large consortia efforts, and a number of useful control samples, it is now time for the field of human genetics to appropriately include sex in the design, analysis, and reporting of results.

  12. DNA Damage and Genetic Instability as Harbingers of Prostate Cancer

    DTIC Science & Technology

    2013-01-01

    incidence of prostate cancer as compared to placebo. Primary analysis of this trial indicated no statistically significant effect of selenium...Identification, isolation, staining, processing, and statistical analysis of slides for ERG and PTEN markers (aim 1) and interpretation of these results...participating in this study being conducted under Investigational New Drug #29829 from the Food and Drug Administration. STANDARD TREATMENT Patients

  13. Analysis of half diallel mating designs I: a practical analysis procedure for ANOVA approximation.

    Treesearch

    G.R. Johnson; J.N. King

    1998-01-01

    Procedures to analyze half-diallel mating designs using the SAS statistical package are presented. The procedure requires two runs of PROC and VARCOMP and results in estimates of additive and non-additive genetic variation. The procedures described can be modified to work on most statistical software packages which can compute variance component estimates. The...

  14. Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

    NASA Astrophysics Data System (ADS)

    Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

    2018-01-01

    The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/

  15. graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture.

    PubMed

    Chung, Dongjun; Kim, Hang J; Zhao, Hongyu

    2017-02-01

    Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/.

  16. Multispecies, Integrative GWAS for Focal Segmental Glomerulosclerosis

    DTIC Science & Technology

    2017-09-01

    is a frequent cause of end-stage renal disease (ESRD. We investigated the genetic basis of FSGS and recruited a heterogeneous population of...understanding the complex genetic mechanisms of FSGS. 15. SUBJECT TERMS FSGS, MCD, GWAS, CNV  16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT uu...disease (MCD). Using a variety of statistical and genetic approaches, including genome wide association analysis and rare copy number variations (CNVs

  17. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis

    PubMed Central

    Steele, Joe; Bastola, Dhundy

    2014-01-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base–base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel–Ziv techniques from data compression. PMID:23904502

  18. Analysis of conditional genetic effects and variance components in developmental genetics.

    PubMed

    Zhu, J

    1995-12-01

    A genetic model with additive-dominance effects and genotype x environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t-1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects.

  19. Analysis of Conditional Genetic Effects and Variance Components in Developmental Genetics

    PubMed Central

    Zhu, J.

    1995-01-01

    A genetic model with additive-dominance effects and genotype X environment interactions is presented for quantitative traits with time-dependent measures. The genetic model for phenotypic means at time t conditional on phenotypic means measured at previous time (t - 1) is defined. Statistical methods are proposed for analyzing conditional genetic effects and conditional genetic variance components. Conditional variances can be estimated by minimum norm quadratic unbiased estimation (MINQUE) method. An adjusted unbiased prediction (AUP) procedure is suggested for predicting conditional genetic effects. A worked example from cotton fruiting data is given for comparison of unconditional and conditional genetic variances and additive effects. PMID:8601500

  20. Forensic-paternity effectiveness and genetics population analysis of six non-CODIS mini-STR loci (D1S1656, D2S441, D6S1043, D10S1248, D12S391, D22S1045) and SE33 in Mestizo and Amerindian populations from Mexico.

    PubMed

    Burguete-Argueta, Nelsi; Martínez De la Cruz, Braulio; Camacho-Mejorado, Rafael; Santana, Carla; Noris, Gino; López-Bayghen, Esther; Arellano-Galindo, José; Majluf-Cruz, Abraham; Antonio Meraz-Ríos, Marco; Gómez, Rocío

    2016-11-01

    STRs are powerful tools intensively used in forensic and kinship studies. In order to assess the effectiveness of non-CODIS genetic markers in forensic and paternity tests, the genetic composition of six mini short tandem repeats-mini-STRs-(D1S1656, D2S441, D6S1043, D10S1248, D12S391, D22S1045) and the microsatellite SE33 in Mestizo and Amerindian populations from Mexico were studied. Using multiplex polymerase chain reactions and capillary electrophoresis, this study genotyped all loci from 870 chromosomes and evaluated the statistical genetic parameters. All mini-STRs studied were in agreement with HW and linkage equilibrium; however, an important HW departure for SE33 was found in the Mestizo population (p ≤ 0.0001). Regarding paternity and forensic statistical parameters, high values of combined power discrimination and mean power of exclusion were found using these seven markers. The principal co-ordinate analysis based on allele frequencies of three mini-STRs showed the complex genetic architecture of the Mestizo population. The results indicate that this set of loci is suitable to genetically identify individuals in the Mexican population, supporting its effectiveness in human identification casework. In addition, these findings add new statistical values and emphasise the importance of the use of non-CODIS markers in complex populations in order to avoid erroneous assumptions.

  1. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  2. Statistical methods to detect novel genetic variants using publicly available GWAS summary data.

    PubMed

    Guo, Bin; Wu, Baolin

    2018-03-01

    We propose statistical methods to detect novel genetic variants using only genome-wide association studies (GWAS) summary data without access to raw genotype and phenotype data. With more and more summary data being posted for public access in the post GWAS era, the proposed methods are practically very useful to identify additional interesting genetic variants and shed lights on the underlying disease mechanism. We illustrate the utility of our proposed methods with application to GWAS meta-analysis results of fasting glucose from the international MAGIC consortium. We found several novel genome-wide significant loci that are worth further study. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Multivariate Methods for Meta-Analysis of Genetic Association Studies.

    PubMed

    Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

    2018-01-01

    Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.

  4. [Population genetics of the inhabitants of Northern European USSR. II. Blood group distribution and antropogenetic characteristics in 6 villages in Archangel Oblast].

    PubMed

    Revazov, A A; Pasekov, V P; Lukasheva, I D

    1975-01-01

    The paper deals with the distribution of genetic markers (systems ABO, MN, Rh (D), Hp, PTC) and a number of demographic (folding of arms, hand clasping, tongue rolling, right- and left-handedness, of the type of ear lobe, of the types of dermatoglyphic patterns) in the inhabitants of 6 villages in the Mezen District of the Archangelsk Region of the RSFSR (river Peosa basin). The data presented in this work were obtained in the course of examination of over 800 persons. Differences in the interpretation of the results of generally adopted methods of statistical analysis of samples from small populations are discussed. Among the systems analysed in one third of all the cases there was a statistically significant deviation from Hardy-Weinberg's ratios. For the MN blood groups and haptoglobins this was caused by the excess of heterozygotes. The test of Hardy--Weinberg's ratios at the level of two-loci phenotypes revealed no statistically significant deviations either in separate villages or in all the villages taken together. The analysis of heterogeneity with respect to markers inherited according to Mendel's law revealed statistically significant differences between villages in all the systems except haptoglobins. A considerable heterogeneity in the distribution of family names, the frequencies of some of them varying from village to village from 0 to 90%. Statistically significant differences between villages were shown for all the anthropogenetic characters except arm folding, hand clasping and right-left-handedness. Considering the uniformity of the environmental pressure in the region examined, the heterogeneity of the population studied is apparently associated with a random genetic differentiation (genetic drift) and, possibly, with the effect of the progenitor.

  5. Meta-analysis of gene-level associations for rare variants based on single-variant statistics.

    PubMed

    Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu

    2013-08-08

    Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  6. Fine-scale landscape genetics of the American badger (Taxidea taxus): disentangling landscape effects and sampling artifacts in a poorly understood species

    PubMed Central

    Kierepka, E M; Latch, E K

    2016-01-01

    Landscape genetics is a powerful tool for conservation because it identifies landscape features that are important for maintaining genetic connectivity between populations within heterogeneous landscapes. However, using landscape genetics in poorly understood species presents a number of challenges, namely, limited life history information for the focal population and spatially biased sampling. Both obstacles can reduce power in statistics, particularly in individual-based studies. In this study, we genotyped 233 American badgers in Wisconsin at 12 microsatellite loci to identify alternative statistical approaches that can be applied to poorly understood species in an individual-based framework. Badgers are protected in Wisconsin owing to an overall lack in life history information, so our study utilized partial redundancy analysis (RDA) and spatially lagged regressions to quantify how three landscape factors (Wisconsin River, Ecoregions and land cover) impacted gene flow. We also performed simulations to quantify errors created by spatially biased sampling. Statistical analyses first found that geographic distance was an important influence on gene flow, mainly driven by fine-scale positive spatial autocorrelations. After controlling for geographic distance, both RDA and regressions found that Wisconsin River and Agriculture were correlated with genetic differentiation. However, only Agriculture had an acceptable type I error rate (3–5%) to be considered biologically relevant. Collectively, this study highlights the benefits of combining robust statistics and error assessment via simulations and provides a method for hypothesis testing in individual-based landscape genetics. PMID:26243136

  7. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses.

    PubMed

    Deng, Yangqing; Pan, Wei

    2017-12-01

    There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach. Copyright © 2017 by the Genetics Society of America.

  8. Analysis of genetic effects of nuclear-cytoplasmic interaction on quantitative traits: genetic model for diploid plants.

    PubMed

    Han, Lide; Yang, Jian; Zhu, Jun

    2007-06-01

    A genetic model was proposed for simultaneously analyzing genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) as well as their genotype by environment (GE) interaction for quantitative traits of diploid plants. In the model, the NCI effects were further partitioned into additive and dominance nuclear-cytoplasmic interaction components. Mixed linear model approaches were used for statistical analysis. On the basis of diallel cross designs, Monte Carlo simulations showed that the genetic model was robust for estimating variance components under several situations without specific effects. Random genetic effects were predicted by an adjusted unbiased prediction (AUP) method. Data on four quantitative traits (boll number, lint percentage, fiber length, and micronaire) in Upland cotton (Gossypium hirsutum L.) were analyzed as a worked example to show the effectiveness of the model.

  9. OPATs: Omnibus P-value association tests.

    PubMed

    Chen, Chia-Wei; Yang, Hsin-Chou

    2017-07-10

    Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.

  10. The effect of rare variants on inflation of the test statistics in case-control analyses.

    PubMed

    Pirie, Ailith; Wood, Angela; Lush, Michael; Tyrer, Jonathan; Pharoah, Paul D P

    2015-02-20

    The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.

  11. Machine learning patterns for neuroimaging-genetic studies in the cloud.

    PubMed

    Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

    2014-01-01

    Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.

  12. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update.

    PubMed

    Peakall, Rod; Smouse, Peter E

    2012-10-01

    GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G'(ST), G''(ST), Jost's D(est) and F'(ST) through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. rod.peakall@anu.edu.au.

  13. Sieve analysis in HIV-1 vaccine efficacy trials

    PubMed Central

    Edlefsen, Paul T.; Gilbert, Peter B.; Rolland, Morgane

    2013-01-01

    Purpose of review The genetic characterization of HIV-1 breakthrough infections in vaccine and placebo recipients offers new ways to assess vaccine efficacy trials. Statistical and sequence analysis methods provide opportunities to mine the mechanisms behind the effect of an HIV vaccine. Recent findings The release of results from two HIV-1 vaccine efficacy trials, Step/HVTN-502 and RV144, led to numerous studies in the last five years, including efforts to sequence HIV-1 breakthrough infections and compare viral characteristics between the vaccine and placebo groups. Novel genetic and statistical analysis methods uncovered features that distinguished founder viruses isolated from vaccinees from those isolated from placebo recipients, and identified HIV-1 genetic targets of vaccine-induced immune responses. Summary Studies of HIV-1 breakthrough infections in vaccine efficacy trials can provide an independent confirmation to correlates of risk studies, as they take advantage of vaccine/placebo comparisons while correlates of risk analyses are limited to vaccine recipients. Through the identification of viral determinants impacted by vaccine-mediated host immune responses, sieve analyses can shed light on potential mechanisms of vaccine protection. PMID:23719202

  14. Sieve analysis in HIV-1 vaccine efficacy trials.

    PubMed

    Edlefsen, Paul T; Gilbert, Peter B; Rolland, Morgane

    2013-09-01

    The genetic characterization of HIV-1 breakthrough infections in vaccine and placebo recipients offers new ways to assess vaccine efficacy trials. Statistical and sequence analysis methods provide opportunities to mine the mechanisms behind the effect of an HIV vaccine. The release of results from two HIV-1 vaccine efficacy trials, Step/HVTN-502 (HIV Vaccine Trials Network-502) and RV144, led to numerous studies in the last 5 years, including efforts to sequence HIV-1 breakthrough infections and compare viral characteristics between the vaccine and placebo groups. Novel genetic and statistical analysis methods uncovered features that distinguished founder viruses isolated from vaccinees from those isolated from placebo recipients, and identified HIV-1 genetic targets of vaccine-induced immune responses. Studies of HIV-1 breakthrough infections in vaccine efficacy trials can provide an independent confirmation to correlates of risk studies, as they take advantage of vaccine/placebo comparisons, whereas correlates of risk analyses are limited to vaccine recipients. Through the identification of viral determinants impacted by vaccine-mediated host immune responses, sieve analyses can shed light on potential mechanisms of vaccine protection.

  15. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis.

    PubMed

    Bonham-Carter, Oliver; Steele, Joe; Bastola, Dhundy

    2014-11-01

    Modern sequencing and genome assembly technologies have provided a wealth of data, which will soon require an analysis by comparison for discovery. Sequence alignment, a fundamental task in bioinformatics research, may be used but with some caveats. Seminal techniques and methods from dynamic programming are proving ineffective for this work owing to their inherent computational expense when processing large amounts of sequence data. These methods are prone to giving misleading information because of genetic recombination, genetic shuffling and other inherent biological events. New approaches from information theory, frequency analysis and data compression are available and provide powerful alternatives to dynamic programming. These new methods are often preferred, as their algorithms are simpler and are not affected by synteny-related problems. In this review, we provide a detailed discussion of computational tools, which stem from alignment-free methods based on statistical analysis from word frequencies. We provide several clear examples to demonstrate applications and the interpretations over several different areas of alignment-free analysis such as base-base correlations, feature frequency profiles, compositional vectors, an improved string composition and the D2 statistic metric. Additionally, we provide detailed discussion and an example of analysis by Lempel-Ziv techniques from data compression. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  16. Evidence, temperature, and the laws of thermodynamics.

    PubMed

    Vieland, Veronica J

    2014-01-01

    A primary purpose of statistical analysis in genetics is the measurement of the strength of evidence for or against hypotheses. As with any type of measurement, a properly calibrated measurement scale is necessary if we want to be able to meaningfully compare degrees of evidence across genetic data sets, across different types of genetic studies and/or across distinct experimental modalities. In previous papers in this journal and elsewhere, my colleagues and I have argued that geneticists ought to care about the scale on which statistical evidence is measured, and we have proposed the Kelvin temperature scale as a template for a context-independent measurement scale for statistical evidence. Moreover, we have claimed that, mathematically speaking, evidence and temperature may be one and the same thing. On first blush, this might seem absurd. Temperature is a property of systems following certain laws of nature (in particular, the 1st and 2nd Law of Thermodynamics) involving very physical quantities (e.g., energy) and processes (e.g., mechanical work). But what do the laws of thermodynamics have to do with statistical systems? Here I address that question. © 2014 S. Karger AG, Basel.

  17. From genes to ecosystems: Measuring evolutionary diversity and community structure with Forest Inventory and Analysis (FIA) data

    Treesearch

    Kevin M. Potter

    2009-01-01

    Forest genetic sustainability is an important component of forest health because genetic diversity and evolutionary processes allow for the adaptation of species and for the maintenance of ecosystem functionality and resilience. Phylogenetic community analyses, a set of new statistical methods for describing the evolutionary relationships among species, offer an...

  18. New application of intelligent agents in sporadic amyotrophic lateral sclerosis identifies unexpected specific genetic background.

    PubMed

    Penco, Silvana; Buscema, Massimo; Patrosso, Maria Cristina; Marocchi, Alessandro; Grossi, Enzo

    2008-05-30

    Few genetic factors predisposing to the sporadic form of amyotrophic lateral sclerosis (ALS) have been identified, but the pathology itself seems to be a true multifactorial disease in which complex interactions between environmental and genetic susceptibility factors take place. The purpose of this study was to approach genetic data with an innovative statistical method such as artificial neural networks to identify a possible genetic background predisposing to the disease. A DNA multiarray panel was applied to genotype more than 60 polymorphisms within 35 genes selected from pathways of lipid and homocysteine metabolism, regulation of blood pressure, coagulation, inflammation, cellular adhesion and matrix integrity, in 54 sporadic ALS patients and 208 controls. Advanced intelligent systems based on novel coupling of artificial neural networks and evolutionary algorithms have been applied. The results obtained have been compared with those derived from the use of standard neural networks and classical statistical analysis Advanced intelligent systems based on novel coupling of artificial neural networks and evolutionary algorithms have been applied. The results obtained have been compared with those derived from the use of standard neural networks and classical statistical analysis. An unexpected discovery of a strong genetic background in sporadic ALS using a DNA multiarray panel and analytical processing of the data with advanced artificial neural networks was found. The predictive accuracy obtained with Linear Discriminant Analysis and Standard Artificial Neural Networks ranged from 70% to 79% (average 75.31%) and from 69.1 to 86.2% (average 76.6%) respectively. The corresponding value obtained with Advanced Intelligent Systems reached an average of 96.0% (range 94.4 to 97.6%). This latter approach allowed the identification of seven genetic variants essential to differentiate cases from controls: apolipoprotein E arg158cys; hepatic lipase -480 C/T; endothelial nitric oxide synthase 690 C/T and glu298asp; vitamin K-dependent coagulation factor seven arg353glu, glycoprotein Ia/IIa 873 G/A and E-selectin ser128arg. This study provides an alternative and reliable method to approach complex diseases. Indeed, the application of a novel artificial intelligence-based method offers a new insight into genetic markers of sporadic ALS pointing out the existence of a strong genetic background.

  19. NETWORK ASSISTED ANALYSIS TO REVEAL THE GENETIC BASIS OF AUTISM1

    PubMed Central

    Liu, Li; Lei, Jing; Roeder, Kathryn

    2016-01-01

    While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk. PMID:27134692

  20. The relative effects of habitat loss and fragmentation on population genetic variation in the red-cockaded woodpecker (Picoides borealis).

    PubMed

    Bruggeman, Douglas J; Wiegand, Thorsten; Fernández, Néstor

    2010-09-01

    The relative influence of habitat loss, fragmentation and matrix heterogeneity on the viability of populations is a critical area of conservation research that remains unresolved. Using simulation modelling, we provide an analysis of the influence both patch size and patch isolation have on abundance, effective population size (N(e)) and F(ST). An individual-based, spatially explicit population model based on 15 years of field work on the red-cockaded woodpecker (Picoides borealis) was applied to different landscape configurations. The variation in landscape patterns was summarized using spatial statistics based on O-ring statistics. By regressing demographic and genetics attributes that emerged across the landscape treatments against proportion of total habitat and O-ring statistics, we show that O-ring statistics provide an explicit link between population processes, habitat area, and critical thresholds of fragmentation that affect those processes. Spatial distances among land cover classes that affect biological processes translated into critical scales at which the measures of landscape structure correlated best with genetic indices. Therefore our study infers pattern from process, which contrasts with past studies of landscape genetics. We found that population genetic structure was more strongly affected by fragmentation than population size, which suggests that examining only population size may limit recognition of fragmentation effects that erode genetic variation. If effective population size is used to set recovery goals for endangered species, then habitat fragmentation effects may be sufficiently strong to prevent evaluation of recovery based on the ratio of census:effective population size alone.

  1. Evaluation of psychiatric and genetic risk factors among primary relatives of suicide completers in Delhi NCR region, India.

    PubMed

    Pasi, Shivani; Singh, Piyoosh Kumar; Pandey, Rajeev Kumar; Dikshit, P C; Jiloha, R C; Rao, V R

    2015-10-30

    Suicide as a public health problem is studied worldwide and association of psychiatric and genetic risk factors for suicidal behavior are the point of discussion in studies across different ethnic groups. The present study is aimed at evaluating psychiatric and genetic traits among primary relatives of suicide completer families in an urban Indian population. Bi-variate analysis shows significant increase in major depression (PHQ and Hamilton), stress, panic disorder, somatoform disorder and suicide attemptamong primary compared to other relatives. Sib pair correlations also reveal significant results for major depression (Hamilton), stress, suicide attempt, intensity of suicide ideation and other anxiety syndrome. 5-HTTLPR, 5-HTT (Stin2) and COMT risk alleles are higher among primary relatives, though statistically insignificant. Backward conditional logistic regression analysis show only independent variable, Depression (Hamilton) made a unique statistically significant contribution to the model in primary relatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  2. Effect of environment and genotype on commercial maize hybrids using LC/MS-based metabolomics.

    PubMed

    Baniasadi, Hamid; Vlahakis, Chris; Hazebroek, Jan; Zhong, Cathy; Asiago, Vincent

    2014-02-12

    We recently applied gas chromatography coupled to time-of-flight mass spectrometry (GC/TOF-MS) and multivariate statistical analysis to measure biological variation of many metabolites due to environment and genotype in forage and grain samples collected from 50 genetically diverse nongenetically modified (non-GM) DuPont Pioneer commercial maize hybrids grown at six North American locations. In the present study, the metabolome coverage was extended using a core subset of these grain and forage samples employing ultra high pressure liquid chromatography (uHPLC) mass spectrometry (LC/MS). A total of 286 and 857 metabolites were detected in grain and forage samples, respectively, using LC/MS. Multivariate statistical analysis was utilized to compare and correlate the metabolite profiles. Environment had a greater effect on the metabolome than genetic background. The results of this study support and extend previously published insights into the environmental and genetic associated perturbations to the metabolome that are not associated with transgenic modification.

  3. Guidelines for collecting and maintaining archives for genetic monitoring

    USGS Publications Warehouse

    Jackson, Jennifer A.; Laikre, Linda; Baker, C. Scott; Kendall, Katherine C.; ,

    2012-01-01

    Rapid advances in molecular genetic techniques and the statistical analysis of genetic data have revolutionized the way that populations of animals, plants and microorganisms can be monitored. Genetic monitoring is the practice of using molecular genetic markers to track changes in the abundance, diversity or distribution of populations, species or ecosystems over time, and to follow adaptive and non-adaptive genetic responses to changing external conditions. In recent years, genetic monitoring has become a valuable tool in conservation management of biological diversity and ecological analysis, helping to illuminate and define cryptic and poorly understood species and populations. Many of the detected biodiversity declines, changes in distribution and hybridization events have helped to drive changes in policy and management. Because a time series of samples is necessary to detect trends of change in genetic diversity and species composition, archiving is a critical component of genetic monitoring. Here we discuss the collection, development, maintenance, and use of archives for genetic monitoring. This includes an overview of the genetic markers that facilitate effective monitoring, describes how tissue and DNA can be stored, and provides guidelines for proper practice.

  4. Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators

    PubMed Central

    Peng, Bo; Chen, Huann-Sheng; Mechanic, Leah E.; Racine, Ben; Clarke, John; Clarke, Lauren; Gillanders, Elizabeth; Feuer, Eric J.

    2013-01-01

    Summary: Many simulation methods and programs have been developed to simulate genetic data of the human genome. These data have been widely used, for example, to predict properties of populations retrospectively or prospectively according to mathematically intractable genetic models, and to assist the validation, statistical inference and power analysis of a variety of statistical models. However, owing to the differences in type of genetic data of interest, simulation methods, evolutionary features, input and output formats, terminologies and assumptions for different applications, choosing the right tool for a particular study can be a resource-intensive process that usually involves searching, downloading and testing many different simulation programs. Genetic Simulation Resources (GSR) is a website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features. Availability: http://popmodels.cancercontrol.cancer.gov/gsr. Contact: gsr@mail.nih.gov PMID:23435068

  5. A discriminative test among the different theories proposed to explain the origin of the genetic code: the coevolution theory finds additional support.

    PubMed

    Giulio, Massimo Di

    2018-05-19

    A discriminative statistical test among the different theories proposed to explain the origin of the genetic code is presented. Gathering the amino acids into polarity and biosynthetic classes that are the first expression of the physicochemical theory of the origin of the genetic code and the second expression of the coevolution theory, these classes are utilized in the Fisher's exact test to establish their significance within the genetic code table. Linking to the rows and columns of the genetic code of probabilities that express the statistical significance of these classes, I have finally been in the condition to be able to calculate a χ value to link to both the physicochemical theory and to the coevolution theory that would express the corroboration level referred to these theories. The comparison between these two χ values showed that the coevolution theory is able to explain - in this strictly empirical analysis - the origin of the genetic code better than that of the physicochemical theory. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics.

    PubMed Central

    Sobel, E.; Lange, K.

    1996-01-01

    The introduction of stochastic methods in pedigree analysis has enabled geneticists to tackle computations intractable by standard deterministic methods. Until now these stochastic techniques have worked by running a Markov chain on the set of genetic descent states of a pedigree. Each descent state specifies the paths of gene flow in the pedigree and the founder alleles dropped down each path. The current paper follows up on a suggestion by Elizabeth Thompson that genetic descent graphs offer a more appropriate space for executing a Markov chain. A descent graph specifies the paths of gene flow but not the particular founder alleles traveling down the paths. This paper explores algorithms for implementing Thompson's suggestion for codominant markers in the context of automatic haplotyping, estimating location scores, and computing gene-clustering statistics for robust linkage analysis. Realistic numerical examples demonstrate the feasibility of the algorithms. PMID:8651310

  7. Improved score statistics for meta-analysis in single-variant and gene-level association studies.

    PubMed

    Yang, Jingjing; Chen, Sai; Abecasis, Gonçalo

    2018-06-01

    Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses. © 2018 WILEY PERIODICALS, INC.

  8. Genetic polymorphisms of pharmacogenomic VIP variants in the Yi population from China.

    PubMed

    Yan, Mengdan; Li, Dianzhen; Zhao, Guige; Li, Jing; Niu, Fanglin; Li, Bin; Chen, Peng; Jin, Tianbo

    2018-03-30

    Drug response and target therapeutic dosage are different among individuals. The variability is largely genetically determined. With the development of pharmacogenetics and pharmacogenomics, widespread research have provided us a wealth of information on drug-related genetic polymorphisms, and the very important pharmacogenetic (VIP) variants have been identified for the major populations around the world whereas less is known regarding minorities in China, including the Yi ethnic group. Our research aims to screen the potential genetic variants in Yi population on pharmacogenomics and provide a theoretical basis for future medication guidance. In the present study, 80 VIP variants (selected from the PharmGKB database) were genotyped in 100 unrelated and healthy Yi adults recruited for our research. Through statistical analysis, we made a comparison between the Yi and other 11 populations listed in the HapMap database for significant SNPs detection. Two specific SNPs were subsequently enrolled in an observation on global allele distribution with the frequencies downloaded from ALlele FREquency Database. Moreover, F-statistics (Fst), genetic structure and phylogenetic tree analyses were conducted for determination of genetic similarity between the 12 ethnic groups. Using the χ2 tests, rs1128503 (ABCB1), rs7294 (VKORC1), rs9934438 (VKORC1), rs1540339 (VDR) and rs689466 (PTGS2) were identified as the significantly different loci for further analysis. The global allele distribution revealed that the allele "A" of rs1540339 and rs9934438 were more frequent in Yi people, which was consistent with the most populations in East Asia. F-statistics (Fst), genetic structure and phylogenetic tree analyses demonstrated that the Yi and CHD shared a closest relationship on their genetic backgrounds. Additionally, Yi was considered similar to the Han people from Shaanxi province among the domestic ethnic populations in China. Our results demonstrated significant differences on several polymorphic SNPs and supplement the pharmacogenomic information for the Yi population, which could provide new strategies for optimizing clinical medication in accordance with the genetic determinants of drug toxicity and efficacy. Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Mapping of epistatic quantitative trait loci in four-way crosses.

    PubMed

    He, Xiao-Hong; Qin, Hongde; Hu, Zhongli; Zhang, Tianzhen; Zhang, Yuan-Ming

    2011-01-01

    Four-way crosses (4WC) involving four different inbred lines often appear in plant and animal commercial breeding programs. Direct mapping of quantitative trait loci (QTL) in these commercial populations is both economical and practical. However, the existing statistical methods for mapping QTL in a 4WC population are built on the single-QTL genetic model. This simple genetic model fails to take into account QTL interactions, which play an important role in the genetic architecture of complex traits. In this paper, therefore, we attempted to develop a statistical method to detect epistatic QTL in 4WC population. Conditional probabilities of QTL genotypes, computed by the multi-point single locus method, were used to sample the genotypes of all putative QTL in the entire genome. The sampled genotypes were used to construct the design matrix for QTL effects. All QTL effects, including main and epistatic effects, were simultaneously estimated by the penalized maximum likelihood method. The proposed method was confirmed by a series of Monte Carlo simulation studies and real data analysis of cotton. The new method will provide novel tools for the genetic dissection of complex traits, construction of QTL networks, and analysis of heterosis.

  10. Charles E. Land, Ph.D., acclaimed statistical expert on radiation risk assessment, died January 2018

    Cancer.gov

    Charles E. Land, Ph.D., an internationally acclaimed statistical expert on radiation risk assessment, died January 25, 2018. He retired in 2009 from the NCI Division of Cancer Epidemiology and Genetics. Dr. Land performed pioneering work in modern radiation dose-response analysis and modeling of low-dose cancer risk.

  11. Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

    PubMed

    Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

    2014-04-11

    Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.

  12. Effective Population Size, Genetic Variation, and Their Relevance for Conservation: The Bighorn Sheep in Tiburon Island and Comparisons with Managed Artiodactyls

    PubMed Central

    Gasca-Pineda, Jaime; Cassaigne, Ivonne; Alonso, Rogelio A.; Eguiarte, Luis E.

    2013-01-01

    The amount of genetic diversity in a finite biological population mostly depends on the interactions among evolutionary forces and the effective population size (N e) as well as the time since population establishment. Because the N e estimation helps to explore population demographic history, and allows one to predict the behavior of genetic diversity through time, N e is a key parameter for the genetic management of small and isolated populations. Here, we explored an N e-based approach using a bighorn sheep population on Tiburon Island, Mexico (TI) as a model. We estimated the current (N crnt) and ancestral stable (N stbl) inbreeding effective population sizes as well as summary statistics to assess genetic diversity and the demographic scenarios that could explain such diversity. Then, we evaluated the feasibility of using TI as a source population for reintroduction programs. We also included data from other bighorn sheep and artiodactyl populations in the analysis to compare their inbreeding effective size estimates. The TI population showed high levels of genetic diversity with respect to other managed populations. However, our analysis suggested that TI has been under a genetic bottleneck, indicating that using individuals from this population as the only source for reintroduction could lead to a severe genetic diversity reduction. Analyses of the published data did not show a strict correlation between H E and N crnt estimates. Moreover, we detected that ancient anthropogenic and climatic pressures affected all studied populations. We conclude that the estimation of N crnt and N stbl are informative genetic diversity estimators and should be used in addition to summary statistics for conservation and population management planning. PMID:24147115

  13. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    PubMed

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  14. MutAIT: an online genetic toxicology data portal and analysis tools.

    PubMed

    Avancini, Daniele; Menzies, Georgina E; Morgan, Claire; Wills, John; Johnson, George E; White, Paul A; Lewis, Paul D

    2016-05-01

    Assessment of genetic toxicity and/or carcinogenic activity is an essential element of chemical screening programs employed to protect human health. Dose-response and gene mutation data are frequently analysed by industry, academia and governmental agencies for regulatory evaluations and decision making. Over the years, a number of efforts at different institutions have led to the creation and curation of databases to house genetic toxicology data, largely, with the aim of providing public access to facilitate research and regulatory assessments. This article provides a brief introduction to a new genetic toxicology portal called Mutation Analysis Informatics Tools (MutAIT) (www.mutait.org) that provides easy access to two of the largest genetic toxicology databases, the Mammalian Gene Mutation Database (MGMD) and TransgenicDB. TransgenicDB is a comprehensive collection of transgenic rodent mutation data initially compiled and collated by Health Canada. The updated MGMD contains approximately 50 000 individual mutation spectral records from the published literature. The portal not only gives access to an enormous quantity of genetic toxicology data, but also provides statistical tools for dose-response analysis and calculation of benchmark dose. Two important R packages for dose-response analysis are provided as web-distributed applications with user-friendly graphical interfaces. The 'drsmooth' package performs dose-response shape analysis and determines various points of departure (PoD) metrics and the 'PROAST' package provides algorithms for dose-response modelling. The MutAIT statistical tools, which are currently being enhanced, provide users with an efficient and comprehensive platform to conduct quantitative dose-response analyses and determine PoD values that can then be used to calculate human exposure limits or margins of exposure. © The Author 2015. Published by Oxford University Press on behalf of the UK Environmental Mutagen Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. Population differentiation in the red-legged kittiwake (Rissa brevirostris) as revealed by mitochondrial DNA

    USGS Publications Warehouse

    Patirana, A.; Hatcher, S.A.; Friesen, Vicki L.

    2002-01-01

    Population decline in red-legged kittiwakes (Rissa brevirostris) over recent decades has necessitated the collection of information on the distribution of genetic variation within and among colonies for implementation of suitable management policies. Here we present a preliminary study of the extent of genetic structuring and gene flow among the three principal breeding locations of red-legged kittiwakes using the hypervariable Domain I of the mitochondrial control region. Genetic variation was high relative to other species of seabirds, and was similar among locations. Analysis of molecular variance indicated that population genetic structure was statistically significant, and nested clade analysis suggested that kittiwakes breeding on Bering Island maybe genetically isolated from those elsewhere. However, phylogeographic structure was weak. Although this analysis involved only a single locus and a small number of samples, it suggests that red-legged kittiwakes probably constitute a single evolutionary significant unit; the possibility that they constitute two management units requires further investigation.

  16. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  17. GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research—an update

    PubMed Central

    Peakall, Rod; Smouse, Peter E.

    2012-01-01

    Summary: GenAlEx: Genetic Analysis in Excel is a cross-platform package for population genetic analyses that runs within Microsoft Excel. GenAlEx offers analysis of diploid codominant, haploid and binary genetic loci and DNA sequences. Both frequency-based (F-statistics, heterozygosity, HWE, population assignment, relatedness) and distance-based (AMOVA, PCoA, Mantel tests, multivariate spatial autocorrelation) analyses are provided. New features include calculation of new estimators of population structure: G′ST, G′′ST, Jost’s Dest and F′ST through AMOVA, Shannon Information analysis, linkage disequilibrium analysis for biallelic data and novel heterogeneity tests for spatial autocorrelation analysis. Export to more than 30 other data formats is provided. Teaching tutorials and expanded step-by-step output options are included. The comprehensive guide has been fully revised. Availability and implementation: GenAlEx is written in VBA and provided as a Microsoft Excel Add-in (compatible with Excel 2003, 2007, 2010 on PC; Excel 2004, 2011 on Macintosh). GenAlEx, and supporting documentation and tutorials are freely available at: http://biology.anu.edu.au/GenAlEx. Contact: rod.peakall@anu.edu.au PMID:22820204

  18. Comparison of Genetic Diversity between Chinese and American Soybean (Glycine max (L.)) Accessions Revealed by High-Density SNPs

    PubMed Central

    Liu, Zhangxiong; Li, Huihui; Wen, Zixiang; Fan, Xuhong; Li, Yinghui; Guan, Rongxia; Guo, Yong; Wang, Shuming; Wang, Dechun; Qiu, Lijuan

    2017-01-01

    Soybean is one of the most important economic crops for both China and the United States (US). The exchange of germplasm between these two countries has long been active. In order to investigate genetic relationships between Chinese and US soybean germplasm, 277 Chinese soybean accessions and 300 US soybean accessions from geographically diverse regions were analyzed using 5,361 SNP markers. The genetic diversity and the polymorphism information content (PIC) of the Chinese accessions was higher than that of the US accessions. Population structure analysis, principal component analysis, and cluster analysis all showed that the genetic basis of Chinese soybeans is distinct from that of the USA. The groupings observed in clustering analysis reflected the geographical origins of the accessions; this conclusion was validated with both genetic distance analysis and relative kinship analysis. FST-based and EigenGWAS statistical analysis revealed high genetic variation between the two subpopulations. Analysis of the 10 loci with the strongest selection signals showed that many loci were located in chromosome regions that have previously been identified as quantitative trait loci (QTL) associated with environmental-adaptation-related and yield-related traits. The pattern of diversity among the American and Chinese accessions should help breeders to select appropriate parental accessions to enhance the performance of future soybean cultivars. PMID:29250088

  19. Assessment of pleiotropic transcriptome perturbations in Arabidopsis engineered for indirect insect defence.

    PubMed

    Houshyani, Benyamin; van der Krol, Alexander R; Bino, Raoul J; Bouwmeester, Harro J

    2014-06-19

    Molecular characterization is an essential step of risk/safety assessment of genetically modified (GM) crops. Holistic approaches for molecular characterization using omics platforms can be used to confirm the intended impact of the genetic engineering, but can also reveal the unintended changes at the omics level as a first assessment of potential risks. The potential of omics platforms for risk assessment of GM crops has rarely been used for this purpose because of the lack of a consensus reference and statistical methods to judge the significance or importance of the pleiotropic changes in GM plants. Here we propose a meta data analysis approach to the analysis of GM plants, by measuring the transcriptome distance to untransformed wild-types. In the statistical analysis of the transcriptome distance between GM and wild-type plants, values are compared with naturally occurring transcriptome distances in non-GM counterparts obtained from a database. Using this approach we show that the pleiotropic effect of genes involved in indirect insect defence traits is substantially equivalent to the variation in gene expression occurring naturally in Arabidopsis. Transcriptome distance is a useful screening method to obtain insight in the pleiotropic effects of genetic modification.

  20. Published GMO studies find no evidence of harm when corrected for multiple comparisons.

    PubMed

    Panchin, Alexander Y; Tuzhikov, Alexander I

    2017-03-01

    A number of widely debated research articles claiming possible technology-related health concerns have influenced the public opinion on genetically modified food safety. We performed a statistical reanalysis and review of experimental data presented in some of these studies and found that quite often in contradiction with the authors' conclusions the data actually provides weak evidence of harm that cannot be differentiated from chance. In our opinion the problem of statistically unaccounted multiple comparisons has led to some of the most cited anti-genetically modified organism health claims in history. We hope this analysis puts the original results of these studies into proper context.

  1. More powerful haplotype sharing by accounting for the mode of inheritance.

    PubMed

    Ziegler, Andreas; Ewhida, Adel; Brendel, Michael; Kleensang, André

    2009-04-01

    The concept of haplotype sharing (HS) has received considerable attention recently, and several haplotype association methods have been proposed. Here, we extend the work of Beckmann and colleagues [2005 Hum. Hered. 59:67-78] who derived an HS statistic (BHS) as special case of Mantel's space-time clustering approach. The Mantel-type HS statistic correlates genetic similarity with phenotypic similarity across pairs of individuals. While phenotypic similarity is measured as the mean-corrected cross product of phenotypes, we propose to incorporate information of the underlying genetic model in the measurement of the genetic similarity. Specifically, for the recessive and dominant modes of inheritance we suggest the use of the minimum and maximum of shared length of haplotypes around a marker locus for pairs of individuals. If the underlying genetic model is unknown, we propose a model-free HS Mantel statistic using the max-test approach. We compare our novel HS statistics to BHS using simulated case-control data and illustrate its use by re-analyzing data from a candidate region of chromosome 18q from the Rheumatoid Arthritis (RA) Consortium. We demonstrate that our approach is point-wise valid and superior to BHS. In the re-analysis of the RA data, we identified three regions with point-wise P-values<0.005 containing six known genes (PMIP1, MC4R, PIGN, KIAA1468, TNFRSF11A and ZCCHC2) which might be worth follow-up.

  2. Estimating the age of Hb G-Coushatta [β22(B4)Glu→Ala] mutation by haplotypes of β-globin gene cluster in Denizli, Turkey.

    PubMed

    Ozturk, Onur; Arikan, Sanem; Atalay, Ayfer; Atalay, Erol O

    2018-05-01

    Hb G-Coushatta variant was reported from various populations' parts of the world such as Thai, Korea, Algeria, Thailand, China, Japan and Turkey. In our study, we aimed to discuss the possible historical relationships of the Hb G-Coushatta mutation with the possible migration routes of the world. For this purpose, associated haplotypes were determined using polymorphic loci in the beta globin gene cluster of hemoglobin G-Coushatta and normal populations in Denizli, Turkey. We performed statistical analysis such as haplotype analysis, Hardy-Weinberg equilibrium, measurement of genetic diversity and population differentiation parameters, analysis of molecular variance using F-statistics, historical-demographic analyses, mismatch distribution analysis of both populations and applied the test statistics in Arlequin ver. 3.5 software program. The diversity of haplotypes has been shown to indicate different genetic origins for two populations. However, AMOVA results, molecular diversity parameters and population demographic expansion times showed that the Hb G-Coushatta mutation develops on the normal population gene pool. Our estimated τ values showed the average time since the demographic expansion for normal and Hb G-Coushatta populations ranged from approximately 42,000 to 38,000 ybp, respectively. Our data suggest that Hb G-Coushatta population originate in normal population in Denizli, Turkey. These results support the hypothesis that the multiple origin of Hb G-Coushatta and indicate that mutation may have been triggered the formation of new variants on beta globin haplotypes. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.

  3. DHLAS: A web-based information system for statistical genetic analysis of HLA population data.

    PubMed

    Thriskos, P; Zintzaras, E; Germenis, A

    2007-03-01

    DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.

  4. Dealing with AFLP genotyping errors to reveal genetic structure in Plukenetia volubilis (Euphorbiaceae) in the Peruvian Amazon

    PubMed Central

    Vašek, Jakub; Viehmannová, Iva; Ocelák, Martin; Cachique Huansi, Danter; Vejl, Pavel

    2017-01-01

    An analysis of the population structure and genetic diversity for any organism often depends on one or more molecular marker techniques. Nonetheless, these techniques are not absolutely reliable because of various sources of errors arising during the genotyping process. Thus, a complex analysis of genotyping error was carried out with the AFLP method in 169 samples of the oil seed plant Plukenetia volubilis L. from small isolated subpopulations in the Peruvian Amazon. Samples were collected in nine localities from the region of San Martin. Analysis was done in eight datasets with a genotyping error from 0 to 5%. Using eleven primer combinations, 102 to 275 markers were obtained according to the dataset. It was found that it is only possible to obtain the most reliable and robust results through a multiple-level filtering process. Genotyping error and software set up influence both the estimation of population structure and genetic diversity, where in our case population number (K) varied between 2–9 depending on the dataset and statistical method used. Surprisingly, discrepancies in K number were caused more by statistical approaches than by genotyping errors themselves. However, for estimation of genetic diversity, the degree of genotyping error was critical because descriptive parameters (He, FST, PLP 5%) varied substantially (by at least 25%). Due to low gene flow, P. volubilis mostly consists of small isolated subpopulations (ΦPT = 0.252–0.323) with some degree of admixture given by socio-economic connectivity among the sites; a direct link between the genetic and geographic distances was not confirmed. The study illustrates the successful application of AFLP to infer genetic structure in non-model plants. PMID:28910307

  5. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  6. Are Interactions between cis-Regulatory Variants Evidence for Biological Epistasis or Statistical Artifacts?

    PubMed

    Fish, Alexandra E; Capra, John A; Bush, William S

    2016-10-06

    The importance of epistasis-or statistical interactions between genetic variants-to the development of complex disease in humans has been controversial. Genome-wide association studies of statistical interactions influencing human traits have recently become computationally feasible and have identified many putative interactions. However, statistical models used to detect interactions can be confounded, which makes it difficult to be certain that observed statistical interactions are evidence for true molecular epistasis. In this study, we investigate whether there is evidence for epistatic interactions between genetic variants within the cis-regulatory region that influence gene expression after accounting for technical, statistical, and biological confounding factors. We identified 1,119 (FDR = 5%) interactions that appear to regulate gene expression in human lymphoblastoid cell lines, a tightly controlled, largely genetically determined phenotype. Many of these interactions replicated in an independent dataset (90 of 803 tested, Bonferroni threshold). We then performed an exhaustive analysis of both known and novel confounders, including ceiling/floor effects, missing genotype combinations, haplotype effects, single variants tagged through linkage disequilibrium, and population stratification. Every interaction could be explained by at least one of these confounders, and replication in independent datasets did not protect against some confounders. Assuming that the confounding factors provide a more parsimonious explanation for each interaction, we find it unlikely that cis-regulatory interactions contribute strongly to human gene expression, which calls into question the relevance of cis-regulatory interactions for other human phenotypes. We additionally propose several best practices for epistasis testing to protect future studies from confounding. Copyright © 2016 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  7. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  8. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  9. Inferring Causalities in Landscape Genetics: An Extension of Wright's Causal Modeling to Distance Matrices.

    PubMed

    Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon

    2018-04-01

    Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.

  10. Genetic Contributions to The Association Between Adult Height and Head and Neck Cancer: A Mendelian Randomization Analysis.

    PubMed

    Pastorino, Roberta; Puggina, Anna; Carreras-Torres, Robert; Lagiou, Pagona; Holcátová, Ivana; Richiardi, Lorenzo; Kjaerheim, Kristina; Agudo, Antonio; Castellsagué, Xavier; Macfarlane, Tatiana V; Barzan, Luigi; Canova, Cristina; Thakker, Nalin S; Conway, David I; Znaor, Ariana; Healy, Claire M; Ahrens, Wolfgang; Zaridze, David; Szeszenia-Dabrowska, Neonilia; Lissowska, Jolanta; Fabianova, Eleonora; Mates, Ioan Nicolae; Bencko, Vladimir; Foretova, Lenka; Janout, Vladimir; Brennan, Paul; Gaborieau, Valérie; McKay, James D; Boccia, Stefania

    2018-03-14

    With the aim to dissect the effect of adult height on head and neck cancer (HNC), we use the Mendelian randomization (MR) approach to test the association between genetic instruments for height and the risk of HNC. 599 single nucleotide polymorphisms (SNPs) were identified as genetic instruments for height, accounting for 16% of the phenotypic variation. Genetic data concerning HNC cases and controls were obtained from a genome-wide association study. Summary statistics for genetic association were used in complementary MR approaches: the weighted genetic risk score (GRS) and the inverse-variance weighted (IVW). MR-Egger regression was used for sensitivity analysis and pleiotropy evaluation. From the GRS analysis, one standard deviation (SD) higher height (6.9 cm; due to genetic predisposition across 599 SNPs) raised the risk for HNC (Odds ratio (OR), 1.14; 95% Confidence Interval (95%CI), 0.99-1.32). The association analyses with potential confounders revealed that the GRS was associated with tobacco smoking (OR = 0.80, 95% CI (0.69-0.93)). MR-Egger regression did not provide evidence of overall directional pleiotropy. Our study indicates that height is potentially associated with HNC risk. However, the reported risk could be underestimated since, at the genetic level, height emerged to be inversely associated with smoking.

  11. Integrative Approaches to Understanding the Pathogenic Role of Genetic Variation in Rheumatic Diseases.

    PubMed

    Laufer, Vincent A; Chen, Jake Y; Langefeld, Carl D; Bridges, S Louis

    2017-08-01

    The use of high-throughput omics may help to understand the contribution of genetic variants to the pathogenesis of rheumatic diseases. We discuss the concept of missing heritability: that genetic variants do not explain the heritability of rheumatoid arthritis and related rheumatologic conditions. In addition to an overview of how integrative data analysis can lead to novel insights into mechanisms of rheumatic diseases, we describe statistical approaches to prioritizing genetic variants for future functional analyses. We illustrate how analyses of large datasets provide hope for improved approaches to the diagnosis, treatment, and prevention of rheumatic diseases. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. In defence of model-based inference in phylogeography

    PubMed Central

    Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

    2017-01-01

    Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924

  13. Adaptation to local ultraviolet radiation conditions among neighbouring Daphnia populations

    PubMed Central

    Miner, Brooks E.; Kerr, Benjamin

    2011-01-01

    Understanding the historical processes that generated current patterns of phenotypic diversity in nature is particularly challenging in subdivided populations. Populations often exhibit heritable genetic differences that correlate with environmental variables, but the non-independence among neighbouring populations complicates statistical inference of adaptation. To understand the relative influence of adaptive and non-adaptive processes in generating phenotypes requires joint evaluation of genetic and phenotypic divergence in an integrated and statistically appropriate analysis. We investigated phenotypic divergence, population-genetic structure and potential fitness trade-offs in populations of Daphnia melanica inhabiting neighbouring subalpine ponds of widely differing transparency to ultraviolet radiation (UVR). Using a combination of experimental, population-genetic and statistical techniques, we separated the effects of shared population ancestry and environmental variables in predicting phenotypic divergence among populations. We found that native water transparency significantly predicted divergence in phenotypes among populations even after accounting for significant population structure. This result demonstrates that environmental factors such as UVR can at least partially account for phenotypic divergence. However, a lack of evidence for a hypothesized trade-off between UVR tolerance and growth rates in the absence of UVR prevents us from ruling out the possibility that non-adaptive processes are partially responsible for phenotypic differentiation in this system. PMID:20943691

  14. [Characterization of patients with skeletal genetic diseases in a Colombian referral center].

    PubMed

    Velasco, Harvy Mauricio; Buelvas, Lina Patricia

    2017-06-01

    Short height in Colombia has an estimated prevalence of 10%. The 2009 Nosology and Classification of Skeletal Genetic Diseases described 456 clinical conditions using biochemical, molecular and radiological criteria for diagnosis. To analyze demographic, epidemiological and clinical variables in a group of patients with skeletal genetic diseases referred to the Instituto de Ortopedia Infantil Roosevelt. Patients referred between 2008 and 2014 were analyzed filtering 167 diagnoses of the International Classification of Diseases, 10th revision (ICD 10), related to skeletal genetic diseases. Demographic, epidemiological and clinical variables were explored using descriptive statistics. An intervention score was generated contemplating different combinations of treatments. An inferential statistical analysis using Student's t test was performed on such variables. The most frequent reason for consultation was suspicion of a genetic skeletal disorder. The types of treatments considered included support, surgical, pharmacological and orthotics, and it was established that genetic skeletal disorders were associated with higher intervention scores while tall and short height showed a lower score. Most referred patients were classified with genetic bone diseases, short stature and other monogenic genetic diseases. Significant differences were found between the age at symptoms onset and the age of diagnosis. Diversity was found in the therapeutic approach among different groups of pathologies. Patients with tall and short height showed lower intervention scores, which may warn on the need to reassess the therapeutic requirements of these groups.

  15. Statistics for Learning Genetics

    NASA Astrophysics Data System (ADS)

    Charles, Abigail Sheena

    This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing statistically-based genetics problems. This issue is at the emerging edge of modern college-level genetics instruction, and this study attempts to identify key theoretical components for creating a specialized biological statistics curriculum. The goal of this curriculum will be to prepare biology students with the skills for assimilating quantitatively-based genetic processes, increasingly at the forefront of modern genetics. To fulfill this, two college level classes at two universities were surveyed. One university was located in the northeastern US and the other in the West Indies. There was a sample size of 42 students and a supplementary interview was administered to a select 9 students. Interviews were also administered to professors in the field in order to gain insight into the teaching of statistics in genetics. Key findings indicated that students had very little to no background in statistics (55%). Although students did perform well on exams with 60% of the population receiving an A or B grade, 77% of them did not offer good explanations on a probability question associated with the normal distribution provided in the survey. The scope and presentation of the applicable statistics/mathematics in some of the most used textbooks in genetics teaching, as well as genetics syllabi used by instructors do not help the issue. It was found that the text books, often times, either did not give effective explanations for students, or completely left out certain topics. The omission of certain statistical/mathematical oriented topics was seen to be also true with the genetics syllabi reviewed for this study. Nonetheless, although the necessity for infusing these quantitative subjects with genetics and, overall, the biological sciences is growing (topics including synthetic biology, molecular systems biology and phylogenetics) there remains little time in the semester to be dedicated to the consolidation of learning and understanding.

  16. CTLA-4 gene polymorphisms and their influence on predisposition to autoimmune thyroid diseases (Graves’ disease and Hashimoto's thyroiditis)

    PubMed Central

    Pastuszak-Lewandoska, Dorota; Sewerynek, Ewa; Domańska, Daria; Gładyś, Aleksandra; Skrzypczak, Renata

    2012-01-01

    Introduction Autoimmune thyroid disease (AITD) is associated with both genetic and environmental factors which lead to the overactivity of immune system. Cytotoxic T-Lymphocyte Antigen 4 (CTLA-4) gene polymorphisms belong to the main genetic factors determining the susceptibility to AITD (Hashimoto's thyroiditis, HT and Graves' disease, GD) development. The aim of the study was to evaluate the relationship between CTLA-4 polymorphisms (A49G, 1822 C/T and CT60 A/G) and HT and/or GD in Polish patients. Material and methods Molecular analysis involved AITD group, consisting of HT (n=28) and GD (n=14) patients, and a control group of healthy persons (n=20). Genomic DNA was isolated from peripheral blood and CTLA-4 polymorphisms were assessed by polymerase chain reaction-restriction fragment length polymorphism method, using three restriction enzymes: Fnu4HI (A49G), BsmAI (1822 C/T) and BsaAI (CT60 A/G). Results Statistical analysis (χ2 test) confirmed significant differences between the studied groups concerning CTLA-4 A49G genotypes. CTLA-4 A/G genotype was significantly more frequent in AITD group and OR analysis suggested that it might increase the susceptibility to HT. In GD patients, OR analysis revealed statistically significant relationship with the presence of G allele. In controls, CTLA-4 A/A genotype frequency was significantly increased suggesting a protective effect. There were no statistically significant differences regarding frequencies of other genotypes and polymorphic alleles of the CTLA-4 gene (1822 C/T and CT60 A/G) between the studied groups. Conclusions CTLA-4 A49G polymorphism seems to be an important genetic determinant of the risk of HT and GD in Polish patients. PMID:22851994

  17. CTLA-4 gene polymorphisms and their influence on predisposition to autoimmune thyroid diseases (Graves' disease and Hashimoto's thyroiditis).

    PubMed

    Pastuszak-Lewandoska, Dorota; Sewerynek, Ewa; Domańska, Daria; Gładyś, Aleksandra; Skrzypczak, Renata; Brzeziańska, Ewa

    2012-07-04

    Autoimmune thyroid disease (AITD) is associated with both genetic and environmental factors which lead to the overactivity of immune system. Cytotoxic T-Lymphocyte Antigen 4 (CTLA-4) gene polymorphisms belong to the main genetic factors determining the susceptibility to AITD (Hashimoto's thyroiditis, HT and Graves' disease, GD) development. The aim of the study was to evaluate the relationship between CTLA-4 polymorphisms (A49G, 1822 C/T and CT60 A/G) and HT and/or GD in Polish patients. Molecular analysis involved AITD group, consisting of HT (n=28) and GD (n=14) patients, and a control group of healthy persons (n=20). Genomic DNA was isolated from peripheral blood and CTLA-4 polymorphisms were assessed by polymerase chain reaction-restriction fragment length polymorphism method, using three restriction enzymes: Fnu4HI (A49G), BsmAI (1822 C/T) and BsaAI (CT60 A/G). Statistical analysis (χ(2) test) confirmed significant differences between the studied groups concerning CTLA-4 A49G genotypes. CTLA-4 A/G genotype was significantly more frequent in AITD group and OR analysis suggested that it might increase the susceptibility to HT. In GD patients, OR analysis revealed statistically significant relationship with the presence of G allele. In controls, CTLA-4 A/A genotype frequency was significantly increased suggesting a protective effect. There were no statistically significant differences regarding frequencies of other genotypes and polymorphic alleles of the CTLA-4 gene (1822 C/T and CT60 A/G) between the studied groups. CTLA-4 A49G polymorphism seems to be an important genetic determinant of the risk of HT and GD in Polish patients.

  18. Path analysis of the genetic integration of traits in the sand cricket: a novel use of BLUPs.

    PubMed

    Roff, D A; Fairbairn, D J

    2011-09-01

    This study combines path analysis with quantitative genetics to analyse a key life history trade-off in the cricket, Gryllus firmus. We develop a path model connecting five traits associated with the trade-off between flight capability and reproduction and test this model using phenotypic data and estimates of breeding values (best linear unbiased predictors) from a half-sibling experiment. Strong support by both types of data validates our causal model and indicates concordance between the phenotypic and genetic expression of the trade-off. Comparisons of the trade-off between sexes and wing morphs reveal that these discrete phenotypes are not genetically independent and that the evolutionary trajectories of the two wing morphs are more tightly constrained to covary than those of the two sexes. Our results illustrate the benefits of combining a quantitative genetic analysis, which examines statistical correlations between traits, with a path model that focuses upon the causal components of variation. © 2011 The Authors. Journal of Evolutionary Biology © 2011 European Society For Evolutionary Biology.

  19. Genetic differentiation of the stingless bee Tetragonula pagdeni in Thailand using SSCP analysis of a large subunit of mitochondrial ribosomal DNA.

    PubMed

    Thummajitsakul, Sirikul; Klinbunga, Sirawut; Sittipraneed, Siriporn

    2011-08-01

    Genetic diversity and population differentiation of the stingless bee Tetragonula pagdeni (Schwarz) was assessed using single-strand conformational polymorphism (SSCP) analysis of a large subunit of the ribosomal RNA gene (16S rRNA). High levels of genetic variation among individuals within each population (North, Northeast, Central, Prachuap Khiri Khan, Chumphon, and Peninsular Thailand) of T. pagdeni were observed. Analysis of molecular variance indicated significant genetic differentiation among the six geographic populations (Φ (PT) = 0.28, P < 0.001) and between samples collected from north and south of the Isthmus of Kra (Φ (PT) = 0.18, P < 0.001). In addition, Φ (PT) values between all pairwise comparisons were statistically significant (P < 0.01), indicating strong degrees of intraspecific population differentiation. Therefore, PCR-SSCP is a simple and cost-effective technique applicable for routine population genetic analyses in T. pagdeni and other stingless bees. The results also provide an important baseline for the conservation and management of this ecologically important species.

  20. Genetic association between the dopamine D1-receptor gene and paranoid schizophrenia in a northern Han Chinese population.

    PubMed

    Yao, Jun; Ding, Mei; Xing, Jiaxin; Xuan, Jinfeng; Pang, Hao; Pan, Yuqing; Wang, Baojie

    2014-01-01

    Dysregulation of dopaminergic neurotransmission at the D1 receptor in the prefrontal cortex has been implicated in the pathogenesis of schizophrenia. Genetic polymorphisms of the dopamine D1-receptor gene have a plausible role in modulating the risk of schizophrenia. To determine the role of DRD1 genetic polymorphisms as a risk factor for schizophrenia, we undertook a case-control study to look for an association between the DRD1 gene and schizophrenia. We genotyped eleven single-nucleotide polymorphisms within the DRD1 gene by deoxyribonucleic acid sequencing involving 173 paranoid schizophrenia patients and 213 unrelated healthy individuals. Statistical analysis was performed to identify the difference of genotype, allele, or haplotype distribution between cases and controls. A significantly lower risk of paranoid schizophrenia was associated with the AG + GG genotype of rs5326 and the AG + GG genotype of rs4532 compared to the AA genotype and the AA genotype, respectively. Distribution of haplotypes was no different between controls and paranoid schizophrenia patients. In the males, the genotype distribution of rs5326 was statistically different between cases and controls. In the females, the genotype distribution of rs4532 was statistically different between cases and controls. However, the aforementioned statistical significances were lost after Bonferroni correction. It is unlikely that DRD1 accounts for a substantial proportion of the genetic risk for schizophrenia. As an important dopaminergic gene, DRD1 may contribute to schizophrenia by interacting with other genes, and further relevant studies are warranted.

  1. Across-cohort QC analyses of GWAS summary statistics from complex traits.

    PubMed

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2016-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.

  2. Across-cohort QC analyses of GWAS summary statistics from complex traits

    PubMed Central

    Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

    2017-01-01

    Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965

  3. Incidence, prevalence and genetic determinants of neonatal diabetes mellitus: a systematic review and meta-analysis protocol.

    PubMed

    Nansseu, Jobert Richie N; Ngo-Um, Suzanne S; Balti, Eric V

    2016-11-10

    In the absence of existing data, the present review intends to determine the incidence, prevalence and/or genetic determinants of neonatal diabetes mellitus (NDM), with expected contribution to disease characterization. We will include cross-sectional, cohort or case-control studies which have reported the incidence, prevalence and/or genetic determinants of NDM between January 01, 2000 and May 31, 2016, published in English or French languages and without any geographical limitation. PubMed and EMBASE will be extensively screened to identify potentially eligible studies, completed by manual search. Two authors will independently screen, select studies, extract data, and assess the risk of bias; disagreements will be resolved by consensus. Clinical heterogeneity will be investigated by examining the design and setting (including geographic region), procedure used for genetic testing, calculation of incidence or prevalence, and outcomes in each study. Studies found to be clinically homogeneous will be pooled together through a random effects meta-analysis. Statistical heterogeneity will be assessed using the chi-square test of homogeneity and quantified using the I 2 statistic. In case of substantial heterogeneity, subgroup analyses will be undertaken. Publication bias will be assessed with funnel plots, complemented with the use of Egger's test of bias. This systematic review and meta-analysis is expected to draw a clear picture of phenotypic and genotypic presentations of NDM in order to better understand the condition and adequately address challenges in respect with its management. PROSPERO CRD42016039765.

  4. Application of Multivariate Statistical Analysis to Biomarkers in Se-Turkey Crude Oils

    NASA Astrophysics Data System (ADS)

    Gürgey, K.; Canbolat, S.

    2017-11-01

    Twenty-four crude oil samples were collected from the 24 oil fields distributed in different districts of SE-Turkey. API and Sulphur content (%), Stable Carbon Isotope, Gas Chromatography (GC), and Gas Chromatography-Mass Spectrometry (GC-MS) data were used to construct a geochemical data matrix. The aim of this study is to examine the genetic grouping or correlations in the crude oil samples, hence the number of source rocks present in the SE-Turkey. To achieve these aims, two of the multivariate statistical analysis techniques (Principle Component Analysis [PCA] and Cluster Analysis were applied to data matrix of 24 samples and 8 source specific biomarker variables/parameters. The results showed that there are 3 genetically different oil groups: Batman-Nusaybin Oils, Adıyaman-Kozluk Oils and Diyarbakir Oils, in addition to a one mixed group. These groupings imply that at least, three different source rocks are present in South-Eastern (SE) Turkey. Grouping of the crude oil samples appears to be consistent with the geographic locations of the oils fields, subsurface stratigraphy as well as geology of the area.

  5. Multi-Genetic Marker Approach and Spatio-Temporal Analysis Suggest There Is a Single Panmictic Population of Swordfish Xiphias gladius in the Indian Ocean

    PubMed Central

    Muths, Delphine; Le Couls, Sarah; Evano, Hugues; Grewe, Peter; Bourjea, Jerome

    2013-01-01

    Genetic population structure of swordfish Xiphias gladius was examined based on 2231 individual samples, collected mainly between 2009 and 2010, among three major sampling areas within the Indian Ocean (IO; twelve distinct sites), Atlantic (two sites) and Pacific (one site) Oceans using analysis of nineteen microsatellite loci (n = 2146) and mitochondrial ND2 sequences (n = 2001) data. Sample collection was stratified in time and space in order to investigate the stability of the genetic structure observed with a special focus on the South West Indian Ocean. Significant AMOVA variance was observed for both markers indicating genetic population subdivision was present between oceans. Overall value of F-statistics for ND2 sequences confirmed that Atlantic and Indian Oceans swordfish represent two distinct genetic stocks. Indo-Pacific differentiation was also significant but lower than that observed between Atlantic and Indian Oceans. However, microsatellite F-statistics failed to reveal structure even at the inter-oceanic scale, indicating that resolving power of our microsatellite loci was insufficient for detecting population subdivision. At the scale of the Indian Ocean, results obtained from both markers are consistent with swordfish belonging to a single unique panmictic population. Analyses partitioned by sampling area, season, or sex also failed to identify any clear structure within this ocean. Such large spatial and temporal homogeneity of genetic structure, observed for such a large highly mobile pelagic species, suggests as satisfactory to consider swordfish as a single panmictic population in the Indian Ocean. PMID:23717447

  6. Statistical power and utility of meta-analysis methods for cross-phenotype genome-wide association studies.

    PubMed

    Zhu, Zhaozhong; Anttila, Verneri; Smoller, Jordan W; Lee, Phil H

    2018-01-01

    Advances in recent genome wide association studies (GWAS) suggest that pleiotropic effects on human complex traits are widespread. A number of classic and recent meta-analysis methods have been used to identify genetic loci with pleiotropic effects, but the overall performance of these methods is not well understood. In this work, we use extensive simulations and case studies of GWAS datasets to investigate the power and type-I error rates of ten meta-analysis methods. We specifically focus on three conditions commonly encountered in the studies of multiple traits: (1) extensive heterogeneity of genetic effects; (2) characterization of trait-specific association; and (3) inflated correlation of GWAS due to overlapping samples. Although the statistical power is highly variable under distinct study conditions, we found the superior power of several methods under diverse heterogeneity. In particular, classic fixed-effects model showed surprisingly good performance when a variant is associated with more than a half of study traits. As the number of traits with null effects increases, ASSET performed the best along with competitive specificity and sensitivity. With opposite directional effects, CPASSOC featured the first-rate power. However, caution is advised when using CPASSOC for studying genetically correlated traits with overlapping samples. We conclude with a discussion of unresolved issues and directions for future research.

  7. A set of autosomal multiple InDel markers for forensic application and population genetic analysis in the Chinese Xinjiang Hui group.

    PubMed

    Xie, Tong; Guo, Yuxin; Chen, Ling; Fang, Yating; Tai, Yunchun; Zhou, Yongsong; Qiu, Pingming; Zhu, Bofeng

    2018-07-01

    In recent years, insertion/deletion (InDel) markers have become a promising and useful supporting tool in forensic identification cases and biogeographic research field. In this study, 30 InDel loci were explored to reveal the genetic diversities and genetic relationships between Chinese Xinjiang Hui group and the 25 previously reported populations using various biostatistics methods such as forensic statistical parameter analysis, phylogenetic reconstruction, multi-dimensional scaling, principal component analysis, and STRUCTURE analysis. No deviations from Hardy-Weinberg equilibrium tests were found at all 30 loci in the Chinese Xinjiang Hui group. The observed heterozygosity and expected heterozygosity ranged from 0.1971 (HLD118) to 0.5092 (HLD92), 0.2222 (HLD118) to 0.5000 (HLD6), respectively. The cumulative probability of exclusion and combined power of discrimination were 0.988849 and 0.99999999999378, respectively, which indicated that these 30 loci could be qualified for personal identification and used as complementary genetic markers for paternity tests in forensic cases. The results of present research based on the different methods of population genetic analysis revealed that the Chinese Xinjiang Hui group had close relationships with most Chinese groups, especially Han populations. In spite of this, for a better understanding of genetic background of the Chinese Xinjiang Hui group, more molecular genetic markers such as ancestry informative markers, single nucleotide polymorphisms (SNPs), and copy number variations will be conducted in future studies. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. A model for family-based case-control studies of genetic imprinting and epistasis.

    PubMed

    Li, Xin; Sui, Yihan; Liu, Tian; Wang, Jianxin; Li, Yongci; Lin, Zhenwu; Hegarty, John; Koltun, Walter A; Wang, Zuoheng; Wu, Rongling

    2014-11-01

    Genetic imprinting, or called the parent-of-origin effect, has been recognized to play an important role in the formation and pathogenesis of human diseases. Although the epigenetic mechanisms that establish genetic imprinting have been a focus of many genetic studies, our knowledge about the number of imprinting genes and their chromosomal locations and interactions with other genes is still scarce, limiting precise inference of the genetic architecture of complex diseases. In this article, we present a statistical model for testing and estimating the effects of genetic imprinting on complex diseases using a commonly used case-control design with family structure. For each subject sampled from a case and control population, we not only genotype its own single nucleotide polymorphisms (SNPs) but also collect its parents' genotypes. By tracing the transmission pattern of SNP alleles from parental to offspring generation, the model allows the characterization of genetic imprinting effects based on Pearson tests of a 2 × 2 contingency table. The model is expanded to test the interactions between imprinting effects and additive, dominant and epistatic effects in a complex web of genetic interactions. Statistical properties of the model are investigated, and its practical usefulness is validated by a real data analysis. The model will provide a useful tool for genome-wide association studies aimed to elucidate the picture of genetic control over complex human diseases. © The Author 2013. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  9. Statistics for Learning Genetics

    ERIC Educational Resources Information Center

    Charles, Abigail Sheena

    2012-01-01

    This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing…

  10. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts.

    PubMed

    Lee, Donghyung; Bigdeli, T Bernard; Williamson, Vernell S; Vladimirov, Vladimir I; Riley, Brien P; Fanous, Ayman H; Bacanu, Silviu-Alin

    2015-10-01

    To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. Directly Imputing summary STatistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix. dlee4@vcu.edu Supplementary Data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  11. Complex Adaptive System Models and the Genetic Analysis of Plasma HDL-Cholesterol Concentration

    PubMed Central

    Rea, Thomas J.; Brown, Christine M.; Sing, Charles F.

    2006-01-01

    Despite remarkable advances in diagnosis and therapy, ischemic heart disease (IHD) remains a leading cause of morbidity and mortality in industrialized countries. Recent efforts to estimate the influence of genetic variation on IHD risk have focused on predicting individual plasma high-density lipoprotein cholesterol (HDL-C) concentration. Plasma HDL-C concentration (mg/dl), a quantitative risk factor for IHD, has a complex multifactorial etiology that involves the actions of many genes. Single gene variations may be necessary but are not individually sufficient to predict a statistically significant increase in risk of disease. The complexity of phenotype-genotype-environment relationships involved in determining plasma HDL-C concentration has challenged commonly held assumptions about genetic causation and has led to the question of which combination of variations, in which subset of genes, in which environmental strata of a particular population significantly improves our ability to predict high or low risk phenotypes. We document the limitations of inferences from genetic research based on commonly accepted biological models, consider how evidence for real-world dynamical interactions between HDL-C determinants challenges the simplifying assumptions implicit in traditional linear statistical genetic models, and conclude by considering research options for evaluating the utility of genetic information in predicting traits with complex etiologies. PMID:17146134

  12. A Genome-Wide Association Meta-Analysis of Attention-Deficit/Hyperactivity Disorder Symptoms in Population-Based Paediatric Cohorts

    PubMed Central

    Groen-Blokhuis, Maria M.; Pourcain, Beate St.; Greven, Corina U.; Pappa, Irene; Tiesler, Carla M.T.; Ang, Wei; Nolte, Ilja M.; Vilor-Tejedor, Natalia; Bacelis, Jonas; Ebejer, Jane L.; Zhao, Huiying; Davies, Gareth E.; Ehli, Erik A.; Evans, David M.; Fedko, Iryna O.; Guxens, Mònica; Hottenga, Jouke-Jan; Hudziak, James J.; Jugessur, Astanand; Kemp, John P.; Krapohl, Eva; Martin, Nicholas G.; Murcia, Mario; Myhre, Ronny; Ormel, Johan; Ring, Susan M.; Standl, Marie; Stergiakouli, Evie; Stoltenberg, Camilla; Thiering, Elisabeth; Timpson, Nicholas J.; Trzaskowski, Maciej; van der Most, Peter J.; Wang, Carol; Nyholt, Dale R.; Medland, Sarah E.; Neale, Benjamin; Jacobsson, Bo; Sunyer, Jordi; Hartman, Catharina A.; Whitehouse, Andrew J.O.; Pennell, Craig E.; Heinrich, Joachim; Plomin, Robert; Smith, George Davey; Tiemeier, Henning; Posthuma, Danielle; Boomsma, Dorret I.

    2016-01-01

    Objective To elucidate the influence of common genetic variants on childhood attention-deficit/hyperactivity disorder (ADHD) symptoms, to identify genetic variants that explain its high heritability, and to investigate the genetic overlap of ADHD symptom scores with ADHD diagnosis. Method Within the EArly Genetics and Lifecourse Epidemiology (EAGLE) consortium, genome-wide single nucleotide polymorphisms (SNPs) and ADHD symptom scores were available for 17,666 children (< 13 years) from nine population-based cohorts. SNP-based heritability was estimated in data from the three largest cohorts. Meta-analysis based on genome-wide association (GWA) analyses with SNPs was followed by gene-based association tests, and the overlap in results with a meta-analysis in the Psychiatric Genomics Consortium (PGC) case-control ADHD study was investigated. Results SNP-based heritability ranged from 5% to 34%, indicating that variation in common genetic variants influences ADHD symptom scores. The meta-analysis did not detect genome-wide significant SNPs, but three genes, lying close to each other with SNPs in high linkage disequilibrium (LD), showed a gene-wide significant association (p values between 1.46×10-6 and 2.66×10-6). One gene, WASL, is involved in neuronal development. Both SNP- and gene-based analyses indicated overlap with the PGC meta-analysis results with the genetic correlation estimated at 0.96. Conclusion The SNP-based heritability for ADHD symptom scores indicates a polygenic architecture and genes involved in neurite outgrowth are possibly involved. Continuous and dichotomous measures of ADHD appear to assess a genetically common phenotype. A next step is to combine data from population-based and case-control cohorts in genetic association studies to increase sample size and improve statistical power for identifying genetic variants. PMID:27663945

  13. Genetics of human body size and shape: body proportions and indices.

    PubMed

    Livshits, Gregory; Roset, A; Yakovenko, K; Trofimov, S; Kobyliansky, E

    2002-01-01

    The study of the genetic component in morphological variables such as body height and weight, head and chest circumference, etc. has a rather long history. However, only a few studies investigated body proportions and configuration. The major aim of the present study was to evaluate the extent of the possible genetic effects on the inter-individual variation of a number of body configuration indices amenable to clear functional interpretation. Two ethnically different pedigree samples were used in the study: (1) Turkmenians (805 individuals) from Central Asia, and (2) Chuvasha (732 individuals) from the Volga riverside, Russian Federation. To achieve the aim of the present study we proposed three new indices, which were subjected to a statistical-genetic analysis using modified version of "FISHER" software. The proposed indices were: (1) an integral index of torso volume (IND#1), an index reflecting a predisposition of body proportions to maintain a balance in a vertical position (IND#2), and an index of skeletal extremities volume (IND#3). Additionally, the first two principal factors (PF1 and PF2) obtained on 19 measurements of body length and breadth were subjected to genetic analysis. Variance decomposition analysis that simultaneously assess the contribution of gender, age, additive genetic effects and effects of environment shared by the nuclear family members, was applied to fit variation of the above three indices, and PF1 and PF2. The raw familial correlation of all study traits and in both samples showed: (1) all marital correlations did not differ significantly from zero; (2) parent-offspring and sibling correlations were all positive and statistically significant. The parameter estimates obtained in variance analyses showed that from 40% to 75% of inter-individual variation of the studied traits (adjusted for age and sex) were attributable to genetic effects. For PF1 and PF2 in both samples, and for IND#2 (in Chuvasha pedigrees), significant common sib environmental effects were also detectable. Genetic factors substantially influence inter-individual differences in body shape and configuration in two studied samples. However, further studies are needed to clarify the extent of pleiotropy and epigenetic effects on various facets of the human physique.

  14. Increased Risk of the APOB rs11279109 Polymorphism for CHD among the Kuwaiti Population

    PubMed Central

    Ismael, Fatma G.; Al-Serri, Ahmad; Al-Rashdan, Ibrahim

    2017-01-01

    Background Coronary heart disease (CHD) is among the leading causes of death in Kuwait. This case-control study investigated the genetic association of APOB rs11279109 with CHD in Kuwaitis. Methods The polymorphism was genotyped in 734 Kuwaiti samples by direct amplification. Statistical analysis with genetic modeling was used to assess its association with CHD. Results A statistically significant association (P < 0.001) between the rs11279109 DD genotype (OR: 2.43, CI: 1.34–4.41) with CHD was observed. A codominant genetic model revealed a 2.69 risk increase (CI: 1.57–4.61) for the DD genotype (P = 0.009) independent of age, sex, BMI, smoking, hypercholesterolemia, and ethnicity suggesting APOB rs11279109 as an indicator for the increased risk of CHD. Conclusion The DD genotype may explain molecular mechanisms that underline increased LDL oxidation leading to arthrosclerosis. The findings emphasize the need to identify genetic markers specific to the CHD patient ethnic group in order to improve prognosis and help in early diagnosis and prevention. PMID:29362515

  15. Design and analysis issues in gene and environment studies

    PubMed Central

    2012-01-01

    Both nurture (environmental) and nature (genetic factors) play an important role in human disease etiology. Traditionally, these effects have been thought of as independent. This perspective is ill informed for non-mendelian complex disorders which result as an interaction between genetics and environment. To understand health and disease we must study how nature and nurture interact. Recent advances in human genomics and high-throughput biotechnology make it possible to study large numbers of genetic markers and gene products simultaneously to explore their interactions with environment. The purpose of this review is to discuss design and analytic issues for gene-environment interaction studies in the “-omics” era, with a focus on environmental and genetic epidemiological studies. We present an expanded environmental genomic disease paradigm. We discuss several study design issues for gene-environmental interaction studies, including confounding and selection bias, measurement of exposures and genotypes. We discuss statistical issues in studying gene-environment interactions in different study designs, such as choices of statistical models, assumptions regarding biological factors, and power and sample size considerations, especially in genome-wide gene-environment studies. Future research directions are also discussed. PMID:23253229

  16. Design and analysis issues in gene and environment studies.

    PubMed

    Liu, Chen-yu; Maity, Arnab; Lin, Xihong; Wright, Robert O; Christiani, David C

    2012-12-19

    Both nurture (environmental) and nature (genetic factors) play an important role in human disease etiology. Traditionally, these effects have been thought of as independent. This perspective is ill informed for non-mendelian complex disorders which result as an interaction between genetics and environment. To understand health and disease we must study how nature and nurture interact. Recent advances in human genomics and high-throughput biotechnology make it possible to study large numbers of genetic markers and gene products simultaneously to explore their interactions with environment. The purpose of this review is to discuss design and analytic issues for gene-environment interaction studies in the "-omics" era, with a focus on environmental and genetic epidemiological studies. We present an expanded environmental genomic disease paradigm. We discuss several study design issues for gene-environmental interaction studies, including confounding and selection bias, measurement of exposures and genotypes. We discuss statistical issues in studying gene-environment interactions in different study designs, such as choices of statistical models, assumptions regarding biological factors, and power and sample size considerations, especially in genome-wide gene-environment studies. Future research directions are also discussed.

  17. Applications of modern statistical methods to analysis of data in physical science

    NASA Astrophysics Data System (ADS)

    Wicker, James Eric

    Modern methods of statistical and computational analysis offer solutions to dilemmas confronting researchers in physical science. Although the ideas behind modern statistical and computational analysis methods were originally introduced in the 1970's, most scientists still rely on methods written during the early era of computing. These researchers, who analyze increasingly voluminous and multivariate data sets, need modern analysis methods to extract the best results from their studies. The first section of this work showcases applications of modern linear regression. Since the 1960's, many researchers in spectroscopy have used classical stepwise regression techniques to derive molecular constants. However, problems with thresholds of entry and exit for model variables plagues this analysis method. Other criticisms of this kind of stepwise procedure include its inefficient searching method, the order in which variables enter or leave the model and problems with overfitting data. We implement an information scoring technique that overcomes the assumptions inherent in the stepwise regression process to calculate molecular model parameters. We believe that this kind of information based model evaluation can be applied to more general analysis situations in physical science. The second section proposes new methods of multivariate cluster analysis. The K-means algorithm and the EM algorithm, introduced in the 1960's and 1970's respectively, formed the basis of multivariate cluster analysis methodology for many years. However, several shortcomings of these methods include strong dependence on initial seed values and inaccurate results when the data seriously depart from hypersphericity. We propose new cluster analysis methods based on genetic algorithms that overcomes the strong dependence on initial seed values. In addition, we propose a generalization of the Genetic K-means algorithm which can accurately identify clusters with complex hyperellipsoidal covariance structures. We then use this new algorithm in a genetic algorithm based Expectation-Maximization process that can accurately calculate parameters describing complex clusters in a mixture model routine. Using the accuracy of this GEM algorithm, we assign information scores to cluster calculations in order to best identify the number of mixture components in a multivariate data set. We will showcase how these algorithms can be used to process multivariate data from astronomical observations.

  18. FARVATX: FAmily-based Rare Variant Association Test for X-linked genes

    PubMed Central

    Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H.; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2016-01-01

    Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease (COPD). Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. PMID:27325607

  19. FARVATX: Family-Based Rare Variant Association Test for X-Linked Genes.

    PubMed

    Choi, Sungkyoung; Lee, Sungyoung; Qiao, Dandi; Hardin, Megan; Cho, Michael H; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2016-09-01

    Although the X chromosome has many genes that are functionally related to human diseases, the complicated biological properties of the X chromosome have prevented efficient genetic association analyses, and only a few significantly associated X-linked variants have been reported for complex traits. For instance, dosage compensation of X-linked genes is often achieved via the inactivation of one allele in each X-linked variant in females; however, some X-linked variants can escape this X chromosome inactivation. Efficient genetic analyses cannot be conducted without prior knowledge about the gene expression process of X-linked variants, and misspecified information can lead to power loss. In this report, we propose new statistical methods for rare X-linked variant genetic association analysis of dichotomous phenotypes with family-based samples. The proposed methods are computationally efficient and can complete X-linked analyses within a few hours. Simulation studies demonstrate the statistical efficiency of the proposed methods, which were then applied to rare-variant association analysis of the X chromosome in chronic obstructive pulmonary disease. Some promising significant X-linked genes were identified, illustrating the practical importance of the proposed methods. © 2016 WILEY PERIODICALS, INC.

  20. Things fall apart: biological species form unconnected parsimony networks.

    PubMed

    Hart, Michael W; Sunday, Jennifer

    2007-10-22

    The generality of operational species definitions is limited by problematic definitions of between-species divergence. A recent phylogenetic species concept based on a simple objective measure of statistically significant genetic differentiation uses between-species application of statistical parsimony networks that are typically used for population genetic analysis within species. Here we review recent phylogeographic studies and reanalyse several mtDNA barcoding studies using this method. We found that (i) alignments of DNA sequences typically fall apart into a separate subnetwork for each Linnean species (but with a higher rate of true positives for mtDNA data) and (ii) DNA sequences from single species typically stick together in a single haplotype network. Departures from these patterns are usually consistent with hybridization or cryptic species diversity.

  1. Dissecting the genetics of complex traits using summary association statistics.

    PubMed

    Pasaniuc, Bogdan; Price, Alkes L

    2017-02-01

    During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

  2. Dissecting the genetics of complex traits using summary association statistics

    PubMed Central

    Pasaniuc, Bogdan; Price, Alkes L.

    2017-01-01

    During the past decade, genome-wide association studies (GWAS) have successfully identified tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyze summary association statistics. Here we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases. PMID:27840428

  3. Evidence of a genetic link between endometriosis and ovarian cancer.

    PubMed

    Lee, Alice W; Templeman, Claire; Stram, Douglas A; Beesley, Jonathan; Tyrer, Jonathan; Berchuck, Andrew; Pharoah, Paul P; Chenevix-Trench, Georgia; Pearce, Celeste Leigh

    2016-01-01

    To evaluate whether endometriosis-associated genetic variation affects risk of ovarian cancer. Pooled genetic analysis. University hospital. Genetic data from 46,176 participants (15,361 ovarian cancer cases and 30,815 controls) from 41 ovarian cancer studies. None. Endometriosis-associated genetic variation and ovarian cancer. There was significant evidence of an association between endometriosis-related genetic variation and ovarian cancer risk, especially for the high-grade serous and clear cell histotypes. Overall we observed 15 significant burden statistics, which was three times more than expected. By focusing on candidate regions from a phenotype associated with ovarian cancer, we have shown a clear genetic link between endometriosis and ovarian cancer that warrants further follow-up. The functional significance of the identified regions and SNPs is presently uncertain, though future fine mapping and histotype-specific functional analyses may shed light on the etiologies of both gynecologic conditions. Copyright © 2016. Published by Elsevier Inc.

  4. Evaluation and application of summary statistic imputation to discover new height-associated loci.

    PubMed

    Rüeger, Sina; McDaid, Aaron; Kutalik, Zoltán

    2018-05-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression.

  5. Evaluation and application of summary statistic imputation to discover new height-associated loci

    PubMed Central

    2018-01-01

    As most of the heritability of complex traits is attributed to common and low frequency genetic variants, imputing them by combining genotyping chips and large sequenced reference panels is the most cost-effective approach to discover the genetic basis of these traits. Association summary statistics from genome-wide meta-analyses are available for hundreds of traits. Updating these to ever-increasing reference panels is very cumbersome as it requires reimputation of the genetic data, rerunning the association scan, and meta-analysing the results. A much more efficient method is to directly impute the summary statistics, termed as summary statistics imputation, which we improved to accommodate variable sample size across SNVs. Its performance relative to genotype imputation and practical utility has not yet been fully investigated. To this end, we compared the two approaches on real (genotyped and imputed) data from 120K samples from the UK Biobank and show that, genotype imputation boasts a 3- to 5-fold lower root-mean-square error, and better distinguishes true associations from null ones: We observed the largest differences in power for variants with low minor allele frequency and low imputation quality. For fixed false positive rates of 0.001, 0.01, 0.05, using summary statistics imputation yielded a decrease in statistical power by 9, 43 and 35%, respectively. To test its capacity to discover novel associations, we applied summary statistics imputation to the GIANT height meta-analysis summary statistics covering HapMap variants, and identified 34 novel loci, 19 of which replicated using data in the UK Biobank. Additionally, we successfully replicated 55 out of the 111 variants published in an exome chip study. Our study demonstrates that summary statistics imputation is a very efficient and cost-effective way to identify and fine-map trait-associated loci. Moreover, the ability to impute summary statistics is important for follow-up analyses, such as Mendelian randomisation or LD-score regression. PMID:29782485

  6. Genetic polymorphisms in the ESR1 gene and cerebral infarction risk: a meta-analysis.

    PubMed

    Gao, Hong-Hua; Gao, Lian-Bo; Wen, Jia-Mei

    2014-09-01

    A number of studies have documented that estrogen receptor α (ESR1) may play an important role in the development and progression of cerebral infarction, but many existing studies have yielded inconclusive results. This meta-analysis was performed to evaluate the relationships between ESR1 genetic polymorphisms and cerebral infarction risk. The PubMed, CISCOM, CINAHL, Web of Science, Google Scholar, EBSCO, Cochrane Library, and CBM databases were searched for relevant articles published before October 1, 2013, without any language restrictions. Meta-analysis was conducted using the STATA 12.0 software. Seven case-control studies were included with a total of 1471 patients with cerebral infarction and 4688 healthy control subjects. Two common single-nucleotide polymorphisms (SNPs) in the ESR1 gene (rs2234693 T>C and rs9340799 A>G) were assessed. Our meta-analysis results revealed that ESR1 genetic polymorphisms might increase the risk of cerebral infarction. Subgroup analysis by SNP type indicated that both rs2234693 and rs9340799 polymorphisms in the ESR1 gene were strongly associated with an increased risk of cerebral infarction. Further subgroup analysis by ethnicity showed significant associations between ESR1 genetic polymorphisms and increased risk of cerebral infarction among both Asians and Caucasians. In the stratified subgroup analysis by gender, the results suggested that ESR1 genetic polymorphisms were associated with an increased risk of cerebral infarction in the female population. However, there were no statistically significant associations between ESR1 genetic polymorphisms and cerebral infarction risk in the male population. Meta-regression analyses also confirmed that gender might be a main source of heterogeneity. Our findings indicate that ESR1 genetic polymorphisms may contribute to the development of cerebral infarction, especially in the female population.

  7. New insights into the endophenotypic status of cognition in bipolar disorder: genetic modelling study of twins and siblings.

    PubMed

    Georgiades, Anna; Rijsdijk, Fruhling; Kane, Fergus; Rebollo-Mesa, Irene; Kalidindi, Sridevi; Schulze, Katja K; Stahl, Daniel; Walshe, Muriel; Sahakian, Barbara J; McDonald, Colm; Hall, Mei-Hua; Murray, Robin M; Kravariti, Eugenia

    2016-06-01

    Twin studies have lacked statistical power to apply advanced genetic modelling techniques to the search for cognitive endophenotypes for bipolar disorder. To quantify the shared genetic variability between bipolar disorder and cognitive measures. Structural equation modelling was performed on cognitive data collected from 331 twins/siblings of varying genetic relatedness, disease status and concordance for bipolar disorder. Using a parsimonious AE model, verbal episodic and spatial working memory showed statistically significant genetic correlations with bipolar disorder (rg = |0.23|-|0.27|), which lost statistical significance after covarying for affective symptoms. Using an ACE model, IQ and visual-spatial learning showed statistically significant genetic correlations with bipolar disorder (rg = |0.51|-|1.00|), which remained significant after covarying for affective symptoms. Verbal episodic and spatial working memory capture a modest fraction of the bipolar diathesis. IQ and visual-spatial learning may tap into genetic substrates of non-affective symptomatology in bipolar disorder. © The Royal College of Psychiatrists 2016.

  8. Evaluation of genetic diversity of Panicum turgidum Forssk from Saudi Arabia.

    PubMed

    Assaeed, Abdulaziz M; Al-Faifi, Sulieman A; Migdadi, Hussein M; El-Bana, Magdy I; Al Qarawi, Abdulaziz A; Khan, Mohammad Altaf

    2018-01-01

    The genetic diversity of 177 accessions of Panicum turgidum Forssk, representing ten populations collected from four geographical regions in Saudi Arabia, was analyzed using amplified fragment length polymorphism (AFLP) markers. A set of four primer-pairs with two/three selective nucleotides scored 836 AFLP amplified fragments (putative loci/genome landmarks), all of which were polymorphic. Populations collected from the southern region of the country showed the highest genetic diversity parameters, whereas those collected from the central regions showed the lowest values. Analysis of molecular variance (AMOVA) revealed that 78% of the genetic variability was attributable to differences within populations. Pairwise values for population differentiation and genetic structure were statistically significant for all variances. The UPGMA dendrogram, validated by principal coordinate analysis-grouped accessions, corresponded to the geographical origin of the accessions. Mantel's test showed that there was a significant correlation between the genetic and geographical distances ( r  = 0.35, P  < 0.04). In summary, the AFLP assay demonstrated the existence of substantial genetic variation in P. turgidum . The relationship between the genetic diversity and geographical source of P. turgidum populations of Saudi Arabia, as revealed through this comprehensive study, will enable effective resource management and restoration of new areas without compromising adaptation and genetic diversity.

  9. A hybrid correlation analysis with application to imaging genetics

    NASA Astrophysics Data System (ADS)

    Hu, Wenxing; Fang, Jian; Calhoun, Vince D.; Wang, Yu-Ping

    2018-03-01

    Investigating the association between brain regions and genes continues to be a challenging topic in imaging genetics. Current brain region of interest (ROI)-gene association studies normally reduce data dimension by averaging the value of voxels in each ROI. This averaging may lead to a loss of information due to the existence of functional sub-regions. Pearson correlation is widely used for association analysis. However, it only detects linear correlation whereas nonlinear correlation may exist among ROIs. In this work, we introduced distance correlation to ROI-gene association analysis, which can detect both linear and nonlinear correlations and overcome the limitation of averaging operations by taking advantage of the information at each voxel. Nevertheless, distance correlation usually has a much lower value than Pearson correlation. To address this problem, we proposed a hybrid correlation analysis approach, by applying canonical correlation analysis (CCA) to the distance covariance matrix instead of directly computing distance correlation. Incorporating CCA into distance correlation approach may be more suitable for complex disease study because it can detect highly associated pairs of ROI and gene groups, and may improve the distance correlation level and statistical power. In addition, we developed a novel nonlinear CCA, called distance kernel CCA, which seeks the optimal combination of features with the most significant dependence. This approach was applied to imaging genetic data from the Philadelphia Neurodevelopmental Cohort (PNC). Experiments showed that our hybrid approach produced more consistent results than conventional CCA across resampling and both the correlation and statistical significance were increased compared to distance correlation analysis. Further gene enrichment analysis and region of interest (ROI) analysis confirmed the associations of the identified genes with brain ROIs. Therefore, our approach provides a powerful tool for finding the correlation between brain imaging and genomic data.

  10. Determinism and mass-media portrayals of genetics.

    PubMed Central

    Condit, C M; Ofulue, N; Sheedy, K M

    1998-01-01

    Scholars have expressed concern that the introduction of substantial coverage of "medical genetics" in the mass media during the past 2 decades represents an increase in biological determinism in public discourse. To test this contention, we analyzed the contents of a randomly selected, structured sample of American public newspapers (n=250) and magazines (n=722) published during 1919-95. Three coders, using three measures, all with intercoder reliability >85%, were employed. Results indicate that the introduction of the discourse of medical genetics is correlated with both a statistically significant decrease in the degree to which articles attribute human characteristics to genetic causes (P<.001) and a statistically significant increase in the differentiation of attributions to genetic and other causes among various conditions or outcomes (P<. 016). There has been no statistically significant change in the relative proportions of physical phenomena attributed to genetic causes, but there has been a statistically significant decrease in the number of articles assigning genetic causes to mental (P<.002) and behavioral (P<.000) characteristics. These results suggest that the current discourse of medical genetics is not accurately described as more biologically deterministic than its antecedents. PMID:9529342

  11. Analysis of biochemical genetic data on Jewish populations: II. Results and interpretations of heterogeneity indices and distance measures with respect to standards.

    PubMed

    Karlin, S; Kenett, R; Bonné-Tamir, B

    1979-05-01

    A nonparametric statistical methodology is used for the analysis of biochemical frequency data observed on a series of nine Jewish and six non-Jewish populations. Two categories of statistics are used: heterogeneity indices and various distance measures with respect to a standard. The latter are more discriminating in exploiting historical, geographical and culturally relevant information. A number of partial orderings and distance relationships among the populations are determined. Our concern in this study is to analyze similarities and differences among the Jewish populations, in terms of the gene frequency distributions for a number of genetic markers. Typical questions discussed are as follows: These Jewish populations differ in certain morphological and anthropometric traits. Are there corresponding differences in biochemical genetic constitution? How can we assess the extent of heterogeneity between and within groupings? Which class of markers (blood typings or protein loci) discriminates better among the separate populations? The results are quite surprising. For example, we found the Ashkenazi, Sephardi and Iraqi Jewish populations to be consistently close in genetic constitution and distant from all the other populations, namely the Yemenite and Cochin Jews, the Arabs, and the non-Jewish German and Russian populations. We found the Polish Jewish community the most heterogeneous among all Jewish populations. The blood loci discriminate better than the protein loci. A number of possible interpretations and hypotheses for these and other results are offered. The method devised for this analysis should prove useful in studying similarities and differences for other groups of populations for which substantial biochemical polymorphic data are available.

  12. Genetic Variation in the Raptor Gene Is Associated With Overweight But Not Hypertension in American Men of Japanese Ancestry

    PubMed Central

    Carnes, Bruce A.; Chen, Randi; Donlon, Timothy A.; He, Qimei; Grove, John S.; Masaki, Kamal H.; Elliott, Ayako; Willcox, Donald C.; Allsopp, Richard; Willcox, Bradley J.

    2015-01-01

    BACKGROUND The mechanistic target of rapamycin (mTOR) pathway is pivotal for cell growth. Regulatory associated protein of mTOR complex I (Raptor) is a unique component of this pro-growth complex. The present study tested whether variation across the raptor gene (RPTOR) is associated with overweight and hypertension. METHODS We tested 61 common (allele frequency ≥ 0.1) tagging single nucleotide polymorphisms (SNPs) that captured most of the genetic variation across RPTOR in 374 subjects of normal lifespan and 439 subjects with a lifespan exceeding 95 years for association with overweight/obesity, essential hypertension, and isolated systolic hypertension. Subjects were drawn from the Honolulu Heart Program, a homogeneous population of American men of Japanese ancestry, well characterized for phenotypes relevant to conditions of aging. Hypertension status was ascertained when subjects were 45–68 years old. Statistical evaluation involved contingency table analysis, logistic regression, and the powerful method of recursive partitioning. RESULTS After analysis of RPTOR genotypes by each statistical approach, we found no significant association between genetic variation in RPTOR and either essential hypertension or isolated systolic hypertension. Models generated by recursive partitioning analysis showed that RPTOR SNPs significantly enhanced the ability of the model to accurately assign individuals to either the overweight/obese or the non-overweight/obese groups (P = 0.008 by 1-tailed Z test). CONCLUSION Common genetic variation in RPTOR is associated with overweight/obesity but does not discernibly contribute to either essential hypertension or isolated systolic hypertension in the population studied. PMID:25249372

  13. Data mining and computationally intensive methods: summary of Group 7 contributions to Genetic Analysis Workshop 13.

    PubMed

    Costello, Tracy J; Falk, Catherine T; Ye, Kenny Q

    2003-01-01

    The Framingham Heart Study data, as well as a related simulated data set, were generously provided to the participants of the Genetic Analysis Workshop 13 in order that newly developed and emerging statistical methodologies could be tested on that well-characterized data set. The impetus driving the development of novel methods is to elucidate the contributions of genes, environment, and interactions between and among them, as well as to allow comparison between and validation of methods. The seven papers that comprise this group used data-mining methodologies (tree-based methods, neural networks, discriminant analysis, and Bayesian variable selection) in an attempt to identify the underlying genetics of cardiovascular disease and related traits in the presence of environmental and genetic covariates. Data-mining strategies are gaining popularity because they are extremely flexible and may have greater efficiency and potential in identifying the factors involved in complex disorders. While the methods grouped together here constitute a diverse collection, some papers asked similar questions with very different methods, while others used the same underlying methodology to ask very different questions. This paper briefly describes the data-mining methodologies applied to the Genetic Analysis Workshop 13 data sets and the results of those investigations. Copyright 2003 Wiley-Liss, Inc.

  14. The power and robustness of maximum LOD score statistics.

    PubMed

    Yoo, Y J; Mendell, N R

    2008-07-01

    The maximum LOD score statistic is extremely powerful for gene mapping when calculated using the correct genetic parameter value. When the mode of genetic transmission is unknown, the maximum of the LOD scores obtained using several genetic parameter values is reported. This latter statistic requires higher critical value than the maximum LOD score statistic calculated from a single genetic parameter value. In this paper, we compare the power of maximum LOD scores based on three fixed sets of genetic parameter values with the power of the LOD score obtained after maximizing over the entire range of genetic parameter values. We simulate family data under nine generating models. For generating models with non-zero phenocopy rates, LOD scores maximized over the entire range of genetic parameters yielded greater power than maximum LOD scores for fixed sets of parameter values with zero phenocopy rates. No maximum LOD score was consistently more powerful than the others for generating models with a zero phenocopy rate. The power loss of the LOD score maximized over the entire range of genetic parameters, relative to the maximum LOD score calculated using the correct genetic parameter value, appeared to be robust to the generating models.

  15. The relationship between the number of loci and the statistical support for the topology of UPGMA trees obtained from genetic distance data.

    PubMed

    Highton, R

    1993-12-01

    An analysis of the relationship between the number of loci utilized in an electrophoretic study of genetic relationships and the statistical support for the topology of UPGMA trees is reported for two published data sets. These are Highton and Larson (Syst. Zool.28:579-599, 1979), an analysis of the relationships of 28 species of plethodonine salamanders, and Hedges (Syst. Zool., 35:1-21, 1986), a similar study of 30 taxa of Holarctic hylid frogs. As the number of loci increases, the statistical support for the topology at each node in UPGMA trees was determined by both the bootstrap and jackknife methods. The results show that the bootstrap and jackknife probabilities supporting the topology at some nodes of UPGMA trees increase as the number of loci utilized in a study is increased, as expected for nodes that have groupings that reflect phylogenetic relationships. The pattern of increase varies and is especially rapid in the case of groups with no close relatives. At nodes that likely do not represent correct phylogenetic relationships, the bootstrap probabilities do not increase and often decline with the addition of more loci.

  16. DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

    PubMed

    Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

    2010-05-01

    Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.

  17. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

    PubMed Central

    Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

    2016-01-01

    Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286

  18. Genetic polymorphisms, forensic efficiency and phylogenetic analysis of 15 autosomal STR loci in the Kazak population of Ili Kazak Autonomous Prefecture, northwestern China.

    PubMed

    Feng, Chunmei; Wang, Xin; Wang, Xiaolong; Yu, Hao; Zhang, Guohua

    2018-03-01

    We investigated the frequencies of 15 autosomal STR loci in the Kazak population of the Ili Kazak Autonomous Prefecture with the aim of expanding the available population information in human genetic databases and for forensic DNA analysis. Genetic polymorphisms of 15 autosomal short tandem repeat (STR) loci were analysed in 456 individuals of the Kazak population from Ili Kazakh Autonomous Prefecture, northwestern China. A total of 173 alleles at 15 autosomal STR loci were found; the allele frequencies ranged from 0.5022-0.0011. The combined power of discrimination and exclusion statistics for the 15 STR loci were 0.999 999 999 85 and 0.999 998 800 65, respectively. In addition, phylogenetic analysis involving the Ili Uygur population and other relevant populations was carried out. A neighbour-joining tree and multidimensional scaling plot were generated based on Nei's standard genetic distance. Results of the population comparison indicated that the Ili Uygur population was most closely related genetically to the Uygur populations from other regions in China. These findings are consistent with the historical and geographic backgrounds of these populations.

  19. Senior Computational Scientist | Center for Cancer Research

    Cancer.gov

    The Basic Science Program (BSP) pursues independent, multidisciplinary research in basic and applied molecular biology, immunology, retrovirology, cancer biology, and human genetics. Research efforts and support are an integral part of the Center for Cancer Research (CCR) at the Frederick National Laboratory for Cancer Research (FNLCR). The Cancer & Inflammation Program (CIP), Basic Science Program, HLA Immunogenetics Section, under the leadership of Dr. Mary Carrington, studies the influence of human leukocyte antigens (HLA) and specific KIR/HLA genotypes on risk of and outcomes to infection, cancer, autoimmune disease, and maternal-fetal disease. Recent studies have focused on the impact of HLA gene expression in disease, the molecular mechanism regulating expression levels, and the functional basis for the effect of differential expression on disease outcome. The lab’s further focus is on the genetic basis for resistance/susceptibility to disease conferred by immunogenetic variation. KEY ROLES/RESPONSIBILITIES The Senior Computational Scientist will provide research support to the CIP-BSP-HLA Immunogenetics Section performing bio-statistical design, analysis and reporting of research projects conducted in the lab. This individual will be involved in the implementation of statistical models and data preparation. Successful candidate should have 5 or more years of competent, innovative biostatistics/bioinformatics research experience, beyond doctoral training Considerable experience with statistical software, such as SAS, R and S-Plus Sound knowledge, and demonstrated experience of theoretical and applied statistics Write program code to analyze data using statistical analysis software Contribute to the interpretation and publication of research results

  20. Genetic Alterations in Familial Breast Cancer: Mapping and Cloning Genes Other Than BRCAl

    DTIC Science & Technology

    1997-09-01

    predisposition to breast cancer in families. The gene PTEN was successfully cloned by this project, and simultaneously by others (for a different ...with germline translocations’and breast cancer for the identification of tumor suppressor genes. 14. SUBJECT TERMS Breast cancer 17. SECURITY...would limit the statistical power of linkage analysis. Therefore, we decided to integrate linkage analysis with the analysis of germline chromosomal

  1. Spatio-temporal Genetic Structuring of Leishmania major in Tunisia by Microsatellite Analysis

    PubMed Central

    Harrabi, Myriam; Bettaieb, Jihène; Ghawar, Wissem; Toumi, Amine; Zaâtour, Amor; Yazidi, Rihab; Chaâbane, Sana; Chalghaf, Bilel; Hide, Mallorie; Bañuls, Anne-Laure; Ben Salah, Afif

    2015-01-01

    In Tunisia, cases of zoonotic cutaneous leishmaniasis caused by Leishmania major are increasing and spreading from the south-west to new areas in the center. To improve the current knowledge on L. major evolution and population dynamics, we performed multi-locus microsatellite typing of human isolates from Tunisian governorates where the disease is endemic (Gafsa, Kairouan and Sidi Bouzid governorates) and collected during two periods: 1991–1992 and 2008–2012. Analysis (F-statistics and Bayesian model-based approach) of the genotyping results of isolates collected in Sidi Bouzid in 1991–1992 and 2008–2012 shows that, over two decades, in the same area, Leishmania parasites evolved by generating genetically differentiated populations. The genetic patterns of 2008–2012 isolates from the three governorates indicate that L. major populations did not spread gradually from the south to the center of Tunisia, according to a geographical gradient, suggesting that human activities might be the source of the disease expansion. The genotype analysis also suggests previous (Bayesian model-based approach) and current (F-statistics) flows of genotypes between governorates and districts. Human activities as well as reservoir dynamics and the effects of environmental changes could explain how the disease progresses. This study provides new insights into the evolution and spread of L. major in Tunisia that might improve our understanding of the parasite flow between geographically and temporally distinct populations. PMID:26302440

  2. Genetic overlap between type 2 diabetes and major depressive disorder identified by bioinformatics analysis.

    PubMed

    Ji, Hong-Fang; Zhuang, Qi-Shuai; Shen, Liang

    2016-04-05

    Our study investigated the shared genetic etiology underlying type 2 diabetes (T2D) and major depressive disorder (MDD) by analyzing large-scale genome wide association studies statistics. A total of 496 shared SNPs associated with both T2D and MDD were identified at p-value ≤ 1.0E-07. Functional enrichment analysis showed that the enriched pathways pertained to immune responses (Fc gamma R-mediated phagocytosis, T cell and B cell receptors signaling), cell signaling (MAPK, Wnt signaling), lipid metabolism, and cancer associated pathways. The findings will have potential implications for future interventional studies of the two diseases.

  3. 6C.04: INTEGRATED SNP ANALYSIS AND METABOLOMIC PROFILES OF METABOLIC SYNDROME.

    PubMed

    Marrachelli, V; Monleon, D; Morales, J M; Rentero, P; Martínez, F; Chaves, F J; Martin-Escudero, J C; Redon, J

    2015-06-01

    Metabolic syndrome (MS) has become a health and financial burden worldwide. Susceptibility of genetically determined metabotype of MS has not yet been investigated. We aimed to identify a distinctive metabolic profile of blood serum which might correlates to the early detection of the development of MS associated to genetic polymorphism. We applied high resolution NMR spectroscopy to profile blood serum from patients without MS (n = 945) or with (n = 291). Principal component analysis (PCA) and projection to latent structures for discriminant analysis (PLS-DA) were applied to NMR spectral datasets. Results were cross-validated using the Venetian Blinds approach. Additionally, five SNPs previously associated with MS were genotyped with SNPlex and tested for associations between the metabolic profiles and the genetic variants. Statistical analysis was performed using in-house MATLAB scripts and the PLS Toolbox statistical multivariate analysis library. Our analysis provided a PLS-DA Metabolic Syndrome discrimination model based on NMR metabolic profile (AUC = 0.86) with 84% of sensitivity and 72% specificity. The model identified 11 metabolites differentially regulated in patients with MS. Among others, fatty acids, glucose, alanine, hydroxyisovalerate, acetone, trimethylamine, 2-phenylpropionate, isobutyrate and valine, significantly contributed to the model. The combined analysis of metabolomics and SNP data revealed an association between the metabolic profile of MS and genes polymorphism involved in the adiposity regulation and fatty acids metabolism: rs2272903_TT (TFAP2B), rs3803_TT (GATA2), rs174589_CC (FADS2) and rs174577_AA (FADS2). In addition, individuals with the rs2272903-TT genotype seem to develop MS earlier than general population. Our study provides new insights on the metabolic alterations associated with a MS high-risk genotype. These results could help in future development of risk assessment and predictive models for subclinical cardiovascular disease.

  4. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models

    PubMed Central

    Chiu, Chi-yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-ling; Xiong, Momiao; Fan, Ruzong

    2017-01-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai–Bartlett trace, Hotelling–Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data. PMID:28000696

  5. Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models.

    PubMed

    Chiu, Chi-Yang; Jung, Jeesun; Chen, Wei; Weeks, Daniel E; Ren, Haobo; Boehnke, Michael; Amos, Christopher I; Liu, Aiyi; Mills, James L; Ting Lee, Mei-Ling; Xiong, Momiao; Fan, Ruzong

    2017-02-01

    To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.

  6. LSAT Dimensionality Analysis for the December 1991, June 1992, and October 1992 Administrations. Statistical Report. LSAC Research Report Series.

    ERIC Educational Resources Information Center

    Douglas, Jeff; Kim, Hae-Rim; Roussos, Louis; Stout, William; Zhang, Jinming

    An extensive nonparametric dimensionality analysis of latent structure was conducted on three forms of the Law School Admission Test (LSAT) (December 1991, June 1992, and October 1992) using the DIMTEST model in confirmatory analyses and using DIMTEST, FAC, DETECT, HCA, PROX, and a genetic algorithm in exploratory analyses. Results indicate that…

  7. Untargeted Metabolic Quantitative Trait Loci Analyses Reveal a Relationship between Primary Metabolism and Potato Tuber Quality1[W][OA

    PubMed Central

    Carreno-Quintero, Natalia; Acharjee, Animesh; Maliepaard, Chris; Bachem, Christian W.B.; Mumm, Roland; Bouwmeester, Harro; Visser, Richard G.F.; Keurentjes, Joost J.B.

    2012-01-01

    Recent advances in -omics technologies such as transcriptomics, metabolomics, and proteomics along with genotypic profiling have permitted dissection of the genetics of complex traits represented by molecular phenotypes in nonmodel species. To identify the genetic factors underlying variation in primary metabolism in potato (Solanum tuberosum), we have profiled primary metabolite content in a diploid potato mapping population, derived from crosses between S. tuberosum and wild relatives, using gas chromatography-time of flight-mass spectrometry. In total, 139 polar metabolites were detected, of which we identified metabolite quantitative trait loci for approximately 72% of the detected compounds. In order to obtain an insight into the relationships between metabolic traits and classical phenotypic traits, we also analyzed statistical associations between them. The combined analysis of genetic information through quantitative trait locus coincidence and the application of statistical learning methods provide information on putative indicators associated with the alterations in metabolic networks that affect complex phenotypic traits. PMID:22223596

  8. Forensic molecular genetic diversity analysis of Chinese Hui ethnic group based on a novel STR panel.

    PubMed

    Fang, Yating; Guo, Yuxin; Xie, Tong; Jin, Xiaoye; Lan, Qiong; Zhou, Yongsong; Zhu, Bofeng

    2018-03-26

    In present study, the genetic polymorphisms of 22 autosomal short tandem repeat (STR) loci were analyzed in 496 unrelated Chinese Xinjiang Hui individuals. These autosomal STR loci were multiplex amplified and genotyped based on a novel STR panel. There were 246 observed alleles with the allele frequencies ranging from 0.0010 to 0.3609. All polymorphic information content values were higher than 0.7. The combined power of discrimination and the combined probability of exclusion were 0.999999999999999999999999999426766 and 0.999999999860491, respectively. Based on analysis of molecular variance method, genetic differentiation analysis between the Xinjiang Hui and other reported groups were conducted at these 22 loci. The results indicated that there were no significant differences in statistics between Hui group and Northern Han group (including Han groups from Hebei, Henan, Shaanxi provinces), and significant deviations with Southern Han group (including those from Guangdong, Guangxi provinces) at 7 loci, and Uygur group at 10 loci. To sum up, these 22 autosomal STR loci were high genetic polymorphic in Xinjiang Hui group.

  9. Genetic variation and structure in remnant population of critically endangered Melicope zahlbruckneri

    USGS Publications Warehouse

    Raji, J. A.; Atkinson, Carter T.

    2016-01-01

    The distribution and amount of genetic variation within and between populations of plant species are important for their adaptability to future habitat changes and also critical for their restoration and overall management. This study was initiated to assess the genetic status of the remnant population of Melicope zahlbruckneri–a critically endangered species in Hawaii, and determine the extent of genetic variation and diversity in order to propose valuable conservation approaches. Estimated genetic structure of individuals based on molecular marker allele frequencies identified genetic groups with low overall differentiation but identified the most genetically diverse individuals within the population. Analysis of Amplified Fragment Length Polymorphic (AFLP) marker loci in the population based on Bayesian model and multivariate statistics classified the population into four subgroups. We inferred a mixed species population structure based on Bayesian clustering and frequency of unique alleles. The percentage of Polymorphic Fragment (PPF) ranged from 18.8 to 64.6% for all marker loci with an average of 54.9% within the population. Inclusion of all surviving M. zahlbruckneri trees in future restorative planting at new sites are suggested, and approaches for longer term maintenance of genetic variability are discussed. To our knowledge, this study represents the first report of molecular genetic analysis of the remaining population of M. zahlbruckneri and also illustrates the importance of genetic variability for conservation of a small endangered population.

  10. Safety Assessment of Food and Feed from GM Crops in Europe: Evaluating EFSA's Alternative Framework for the Rat 90-day Feeding Study.

    PubMed

    Hong, Bonnie; Du, Yingzhou; Mukerji, Pushkor; Roper, Jason M; Appenzeller, Laura M

    2017-07-12

    Regulatory-compliant rodent subchronic feeding studies are compulsory regardless of a hypothesis to test, according to recent EU legislation for the safety assessment of whole food/feed produced from genetically modified (GM) crops containing a single genetic transformation event (European Union Commission Implementing Regulation No. 503/2013). The Implementing Regulation refers to guidelines set forth by the European Food Safety Authority (EFSA) for the design, conduct, and analysis of rodent subchronic feeding studies. The set of EFSA recommendations was rigorously applied to a 90-day feeding study in Sprague-Dawley rats. After study completion, the appropriateness and applicability of these recommendations were assessed using a battery of statistical analysis approaches including both retrospective and prospective statistical power analyses as well as variance-covariance decomposition. In the interest of animal welfare considerations, alternative experimental designs were investigated and evaluated in the context of informing the health risk assessment of food/feed from GM crops.

  11. The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape

    PubMed Central

    Dobon, Begoña; Hassan, Hisham Y.; Laayouni, Hafid; Luisi, Pierre; Ricaño-Ponce, Isis; Zhernakova, Alexandra; Wijmenga, Cisca; Tahir, Hanan; Comas, David; Netea, Mihai G.; Bertranpetit, Jaume

    2015-01-01

    East Africa is a strategic region to study human genetic diversity due to the presence of ethnically, linguistically, and geographically diverse populations. Here, we provide new insight into the genetic history of populations living in the Sudanese region of East Africa by analysing nine ethnic groups belonging to three African linguistic families: Niger-Kordofanian, Nilo-Saharan and Afro-Asiatic. A total of 500 individuals were genotyped for 200,000 single-nucleotide polymorphisms. Principal component analysis, clustering analysis using ADMIXTURE, FST statistics, and the three-population test were used to investigate the underlying genetic structure and ancestry of the different ethno-linguistic groups. Our analyses revealed a genetic component for Sudanese Nilo-Saharan speaking groups (Darfurians and part of Nuba populations) related to Nilotes of South Sudan, but not to other Sudanese populations or other sub-Saharan populations. Populations inhabiting the North of the region showed close genetic affinities with North Africa, with a component that could be remnant of North Africans before the migrations of Arabs from Arabia. In addition, we found very low genetic distances between populations in genes important for anti-malarial and anti-bacterial host defence, suggesting similar selective pressures on these genes and stressing the importance of considering functional pathways to understand the evolutionary history of populations. PMID:26017457

  12. An analysis of the metabolic theory of the origin of the genetic code

    NASA Technical Reports Server (NTRS)

    Amirnovin, R.; Bada, J. L. (Principal Investigator)

    1997-01-01

    A computer program was used to test Wong's coevolution theory of the genetic code. The codon correlations between the codons of biosynthetically related amino acids in the universal genetic code and in randomly generated genetic codes were compared. It was determined that many codon correlations are also present within random genetic codes and that among the random codes there are always several which have many more correlations than that found in the universal code. Although the number of correlations depends on the choice of biosynthetically related amino acids, the probability of choosing a random genetic code with the same or greater number of codon correlations as the universal genetic code was found to vary from 0.1% to 34% (with respect to a fairly complete listing of related amino acids). Thus, Wong's theory that the genetic code arose by coevolution with the biosynthetic pathways of amino acids, based on codon correlations between biosynthetically related amino acids, is statistical in nature.

  13. Terminology, concepts, and models in genetic epidemiology.

    PubMed

    Teare, M Dawn; Koref, Mauro F Santibàñez

    2011-01-01

    Genetic epidemiology brings together approaches and techniques developed in mathematical genetics and statistics, medical genetics, quantitative genetics, and epidemiology. In the 1980s, the focus was on the mapping and identification of genes where defects had large effects at the individual level. More recently, statistical and experimental advances have made possible to identify and characterise genes associated with small effects at the individual level. In this chapter, we provide a brief outline of the models, concepts, and terminology used in genetic epidemiology.

  14. Higher criticism approach to detect rare variants using whole genome sequencing data

    PubMed Central

    2014-01-01

    Because of low statistical power of single-variant tests for whole genome sequencing (WGS) data, the association test for variant groups is a key approach for genetic mapping. To address the features of sparse and weak genetic effects to be detected, the higher criticism (HC) approach has been proposed and theoretically has proven optimal for detecting sparse and weak genetic effects. Here we develop a strategy to apply the HC approach to WGS data that contains rare variants as the majority. By using Genetic Analysis Workshop 18 "dose" genetic data with simulated phenotypes, we assess the performance of HC under a variety of strategies for grouping variants and collapsing rare variants. The HC approach is compared with the minimal p-value method and the sequence kernel association test. The results show that the HC approach is preferred for detecting weak genetic effects. PMID:25519367

  15. The role of AMH and its receptor SNP in the pathogenesis of PCOS.

    PubMed

    Wang, Fang; Niu, Wen-Bin; Kong, Hui-Juan; Guo, Yi-Hong; Sun, Ying-Pu

    2017-01-05

    The etiology of polycystic ovaries syndrome (PCOS) is unknown. Studies probing the role of genetic variants of anti-Mullerian hormone (AMH) and its type II receptor (AMHR2) in the pathogenesis of PCOS have yielded inconsistent results. Thus, we performed a systematic review and meta-analysis to determine the role of genetic variants of AMH/AMHR2 in the pathogenesis of PCOS. A systematic search of electronic databases was performed. Statistical analysis was performed using the Comprehensive Meta-Analysis software (Version 3). Pooled Odds Ratios (OR) (95% confidence intervals) were determined to assess the association between genetic variants of AMH/AMHR2 and PCOS. Five studies, involving a total of 2042 PCOS cases and 1071 controls, were included in the meta-analysis. Single nucleotide polymorphisms of AMH and AMHR2 did not appear to confer a heightened risk for PCOS (OR: 0.954, 95% CI: 0.848-1.073; P = 0.435; and OR: 1.074, 95% CI: 0.875-1.318; P = 0.494, respectively). In this study, genetic variants of AMH or AMHR2 were not found to be associated with a higher risk for PCOS. Copyright © 2016. Published by Elsevier Ireland Ltd.

  16. Near-exact distributions for the block equicorrelation and equivariance likelihood ratio test statistic

    NASA Astrophysics Data System (ADS)

    Coelho, Carlos A.; Marques, Filipe J.

    2013-09-01

    In this paper the authors combine the equicorrelation and equivariance test introduced by Wilks [13] with the likelihood ratio test (l.r.t.) for independence of groups of variables to obtain the l.r.t. of block equicorrelation and equivariance. This test or its single block version may find applications in many areas as in psychology, education, medicine, genetics and they are important "in many tests of multivariate analysis, e.g. in MANOVA, Profile Analysis, Growth Curve analysis, etc" [12, 9]. By decomposing the overall hypothesis into the hypotheses of independence of groups of variables and the hypothesis of equicorrelation and equivariance we are able to obtain the expressions for the overall l.r.t. statistic and its moments. From these we obtain a suitable factorization of the characteristic function (c.f.) of the logarithm of the l.r.t. statistic, which enables us to develop highly manageable and precise near-exact distributions for the test statistic.

  17. The Genetics Concept Assessment: a new concept inventory for gauging student understanding of genetics.

    PubMed

    Smith, Michelle K; Wood, William B; Knight, Jennifer K

    2008-01-01

    We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course.

  18. The Genetics Concept Assessment: A New Concept Inventory for Gauging Student Understanding of Genetics

    PubMed Central

    Wood, William B.; Knight, Jennifer K.

    2008-01-01

    We have designed, developed, and validated a 25-question Genetics Concept Assessment (GCA) to test achievement of nine broad learning goals in majors and nonmajors undergraduate genetics courses. Written in everyday language with minimal jargon, the GCA is intended for use as a pre- and posttest to measure student learning gains. The assessment was reviewed by genetics experts, validated by student interviews, and taken by >600 students at three institutions. Normalized learning gains on the GCA were positively correlated with averaged exam scores, suggesting that the GCA measures understanding of topics relevant to instructors. Statistical analysis of our results shows that differences in the item difficulty and item discrimination index values between different questions on pre- and posttests can be used to distinguish between concepts that are well or poorly learned during a course. PMID:19047428

  19. PTPN22 1858C > T polymorphism and susceptibility to systemic lupus erythematosus: a meta-analysis update.

    PubMed

    de Lima, Suelen Cristina; Adelino, José Eduardo; Crovella, Sergio; de Azevedo Silva, Jaqueline; Sandrin-Garcia, Paula

    2017-11-01

    Studies performed in the past years showed PTNP22 1858 C > T (rs2476601) polymorphism as associated with systemic lupus erythematosus susceptibility, although conflicting findings are still found. In this context, a powerful statistical study, such as meta-analysis, is necessary to establish a consensus. The aim of this study was to evaluate association studies between the PTPN22 1858 C > T polymorphism and SLE by a meta-analysis update, including three recently published studies in the last three years. A total of 3868 SLE patients and 7458 healthy individuals were considered herein, enclosing 19 studies from Asian, American, European and Latin ethnic groups. Odds ratio (OR) was performed for allelic, dominant and recessive genetic models. Statistically significant association was found between the PTPN22 1858 C > T polymorphism and susceptibility to SLE in all inheritance models. Allelic genetic model data (OR = 1.54, 95% confidence interval (CI) = 1.38-1.72, p value=.000) shows that T allele confers increased SLE susceptibility. As well as recessive genetic model (OR = 2.04, 95% CI = 1.09-3.82, p value = .030) for T/T genotype. Instead, dominant genetic model shows that C/C genotype confers lower susceptibility for SLE development (OR = 0.62, 95% CI = 0.54-0.72, p value = .000). In addition, we provided an ethnicity-derived meta-analysis. The results showed association in Caucasian (OR = 1.47, p value = .000) and Latin (OR = 2.41, p value = .000) ethnic groups. However, rs2476601 polymorphism is not associated nor in Asian (OR= 1.31; p value = .54) and African (OR = 2.04; p value=.22) populations. In conclusion, present meta-analysis update confirms that T allele and T/T genotype in PTPN22 1858 C > T polymorphism confers SLE susceptibility, particular in Caucasian and Latin groups, suggesting PTPN22 1858 C > T as a potential genetic marker in SLE susceptibility.

  20. A fast boosting-based screening method for large-scale association study in complex traits with genetic heterogeneity.

    PubMed

    Wang, Lu-Yong; Fasulo, D

    2006-01-01

    Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical for large-scale dataset. Moreover, genetic heterogeneity makes the traditional methods harder to identify the genetic causes of diseases. A more recent random forest method provides a more robust method for screening the SNPs in thousands scale. However, for more large-scale data, i.e., Affymetrix Human Mapping 100K GeneChip data, a faster screening method is required to screening SNPs in whole-genome large scale association analysis with genetic heterogeneity. We propose a boosting-based method for rapid screening in large-scale analysis of complex traits in the presence of genetic heterogeneity. It provides a relatively fast and fairly good tool for screening and limiting the candidate SNPs for further more complex computational modeling task.

  1. Prioritizing GWAS Results: A Review of Statistical Methods and Recommendations for Their Application

    PubMed Central

    Cantor, Rita M.; Lange, Kenneth; Sinsheimer, Janet S.

    2010-01-01

    Genome-wide association studies (GWAS) have rapidly become a standard method for disease gene discovery. A substantial number of recent GWAS indicate that for most disorders, only a few common variants are implicated and the associated SNPs explain only a small fraction of the genetic risk. This review is written from the viewpoint that findings from the GWAS provide preliminary genetic information that is available for additional analysis by statistical procedures that accumulate evidence, and that these secondary analyses are very likely to provide valuable information that will help prioritize the strongest constellations of results. We review and discuss three analytic methods to combine preliminary GWAS statistics to identify genes, alleles, and pathways for deeper investigations. Meta-analysis seeks to pool information from multiple GWAS to increase the chances of finding true positives among the false positives and provides a way to combine associations across GWAS, even when the original data are unavailable. Testing for epistasis within a single GWAS study can identify the stronger results that are revealed when genes interact. Pathway analysis of GWAS results is used to prioritize genes and pathways within a biological context. Following a GWAS, association results can be assigned to pathways and tested in aggregate with computational tools and pathway databases. Reviews of published methods with recommendations for their application are provided within the framework for each approach. PMID:20074509

  2. Capturing Positive Transgressive Variation From Wild And Exotic Germplasm Resources

    USDA-ARS?s Scientific Manuscript database

    Only a small fraction of the naturally occurring genetic diversity available in rice germplasm repositories around the world has been explored to date. This is beginning to change with the advent of affordable, high throughput genotyping approaches coupled with robust statistical analysis methods th...

  3. Analysis of Heritability and Shared Heritability Based on Genome-Wide Association Studies for 13 Cancer Types

    PubMed Central

    Wheeler, William A.; Yeager, Meredith; Panagiotou, Orestis; Wang, Zhaoming; Berndt, Sonja I.; Lan, Qing; Abnet, Christian C.; Amundadottir, Laufey T.; Figueroa, Jonine D.; Landi, Maria Teresa; Mirabello, Lisa; Savage, Sharon A.; Taylor, Philip R.; Vivo, Immaculata De; McGlynn, Katherine A.; Purdue, Mark P.; Rajaraman, Preetha; Adami, Hans-Olov; Ahlbom, Anders; Albanes, Demetrius; Amary, Maria Fernanda; An, She-Juan; Andersson, Ulrika; Andriole, Gerald; Andrulis, Irene L.; Angelucci, Emanuele; Ansell, Stephen M.; Arici, Cecilia; Armstrong, Bruce K.; Arslan, Alan A.; Austin, Melissa A.; Baris, Dalsu; Barkauskas, Donald A.; Bassig, Bryan A.; Becker, Nikolaus; Benavente, Yolanda; Benhamou, Simone; Berg, Christine; Van Den Berg, David; Bernstein, Leslie; Bertrand, Kimberly A.; Birmann, Brenda M.; Black, Amanda; Boeing, Heiner; Boffetta, Paolo; Boutron-Ruault, Marie-Christine; Bracci, Paige M.; Brinton, Louise; Brooks-Wilson, Angela R.; Bueno-de-Mesquita, H. Bas; Burdett, Laurie; Buring, Julie; Butler, Mary Ann; Cai, Qiuyin; Cancel-Tassin, Geraldine; Canzian, Federico; Carrato, Alfredo; Carreon, Tania; Carta, Angela; Chan, John K. C.; Chang, Ellen T.; Chang, Gee-Chen; Chang, I-Shou; Chang, Jiang; Chang-Claude, Jenny; Chen, Chien-Jen; Chen, Chih-Yi; Chen, Chu; Chen, Chung-Hsing; Chen, Constance; Chen, Hongyan; Chen, Kexin; Chen, Kuan-Yu; Chen, Kun-Chieh; Chen, Ying; Chen, Ying-Hsiang; Chen, Yi-Song; Chen, Yuh-Min; Chien, Li-Hsin; Chirlaque, María-Dolores; Choi, Jin Eun; Choi, Yi Young; Chow, Wong-Ho; Chung, Charles C.; Clavel, Jacqueline; Clavel-Chapelon, Françoise; Cocco, Pierluigi; Colt, Joanne S.; Comperat, Eva; Conde, Lucia; Connors, Joseph M.; Conti, David; Cortessis, Victoria K.; Cotterchio, Michelle; Cozen, Wendy; Crouch, Simon; Crous-Bou, Marta; Cussenot, Olivier; Davis, Faith G.; Ding, Ti; Diver, W. Ryan; Dorronsoro, Miren; Dossus, Laure; Duell, Eric J.; Ennas, Maria Grazia; Erickson, Ralph L.; Feychting, Maria; Flanagan, Adrienne M.; Foretova, Lenka; Fraumeni, Joseph F.; Freedman, Neal D.; Beane Freeman, Laura E.; Fuchs, Charles; Gago-Dominguez, Manuela; Gallinger, Steven; Gao, Yu-Tang; Gapstur, Susan M.; Garcia-Closas, Montserrat; García-Closas, Reina; Gascoyne, Randy D.; Gastier-Foster, Julie; Gaudet, Mia M.; Gaziano, J. Michael; Giffen, Carol; Giles, Graham G.; Giovannucci, Edward; Glimelius, Bengt; Goggins, Michael; Gokgoz, Nalan; Goldstein, Alisa M.; Gorlick, Richard; Gross, Myron; Grubb, Robert; Gu, Jian; Guan, Peng; Gunter, Marc; Guo, Huan; Habermann, Thomas M.; Haiman, Christopher A.; Halai, Dina; Hallmans, Goran; Hassan, Manal; Hattinger, Claudia; He, Qincheng; He, Xingzhou; Helzlsouer, Kathy; Henderson, Brian; Henriksson, Roger; Hjalgrim, Henrik; Hoffman-Bolton, Judith; Hohensee, Chancellor; Holford, Theodore R.; Holly, Elizabeth A.; Hong, Yun-Chul; Hoover, Robert N.; Horn-Ross, Pamela L.; Hosain, G. M. Monawar; Hosgood, H. Dean; Hsiao, Chin-Fu; Hu, Nan; Hu, Wei; Hu, Zhibin; Huang, Ming-Shyan; Huerta, Jose-Maria; Hung, Jen-Yu; Hutchinson, Amy; Inskip, Peter D.; Jackson, Rebecca D.; Jacobs, Eric J.; Jenab, Mazda; Jeon, Hyo-Sung; Ji, Bu-Tian; Jin, Guangfu; Jin, Li; Johansen, Christoffer; Johnson, Alison; Jung, Yoo Jin; Kaaks, Rudolph; Kamineni, Aruna; Kane, Eleanor; Kang, Chang Hyun; Karagas, Margaret R.; Kelly, Rachel S.; Khaw, Kay-Tee; Kim, Christopher; Kim, Hee Nam; Kim, Jin Hee; Kim, Jun Suk; Kim, Yeul Hong; Kim, Young Tae; Kim, Young-Chul; Kitahara, Cari M.; Klein, Alison P.; Klein, Robert J.; Kogevinas, Manolis; Kohno, Takashi; Kolonel, Laurence N.; Kooperberg, Charles; Kricker, Anne; Krogh, Vittorio; Kunitoh, Hideo; Kurtz, Robert C.; Kweon, Sun-Seog; LaCroix, Andrea; Lawrence, Charles; Lecanda, Fernando; Lee, Victor Ho Fun; Li, Donghui; Li, Haixin; Li, Jihua; Li, Yao-Jen; Li, Yuqing; Liao, Linda M.; Liebow, Mark; Lightfoot, Tracy; Lim, Wei-Yen; Lin, Chien-Chung; Lin, Dongxin; Lindstrom, Sara; Linet, Martha S.; Link, Brian K.; Liu, Chenwei; Liu, Jianjun; Liu, Li; Ljungberg, Börje; Lloreta, Josep; Lollo, Simonetta Di; Lu, Daru; Lund, Eiluv; Malats, Nuria; Mannisto, Satu; Marchand, Loic Le; Marina, Neyssa; Masala, Giovanna; Mastrangelo, Giuseppe; Matsuo, Keitaro; Maynadie, Marc; McKay, James; McKean-Cowdin, Roberta; Melbye, Mads; Melin, Beatrice S.; Michaud, Dominique S.; Mitsudomi, Tetsuya; Monnereau, Alain; Montalvan, Rebecca; Moore, Lee E.; Mortensen, Lotte Maxild; Nieters, Alexandra; North, Kari E.; Novak, Anne J.; Oberg, Ann L.; Offit, Kenneth; Oh, In-Jae; Olson, Sara H.; Palli, Domenico; Pao, William; Park, In Kyu; Park, Jae Yong; Park, Kyong Hwa; Patiño-Garcia, Ana; Pavanello, Sofia; Peeters, Petra H. M.; Perng, Reury-Perng; Peters, Ulrike; Petersen, Gloria M.; Picci, Piero; Pike, Malcolm C.; Porru, Stefano; Prescott, Jennifer; Prokunina-Olsson, Ludmila; Qian, Biyun; Qiao, You-Lin; Rais, Marco; Riboli, Elio; Riby, Jacques; Risch, Harvey A.; Rizzato, Cosmeri; Rodabough, Rebecca; Roman, Eve; Roupret, Morgan; Ruder, Avima M.; de Sanjose, Silvia; Scelo, Ghislaine; Schned, Alan; Schumacher, Fredrick; Schwartz, Kendra; Schwenn, Molly; Scotlandi, Katia; Seow, Adeline; Serra, Consol; Serra, Massimo; Sesso, Howard D.; Setiawan, Veronica Wendy; Severi, Gianluca; Severson, Richard K.; Shanafelt, Tait D.; Shen, Hongbing; Shen, Wei; Shin, Min-Ho; Shiraishi, Kouya; Shu, Xiao-Ou; Siddiq, Afshan; Sierrasesúmaga, Luis; Sihoe, Alan Dart Loon; Skibola, Christine F.; Smith, Alex; Smith, Martyn T.; Southey, Melissa C.; Spinelli, John J.; Staines, Anthony; Stampfer, Meir; Stern, Marianna C.; Stevens, Victoria L.; Stolzenberg-Solomon, Rachael S.; Su, Jian; Su, Wu-Chou; Sund, Malin; Sung, Jae Sook; Sung, Sook Whan; Tan, Wen; Tang, Wei; Tardón, Adonina; Thomas, David; Thompson, Carrie A.; Tinker, Lesley F.; Tirabosco, Roberto; Tjønneland, Anne; Travis, Ruth C.; Trichopoulos, Dimitrios; Tsai, Fang-Yu; Tsai, Ying-Huang; Tucker, Margaret; Turner, Jenny; Vajdic, Claire M.; Vermeulen, Roel C. H.; Villano, Danylo J.; Vineis, Paolo; Virtamo, Jarmo; Visvanathan, Kala; Wactawski-Wende, Jean; Wang, Chaoyu; Wang, Chih-Liang; Wang, Jiu-Cun; Wang, Junwen; Wei, Fusheng; Weiderpass, Elisabete; Weiner, George J.; Weinstein, Stephanie; Wentzensen, Nicolas; White, Emily; Witzig, Thomas E.; Wolpin, Brian M.; Wong, Maria Pik; Wu, Chen; Wu, Guoping; Wu, Junjie; Wu, Tangchun; Wu, Wei; Wu, Xifeng; Wu, Yi-Long; Wunder, Jay S.; Xiang, Yong-Bing; Xu, Jun; Xu, Ping; Yang, Pan-Chyr; Yang, Tsung-Ying; Ye, Yuanqing; Yin, Zhihua; Yokota, Jun; Yoon, Ho-Il; Yu, Chong-Jen; Yu, Herbert; Yu, Kai; Yuan, Jian-Min; Zelenetz, Andrew; Zeleniuch-Jacquotte, Anne; Zhang, Xu-Chao; Zhang, Yawei; Zhao, Xueying; Zhao, Zhenhong; Zheng, Hong; Zheng, Tongzhang; Zheng, Wei; Zhou, Baosen; Zhu, Meng; Zucca, Mariagrazia; Boca, Simina M.; Cerhan, James R.; Ferri, Giovanni M.; Hartge, Patricia; Hsiung, Chao Agnes; Magnani, Corrado; Miligi, Lucia; Morton, Lindsay M.; Smedby, Karin E.; Teras, Lauren R.; Vijai, Joseph; Wang, Sophia S.; Brennan, Paul; Caporaso, Neil E.; Hunter, David J.; Kraft, Peter; Rothman, Nathaniel; Silverman, Debra T.; Slager, Susan L.; Chanock, Stephen J.; Chatterjee, Nilanjan

    2015-01-01

    Background: Studies of related individuals have consistently demonstrated notable familial aggregation of cancer. We aim to estimate the heritability and genetic correlation attributable to the additive effects of common single-nucleotide polymorphisms (SNPs) for cancer at 13 anatomical sites. Methods: Between 2007 and 2014, the US National Cancer Institute has generated data from genome-wide association studies (GWAS) for 49 492 cancer case patients and 34 131 control patients. We apply novel mixed model methodology (GCTA) to this GWAS data to estimate the heritability of individual cancers, as well as the proportion of heritability attributable to cigarette smoking in smoking-related cancers, and the genetic correlation between pairs of cancers. Results: GWAS heritability was statistically significant at nearly all sites, with the estimates of array-based heritability, hl 2, on the liability threshold (LT) scale ranging from 0.05 to 0.38. Estimating the combined heritability of multiple smoking characteristics, we calculate that at least 24% (95% confidence interval [CI] = 14% to 37%) and 7% (95% CI = 4% to 11%) of the heritability for lung and bladder cancer, respectively, can be attributed to genetic determinants of smoking. Most pairs of cancers studied did not show evidence of strong genetic correlation. We found only four pairs of cancers with marginally statistically significant correlations, specifically kidney and testes (ρ = 0.73, SE = 0.28), diffuse large B-cell lymphoma (DLBCL) and pediatric osteosarcoma (ρ = 0.53, SE = 0.21), DLBCL and chronic lymphocytic leukemia (CLL) (ρ = 0.51, SE =0.18), and bladder and lung (ρ = 0.35, SE = 0.14). Correlation analysis also indicates that the genetic architecture of lung cancer differs between a smoking population of European ancestry and a nonsmoking Asian population, allowing for the possibility that the genetic etiology for the same disease can vary by population and environmental exposures. Conclusion: Our results provide important insights into the genetic architecture of cancers and suggest new avenues for investigation. PMID:26464424

  4. Genetic admixture and lineage separation in a southern Andean plant

    PubMed Central

    Morello, Santiago; Sede, Silvana M.

    2016-01-01

    Mountain uplifts have generated new ecologic opportunities for plants, and triggered evolutionary processes, favouring an increase on the speciation rate in all continents. Moreover, mountain ranges may act as corridors or barriers for plant lineages and populations. In South America a high rate of diversification has been linked to Andean orogeny during Pliocene/Miocene. More recently, Pleistocene glacial cycles have also shaped species distribution and demography. The endemic genus Escallonia is known to have diversified in the Andes. Species with similar morphology obscure species delimitation and plants with intermediate characters occur naturally. The aim of this study is to characterize genetic variation and structure of two widespread species of Escallonia: E. alpina and E. rubra. We analyzed the genetic variation of populations of the entire distribution range of the species and we also included those with intermediate morphological characters; a total of 94 accessions from 14 populations were used for the Amplified Fragment Length Polymorphism (AFLP) analysis. Plastid DNA sequences (trnS-trnG, 3′trnV-ndhC intergenic spacers and the ndhF gene) from sixteen accessions of Escallonia species were used to construct a Statistical Parsimony network. Additionally, we performed a geometric morphometrics analysis on 88 leaves from 35 individuals of the two E. alpina varieties to further study their differences. Wright’s Fst and analysis of molecular variance tests performed on AFLP data showed a significant level of genetic structure at the species and population levels. Intermediate morphology populations showed a mixed genetic contribution from E. alpina var. alpina and E. rubra both in the Principal Coordinates Analysis (PCoA) and STRUCTURE. On the other hand, E. rubra and the two varieties of E. alpina are well differentiated and assigned to different genetic clusters. Moreover, the Statistical Parsimony network showed a high degree of divergence between the varieties of E. alpina: var. alpina is more closely related to E. rubra and other species than to its own counterpart E. alpina var. carmelitana. Geometric morphometrics analysis (Elliptic Fourier descriptors) revealed significant differences in leaf shape between varieties. We found that diversity in Escallonia species analyzed here is geographically structured and deep divergence between varieties of E. alpina could be associated to ancient evolutionary events like orogeny. Admixture in southern populations could be the result of hybridization at the margins of the parental species’ distribution range. PMID:27179539

  5. [Landscape and ecological genomics].

    PubMed

    Tetushkin, E Ia

    2013-10-01

    Landscape genomics is the modern version of landscape genetics, a discipline that arose approximately 10 years ago as a combination of population genetics, landscape ecology, and spatial statistics. It studies the effects of environmental variables on gene flow and other microevolutionary processes that determine genetic connectivity and variations in populations. In contrast to population genetics, it operates at the level of individual specimens rather than at the level of population samples. Another important difference between landscape genetics and genomics and population genetics is that, in the former, the analysis of gene flow and local adaptations takes quantitative account of landforms and features of the matrix, i.e., hostile spaces that separate species habitats. Landscape genomics is a part of population ecogenomics, which, along with community genomics, is a major part of ecological genomics. One of the principal purposes of landscape genomics is the identification and differentiation of various genome-wide and locus-specific effects. The approaches and computation tools developed for combined analysis of genomic and landscape variables make it possible to detect adaptation-related genome fragments, which facilitates the planning of conservation efforts and the prediction of species' fate in response to expected changes in the environment.

  6. Genetic differentiation and origin of the Jordanian population: an analysis of Alu insertion polymorphisms.

    PubMed

    Bahri, Raoudha; El Moncer, Wifak; Al-Batayneh, Khalid; Sadiq, May; Esteban, Esther; Moral, Pedro; Chaabani, Hassen

    2012-05-01

    Although much of Jordan is covered by desert, its north-western region forms part of the Fertile Crescent region that had given a rich past to Jordanians. This past, scarcely described by historians, is not yet clarified by sufficient genetic data. Thus in this paper we aim to determine the genetic differentiation of the Jordanian population and to discuss its origin. A total of 150 unrelated healthy Jordanians were investigated for ten Alu insertion polymorphisms. Genetic relationships among populations were estimated by a principal component (PC) plot based on the analyses of the R-matrix software. Statistical analysis showed that the Jordanian population is not significantly different from the United Arab Emirates population or the North Africans. This observation, well represented in PC plot, suggests a common origin of these populations belonging respectively to ancient Mesopotamia, Arabia, and North Africa. Our results are compatible with ancient peoples' movements from Arabia to ancient Mesopotamia and North Africa as proposed by historians and supported by previous genetic results. The original genetic profile of the Jordanian population, very likely Arabian Semitic, has not been subject to significant change despite the succession of several civilizations.

  7. Efficient strategy for detecting gene × gene joint action and its application in schizophrenia.

    PubMed

    Won, Sungho; Kwon, Min-Seok; Mattheisen, Manuel; Park, Suyeon; Park, Changsoon; Kihara, Daisuke; Cichon, Sven; Ophoff, Roel; Nöthen, Markus M; Rietschel, Marcella; Baur, Max; Uitterlinden, Andre G; Hofmann, A; Lange, Christoph

    2014-01-01

    We propose a new approach to detect gene × gene joint action in genome-wide association studies (GWASs) for case-control designs. This approach offers an exhaustive search for all two-way joint action (including, as a special case, single gene action) that is computationally feasible at the genome-wide level and has reasonable statistical power under most genetic models. We found that the presence of any gene × gene joint action may imply differences in three types of genetic components: the minor allele frequencies and the amounts of Hardy-Weinberg disequilibrium may differ between cases and controls, and between the two genetic loci the degree of linkage disequilibrium may differ between cases and controls. Using Fisher's method, it is possible to combine the different sources of genetic information in an overall test for detecting gene × gene joint action. The proposed statistical analysis is efficient and its simplicity makes it applicable to GWASs. In the current study, we applied the proposed approach to a GWAS on schizophrenia and found several potential gene × gene interactions. Our application illustrates the practical advantage of the proposed method. © 2013 WILEY PERIODICALS, INC.

  8. Multiple testing and power calculations in genetic association studies.

    PubMed

    So, Hon-Cheong; Sham, Pak C

    2011-01-01

    Modern genetic association studies typically involve multiple single-nucleotide polymorphisms (SNPs) and/or multiple genes. With the development of high-throughput genotyping technologies and the reduction in genotyping cost, investigators can now assay up to a million SNPs for direct or indirect association with disease phenotypes. In addition, some studies involve multiple disease or related phenotypes and use multiple methods of statistical analysis. The combination of multiple genetic loci, multiple phenotypes, and multiple methods of evaluating associations between genotype and phenotype means that modern genetic studies often involve the testing of an enormous number of hypotheses. When multiple hypothesis tests are performed in a study, there is a risk of inflation of the type I error rate (i.e., the chance of falsely claiming an association when there is none). Several methods for multiple-testing correction are in popular use, and they all have strengths and weaknesses. Because no single method is universally adopted or always appropriate, it is important to understand the principles, strengths, and weaknesses of the methods so that they can be applied appropriately in practice. In this article, we review the three principle methods for multiple-testing correction and provide guidance for calculating statistical power.

  9. A functional U-statistic method for association analysis of sequencing data.

    PubMed

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  10. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics.

    PubMed

    Giambartolomei, Claudia; Vukcevic, Damjan; Schadt, Eric E; Franke, Lude; Hingorani, Aroon D; Wallace, Chris; Plagnol, Vincent

    2014-05-01

    Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at http://coloc.cs.ucl.ac.uk/coloc/). Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.

  11. msap: a tool for the statistical analysis of methylation-sensitive amplified polymorphism data.

    PubMed

    Pérez-Figueroa, A

    2013-05-01

    In this study msap, an R package which analyses methylation-sensitive amplified polymorphism (MSAP or MS-AFLP) data is presented. The program provides a deep analysis of epigenetic variation starting from a binary data matrix indicating the banding pattern between the isoesquizomeric endonucleases HpaII and MspI, with differential sensitivity to cytosine methylation. After comparing the restriction fragments, the program determines if each fragment is susceptible to methylation (representative of epigenetic variation) or if there is no evidence of methylation (representative of genetic variation). The package provides, in a user-friendly command line interface, a pipeline of different analyses of the variation (genetic and epigenetic) among user-defined groups of samples, as well as the classification of the methylation occurrences in those groups. Statistical testing provides support to the analyses. A comprehensive report of the analyses and several useful plots could help researchers to assess the epigenetic and genetic variation in their MSAP experiments. msap is downloadable from CRAN (http://cran.r-project.org/) and its own webpage (http://msap.r-forge.R-project.org/). The package is intended to be easy to use even for those people unfamiliar with the R command line environment. Advanced users may take advantage of the available source code to adapt msap to more complex analyses. © 2013 Blackwell Publishing Ltd.

  12. Face shape differs in phylogenetically related populations.

    PubMed

    Hopman, Saskia M J; Merks, Johannes H M; Suttie, Michael; Hennekam, Raoul C M; Hammond, Peter

    2014-11-01

    3D analysis of facial morphology has delineated facial phenotypes in many medical conditions and detected fine grained differences between typical and atypical patients to inform genotype-phenotype studies. Next-generation sequencing techniques have enabled extremely detailed genotype-phenotype correlative analysis. Such comparisons typically employ control groups matched for age, sex and ethnicity and the distinction between ethnic categories in genotype-phenotype studies has been widely debated. The phylogenetic tree based on genetic polymorphism studies divides the world population into nine subpopulations. Here we show statistically significant face shape differences between two European Caucasian populations of close phylogenetic and geographic proximity from the UK and The Netherlands. The average face shape differences between the Dutch and UK cohorts were visualised in dynamic morphs and signature heat maps, and quantified for their statistical significance using both conventional anthropometry and state of the art dense surface modelling techniques. Our results demonstrate significant differences between Dutch and UK face shape. Other studies have shown that genetic variants influence normal facial variation. Thus, face shape difference between populations could reflect underlying genetic difference. This should be taken into account in genotype-phenotype studies and we recommend that in those studies reference groups be established in the same population as the individuals who form the subject of the study.

  13. A combined analysis of genetically correlated traits identifies 187 loci and a role for neurogenesis and myelination in intelligence.

    PubMed

    Hill, W D; Marioni, R E; Maghzian, O; Ritchie, S J; Hagenaars, S P; McIntosh, A M; Gale, C R; Davies, G; Deary, I J

    2018-01-11

    Intelligence, or general cognitive function, is phenotypically and genetically correlated with many traits, including a wide range of physical, and mental health variables. Education is strongly genetically correlated with intelligence (r g  = 0.70). We used these findings as foundations for our use of a novel approach-multi-trait analysis of genome-wide association studies (MTAG; Turley et al. 2017)-to combine two large genome-wide association studies (GWASs) of education and intelligence, increasing statistical power and resulting in the largest GWAS of intelligence yet reported. Our study had four goals: first, to facilitate the discovery of new genetic loci associated with intelligence; second, to add to our understanding of the biology of intelligence differences; third, to examine whether combining genetically correlated traits in this way produces results consistent with the primary phenotype of intelligence; and, finally, to test how well this new meta-analytic data sample on intelligence predicts phenotypic intelligence in an independent sample. By combining datasets using MTAG, our functional sample size increased from 199,242 participants to 248,482. We found 187 independent loci associated with intelligence, implicating 538 genes, using both SNP-based and gene-based GWAS. We found evidence that neurogenesis and myelination-as well as genes expressed in the synapse, and those involved in the regulation of the nervous system-may explain some of the biological differences in intelligence. The results of our combined analysis demonstrated the same pattern of genetic correlations as those from previous GWASs of intelligence, providing support for the meta-analysis of these genetically-related phenotypes.

  14. Genetic overlap between endometriosis and endometrial cancer: evidence from cross-disease genetic correlation and GWAS meta-analyses.

    PubMed

    Painter, Jodie N; O'Mara, Tracy A; Morris, Andrew P; Cheng, Timothy H T; Gorman, Maggie; Martin, Lynn; Hodson, Shirley; Jones, Angela; Martin, Nicholas G; Gordon, Scott; Henders, Anjali K; Attia, John; McEvoy, Mark; Holliday, Elizabeth G; Scott, Rodney J; Webb, Penelope M; Fasching, Peter A; Beckmann, Matthias W; Ekici, Arif B; Hein, Alexander; Rübner, Matthias; Hall, Per; Czene, Kamila; Dörk, Thilo; Dürst, Matthias; Hillemanns, Peter; Runnebaum, Ingo; Lambrechts, Diether; Amant, Frederic; Annibali, Daniela; Depreeuw, Jeroen; Vanderstichele, Adriaan; Goode, Ellen L; Cunningham, Julie M; Dowdy, Sean C; Winham, Stacey J; Trovik, Jone; Hoivik, Erling; Werner, Henrica M J; Krakstad, Camilla; Ashton, Katie; Otton, Geoffrey; Proietto, Tony; Tham, Emma; Mints, Miriam; Ahmed, Shahana; Healey, Catherine S; Shah, Mitul; Pharoah, Paul D P; Dunning, Alison M; Dennis, Joe; Bolla, Manjeet K; Michailidou, Kyriaki; Wang, Qin; Tyrer, Jonathan P; Hopper, John L; Peto, Julian; Swerdlow, Anthony J; Burwinkel, Barbara; Brenner, Hermann; Meindl, Alfons; Brauch, Hiltrud; Lindblom, Annika; Chang-Claude, Jenny; Couch, Fergus J; Giles, Graham G; Kristensen, Vessela N; Cox, Angela; Zondervan, Krina T; Nyholt, Dale R; MacGregor, Stuart; Montgomery, Grant W; Tomlinson, Ian; Easton, Douglas F; Thompson, Deborah J; Spurdle, Amanda B

    2018-05-01

    Epidemiological, biological, and molecular data suggest links between endometriosis and endometrial cancer, with recent epidemiological studies providing evidence for an association between a previous diagnosis of endometriosis and risk of endometrial cancer. We used genetic data as an alternative approach to investigate shared biological etiology of these two diseases. Genetic correlation analysis of summary level statistics from genomewide association studies (GWAS) using LD Score regression revealed moderate but significant genetic correlation (r g  = 0.23, P = 9.3 × 10 -3 ), and SNP effect concordance analysis provided evidence for significant SNP pleiotropy (P = 6.0 × 10 -3 ) and concordance in effect direction (P = 2.0 × 10 -3 ) between the two diseases. Cross-disease GWAS meta-analysis highlighted 13 distinct loci associated at P ≤ 10 -5 with both endometriosis and endometrial cancer, with one locus (SNP rs2475335) located within PTPRD associated at a genomewide significant level (P = 4.9 × 10 -8 , OR = 1.11, 95% CI = 1.07-1.15). PTPRD acts in the STAT3 pathway, which has been implicated in both endometriosis and endometrial cancer. This study demonstrates the value of cross-disease genetic analysis to support epidemiological observations and to identify biological pathways of relevance to multiple diseases. © 2018 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.

  15. Spurious correlations and inference in landscape genetics

    Treesearch

    Samuel A. Cushman; Erin L. Landguth

    2010-01-01

    Reliable interpretation of landscape genetic analyses depends on statistical methods that have high power to identify the correct process driving gene flow while rejecting incorrect alternative hypotheses. Little is known about statistical power and inference in individual-based landscape genetics. Our objective was to evaluate the power of causalmodelling with partial...

  16. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms

    PubMed Central

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-01-01

    Abstract To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches – for example, analysis of variance (ANOVA) – are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing. PMID:24567836

  17. The use of statistical tools in field testing of putative effects of genetically modified plants on nontarget organisms.

    PubMed

    Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F

    2013-08-01

    To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in field testing.

  18. GENOME-WIDE GENETIC INTERACTION ANALYSIS OF GLAUCOMA USING EXPERT KNOWLEDGE DERIVED FROM HUMAN PHENOTYPE NETWORKS

    PubMed Central

    HU, TING; DARABOS, CHRISTIAN; CRICCO, MARIA E.; KONG, EMILY; MOORE, JASON H.

    2014-01-01

    The large volume of GWAS data poses great computational challenges for analyzing genetic interactions associated with common human diseases. We propose a computational framework for characterizing epistatic interactions among large sets of genetic attributes in GWAS data. We build the human phenotype network (HPN) and focus around a disease of interest. In this study, we use the GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filter to prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify a significant connected network of pairwise epistatic interactions among the prioritized SNPs. These clearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs by quantifying structural network characteristics. Through functional annotation of these key SNPs using Biofilter, a software accessing multiple publicly available human genetic data sources, we find supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving our concept. We conclude by suggesting hypotheses for a better understanding of the disease. PMID:25592582

  19. Controlling the joint local false discovery rate is more powerful than meta-analysis methods in joint analysis of summary statistics from multiple genome-wide association studies.

    PubMed

    Jiang, Wei; Yu, Weichuan

    2017-02-15

    In genome-wide association studies (GWASs) of common diseases/traits, we often analyze multiple GWASs with the same phenotype together to discover associated genetic variants with higher power. Since it is difficult to access data with detailed individual measurements, summary-statistics-based meta-analysis methods have become popular to jointly analyze datasets from multiple GWASs. In this paper, we propose a novel summary-statistics-based joint analysis method based on controlling the joint local false discovery rate (Jlfdr). We prove that our method is the most powerful summary-statistics-based joint analysis method when controlling the false discovery rate at a certain level. In particular, the Jlfdr-based method achieves higher power than commonly used meta-analysis methods when analyzing heterogeneous datasets from multiple GWASs. Simulation experiments demonstrate the superior power of our method over meta-analysis methods. Also, our method discovers more associations than meta-analysis methods from empirical datasets of four phenotypes. The R-package is available at: http://bioinformatics.ust.hk/Jlfdr.html . eeyu@ust.hk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Analysis of biochemical genetic data on Jewish populations: II. Results and interpretations of heterogeneity indices and distance measures with respect to standards.

    PubMed Central

    Karlin, S; Kenett, R; Bonné-Tamir, B

    1979-01-01

    A nonparametric statistical methodology is used for the analysis of biochemical frequency data observed on a series of nine Jewish and six non-Jewish populations. Two categories of statistics are used: heterogeneity indices and various distance measures with respect to a standard. The latter are more discriminating in exploiting historical, geographical and culturally relevant information. A number of partial orderings and distance relationships among the populations are determined. Our concern in this study is to analyze similarities and differences among the Jewish populations, in terms of the gene frequency distributions for a number of genetic markers. Typical questions discussed are as follows: These Jewish populations differ in certain morphological and anthropometric traits. Are there corresponding differences in biochemical genetic constitution? How can we assess the extent of heterogeneity between and within groupings? Which class of markers (blood typings or protein loci) discriminates better among the separate populations? The results are quite surprising. For example, we found the Ashkenazi, Sephardi and Iraqi Jewish populations to be consistently close in genetic constitution and distant from all the other populations, namely the Yemenite and Cochin Jews, the Arabs, and the non-Jewish German and Russian populations. We found the Polish Jewish community the most heterogeneous among all Jewish populations. The blood loci discriminate better than the protein loci. A number of possible interpretations and hypotheses for these and other results are offered. The method devised for this analysis should prove useful in studying similarities and differences for other groups of populations for which substantial biochemical polymorphic data are available. PMID:380330

  1. Genetic variation in the raptor gene is associated with overweight but not hypertension in American men of Japanese ancestry.

    PubMed

    Morris, Brian J; Carnes, Bruce A; Chen, Randi; Donlon, Timothy A; He, Qimei; Grove, John S; Masaki, Kamal H; Elliott, Ayako; Willcox, Donald C; Allsopp, Richard; Willcox, Bradley J

    2015-04-01

    The mechanistic target of rapamycin (mTOR) pathway is pivotal for cell growth. Regulatory associated protein of mTOR complex I (Raptor) is a unique component of this pro-growth complex. The present study tested whether variation across the raptor gene (RPTOR) is associated with overweight and hypertension. We tested 61 common (allele frequency ≥ 0.1) tagging single nucleotide polymorphisms (SNPs) that captured most of the genetic variation across RPTOR in 374 subjects of normal lifespan and 439 subjects with a lifespan exceeding 95 years for association with overweight/obesity, essential hypertension, and isolated systolic hypertension. Subjects were drawn from the Honolulu Heart Program, a homogeneous population of American men of Japanese ancestry, well characterized for phenotypes relevant to conditions of aging. Hypertension status was ascertained when subjects were 45-68 years old. Statistical evaluation involved contingency table analysis, logistic regression, and the powerful method of recursive partitioning. After analysis of RPTOR genotypes by each statistical approach, we found no significant association between genetic variation in RPTOR and either essential hypertension or isolated systolic hypertension. Models generated by recursive partitioning analysis showed that RPTOR SNPs significantly enhanced the ability of the model to accurately assign individuals to either the overweight/obese or the non-overweight/obese groups (P = 0.008 by 1-tailed Z test). Common genetic variation in RPTOR is associated with overweight/obesity but does not discernibly contribute to either essential hypertension or isolated systolic hypertension in the population studied. © American Journal of Hypertension, Ltd 2014. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  2. Joint QTL linkage mapping for multiple-cross mating design sharing one common parent

    USDA-ARS?s Scientific Manuscript database

    Nested association mapping (NAM) is a novel genetic mating design that combines the advantages of linkage analysis and association mapping. This design provides opportunities to study the inheritance of complex traits, but also requires more advanced statistical methods. In this paper, we present th...

  3. WISARD: workbench for integrated superfast association studies for related datasets.

    PubMed

    Lee, Sungyoung; Choi, Sungkyoung; Qiao, Dandi; Cho, Michael; Silverman, Edwin K; Park, Taesung; Won, Sungho

    2018-04-20

    A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used. We present a Workbench for Integrated Superfast Association studies for Related Data (WISARD) programmed in C/C++. WISARD enables the fast and a comprehensive analysis of SNP-chip and next-generation sequencing data on extended families, with applications from designing genetic studies to summarizing analysis results. In addition, WISARD can automatically be run in a fully multithreaded manner, and the integration of R software for visualization makes it more accessible to non-experts. Comparison with existing toolsets showed that WISARD is computationally suitable for integrated analysis of related subjects, and demonstrated that WISARD outperforms existing toolsets. WISARD has also been successfully utilized to analyze the large-scale massive sequencing dataset of chronic obstructive pulmonary disease data (COPD), and we identified multiple genes associated with COPD, which demonstrates its practical value.

  4. A new u-statistic with superior design sensitivity in matched observational studies.

    PubMed

    Rosenbaum, Paul R

    2011-09-01

    In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is "no departure" then this is the same as the power of a randomization test in a randomized experiment. A new family of u-statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments-that is, it often has good Pitman efficiency-but small effects are invariably sensitive to small unobserved biases. Members of this family of u-statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u-statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology. © 2010, The International Biometric Society.

  5. Review: domestic animal forensic genetics - biological evidence, genetic markers, analytical approaches and challenges.

    PubMed

    Kanthaswamy, S

    2015-10-01

    This review highlights the importance of domestic animal genetic evidence sources, genetic testing, markers and analytical approaches as well as the challenges this field is facing in view of the de facto 'gold standard' human DNA identification. Because of the genetic similarity between humans and domestic animals, genetic analysis of domestic animal hair, saliva, urine, blood and other biological material has generated vital investigative leads that have been admitted into a variety of court proceedings, including criminal and civil litigation. Information on validated short tandem repeat, single nucleotide polymorphism and mitochondrial DNA markers and public access to genetic databases for forensic DNA analysis is becoming readily available. Although the fundamental aspects of animal forensic genetic testing may be reliable and acceptable, animal forensic testing still lacks the standardized testing protocols that human genetic profiling requires, probably because of the absence of monetary support from government agencies and the difficulty in promoting cooperation among competing laboratories. Moreover, there is a lack in consensus about how to best present the results and expert opinion to comply with court standards and bear judicial scrutiny. This has been the single most persistent challenge ever since the earliest use of domestic animal forensic genetic testing in a criminal case in the mid-1990s. Crime laboratory accreditation ensures that genetic test results have the courts' confidence. Because accreditation requires significant commitments of effort, time and resources, the vast majority of animal forensic genetic laboratories are not accredited nor are their analysts certified forensic examiners. The relevance of domestic animal forensic genetics in the criminal justice system is undeniable. However, further improvements are needed in a wide range of supporting resources, including standardized quality assurance and control protocols for sample handling, evidence testing, statistical analysis and reporting that meet the rules of scientific acceptance, reliability and human forensic identification standards. © 2015 Stichting International Foundation for Animal Genetics.

  6. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

    PubMed Central

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel

    2009-01-01

    Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481

  7. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.

    PubMed

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Angel

    2009-03-19

    Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.

  8. Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio.

    PubMed

    Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M

    2018-04-01

    Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.

  9. Rare-Variant Association Analysis: Study Designs and Statistical Tests

    PubMed Central

    Lee, Seunggeung; Abecasis, Gonçalo R.; Boehnke, Michael; Lin, Xihong

    2014-01-01

    Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions. PMID:24995866

  10. Identifying currents in the gene pool for bacterial populations using an integrative approach.

    PubMed

    Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka

    2009-08-01

    The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.

  11. A Genealogical Interpretation of Principal Components Analysis

    PubMed Central

    McVean, Gil

    2009-01-01

    Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright's fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference. PMID:19834557

  12. Diverse types of genetic variation converge on functional gene networks involved in schizophrenia.

    PubMed

    Gilman, Sarah R; Chang, Jonathan; Xu, Bin; Bawa, Tejdeep S; Gogos, Joseph A; Karayiorgou, Maria; Vitkup, Dennis

    2012-12-01

    Despite the successful identification of several relevant genomic loci, the underlying molecular mechanisms of schizophrenia remain largely unclear. We developed a computational approach (NETBAG+) that allows an integrated analysis of diverse disease-related genetic data using a unified statistical framework. The application of this approach to schizophrenia-associated genetic variations, obtained using unbiased whole-genome methods, allowed us to identify several cohesive gene networks related to axon guidance, neuronal cell mobility, synaptic function and chromosomal remodeling. The genes forming the networks are highly expressed in the brain, with higher brain expression during prenatal development. The identified networks are functionally related to genes previously implicated in schizophrenia, autism and intellectual disability. A comparative analysis of copy number variants associated with autism and schizophrenia suggests that although the molecular networks implicated in these distinct disorders may be related, the mutations associated with each disease are likely to lead, at least on average, to different functional consequences.

  13. Neural-genetic synthesis for state-space controllers based on linear quadratic regulator design for eigenstructure assignment.

    PubMed

    da Fonseca Neto, João Viana; Abreu, Ivanildo Silva; da Silva, Fábio Nogueira

    2010-04-01

    Toward the synthesis of state-space controllers, a neural-genetic model based on the linear quadratic regulator design for the eigenstructure assignment of multivariable dynamic systems is presented. The neural-genetic model represents a fusion of a genetic algorithm and a recurrent neural network (RNN) to perform the selection of the weighting matrices and the algebraic Riccati equation solution, respectively. A fourth-order electric circuit model is used to evaluate the convergence of the computational intelligence paradigms and the control design method performance. The genetic search convergence evaluation is performed in terms of the fitness function statistics and the RNN convergence, which is evaluated by landscapes of the energy and norm, as a function of the parameter deviations. The control problem solution is evaluated in the time and frequency domains by the impulse response, singular values, and modal analysis.

  14. Readiness of adolescents to use genetically modified organisms according to their knowledge and emotional attitude towards GMOs.

    PubMed

    Lachowski, Stanisław; Jurkiewicz, Anna; Choina, Piotr; Florek-Łuszczki, Magdalena; Buczaj, Agnieszka; Goździewska, Małgorzata

    2017-06-07

    Agriculture based on genetically modified organisms plays an increasingly important role in feeding the world population, which is evidenced by a considerable growth in the size of land under genetically modified crops (GM). Uncertainty and controversy around GM products are mainly due to the lack of accurate and reliable information, and lack of knowledge concerning the essence of genetic modifications, and the effect of GM food on the human organism, and consequently, a negative emotional attitude towards what is unknown. The objective of the presented study was to discover to what extent knowledge and the emotional attitude of adolescents towards genetically modified organisms is related with acceptance of growing genetically modified plants or breeding GM animals on own farm or allotment garden, and the purchase and consumption of GM food, as well as the use of GMOs in medicine. The study was conducted by the method of a diagnostic survey using a questionnaire designed by the author, which covered a group of 500 adolescents completing secondary school on the level of maturity examination. The collected material was subjected to statistical analysis. Research hypotheses were verified using chi-square test (χ 2 ), t-Student test, and stepwise regression analysis. Stepwise regression analysis showed that the readiness of adolescents to use genetically modified organisms as food or for the production of pharmaceuticals, the production of GM plants or animals on own farm, depends on an emotional-evaluative attitude towards GMOs, and the level of knowledge concerning the essence of genetic modifications.

  15. Integrating Nonadditive Genomic Relationship Matrices into the Study of Genetic Architecture of Complex Traits.

    PubMed

    Nazarian, Alireza; Gezan, Salvador A

    2016-03-01

    The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed Central

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-01-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328

  17. The power to detect linkage in complex disease by means of simple LOD-score analyses.

    PubMed

    Greenberg, D A; Abreu, P; Hodge, S E

    1998-09-01

    Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.

  18. Cryptic or pseudocryptic: can morphological methods inform copepod taxonomy? An analysis of publications and a case study of the Eurytemora affinis species complex

    PubMed Central

    Lajus, Dmitry; Sukhikh, Natalia; Alekseev, Victor

    2015-01-01

    Interest in cryptic species has increased significantly with current progress in genetic methods. The large number of cryptic species suggests that the resolution of traditional morphological techniques may be insufficient for taxonomical research. However, some species now considered to be cryptic may, in fact, be designated pseudocryptic after close morphological examination. Thus the “cryptic or pseudocryptic” dilemma speaks to the resolution of morphological analysis and its utility for identifying species. We address this dilemma first by systematically reviewing data published from 1980 to 2013 on cryptic species of Copepoda and then by performing an in-depth morphological study of the former Eurytemora affinis complex of cryptic species. Analyzing the published data showed that, in 5 of 24 revisions eligible for systematic review, cryptic species assignment was based solely on the genetic variation of forms without detailed morphological analysis to confirm the assignment. Therefore, some newly described cryptic species might be designated pseudocryptic under more detailed morphological analysis as happened with Eurytemora affinis complex. Recent genetic analyses of the complex found high levels of heterogeneity without morphological differences; it is argued to be cryptic. However, next detailed morphological analyses allowed to describe a number of valid species. Our study, using deep statistical analyses usually not applied for new species describing, of this species complex confirmed considerable differences between former cryptic species. In particular, fluctuating asymmetry (FA), the random variation of left and right structures, was significantly different between forms and provided independent information about their status. Our work showed that multivariate statistical approaches, such as principal component analysis, can be powerful techniques for the morphological discrimination of cryptic taxons. Despite increasing cryptic species designations, morphological techniques have great potential in determining copepod taxonomy. PMID:26120427

  19. An evidence-based approach to globally assess the covariate-dependent effect of the MTHFR single nucleotide polymorphism rs1801133 on blood homocysteine: a systematic review and meta-analysis.

    PubMed

    Jin, Huifeng; Cheng, Haojie; Chen, Wei; Sheng, Xiaoming; Levy, Mark A; Brown, Mark J; Tian, Junqiang

    2018-05-01

    The single nucleotide polymorphism of the gene 5,10-methylenetetrahydrofolate reductase (MTHFR) C677T (or rs1801133) is the most established genetic factor that increases plasma total homocysteine (tHcy) and consequently results in hyperhomocysteinemia. Yet, given the limited penetrance of this genetic variant, it is necessary to individually predict the risk of hyperhomocysteinemia for an rs1801133 carrier. We hypothesized that variability in this genetic risk is largely due to the presence of factors (covariates) that serve as effect modifiers, confounders, or both, such as folic acid (FA) intake, and aimed to assess this risk in the complex context of these covariates. We systematically extracted from published studies the data on tHcy, rs1801133, and any previously reported rs1801133 covariates. The resulting metadata set was first used to analyze the covariates' modifying effect by meta-regression and other statistical means. Subsequently, we controlled for this modifying effect by genotype-stratifying tHcy data and analyzed the variability in the risk resulting from the confounding of covariates. The data set contains data on 36 rs1801133 covariates that were collected from 114,799 participants and 256 qualified studies, among which 6 covariates (sex, age, race, FA intake, smoking, and alcohol consumption) are the most frequently informed and therefore included for statistical analysis. The effect of rs1801133 on tHcy exhibits significant variability that can be attributed to effect modification as well as confounding by these covariates. Via statistical modeling, we predicted the covariate-dependent risk of tHcy elevation and hyperhomocysteinemia in a systematic manner. We showed an evidence-based approach that globally assesses the covariate-dependent effect of rs1801133 on tHcy. The results should assist clinicians in interpreting the rs1801133 data from genetic testing for their patients. Such information is also important for the public, who increasingly receive genetic data from commercial services without interpretation of its clinical relevance. This study was registered at Research Registry with the registration number reviewregistry328.

  20. Identification of crop cultivars with consistently high lignocellulosic sugar release requires the use of appropriate statistical design and modelling

    PubMed Central

    2013-01-01

    Background In this study, a multi-parent population of barley cultivars was grown in the field for two consecutive years and then straw saccharification (sugar release by enzymes) was subsequently analysed in the laboratory to identify the cultivars with the highest consistent sugar yield. This experiment was used to assess the benefit of accounting for both the multi-phase and multi-environment aspects of large-scale phenotyping experiments with field-grown germplasm through sound statistical design and analysis. Results Complementary designs at both the field and laboratory phases of the experiment ensured that non-genetic sources of variation could be separated from the genetic variation of cultivars, which was the main target of the study. The field phase included biological replication and plot randomisation. The laboratory phase employed re-randomisation and technical replication of samples within a batch, with a subset of cultivars chosen as duplicates that were randomly allocated across batches. The resulting data was analysed using a linear mixed model that incorporated field and laboratory variation and a cultivar by trial interaction, and ensured that the cultivar means were more accurately represented than if the non-genetic variation was ignored. The heritability detected was more than doubled in each year of the trial by accounting for the non-genetic variation in the analysis, clearly showing the benefit of this design and approach. Conclusions The importance of accounting for both field and laboratory variation, as well as the cultivar by trial interaction, by fitting a single statistical model (multi-environment trial, MET, model), was evidenced by the changes in list of the top 40 cultivars showing the highest sugar yields. Failure to account for this interaction resulted in only eight cultivars that were consistently in the top 40 in different years. The correspondence between the rankings of cultivars was much higher at 25 in the MET model. This approach is suited to any multi-phase and multi-environment population-based genetic experiment. PMID:24359577

  1. Methods for meta-analysis of multiple traits using GWAS summary statistics.

    PubMed

    Ray, Debashree; Boehnke, Michael

    2018-03-01

    Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits. © 2017 WILEY PERIODICALS, INC.

  2. HLA-linked rheumatoid arthritis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hasstedt, S.J.; Clegg, D.O.; Ingles, L.

    Twenty-eight pedigrees were ascertained through pairs of first-degree relatives diagnosed with rheumatoid arthritis (RA). RA was confirmed in 77 pedigree members including probands; the absence of disease was verified in an additional 261 pedigree members. Pedigree members were serologically typed for HLA. We used likelihood analysis to statistically characterize the HLA-linked RA susceptibility locus. The genetic model assumed tight linkage to HLA. The analysis supported the existence of an HLA-linked RA susceptibility locus, estimated the lifetime penetrance as 41% in male homozygotes and as 48% in female homozygotes. Inheritance was recessive in males and was nearly recessive in females. Inmore » addition, the analysis attributed 78% of the variance within genotypes to genetic or environmental effects shared by siblings. The genetic model inferred in this analysis is consistent with previous association, linkage, and familial aggregation studies of RA. The inferred HLA-linked RA susceptibility locus accounts for approximately one-fifth of the RA in the population. Although other genes may account for the remaining familial RA, a large portion of RA cases may occur sporadically. 79 refs., 9 tabs.« less

  3. Bilirubin and Stroke Risk Using a Mendelian Randomization Design.

    PubMed

    Lee, Sun Ju; Jee, Yon Ho; Jung, Keum Ji; Hong, Seri; Shin, Eun Soon; Jee, Sun Ha

    2017-05-01

    Circulating bilirubin, a natural antioxidant, is associated with decreased risk of stroke. However, the nature of the relationship between the two remains unknown. We used a Mendelian randomization analysis to assess the causal effect of serum bilirubin on stroke risk in Koreans. The 14 single-nucleotide polymorphisms (SNPs) (<10 -7 ) including rs6742078 of uridine diphosphoglucuronyl-transferase were selected from genome-wide association study of bilirubin level in the KCPS-II (Korean Cancer Prevention Study-II) Biobank subcohort consisting of 4793 healthy Korean and 806 stroke cases. Weighted genetic risk score was calculated using 14 SNPs selected from the top SNPs. Both rs6742078 (F statistics=138) and weighted genetic risk score with 14 SNPs (F statistics=187) were strongly associated with bilirubin levels. Simultaneously, serum bilirubin level was associated with decreased risk of stroke in an ordinary least-squares analysis. However, in 2-stage least-squares Mendelian randomization analysis, no causal relationship between serum bilirubin and stroke risk was found. There is no evidence that bilirubin level is causally associated with risk of stroke in Koreans. Therefore, bilirubin level is not a risk determinant of stroke. © 2017 American Heart Association, Inc.

  4. The aminoacyl-tRNA synthetases had only a marginal role in the origin of the organization of the genetic code: Evidence in favor of the coevolution theory.

    PubMed

    Di Giulio, Massimo

    2017-11-07

    The coevolution theory of the origin of the genetic code suggests that the organization of the genetic code coevolved with the biosynthetic relationships between amino acids. The mechanism that allowed this coevolution was based on tRNA-like molecules on which-this theory-would postulate the biosynthetic transformations between amino acids to have occurred. This mechanism makes a prediction on how the role conducted by the aminoacyl-tRNA synthetases (ARSs), in the origin of the genetic code, should have been. Indeed, if the biosynthetic transformations between amino acids occurred on tRNA-like molecules, then there was no need to link amino acids to these molecules because amino acids were already charged on tRNA-like molecules, as the coevolution theory suggests. In spite of the fact that ARSs make the genetic code responsible for the first interaction between a component of nucleic acids and that of proteins, for the coevolution theory the role of ARSs should have been entirely marginal in the genetic code origin. Therefore, I have conducted a further analysis of the distribution of the two classes of ARSs and of their subclasses-in the genetic code table-in order to perform a falsification test of the coevolution theory. Indeed, in the case in which the distribution of ARSs within the genetic code would have been highly significant, then the coevolution theory would be falsified since the mechanism on which it is based would not predict a fundamental role of ARSs in the origin of the genetic code. I found that the statistical significance of the distribution of the two classes of ARSs in the table of the genetic code is low or marginal, whereas that of the subclasses of ARSs statistically significant. However, this is in perfect agreement with the postulates of the coevolution theory. Indeed, the only case of statistical significance-regarding the classes of ARSs-is appreciable for the CAG code, whereas for its complement-the UNN/NUN code-only a marginal significance is measurable. These two codes codify roughly for the two ARS classes, in particular, the CAG code for the class II while the UNN/NUN code for the class I. Furthermore, the subclasses of ARSs show a statistical significance of their distribution in the genetic code table. Nevertheless, the more sensible explanation for these observations would be the following. The observation that would link the two classes of ARSs to the CAG and UNN/NUN codes, and the statistical significance of the distribution of the subclasses of ARSs in the genetic code table, would be only a secondary effect due to the highly significant distribution of the polarity of amino acids and their biosynthetic relationships in the genetic code. That is to say, the polarity of amino acids and their biosynthetic relationships would have conditioned the evolution of ARSs so that their presence in the genetic code would have been detectable. Even if the ARSs would not have-on their own-influenced directly the evolutionary organization of the genetic code. In other words, the role that ARSs had in the origin of the genetic code would have been entirely marginal. This conclusion would be in perfect accord with the predictions of the coevolution theory. Conversely, this conclusion would be in contrast-at least partially-with the physicochemical theories of the origin of the genetic code because they would foresee an absolutely more active role of ARSs in the origin of the organization of the genetic code. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. The fine-scale genetic structure and evolution of the Japanese population.

    PubMed

    Takeuchi, Fumihiko; Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua; Teo, Yik-Ying; Kato, Norihiro

    2017-01-01

    The contemporary Japanese populations largely consist of three genetically distinct groups-Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics.

  6. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

    PubMed

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-07-01

    A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  7. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    PubMed Central

    Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

    2016-01-01

    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

  8. Short communication: Genetic association between schizophrenia and cannabis use.

    PubMed

    Verweij, Karin J H; Abdellaoui, Abdel; Nivard, Michel G; Sainz Cort, Alberto; Ligthart, Lannie; Draisma, Harmen H M; Minică, Camelia C; Gillespie, Nathan A; Willemsen, Gonneke; Hottenga, Jouke-Jan; Boomsma, Dorret I; Vink, Jacqueline M

    2017-02-01

    Previous studies have shown a relationship between schizophrenia and cannabis use. As both traits are substantially heritable, a shared genetic liability could explain the association. We use two recently developed genomics methods to investigate the genetic overlap between schizophrenia and cannabis use. Firstly, polygenic risk scores for schizophrenia were created based on summary statistics from the largest schizophrenia genome-wide association (GWA) meta-analysis to date. We analysed the association between these schizophrenia polygenic scores and multiple cannabis use phenotypes (lifetime use, regular use, age at initiation, and quantity and frequency of use) in a sample of 6,931 individuals. Secondly, we applied LD-score regression to the GWA summary statistics of schizophrenia and lifetime cannabis use to calculate the genome-wide genetic correlation. Polygenic risk scores for schizophrenia were significantly (α<0.05) associated with five of the eight cannabis use phenotypes, including lifetime use, regular use, and quantity of use, with risk scores explaining up to 0.5% of the variance. Associations were not significant for age at initiation of use and two measures of frequency of use analyzed in lifetime users only, potentially because of reduced power due to a smaller sample size. The LD-score regression revealed a significant genetic correlation of r g =0.22 (SE=0.07, p=0.003) between schizophrenia and lifetime cannabis use. Common genetic variants underlying schizophrenia and lifetime cannabis use are partly overlapping. Individuals with a stronger genetic predisposition to schizophrenia are more likely to initiate cannabis use, use cannabis more regularly, and consume more cannabis over their lifetime. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations.

    PubMed

    Zhang, Han; Wheeler, William; Hyland, Paula L; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

    2016-06-01

    Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.

  10. A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

    PubMed Central

    Zhang, Han; Wheeler, William; Hyland, Paula L.; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

    2016-01-01

    Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs. PMID:27362418

  11. FGWAS: Functional genome wide association analysis.

    PubMed

    Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-10-01

    Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. The human chromosomal fragile sites more often involved in constitutional deletions and duplications - A genetic and statistical assessment

    NASA Astrophysics Data System (ADS)

    Gomes, Dora Prata; Sequeira, Inês J.; Figueiredo, Carlos; Rueff, José; Brás, Aldina

    2016-12-01

    Human chromosomal fragile sites (CFSs) are heritable loci or regions of the human chromosomes prone to exhibit gaps, breaks and rearrangements. Determining the frequency of deletions and duplications in CFSs may contribute to explain the occurrence of human disease due to those rearrangements. In this study we analyzed the frequency of deletions and duplications in each human CFS. Statistical methods, namely data display, descriptive statistics and linear regression analysis were applied to analyze this dataset. We found that FRA15C, FRA16A and FRAXB are the most frequently involved CFSs in deletions and duplications occurring in the human genome.

  13. Admixture, Population Structure, and F-Statistics.

    PubMed

    Peter, Benjamin M

    2016-04-01

    Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose isF-statistics that measure shared genetic drift between sets of two, three, and four populations and can be used to test simple and complex hypotheses about admixture between populations. This article provides context from phylogenetic and population genetic theory. I review how F-statistics can be interpreted as branch lengths or paths and derive new interpretations, using coalescent theory. I further show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing extension of some ideas applications to arbitrary phylogenetic trees. The new results are used to investigate the behavior of the statistics under different models of population structure and show how population substructure complicates inference. The results lead to simplified estimators in many cases, and I recommend to replace F3 with the average number of pairwise differences for estimating population divergence. Copyright © 2016 by the Genetics Society of America.

  14. Impacts of early viability selection on management of inbreeding and genetic diversity in conservation.

    PubMed

    Grueber, Catherine E; Hogg, Carolyn J; Ivy, Jamie A; Belov, Katherine

    2015-04-01

    Maintaining genetic diversity is a crucial goal of intensive management of threatened species, particularly for those populations that act as sources for translocation or re-introduction programmes. Most captive genetic management is based on pedigrees and a neutral theory of inheritance, an assumption that may be violated by selective forces operating in captivity. Here, we explore the conservation consequences of early viability selection: differential offspring survival that occurs prior to management or research observations, such as embryo deaths in utero. If early viability selection produces genotypic deviations from Mendelian predictions, it may undermine management strategies intended to minimize inbreeding and maintain genetic diversity. We use empirical examples to demonstrate that straightforward approaches, such as comparing litter sizes of inbred vs. noninbred breeding pairs, can be used to test whether early viability selection likely impacts estimates of inbreeding depression. We also show that comparing multilocus genotype data to pedigree predictions can reveal whether early viability selection drives systematic biases in genetic diversity, patterns that would not be detected using pedigree-based statistics alone. More sophisticated analysis combining genomewide molecular data with pedigree information will enable conservation scientists to test whether early viability selection drives deviations from neutrality across wide stretches of the genome, revealing whether this form of selection biases the pedigree-based statistics and inference upon which intensive management is based. © 2015 John Wiley & Sons Ltd.

  15. Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics.

    PubMed

    Dutheil, Julien; Gaillard, Sylvain; Bazin, Eric; Glémin, Sylvain; Ranwez, Vincent; Galtier, Nicolas; Belkhir, Khalid

    2006-04-04

    A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/output methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website http://kimura.univ-montp2.fr/BioPP.

  16. Multi-site study of additive genetic effects on fractional anisotropy of cerebral white matter: Comparing meta and megaanalytical approaches for data pooling.

    PubMed

    Kochunov, Peter; Jahanshad, Neda; Sprooten, Emma; Nichols, Thomas E; Mandl, René C; Almasy, Laura; Booth, Tom; Brouwer, Rachel M; Curran, Joanne E; de Zubicaray, Greig I; Dimitrova, Rali; Duggirala, Ravi; Fox, Peter T; Hong, L Elliot; Landman, Bennett A; Lemaitre, Hervé; Lopez, Lorna M; Martin, Nicholas G; McMahon, Katie L; Mitchell, Braxton D; Olvera, Rene L; Peterson, Charles P; Starr, John M; Sussmann, Jessika E; Toga, Arthur W; Wardlaw, Joanna M; Wright, Margaret J; Wright, Susan N; Bastin, Mark E; McIntosh, Andrew M; Boomsma, Dorret I; Kahn, René S; den Braber, Anouk; de Geus, Eco J C; Deary, Ian J; Hulshoff Pol, Hilleke E; Williamson, Douglas E; Blangero, John; van 't Ent, Dennis; Thompson, Paul M; Glahn, David C

    2014-07-15

    Combining datasets across independent studies can boost statistical power by increasing the numbers of observations and can achieve more accurate estimates of effect sizes. This is especially important for genetic studies where a large number of observations are required to obtain sufficient power to detect and replicate genetic effects. There is a need to develop and evaluate methods for joint-analytical analyses of rich datasets collected in imaging genetics studies. The ENIGMA-DTI consortium is developing and evaluating approaches for obtaining pooled estimates of heritability through meta-and mega-genetic analytical approaches, to estimate the general additive genetic contributions to the intersubject variance in fractional anisotropy (FA) measured from diffusion tensor imaging (DTI). We used the ENIGMA-DTI data harmonization protocol for uniform processing of DTI data from multiple sites. We evaluated this protocol in five family-based cohorts providing data from a total of 2248 children and adults (ages: 9-85) collected with various imaging protocols. We used the imaging genetics analysis tool, SOLAR-Eclipse, to combine twin and family data from Dutch, Australian and Mexican-American cohorts into one large "mega-family". We showed that heritability estimates may vary from one cohort to another. We used two meta-analytical (the sample-size and standard-error weighted) approaches and a mega-genetic analysis to calculate heritability estimates across-population. We performed leave-one-out analysis of the joint estimates of heritability, removing a different cohort each time to understand the estimate variability. Overall, meta- and mega-genetic analyses of heritability produced robust estimates of heritability. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. The genetic basis of female multiple mating in a polyandrous livebearing fish

    PubMed Central

    Evans, Jonathan P; Gasparini, Clelia

    2013-01-01

    The widespread occurrence of female multiple mating (FMM) demands evolutionary explanation, particularly in the light of the costs of mating. One explanation encapsulated by “good sperm” and “sexy-sperm” (GS-SS) theoretical models is that FMM facilitates sperm competition, thus ensuring paternity by males that pass on genes for elevated sperm competitiveness to their male offspring. While support for this component of GS-SS theory is accumulating, a second but poorly tested assumption of these models is that there should be corresponding heritable genetic variation in FMM – the proposed mechanism of postcopulatory preferences underlying GS-SS models. Here, we conduct quantitative genetic analyses on paternal half-siblings to test this component of GS-SS theory in the guppy (Poecilia reticulata), a freshwater fish with some of the highest known rates of FMM in vertebrates. As with most previous quantitative genetic analyses of FMM in other species, our results reveal high levels of phenotypic variation in this trait and a correspondingly low narrow-sense heritability (h2 = 0.11). Furthermore, although our analysis of additive genetic variance in FMM was not statistically significant (probably owing to limited statistical power), the ensuing estimate of mean-standardized additive genetic variance (IA = 0.7) was nevertheless relatively low compared with estimates published for life-history traits across a broad range of taxa. Our results therefore add to a growing body of evidence that FMM is characterized by relatively low additive genetic variation, thus apparently contradicting GS-SS theory. However, we qualify this conclusion by drawing attention to potential deficiencies in most designs (including ours) that have tested for genetic variation in FMM, particularly those that fail to account for intersexual interactions that underlie FMM in many systems. PMID:23403856

  18. SETI in vivo: testing the we-are-them hypothesis

    NASA Astrophysics Data System (ADS)

    Makukov, Maxim A.; Shcherbak, Vladimir I.

    2018-04-01

    After it was proposed that life on Earth might descend from seeding by an earlier extraterrestrial civilization motivated to secure and spread life, some authors noted that this alternative offers a testable implication: microbial seeds could be intentionally supplied with a durable signature that might be found in extant organisms. In particular, it was suggested that the optimal location for such an artefact is the genetic code, as the least evolving part of cells. However, as the mainstream view goes, this scenario is too speculative and cannot be meaningfully tested because encoding/decoding a signature within the genetic code is something ill-defined, so any retrieval attempt is doomed to guesswork. Here we refresh the seeded-Earth hypothesis in light of recent observations, and discuss the motivation for inserting a signature. We then show that `biological SETI' involves even weaker assumptions than traditional SETI and admits a well-defined methodological framework. After assessing the possibility in terms of molecular and evolutionary biology, we formalize the approach and, adopting the standard guideline of SETI that encoding/decoding should follow from first principles and be convention-free, develop a universal retrieval strategy. Applied to the canonical genetic code, it reveals a non-trivial precision structure of interlocked logical and numerical attributes of systematic character (previously we found these heuristically). To assess this result in view of the initial assumption, we perform statistical, comparison, interdependence and semiotic analyses. Statistical analysis reveals no causal connection of the result to evolutionary models of the genetic code, interdependence analysis precludes overinterpretation, and comparison analysis shows that known variations of the code lack any precision-logic structures, in agreement with these variations being post-LUCA (i.e. post-seeding) evolutionary deviations from the canonical code. Finally, semiotic analysis shows that not only the found attributes are consistent with the initial assumption, but that they make perfect sense from SETI perspective, as they ultimately maintain some of the most universal codes of culture.

  19. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  20. Replication of a gene-environment interaction Via Multimodel inference: additive-genetic variance in adolescents' general cognitive ability increases with family-of-origin socioeconomic status.

    PubMed

    Kirkpatrick, Robert M; McGue, Matt; Iacono, William G

    2015-03-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES-an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research.

  1. Replication of a Gene-Environment Interaction via Multimodel Inference: Additive-Genetic Variance in Adolescents’ General Cognitive Ability Increases with Family-of-Origin Socioeconomic Status

    PubMed Central

    Kirkpatrick, Robert M.; McGue, Matt; Iacono, William G.

    2015-01-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES—an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research. PMID:25539975

  2. Systematic meta-analyses and field synopsis of genetic association studies in colorectal adenomas

    PubMed Central

    Montazeri, Zahra; Theodoratou, Evropi; Nyiraneza, Christine; Timofeeva, Maria; Chen, Wanjing; Svinti, Victoria; Sivakumaran, Shanya; Gresham, Gillian; Cubitt, Laura; Carvajal-Carmona, Luis; Bertagnolli, Monica M; Zauber, Ann G; Tomlinson, Ian; Farrington, Susan M; Dunlop, Malcolm G; Campbell, Harry; Little, Julian

    2018-01-01

    Background Low penetrance genetic variants, primarily single nucleotide polymorphisms, have substantial influence on colorectal cancer (CRC) susceptibility. Most CRCs develop from colorectal adenomas (CRA). Here, we report the first comprehensive field synopsis that catalogues all genetic association studies on CRA, with a parallel online database (http://www.chs.med.ed.ac.uk/CRAgene/). Methods We performed a systematic review, reviewing 9750 titles and then extracted data from 130 publications reporting on 181 polymorphisms in 74 genes. We conducted meta-analyses to derive summary effect estimates for 37 polymorphisms in 26 genes. We applied the Venice criteria and Bayesian False Discovery Probability (BFDP) to assess the levels of the credibility of associations. Results We considered the association with the rs6983267 variant at 8q24 as “highly credible”, reaching genome wide statistical significance in at least one meta-analysis model. We identified “less credible” associations (higher heterogeneity, lower statistical power, BFDP>0.02) with a further four variants of four independent genes: MTHFR c.677C>T p.A222V (rs1801133), TP53 c.215C>G p.R72P (rs1042522), NQO1 c.559C>T p.P187S (rs1800566), and NAT1 alleles imputed as fast acetylator genotypes. For the remaining 32 variants of 22 genes for which positive associations with CRA risk have been previously reported, the meta-analyses revealed no credible evidence to support these as true associations. Conclusions The limited number of credible associations between low penetrance genetic variants and CRA reflects the lower volume of evidence and associated lack of statistical power to detect associations of the magnitude typically observed for genetic variants and chronic diseases. The CRAgene database provides context for CRA genetic association data and will help inform future research directions. PMID:26451011

  3. Association between Apolipoprotein C-III Gene Polymorphisms and Coronary Heart Disease: A Meta-analysis.

    PubMed

    Zhang, Jing-Zhan; Xie, Xiang; Ma, Yi-Tong; Zheng, Ying-Ying; Yang, Yi-Ning; Li, Xiao-Mei; Fu, Zhen-Yan; Dai, Chuan-Fang; Zhang, Ming-Ming; Yin, Guo-Ting; Liu, Fen; Chen, Bang-Dang; Gai, Min-Tao

    2016-01-01

    Polymorphisms in the apolipoprotein C-III (APOC3) gene have been reported to be associated with coronary heart disease (CHD), but the data so far have been conflicting. To derive a more precise estimation of these associations, we performed a meta-analysis to investigate the three main polymorphisms (SstI, T-455C, C-482T) of APOC3 in all published studies. Databases including PubMed, Web of Science, Wanfang, SinoMed and CNKI were systematically searched. The association was assessed using odds ratios (ORs) with 95% confidence intervals (CIs). The statistical analysis was performed using Review Manager 5.3.3 and Stata 12.0. A total of 31 studies have been identified. The pooled odds ratio (OR) for the association between the APOC3 gene polymorphisms and CHD and its corresponding 95% confidence interval (95% CI) were evaluated by random or fixed effect models. A statistical association between APOC3 SstI polymorphism and CHD susceptibility was observed under an allelic contrast model (P= 0.003, OR = 1.14, 95% CI = 1.05-1.24), dominant genetic model (P= 0.01, OR = 1.14, 95% CI = 1.03-1.26), and recessive genetic model (P= 0.02, OR = 1.35, 95% CI = 1.06-1.71), respectively. A significant association between the APOC3 T-455C polymorphism and CHD was also detected under an allelic contrast (P < 0.0001, OR = 1.19, 95% CI = 1.10-1.29), dominant genetic model (P= 0.0003, OR = 1.24, 95% CI = 1.11-1.39) and recessive genetic model (P= 0.04, OR = 1.30, 95% CI = 1.01-1.67). No significant association between the APOC3 C-482T polymorphism and CHD was found under an allelic model (P= 0.94, OR = 1.00, 95% CI = 0.93-1.08), dominant genetic model (P= 0.20, OR = 1.07, 95% CI = 0.97-1.18) or recessive genetic model (P= 0.13, OR = 0.90, 95% CI = 0.79-1.03). This meta-analysis revealed that the APOC3 SstI and T-455C polymorphisms significantly increase CHD susceptibility. No significant association was observed between the APOC3 C-482T polymorphism and CHD susceptibility.

  4. Association between Apolipoprotein C-III Gene Polymorphisms and Coronary Heart Disease: A Meta-analysis

    PubMed Central

    Zhang, Jing-Zhan; Xie, Xiang; Ma, Yi-Tong; Zheng, Ying-Ying; Yang, Yi-Ning; Li, Xiao-Mei; Fu, Zhen-Yan; Dai, Chuan-Fang; Zhang, Ming-Ming; Yin, Guo-Ting; Liu, Fen; Chen, Bang-Dang; Gai, Min-Tao

    2016-01-01

    Polymorphisms in the apolipoprotein C-III (APOC3) gene have been reported to be associated with coronary heart disease (CHD), but the data so far have been conflicting. To derive a more precise estimation of these associations, we performed a meta-analysis to investigate the three main polymorphisms (SstI, T-455C, C-482T) of APOC3 in all published studies. Databases including PubMed, Web of Science, Wanfang, SinoMed and CNKI were systematically searched. The association was assessed using odds ratios (ORs) with 95% confidence intervals (CIs). The statistical analysis was performed using Review Manager 5.3.3 and Stata 12.0. A total of 31 studies have been identified. The pooled odds ratio (OR) for the association between the APOC3 gene polymorphisms and CHD and its corresponding 95% confidence interval (95% CI) were evaluated by random or fixed effect models. A statistical association between APOC3 SstI polymorphism and CHD susceptibility was observed under an allelic contrast model (P= 0.003, OR = 1.14, 95% CI = 1.05-1.24), dominant genetic model (P= 0.01, OR = 1.14, 95% CI = 1.03-1.26), and recessive genetic model (P= 0.02, OR = 1.35, 95% CI = 1.06-1.71), respectively. A significant association between the APOC3 T-455C polymorphism and CHD was also detected under an allelic contrast (P < 0.0001, OR = 1.19, 95% CI = 1.10-1.29), dominant genetic model (P= 0.0003, OR = 1.24, 95% CI = 1.11-1.39) and recessive genetic model (P= 0.04, OR = 1.30, 95% CI = 1.01-1.67). No significant association between the APOC3 C-482T polymorphism and CHD was found under an allelic model (P= 0.94, OR = 1.00, 95% CI = 0.93-1.08), dominant genetic model (P= 0.20, OR = 1.07, 95% CI = 0.97-1.18) or recessive genetic model (P= 0.13, OR = 0.90, 95% CI = 0.79-1.03). This meta-analysis revealed that the APOC3 SstI and T-455C polymorphisms significantly increase CHD susceptibility. No significant association was observed between the APOC3 C-482T polymorphism and CHD susceptibility. PMID:26816662

  5. Improved Statistics for Genome-Wide Interaction Analysis

    PubMed Central

    Ueki, Masao; Cordell, Heather J.

    2012-01-01

    Recently, Wu and colleagues [1] proposed two novel statistics for genome-wide interaction analysis using case/control or case-only data. In computer simulations, their proposed case/control statistic outperformed competing approaches, including the fast-epistasis option in PLINK and logistic regression analysis under the correct model; however, reasons for its superior performance were not fully explored. Here we investigate the theoretical properties and performance of Wu et al.'s proposed statistics and explain why, in some circumstances, they outperform competing approaches. Unfortunately, we find minor errors in the formulae for their statistics, resulting in tests that have higher than nominal type 1 error. We also find minor errors in PLINK's fast-epistasis and case-only statistics, although theory and simulations suggest that these errors have only negligible effect on type 1 error. We propose adjusted versions of all four statistics that, both theoretically and in computer simulations, maintain correct type 1 error rates under the null hypothesis. We also investigate statistics based on correlation coefficients that maintain similar control of type 1 error. Although designed to test specifically for interaction, we show that some of these previously-proposed statistics can, in fact, be sensitive to main effects at one or both loci, particularly in the presence of linkage disequilibrium. We propose two new “joint effects” statistics that, provided the disease is rare, are sensitive only to genuine interaction effects. In computer simulations we find, in most situations considered, that highest power is achieved by analysis under the correct genetic model. Such an analysis is unachievable in practice, as we do not know this model. However, generally high power over a wide range of scenarios is exhibited by our joint effects and adjusted Wu statistics. We recommend use of these alternative or adjusted statistics and urge caution when using Wu et al.'s originally-proposed statistics, on account of the inflated error rate that can result. PMID:22496670

  6. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  7. AGE-RELATED DIFFERENCES IN SUSCEPTIBILITY TO CARCINOGENESIS II, APPROACHES FOR APPLICATION AND UNCERTAINTY ANALYSES FOR INDIVIDUAL GENETICALLY ACTING CARCINOGENS

    EPA Science Inventory

    An earlier paper (Hattis et al., 2003) developed a quantitative likelihood-based statistical analysis of the differences in apparent sensitivity of rodents to mutagenic carcinogens across three life stages (fetal, birth-weaning, and weaning-60 days) relative to exposures in adult...

  8. Similarity in temperament between mother and offspring rhesus monkeys: Sex differences and the role of monoamine oxidase-A and serotonin transporter promoter polymorphism genotypes

    PubMed Central

    Sullivan, Erin C.; Mendoza, Sally P.; Capitanio, John P.

    2011-01-01

    Temperament is usually considered biologically based and largely inherited, however the environment can shape the development of temperament. Allelic variation may confer differential sensitivity to early environment resulting in variations in temperament. Here we explore the relationship between measures of temperament in mothers and their first-born offspring and the role of genetic sensitivity in establishing the strength of these associations. Temperament ratings were conducted on 3-4 month old rhesus monkeys after a 25-hour biobehavioral assessment. Factor analysis revealed a four factor structure of temperament. Females assessed as infants have reproduced and their offspring have also been evaluated through the standardized testing paradigm. Canonical correlation analysis revealed statistically significant associations between factor scores of mothers and sons, but not mothers and daughters. Further, offspring possessing the high activity, “low risk”, alleles of the rhMAOA-LPR or rh5-HTTLPR showed statistically significant canonical correlations, whereas those possessing other alleles did not, suggesting differential genetic sensitivity to the normative early experience of maternal temperament. PMID:21866539

  9. Genetic programming based models in plant tissue culture: An addendum to traditional statistical approach.

    PubMed

    Mridula, Meenu R; Nair, Ashalatha S; Kumar, K Satheesh

    2018-02-01

    In this paper, we compared the efficacy of observation based modeling approach using a genetic algorithm with the regular statistical analysis as an alternative methodology in plant research. Preliminary experimental data on in vitro rooting was taken for this study with an aim to understand the effect of charcoal and naphthalene acetic acid (NAA) on successful rooting and also to optimize the two variables for maximum result. Observation-based modelling, as well as traditional approach, could identify NAA as a critical factor in rooting of the plantlets under the experimental conditions employed. Symbolic regression analysis using the software deployed here optimised the treatments studied and was successful in identifying the complex non-linear interaction among the variables, with minimalistic preliminary data. The presence of charcoal in the culture medium has a significant impact on root generation by reducing basal callus mass formation. Such an approach is advantageous for establishing in vitro culture protocols as these models will have significant potential for saving time and expenditure in plant tissue culture laboratories, and it further reduces the need for specialised background.

  10. Statistical analysis for validating ACO-KNN algorithm as feature selection in sentiment analysis

    NASA Astrophysics Data System (ADS)

    Ahmad, Siti Rohaidah; Yusop, Nurhafizah Moziyana Mohd; Bakar, Azuraliza Abu; Yaakub, Mohd Ridzwan

    2017-10-01

    This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbor (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, IG-GA, and IG-RSAR algorithms. This study evaluated the performance of the ACO-KNN algorithm using precision, recall, and F-score, which were validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

  11. Latent spatial models and sampling design for landscape genetics

    USGS Publications Warehouse

    Hanks, Ephraim M.; Hooten, Mevin B.; Knick, Steven T.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Cross, Todd B.; Schwartz, Michael K.

    2016-01-01

    We propose a spatially-explicit approach for modeling genetic variation across space and illustrate how this approach can be used to optimize spatial prediction and sampling design for landscape genetic data. We propose a multinomial data model for categorical microsatellite allele data commonly used in landscape genetic studies, and introduce a latent spatial random effect to allow for spatial correlation between genetic observations. We illustrate how modern dimension reduction approaches to spatial statistics can allow for efficient computation in landscape genetic statistical models covering large spatial domains. We apply our approach to propose a retrospective spatial sampling design for greater sage-grouse (Centrocercus urophasianus) population genetics in the western United States.

  12. Statistical Optimization of Pharmacogenomics Association Studies: Key Considerations from Study Design to Analysis

    PubMed Central

    Grady, Benjamin J.; Ritchie, Marylyn D.

    2011-01-01

    Research in human genetics and genetic epidemiology has grown significantly over the previous decade, particularly in the field of pharmacogenomics. Pharmacogenomics presents an opportunity for rapid translation of associated genetic polymorphisms into diagnostic measures or tests to guide therapy as part of a move towards personalized medicine. Expansion in genotyping technology has cleared the way for widespread use of whole-genome genotyping in the effort to identify novel biology and new genetic markers associated with pharmacokinetic and pharmacodynamic endpoints. With new technology and methodology regularly becoming available for use in genetic studies, a discussion on the application of such tools becomes necessary. In particular, quality control criteria have evolved with the use of GWAS as we have come to understand potential systematic errors which can be introduced into the data during genotyping. There have been several replicated pharmacogenomic associations, some of which have moved to the clinic to enact change in treatment decisions. These examples of translation illustrate the strength of evidence necessary to successfully and effectively translate a genetic discovery. In this review, the design of pharmacogenomic association studies is examined with the goal of optimizing the impact and utility of this research. Issues of ascertainment, genotyping, quality control, analysis and interpretation are considered. PMID:21887206

  13. Progressive erosion of genetic and epigenetic variation in callus-derived cocoa (Theobroma cacao) plants.

    PubMed

    Rodríguez López, Carlos M; Wetten, Andrew C; Wilkinson, Michael J

    2010-06-01

    *Relatively little is known about the timing of genetic and epigenetic forms of somaclonal variation arising from callus growth. We surveyed for both types of change in cocoa (Theobroma cacao) plants regenerated from calli of various ages, and also between tissues from the source trees. *For genetic change, we used 15 single sequence repeat (SSR) markers from four source trees and from 233 regenerated plants. For epigenetic change, we used 386 methylation-sensitive amplified polymorphism (MSAP) markers on leaf and explant (staminode) DNA from two source trees and on leaf DNA from 114 regenerants. *Genetic variation within source trees was limited to one slippage mutation in one leaf. Regenerants were far more variable, with 35% exhibiting at least one mutation. Genetic variation initially accumulated with culture age but subsequently declined. MSAP (epigenetic) profiles diverged between leaf and staminode samples from source trees. Multivariate analysis revealed that leaves from regenerants occupied intermediate eigenspace between leaves and staminodes of source plants but became progressively more similar to source tree leaves with culture age. *Statistical analysis confirmed this rather counterintuitive finding that leaves of 'late regenerants' exhibited significantly less genetic and epigenetic divergence from source leaves than those exposed to short periods of callus growth.

  14. General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters.

    PubMed

    Hadfield, J D; Nakagawa, S

    2010-03-01

    Although many of the statistical techniques used in comparative biology were originally developed in quantitative genetics, subsequent development of comparative techniques has progressed in relative isolation. Consequently, many of the new and planned developments in comparative analysis already have well-tested solutions in quantitative genetics. In this paper, we take three recent publications that develop phylogenetic meta-analysis, either implicitly or explicitly, and show how they can be considered as quantitative genetic models. We highlight some of the difficulties with the proposed solutions, and demonstrate that standard quantitative genetic theory and software offer solutions. We also show how results from Bayesian quantitative genetics can be used to create efficient Markov chain Monte Carlo algorithms for phylogenetic mixed models, thereby extending their generality to non-Gaussian data. Of particular utility is the development of multinomial models for analysing the evolution of discrete traits, and the development of multi-trait models in which traits can follow different distributions. Meta-analyses often include a nonrandom collection of species for which the full phylogenetic tree has only been partly resolved. Using missing data theory, we show how the presented models can be used to correct for nonrandom sampling and show how taxonomies and phylogenies can be combined to give a flexible framework with which to model dependence.

  15. Do polymorphisms of 5,10-methylenetetrahydrofolate reductase (MTHFR) gene affect the risk of childhood acute lymphoblastic leukemia?

    PubMed

    Pereira, Tiago Veiga; Rudnicki, Martina; Pereira, Alexandre Costa; Pombo-de-Oliveira, Maria S; Franco, Rendrik França

    2006-01-01

    Meta-analysis has become an important statistical tool in genetic association studies, since it may provide more powerful and precise estimates. However, meta-analytic studies are prone to several potential biases not only because the preferential publication of "positive'' studies but also due to difficulties in obtaining all relevant information during the study selection process. In this letter, we point out major problems in meta-analysis that may lead to biased conclusions, illustrating an empirical example of two recent meta-analyses on the relation between MTHFR polymorphisms and risk of acute lymphoblastic leukemia that, despite the similarity in statistical methods and period of study selection, provided partially conflicting results.

  16. Identification of contemporary selection signatures using composite log likelihood and their associations with marbling score in Korean cattle.

    PubMed

    Ryu, Jihye; Lee, Chaeyoung

    2014-12-01

    Positive selection not only increases beneficial allele frequency but also causes augmentation of allele frequencies of sequence variants in close proximity. Signals for positive selection were detected by the statistical differences in subsequent allele frequencies. To identify selection signatures in Korean cattle, we applied a composite log-likelihood (CLL)-based method, which calculates a composite likelihood of the allelic frequencies observed across sliding windows of five adjacent loci and compares the value with the critical statistic estimated by 50,000 permutations. Data for a total of 11,799 nucleotide polymorphisms were used with 71 Korean cattle and 209 foreign beef cattle. As a result, 147 signals were identified for Korean cattle based on CLL estimates (P < 0.01). The signals might be candidate genetic factors for meat quality by which the Korean cattle have been selected. Further genetic association analysis with 41 intragenic variants in the selection signatures with the greatest CLL for each chromosome revealed that marbling score was associated with five variants. Intensive association studies with all the selection signatures identified in this study are required to exclude signals associated with other phenotypes or signals falsely detected and thus to identify genetic markers for meat quality. © 2014 Stichting International Foundation for Animal Genetics.

  17. The impact of a scheduling change on ninth grade high school performance on biology benchmark exams and the California Standards Test

    NASA Astrophysics Data System (ADS)

    Leonardi, Marcelo

    The primary purpose of this study was to examine the impact of a scheduling change from a trimester 4x4 block schedule to a modified hybrid schedule on student achievement in ninth grade biology courses. This study examined the impact of the scheduling change on student achievement through teacher created benchmark assessments in Genetics, DNA, and Evolution and on the California Standardized Test in Biology. The secondary purpose of this study examined the ninth grade biology teacher perceptions of ninth grade biology student achievement. Using a mixed methods research approach, data was collected both quantitatively and qualitatively as aligned to research questions. Quantitative methods included gathering data from departmental benchmark exams and California Standardized Test in Biology and conducting multiple analysis of covariance and analysis of covariance to determine significance differences. Qualitative methods include journal entries questions and focus group interviews. The results revealed a statistically significant increase in scores on both the DNA and Evolution benchmark exams. DNA and Evolution benchmark exams showed significant improvements from a change in scheduling format. The scheduling change was responsible for 1.5% of the increase in DNA benchmark scores and 2% of the increase in Evolution benchmark scores. The results revealed a statistically significant decrease in scores on the Genetics Benchmark exam as a result of the scheduling change. The scheduling change was responsible for 1% of the decrease in Genetics benchmark scores. The results also revealed a statistically significant increase in scores on the CST Biology exam. The scheduling change was responsible for .7% of the increase in CST Biology scores. Results of the focus group discussions indicated that all teachers preferred the modified hybrid schedule over the trimester schedule and that it improved student achievement.

  18. Analysis of the HLA population data (AHPD) submitted to the 15th International Histocompatibility/Immunogenetics Workshop by using the Gene[rate] computer tools accommodating ambiguous data (AHPD project report).

    PubMed

    Nunes, J M; Riccio, M E; Buhler, S; Di, D; Currat, M; Ries, F; Almada, A J; Benhamamouch, S; Benitez, O; Canossi, A; Fadhlaoui-Zid, K; Fischer, G; Kervaire, B; Loiseau, P; de Oliveira, D C M; Papasteriades, C; Piancatelli, D; Rahal, M; Richard, L; Romero, M; Rousseau, J; Spiroski, M; Sulcebe, G; Middleton, D; Tiercy, J-M; Sanchez-Mazas, A

    2010-07-01

    During the 15th International Histocompatibility and Immunogenetics Workshop (IHIWS), 14 human leukocyte antigen (HLA) laboratories participated in the Analysis of HLA Population Data (AHPD) project where 18 new population samples were analyzed statistically and compared with data available from previous workshops. To that aim, an original methodology was developed and used (i) to estimate frequencies by taking into account ambiguous genotypic data, (ii) to test for Hardy-Weinberg equilibrium (HWE) by using a nested likelihood ratio test involving a parameter accounting for HWE deviations, (iii) to test for selective neutrality by using a resampling algorithm, and (iv) to provide explicit graphical representations including allele frequencies and basic statistics for each series of data. A total of 66 data series (1-7 loci per population) were analyzed with this standard approach. Frequency estimates were compliant with HWE in all but one population of mixed stem cell donors. Neutrality testing confirmed the observation of heterozygote excess at all HLA loci, although a significant deviation was established in only a few cases. Population comparisons showed that HLA genetic patterns were mostly shaped by geographic and/or linguistic differentiations in Africa and Europe, but not in America where both genetic drift in isolated populations and gene flow in admixed populations led to a more complex genetic structure. Overall, a fruitful collaboration between HLA typing laboratories and population geneticists allowed finding useful solutions to the problem of estimating gene frequencies and testing basic population diversity statistics on highly complex HLA data (high numbers of alleles and ambiguities), with promising applications in either anthropological, epidemiological, or transplantation studies.

  19. Mendelian randomization analysis associates increased serum urate, due to genetic variation in uric acid transporters, with improved renal function.

    PubMed

    Hughes, Kim; Flynn, Tanya; de Zoysa, Janak; Dalbeth, Nicola; Merriman, Tony R

    2014-02-01

    Increased serum urate predicts chronic kidney disease independent of other risk factors. The use of xanthine oxidase inhibitors coincides with improved renal function. Whether this is due to reduced serum urate or reduced production of oxidants by xanthine oxidase or another physiological mechanism remains unresolved. Here we applied Mendelian randomization, a statistical genetics approach allowing disentangling of cause and effect in the presence of potential confounding, to determine whether lowering of serum urate by genetic modulation of renal excretion benefits renal function using data from 7979 patients of the Atherosclerosis Risk in Communities and Framingham Heart studies. Mendelian randomization by the two-stage least squares method was done with serum urate as the exposure, a uric acid transporter genetic risk score as instrumental variable, and estimated glomerular filtration rate and serum creatinine as the outcomes. Increased genetic risk score was associated with significantly improved renal function in men but not in women. Analysis of individual genetic variants showed the effect size associated with serum urate did not correlate with that associated with renal function in the Mendelian randomization model. This is consistent with the possibility that the physiological action of these genetic variants in raising serum urate correlates directly with improved renal function. Further studies are required to understand the mechanism of the potential renal function protection mediated by xanthine oxidase inhibitors.

  20. Association Testing of Previously Reported Variants in a Large Case-Control Meta-analysis of Diabetic Nephropathy

    PubMed Central

    Williams, Winfred W.; Salem, Rany M.; McKnight, Amy Jayne; Sandholm, Niina; Forsblom, Carol; Taylor, Andrew; Guiducci, Candace; McAteer, Jarred B.; McKay, Gareth J.; Isakova, Tamara; Brennan, Eoin P.; Sadlier, Denise M.; Palmer, Cameron; Söderlund, Jenny; Fagerholm, Emma; Harjutsalo, Valma; Lithovius, Raija; Gordin, Daniel; Hietala, Kustaa; Kytö, Janne; Parkkonen, Maija; Rosengård-Bärlund, Milla; Thorn, Lena; Syreeni, Anna; Tolonen, Nina; Saraheimo, Markku; Wadén, Johan; Pitkäniemi, Janne; Sarti, Cinzia; Tuomilehto, Jaakko; Tryggvason, Karl; Österholm, Anne-May; He, Bing; Bain, Steve; Martin, Finian; Godson, Catherine; Hirschhorn, Joel N.; Maxwell, Alexander P.; Groop, Per-Henrik; Florez, Jose C.

    2012-01-01

    We formed the GEnetics of Nephropathy–an International Effort (GENIE) consortium to examine previously reported genetic associations with diabetic nephropathy (DN) in type 1 diabetes. GENIE consists of 6,366 similarly ascertained participants of European ancestry with type 1 diabetes, with and without DN, from the All Ireland-Warren 3-Genetics of Kidneys in Diabetes U.K. and Republic of Ireland (U.K.-R.O.I.) collection and the Finnish Diabetic Nephropathy Study (FinnDiane), combined with reanalyzed data from the Genetics of Kidneys in Diabetes U.S. Study (U.S. GoKinD). We found little evidence for the association of the EPO promoter polymorphism, rs161740, with the combined phenotype of proliferative retinopathy and end-stage renal disease in U.K.-R.O.I. (odds ratio [OR] 1.14, P = 0.19) or FinnDiane (OR 1.06, P = 0.60). However, a fixed-effects meta-analysis that included the previously reported cohorts retained a genome-wide significant association with that phenotype (OR 1.31, P = 2 × 10−9). An expanded investigation of the ELMO1 locus and genetic regions reported to be associated with DN in the U.S. GoKinD yielded only nominal statistical significance for these loci. Finally, top candidates identified in a recent meta-analysis failed to reach genome-wide significance. In conclusion, we were unable to replicate most of the previously reported genetic associations for DN, and significance for the EPO promoter association was attenuated. PMID:22721967

  1. Big Data, Big Opportunities, and Big Challenges.

    PubMed

    Frelinger, Jeffrey A

    2015-11-01

    High-throughput assays have begun to revolutionize modern biology and medicine. The advent of cheap next-generation sequencing (NGS) has made it possible to interrogate cells and human populations as never before. Although this has allowed us to investigate the genetics, gene expression, and impacts of the microbiome, there remain both practical and conceptual challenges. These include data handling, storage, and statistical analysis, as well as an inherent problem of the analysis of heterogeneous cell populations.

  2. Quantitative Analysis of Repertoire Scale Immunoglobulin properties in Vaccine Induced B cell Responses

    DTIC Science & Technology

    Immunosequencing now readily generates 103105 sequences per sample ; however, statistical analysis of these repertoires is challenging because of the high genetic...diversity of BCRs and the elaborate clonal relationships among them. To date, most immunosequencing analyses have focused on reporting qualitative ...repertoire differences, (2) identifying how two repertoires differ, and (3) determining appropriate confidence intervals for assessing the size of the differences and their potential biological relevance.

  3. An efficient genome-wide association test for mixed binary and continuous phenotypes with applications to substance abuse research.

    PubMed

    Buu, Anne; Williams, L Keoki; Yang, James J

    2018-03-01

    We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.

  4. Genetic variation in steelhead of Oregon and northern California

    USGS Publications Warehouse

    Reisenbichler, R.R.; McIntyre, J.D.; Solazzi, M.F.; Landino, S.W

    1992-01-01

    Steelhead Oncorhynchus mykiss from various sites between the Columbia River and the Mad River, California, were genetically characterized at 10 protein-coding loci or pairs of loci by starch gel electrophoresis. Fish from coastal streams differed from fish east of the Cascade Mountains and from fish of the Willamette River (a tributary of the Columbia River, west of the Cascade Mountains). Coastal steelhead from the northern part of the study area differed from those in the southern part. Genetic differentiation within and among drainages was not statistically significant; however, gene diversity analysis and the life history of steelhead suggested that fish from different drainages should be considered as separate populations. Genetic variation among fish in separate drainages was similar to that reported in northwestern Washington and less than that reported in British Columbia. Allele frequencies varied significantly among year-classes. Genetic variation within samples accounted for 98.3% of the total genetic variation observed in this study. Most hatchery populations differed from wild populations, suggesting that conservation of genetic diversity among and within wild populations could be facilitated by altering hatchery programs.

  5. On the Optimization of Aerospace Plane Ascent Trajectory

    NASA Astrophysics Data System (ADS)

    Al-Garni, Ahmed; Kassem, Ayman Hamdy

    A hybrid heuristic optimization technique based on genetic algorithms and particle swarm optimization has been developed and tested for trajectory optimization problems with multi-constraints and a multi-objective cost function. The technique is used to calculate control settings for two types for ascending trajectories (constant dynamic pressure and minimum-fuel-minimum-heat) for a two-dimensional model of an aerospace plane. A thorough statistical analysis is done on the hybrid technique to make comparisons with both basic genetic algorithms and particle swarm optimization techniques with respect to convergence and execution time. Genetic algorithm optimization showed better execution time performance while particle swarm optimization showed better convergence performance. The hybrid optimization technique, benefiting from both techniques, showed superior robust performance compromising convergence trends and execution time.

  6. Pathway-based discovery of genetic interactions in breast cancer

    PubMed Central

    Xu, Zack Z.; Boone, Charles; Lange, Carol A.

    2017-01-01

    Breast cancer is the second largest cause of cancer death among U.S. women and the leading cause of cancer death among women worldwide. Genome-wide association studies (GWAS) have identified several genetic variants associated with susceptibility to breast cancer, but these still explain less than half of the estimated genetic contribution to the disease. Combinations of variants (i.e. genetic interactions) may play an important role in breast cancer susceptibility. However, due to a lack of statistical power, the current tests for genetic interactions from GWAS data mainly leverage prior knowledge to focus on small sets of genes or SNPs that are known to have an association with breast cancer. Thus, many genetic interactions, particularly among novel variants, remain understudied. Reverse-genetic interaction screens in model organisms have shown that genetic interactions frequently cluster into highly structured motifs, where members of the same pathway share similar patterns of genetic interactions. Based on this key observation, we recently developed a method called BridGE to search for such structured motifs in genetic networks derived from GWAS studies and identify pathway-level genetic interactions in human populations. We applied BridGE to six independent breast cancer cohorts and identified significant pathway-level interactions in five cohorts. Joint analysis across all five cohorts revealed a high confidence consensus set of genetic interactions with support in multiple cohorts. The discovered interactions implicated the glutathione conjugation, vitamin D receptor, purine metabolism, mitotic prometaphase, and steroid hormone biosynthesis pathways as major modifiers of breast cancer risk. Notably, while many of the pathways identified by BridGE show clear relevance to breast cancer, variants in these pathways had not been previously discovered by traditional single variant association tests, or single pathway enrichment analysis that does not consider SNP-SNP interactions. PMID:28957314

  7. Genetic architecture of wood properties based on association analysis and co-expression networks in white spruce.

    PubMed

    Lamara, Mebarek; Raherison, Elie; Lenz, Patrick; Beaulieu, Jean; Bousquet, Jean; MacKay, John

    2016-04-01

    Association studies are widely utilized to analyze complex traits but their ability to disclose genetic architectures is often limited by statistical constraints, and functional insights are usually minimal in nonmodel organisms like forest trees. We developed an approach to integrate association mapping results with co-expression networks. We tested single nucleotide polymorphisms (SNPs) in 2652 candidate genes for statistical associations with wood density, stiffness, microfibril angle and ring width in a population of 1694 white spruce trees (Picea glauca). Associations mapping identified 229-292 genes per wood trait using a statistical significance level of P < 0.05 to maximize discovery. Over-representation of genes associated for nearly all traits was found in a xylem preferential co-expression group developed in independent experiments. A xylem co-expression network was reconstructed with 180 wood associated genes and several known MYB and NAC regulators were identified as network hubs. The network revealed a link between the gene PgNAC8, wood stiffness and microfibril angle, as well as considerable within-season variation for both genetic control of wood traits and gene expression. Trait associations were distributed throughout the network suggesting complex interactions and pleiotropic effects. Our findings indicate that integration of association mapping and co-expression networks enhances our understanding of complex wood traits. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  8. Predicting Flowering Behavior and Exploring Its Genetic Determinism in an Apple Multi-family Population Based on Statistical Indices and Simplified Phenotyping.

    PubMed

    Durand, Jean-Baptiste; Allard, Alix; Guitton, Baptiste; van de Weg, Eric; Bink, Marco C A M; Costes, Evelyne

    2017-01-01

    Irregular flowering over years is commonly observed in fruit trees. The early prediction of tree behavior is highly desirable in breeding programmes. This study aims at performing such predictions, combining simplified phenotyping and statistics methods. Sequences of vegetative vs. floral annual shoots (AS) were observed along axes in trees belonging to five apple related full-sib families. Sequences were analyzed using Markovian and linear mixed models including year and site effects. Indices of flowering irregularity, periodicity and synchronicity were estimated, at tree and axis scales. They were used to predict tree behavior and detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of a Biennial Bearing Index (BBI) with an autoregressive coefficient (γ g ) efficiently predicted and classified the genotype behaviors, despite few misclassifications. Four QTLs common to BBIs and γ g and one for synchronicity were highlighted and revealed the complex genetic architecture of the traits. Irregularity resulted from high AS synchronism, whereas regularity resulted from either asynchronous locally alternating or continual regular AS flowering. A relevant and time-saving method, based on a posteriori sampling of axes and statistical indices is proposed, which is efficient to evaluate the tree breeding values for flowering regularity and could be transferred to other species.

  9. Cross-Population Joint Analysis of eQTLs: Fine Mapping and Functional Annotation

    PubMed Central

    Wen, Xiaoquan; Luca, Francesca; Pique-Regi, Roger

    2015-01-01

    Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistent across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22). PMID:25906321

  10. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  11. Population data of five genetic markers in the Turkish population: comparison with four American population groups.

    PubMed

    Kurtuluş-Ulküer, M; Ulküer, U; Kesici, T; Menevşe, S

    2002-09-01

    In this study, the phenotype and allele frequencies of five enzyme systems were determined in a total of 611 unrelated Turkish individuals and analyzed by using the exact and the chi 2 test. The following five red cell enzymes were identified by cellulose acetate electrophoresis: phosphoglucomutase (PGM), adenosine deaminase (ADA), phosphoglucose isomerase (PGI), adenylate kinase (AK), and 6-phosphogluconate dehydrogenase (6-PGD). The ADA, PGM and AK enzymes were found to be polymorphic in the Turkish population. The results of the statistical analysis showed, that the phenotype frequencies of the five enzyme under study are in Hardy-Weinberg equilibrium. Statistical analysis was performed in order to examine whether there are significant differences in the phenotype frequencies between the Turkish population and four American population groups. This analysis showed, that there are some statistically significant differences between the Turkish and the other groups. Moreover, the observed phenotype and allele frequencies were compared with those obtained in other population groups of Turkey.

  12. An empirical evaluation of genetic distance statistics using microsatellite data from bear (Ursidae) populations.

    PubMed

    Paetkau, D; Waits, L P; Clarkson, P L; Craighead, L; Strobeck, C

    1997-12-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data.

  13. An Empirical Evaluation of Genetic Distance Statistics Using Microsatellite Data from Bear (Ursidae) Populations

    PubMed Central

    Paetkau, D.; Waits, L. P.; Clarkson, P. L.; Craighead, L.; Strobeck, C.

    1997-01-01

    A large microsatellite data set from three species of bear (Ursidae) was used to empirically test the performance of six genetic distance measures in resolving relationships at a variety of scales ranging from adjacent areas in a continuous distribution to species that diverged several million years ago. At the finest scale, while some distance measures performed extremely well, statistics developed specifically to accommodate the mutational processes of microsatellites performed relatively poorly, presumably because of the relatively higher variance of these statistics. At the other extreme, no statistic was able to resolve the close sister relationship of polar bears and brown bears from more distantly related pairs of species. This failure is most likely due to constraints on allele distributions at microsatellite loci. At intermediate scales, both within continuous distributions and in comparisons to insular populations of late Pleistocene origin, it was not possible to define the point where linearity was lost for each of the statistics, except that it is clearly lost after relatively short periods of independent evolution. All of the statistics were affected by the amount of genetic diversity within the populations being compared, significantly complicating the interpretation of genetic distance data. PMID:9409849

  14. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics.

    PubMed

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q

    2016-06-08

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance.

  15. A method to estimate the contribution of regional genetic associations to complex traits from summary association statistics

    PubMed Central

    Pare, Guillaume; Mao, Shihong; Deng, Wei Q.

    2016-01-01

    Despite considerable efforts, known genetic associations only explain a small fraction of predicted heritability. Regional associations combine information from multiple contiguous genetic variants and can improve variance explained at established association loci. However, regional associations are not easily amenable to estimation using summary association statistics because of sensitivity to linkage disequilibrium (LD). We now propose a novel method, LD Adjusted Regional Genetic Variance (LARGV), to estimate phenotypic variance explained by regional associations using summary statistics while accounting for LD. Our method is asymptotically equivalent to a multiple linear regression model when no interaction or haplotype effects are present. It has several applications, such as ranking of genetic regions according to variance explained or comparison of variance explained by two or more regions. Using height and BMI data from the Health Retirement Study (N = 7,776), we show that most genetic variance lies in a small proportion of the genome and that previously identified linkage peaks have higher than expected regional variance. PMID:27273519

  16. Philopatry drives genetic differentiation in an island archipelago: comparative population genetics of Galapagos Nazca boobies (Sula granti) and great frigatebirds (Fregata minor)

    PubMed Central

    Levin, Iris I; Parker, Patricia G

    2012-01-01

    Seabirds are considered highly mobile, able to fly great distances with few apparent barriers to dispersal. However, it is often the case that seabird populations exhibit strong population genetic structure despite their potential vagility. Here we show that Galapagos Nazca booby (Sula granti) populations are substantially differentiated, even within the small geographic scale of this archipelago. On the other hand, Galapagos great frigatebird (Fregata minor) populations do not show any genetic structure. We characterized the genetic differentiation by sampling five colonies of both species in the Galapagos archipelago and analyzing eight microsatellite loci and three mitochondrial genes. Using an F-statistic approach on the multilocus data, we found significant differentiation between nearly all island pairs of Nazca booby populations and a Bayesian clustering analysis provided support for three distinct genetic clusters. Mitochondrial DNA showed less differentiation of Nazca booby colonies; only Nazca boobies from the island of Darwin were significantly differentiated from individuals throughout the rest of the archipelago. Great frigatebird populations showed little to no evidence for genetic differentiation at the same scale. Only two island pairs (Darwin – Wolf, N. Seymour – Wolf) were significantly differentiated using the multilocus data, and only two island pairs had statistically significant φST values (N. Seymour – Darwin, N. Seymour – Wolf) according to the mitochondrial data. There was no significant pattern of isolation by distance for either species calculated using both markers. Seven of the ten Nazca booby migration rates calculated between island pairs were in the south or southeast to north or northwest direction. The population differentiation found among Galapagos Nazca booby colonies, but not great frigatebird colonies, is most likely due to differences in natal and breeding philopatry. PMID:23170212

  17. Analysis of Heritability and Shared Heritability Based on Genome-Wide Association Studies for Thirteen Cancer Types.

    PubMed

    Sampson, Joshua N; Wheeler, William A; Yeager, Meredith; Panagiotou, Orestis; Wang, Zhaoming; Berndt, Sonja I; Lan, Qing; Abnet, Christian C; Amundadottir, Laufey T; Figueroa, Jonine D; Landi, Maria Teresa; Mirabello, Lisa; Savage, Sharon A; Taylor, Philip R; De Vivo, Immaculata; McGlynn, Katherine A; Purdue, Mark P; Rajaraman, Preetha; Adami, Hans-Olov; Ahlbom, Anders; Albanes, Demetrius; Amary, Maria Fernanda; An, She-Juan; Andersson, Ulrika; Andriole, Gerald; Andrulis, Irene L; Angelucci, Emanuele; Ansell, Stephen M; Arici, Cecilia; Armstrong, Bruce K; Arslan, Alan A; Austin, Melissa A; Baris, Dalsu; Barkauskas, Donald A; Bassig, Bryan A; Becker, Nikolaus; Benavente, Yolanda; Benhamou, Simone; Berg, Christine; Van Den Berg, David; Bernstein, Leslie; Bertrand, Kimberly A; Birmann, Brenda M; Black, Amanda; Boeing, Heiner; Boffetta, Paolo; Boutron-Ruault, Marie-Christine; Bracci, Paige M; Brinton, Louise; Brooks-Wilson, Angela R; Bueno-de-Mesquita, H Bas; Burdett, Laurie; Buring, Julie; Butler, Mary Ann; Cai, Qiuyin; Cancel-Tassin, Geraldine; Canzian, Federico; Carrato, Alfredo; Carreon, Tania; Carta, Angela; Chan, John K C; Chang, Ellen T; Chang, Gee-Chen; Chang, I-Shou; Chang, Jiang; Chang-Claude, Jenny; Chen, Chien-Jen; Chen, Chih-Yi; Chen, Chu; Chen, Chung-Hsing; Chen, Constance; Chen, Hongyan; Chen, Kexin; Chen, Kuan-Yu; Chen, Kun-Chieh; Chen, Ying; Chen, Ying-Hsiang; Chen, Yi-Song; Chen, Yuh-Min; Chien, Li-Hsin; Chirlaque, María-Dolores; Choi, Jin Eun; Choi, Yi Young; Chow, Wong-Ho; Chung, Charles C; Clavel, Jacqueline; Clavel-Chapelon, Françoise; Cocco, Pierluigi; Colt, Joanne S; Comperat, Eva; Conde, Lucia; Connors, Joseph M; Conti, David; Cortessis, Victoria K; Cotterchio, Michelle; Cozen, Wendy; Crouch, Simon; Crous-Bou, Marta; Cussenot, Olivier; Davis, Faith G; Ding, Ti; Diver, W Ryan; Dorronsoro, Miren; Dossus, Laure; Duell, Eric J; Ennas, Maria Grazia; Erickson, Ralph L; Feychting, Maria; Flanagan, Adrienne M; Foretova, Lenka; Fraumeni, Joseph F; Freedman, Neal D; Beane Freeman, Laura E; Fuchs, Charles; Gago-Dominguez, Manuela; Gallinger, Steven; Gao, Yu-Tang; Gapstur, Susan M; Garcia-Closas, Montserrat; García-Closas, Reina; Gascoyne, Randy D; Gastier-Foster, Julie; Gaudet, Mia M; Gaziano, J Michael; Giffen, Carol; Giles, Graham G; Giovannucci, Edward; Glimelius, Bengt; Goggins, Michael; Gokgoz, Nalan; Goldstein, Alisa M; Gorlick, Richard; Gross, Myron; Grubb, Robert; Gu, Jian; Guan, Peng; Gunter, Marc; Guo, Huan; Habermann, Thomas M; Haiman, Christopher A; Halai, Dina; Hallmans, Goran; Hassan, Manal; Hattinger, Claudia; He, Qincheng; He, Xingzhou; Helzlsouer, Kathy; Henderson, Brian; Henriksson, Roger; Hjalgrim, Henrik; Hoffman-Bolton, Judith; Hohensee, Chancellor; Holford, Theodore R; Holly, Elizabeth A; Hong, Yun-Chul; Hoover, Robert N; Horn-Ross, Pamela L; Hosain, G M Monawar; Hosgood, H Dean; Hsiao, Chin-Fu; Hu, Nan; Hu, Wei; Hu, Zhibin; Huang, Ming-Shyan; Huerta, Jose-Maria; Hung, Jen-Yu; Hutchinson, Amy; Inskip, Peter D; Jackson, Rebecca D; Jacobs, Eric J; Jenab, Mazda; Jeon, Hyo-Sung; Ji, Bu-Tian; Jin, Guangfu; Jin, Li; Johansen, Christoffer; Johnson, Alison; Jung, Yoo Jin; Kaaks, Rudolph; Kamineni, Aruna; Kane, Eleanor; Kang, Chang Hyun; Karagas, Margaret R; Kelly, Rachel S; Khaw, Kay-Tee; Kim, Christopher; Kim, Hee Nam; Kim, Jin Hee; Kim, Jun Suk; Kim, Yeul Hong; Kim, Young Tae; Kim, Young-Chul; Kitahara, Cari M; Klein, Alison P; Klein, Robert J; Kogevinas, Manolis; Kohno, Takashi; Kolonel, Laurence N; Kooperberg, Charles; Kricker, Anne; Krogh, Vittorio; Kunitoh, Hideo; Kurtz, Robert C; Kweon, Sun-Seog; LaCroix, Andrea; Lawrence, Charles; Lecanda, Fernando; Lee, Victor Ho Fun; Li, Donghui; Li, Haixin; Li, Jihua; Li, Yao-Jen; Li, Yuqing; Liao, Linda M; Liebow, Mark; Lightfoot, Tracy; Lim, Wei-Yen; Lin, Chien-Chung; Lin, Dongxin; Lindstrom, Sara; Linet, Martha S; Link, Brian K; Liu, Chenwei; Liu, Jianjun; Liu, Li; Ljungberg, Börje; Lloreta, Josep; Di Lollo, Simonetta; Lu, Daru; Lund, Eiluv; Malats, Nuria; Mannisto, Satu; Le Marchand, Loic; Marina, Neyssa; Masala, Giovanna; Mastrangelo, Giuseppe; Matsuo, Keitaro; Maynadie, Marc; McKay, James; McKean-Cowdin, Roberta; Melbye, Mads; Melin, Beatrice S; Michaud, Dominique S; Mitsudomi, Tetsuya; Monnereau, Alain; Montalvan, Rebecca; Moore, Lee E; Mortensen, Lotte Maxild; Nieters, Alexandra; North, Kari E; Novak, Anne J; Oberg, Ann L; Offit, Kenneth; Oh, In-Jae; Olson, Sara H; Palli, Domenico; Pao, William; Park, In Kyu; Park, Jae Yong; Park, Kyong Hwa; Patiño-Garcia, Ana; Pavanello, Sofia; Peeters, Petra H M; Perng, Reury-Perng; Peters, Ulrike; Petersen, Gloria M; Picci, Piero; Pike, Malcolm C; Porru, Stefano; Prescott, Jennifer; Prokunina-Olsson, Ludmila; Qian, Biyun; Qiao, You-Lin; Rais, Marco; Riboli, Elio; Riby, Jacques; Risch, Harvey A; Rizzato, Cosmeri; Rodabough, Rebecca; Roman, Eve; Roupret, Morgan; Ruder, Avima M; Sanjose, Silvia de; Scelo, Ghislaine; Schned, Alan; Schumacher, Fredrick; Schwartz, Kendra; Schwenn, Molly; Scotlandi, Katia; Seow, Adeline; Serra, Consol; Serra, Massimo; Sesso, Howard D; Setiawan, Veronica Wendy; Severi, Gianluca; Severson, Richard K; Shanafelt, Tait D; Shen, Hongbing; Shen, Wei; Shin, Min-Ho; Shiraishi, Kouya; Shu, Xiao-Ou; Siddiq, Afshan; Sierrasesúmaga, Luis; Sihoe, Alan Dart Loon; Skibola, Christine F; Smith, Alex; Smith, Martyn T; Southey, Melissa C; Spinelli, John J; Staines, Anthony; Stampfer, Meir; Stern, Marianna C; Stevens, Victoria L; Stolzenberg-Solomon, Rachael S; Su, Jian; Su, Wu-Chou; Sund, Malin; Sung, Jae Sook; Sung, Sook Whan; Tan, Wen; Tang, Wei; Tardón, Adonina; Thomas, David; Thompson, Carrie A; Tinker, Lesley F; Tirabosco, Roberto; Tjønneland, Anne; Travis, Ruth C; Trichopoulos, Dimitrios; Tsai, Fang-Yu; Tsai, Ying-Huang; Tucker, Margaret; Turner, Jenny; Vajdic, Claire M; Vermeulen, Roel C H; Villano, Danylo J; Vineis, Paolo; Virtamo, Jarmo; Visvanathan, Kala; Wactawski-Wende, Jean; Wang, Chaoyu; Wang, Chih-Liang; Wang, Jiu-Cun; Wang, Junwen; Wei, Fusheng; Weiderpass, Elisabete; Weiner, George J; Weinstein, Stephanie; Wentzensen, Nicolas; White, Emily; Witzig, Thomas E; Wolpin, Brian M; Wong, Maria Pik; Wu, Chen; Wu, Guoping; Wu, Junjie; Wu, Tangchun; Wu, Wei; Wu, Xifeng; Wu, Yi-Long; Wunder, Jay S; Xiang, Yong-Bing; Xu, Jun; Xu, Ping; Yang, Pan-Chyr; Yang, Tsung-Ying; Ye, Yuanqing; Yin, Zhihua; Yokota, Jun; Yoon, Ho-Il; Yu, Chong-Jen; Yu, Herbert; Yu, Kai; Yuan, Jian-Min; Zelenetz, Andrew; Zeleniuch-Jacquotte, Anne; Zhang, Xu-Chao; Zhang, Yawei; Zhao, Xueying; Zhao, Zhenhong; Zheng, Hong; Zheng, Tongzhang; Zheng, Wei; Zhou, Baosen; Zhu, Meng; Zucca, Mariagrazia; Boca, Simina M; Cerhan, James R; Ferri, Giovanni M; Hartge, Patricia; Hsiung, Chao Agnes; Magnani, Corrado; Miligi, Lucia; Morton, Lindsay M; Smedby, Karin E; Teras, Lauren R; Vijai, Joseph; Wang, Sophia S; Brennan, Paul; Caporaso, Neil E; Hunter, David J; Kraft, Peter; Rothman, Nathaniel; Silverman, Debra T; Slager, Susan L; Chanock, Stephen J; Chatterjee, Nilanjan

    2015-12-01

    Studies of related individuals have consistently demonstrated notable familial aggregation of cancer. We aim to estimate the heritability and genetic correlation attributable to the additive effects of common single-nucleotide polymorphisms (SNPs) for cancer at 13 anatomical sites. Between 2007 and 2014, the US National Cancer Institute has generated data from genome-wide association studies (GWAS) for 49 492 cancer case patients and 34 131 control patients. We apply novel mixed model methodology (GCTA) to this GWAS data to estimate the heritability of individual cancers, as well as the proportion of heritability attributable to cigarette smoking in smoking-related cancers, and the genetic correlation between pairs of cancers. GWAS heritability was statistically significant at nearly all sites, with the estimates of array-based heritability, hl (2), on the liability threshold (LT) scale ranging from 0.05 to 0.38. Estimating the combined heritability of multiple smoking characteristics, we calculate that at least 24% (95% confidence interval [CI] = 14% to 37%) and 7% (95% CI = 4% to 11%) of the heritability for lung and bladder cancer, respectively, can be attributed to genetic determinants of smoking. Most pairs of cancers studied did not show evidence of strong genetic correlation. We found only four pairs of cancers with marginally statistically significant correlations, specifically kidney and testes (ρ = 0.73, SE = 0.28), diffuse large B-cell lymphoma (DLBCL) and pediatric osteosarcoma (ρ = 0.53, SE = 0.21), DLBCL and chronic lymphocytic leukemia (CLL) (ρ = 0.51, SE =0.18), and bladder and lung (ρ = 0.35, SE = 0.14). Correlation analysis also indicates that the genetic architecture of lung cancer differs between a smoking population of European ancestry and a nonsmoking Asian population, allowing for the possibility that the genetic etiology for the same disease can vary by population and environmental exposures. Our results provide important insights into the genetic architecture of cancers and suggest new avenues for investigation. Published by Oxford University Press 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  18. The fine-scale genetic structure and evolution of the Japanese population

    PubMed Central

    Katsuya, Tomohiro; Kimura, Ryosuke; Nabika, Toru; Isomura, Minoru; Ohkubo, Takayoshi; Tabara, Yasuharu; Yamamoto, Ken; Yokota, Mitsuhiro; Liu, Xuanyao; Saw, Woei-Yuh; Mamatyusupu, Dolikun; Yang, Wenjun; Xu, Shuhua

    2017-01-01

    The contemporary Japanese populations largely consist of three genetically distinct groups—Hondo, Ryukyu and Ainu. By principal-component analysis, while the three groups can be clearly separated, the Hondo people, comprising 99% of the Japanese, form one almost indistinguishable cluster. To understand fine-scale genetic structure, we applied powerful haplotype-based statistical methods to genome-wide single nucleotide polymorphism data from 1600 Japanese individuals, sampled from eight distinct regions in Japan. We then combined the Japanese data with 26 other Asian populations data to analyze the shared ancestry and genetic differentiation. We found that the Japanese could be separated into nine genetic clusters in our dataset, showing a marked concordance with geography; and that major components of ancestry profile of Japanese were from the Korean and Han Chinese clusters. We also detected and dated admixture in the Japanese. While genetic differentiation between Ryukyu and Hondo was suggested to be caused in part by positive selection, genetic differentiation among the Hondo clusters appeared to result principally from genetic drift. Notably, in Asians, we found the possibility that positive selection accentuated genetic differentiation among distant populations but attenuated genetic differentiation among close populations. These findings are significant for studies of human evolution and medical genetics. PMID:29091727

  19. Spatial genetic structure of the cyprinid fish Onychostoma lepturum on Hainan Island.

    PubMed

    Zhou, Tian-Qi; Lin, Hung-Du; Hsu, Kui-Ching; Kuo, Po-Hsun; Wang, Wei-Kuang; Tang, Wen-Qiao; Liu, Dong; Yang, Jin-Quan

    2017-11-01

    Population genetic structure of Onychostoma lepturum on Hainan Island was investigated based on mitochondrial CR + cyt b region in 63 specimens collected from four populations. Population analyses indicated significant genetic structure (F ST  = 0.749) and displayed a significant relationship between phylogeny and geography (N ST  = 0.750 and G ST  = 0.140). Thirty-one mtDNA haplotypes were classified into four lineages, and these lineages had an almost allopatric distribution. The results of a statistical dispersal-vicariance analysis suggest that the ancestral populations were distributed widely on Hainan Island, and the rising of the central mountainous area of Hainan Island, the Wuzhi and Yinggeling Mountain Range, separated these four drainages into independent lineages. According to a spatial analysis of molecular variance analysis, we divided these populations into three units: ND, CH and WQ + LS, running into Qiongzhou Strait, the Gulf of Tokin and the South China Sea, respectively. According to our study, the exposure of straits and shelf under water retreat gave chances for population dispersion during the glaciations.

  20. Genetic diversity analysis of Blastocystis subtypes from both symptomatic and asymptomatic subjects using a barcoding region from the 18S rRNA gene.

    PubMed

    Rezaei Riabi, Tahereh; Mirjalali, Hamed; Haghighi, Ali; Rostami Nejad, Mohammad; Pourhoseingholi, Mohammad Amin; Poirier, Philippe; Delbac, Frederic; Wawrzyniak, Ivan; Zali, Mohammad Reza

    2018-07-01

    Blastocystis is the most prevalent protozoa found in human stool samples. This study aimed to evaluate genetic diversity among Blastocystis subtypes isolated from both symptomatic and asymptomatic subjects as well as the potential correlation between subtypes and symptoms. A total of 55 Blastocystis-positive isolates were included in this study. A barcoding region of the small subunit rDNA was amplified and genetically assessed using MEGA6 and DnaSP regarding the presence of symptoms. BLAST analyses revealed the presence of 5 different subtypes (ST1, ST2, ST3, ST6 and ST7) among the samples. ST3 was the most prevalent subtype (25/55, 45%) while only one ST7 isolate was detected. Moreover, alleles 4 and 86 for ST1; alleles 9, 11 and 12 for ST2; alleles 31, 34, 36, 37 and 52 for ST3; allele 122 for ST6 and allele 137 for ST7 were detected. No statistically significant association was found between gender and symptoms with certain subtypes. Analysis of the intra-subtype variability in both symptomatic and asymptomatic subjects revealed highest similarity among ST1 isolates while lowest similarity was seen among ST3 isolates. Neutrality indices, Tajima's D and Fu's Fs, were negative but only statistically significant for ST3. Furthermore, highest values of Hd, π and S were observed among ST1, ST2 and ST3 isolated from symptomatic patients indicating high level of diversity among isolates obtained from these subjects. In addition, inter-subtype analysis showed the highest similarity between ST1 and ST2 isolates and the lowest similarity between ST2 and ST7 isolates. This is the first study revealing the presence of both ST6 and ST7 isolates in human from Iran. Phylogenetic analysis did not suggest any significant correlation between clinical manifestations and certain subtypes although genetic analysis showed highest value of diversity and significant neutrality indices among ST3 isolates obtained from symptomatic patients. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Park, J.F.

    Research in the biomedical sciences at PNL is described. Activities reported include: inhaled plutonium in dogs; national radiobiology archives; statistical analysis of data from animal studies; genotoxicity of inhaled energy effluents; molecular events during tumor initiation; biochemistry of free radical induced DNA damage; radon hazards in homes; mechanisms of radon injury; genetics of radon induced lung cancer; and in vivo/in vitro radon induced cellular damage.

  2. Utilization of Lymphoblastoid Cell Lines as a System for the Molecular Modeling of Autism

    ERIC Educational Resources Information Center

    Baron, Colin A.; Liu, Stephenie Y.; Hicks, Chindo; Gregg, Jeffrey P.

    2006-01-01

    In order to provide an alternative approach for understanding the biology and genetics of autism, we performed statistical analysis of gene expression profiles of lymphoblastoid cell lines derived from children with autism and their families. The goal was to assess the feasibility of using this model in identifying autism-associated genes.…

  3. Inferring Demographic History Using Two-Locus Statistics.

    PubMed

    Ragsdale, Aaron P; Gutenkunst, Ryan N

    2017-06-01

    Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.

  4. An entropy-based statistic for genomewide association studies.

    PubMed

    Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao

    2005-07-01

    Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.

  5. Discovering genetic variants in Crohn's disease by exploring genomic regions enriched of weak association signals.

    PubMed

    D'Addabbo, Annarita; Palmieri, Orazio; Maglietta, Rosalia; Latiano, Anna; Mukherjee, Sayan; Annese, Vito; Ancona, Nicola

    2011-08-01

    A meta-analysis has re-analysed previous genome-wide association scanning definitively confirming eleven genes and further identifying 21 new loci. However, the identified genes/loci still explain only the minority of genetic predisposition of Crohn's disease. To identify genes weakly involved in disease predisposition by analysing chromosomal regions enriched of single nucleotide polymorphisms with modest statistical association. We utilized the WTCCC data set evaluating 1748 CD and 2938 controls. The identification of candidate genes/loci was performed by a two-step procedure: first of all chromosomal regions enriched of weak association signals were localized; subsequently, weak signals clustered in gene regions were identified. The statistical significance was assessed by non parametric permutation tests. The cytoband enrichment analysis highlighted 44 regions (P≤0.05) enriched with single nucleotide polymorphisms significantly associated with the trait including 23 out of 31 previously confirmed and replicated genes. Importantly, we highlight further 20 novel chromosomal regions carrying approximately one hundred genes/loci with modest association. Amongst these we find compelling functional candidate genes such as MAPT, GRB2 and CREM, LCT, and IL12RB2. Our study suggests a different statistical perspective to discover genes weakly associated with a given trait, although further confirmatory functional studies are needed. Copyright © 2011 Editrice Gastroenterologica Italiana S.r.l. All rights reserved.

  6. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.

    PubMed

    Pappas, Derek J; Marin, Wesley; Hollenbach, Jill A; Mack, Steven J

    2016-03-01

    Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  7. Chemical fingerprints encode mother–offspring similarity, colony membership, relatedness, and genetic quality in fur seals

    PubMed Central

    Stoffel, Martin A.; Caspers, Barbara A.; Forcada, Jaume; Giannakara, Athina; Baier, Markus; Eberhart-Phillips, Luke; Müller, Caroline; Hoffman, Joseph I.

    2015-01-01

    Chemical communication underpins virtually all aspects of vertebrate social life, yet remains poorly understood because of its highly complex mechanistic basis. We therefore used chemical fingerprinting of skin swabs and genetic analysis to explore the chemical cues that may underlie mother–offspring recognition in colonially breeding Antarctic fur seals. By sampling mother–offspring pairs from two different colonies, using a variety of statistical approaches and genotyping a large panel of microsatellite loci, we show that colony membership, mother–offspring similarity, heterozygosity, and genetic relatedness are all chemically encoded. Moreover, chemical similarity between mothers and offspring reflects a combination of genetic and environmental influences, the former partly encoded by substances resembling known pheromones. Our findings reveal the diversity of information contained within chemical fingerprints and have implications for understanding mother–offspring communication, kin recognition, and mate choice. PMID:26261311

  8. Genetic mixed linear models for twin survival data.

    PubMed

    Ha, Il Do; Lee, Youngjo; Pawitan, Yudi

    2007-07-01

    Twin studies are useful for assessing the relative importance of genetic or heritable component from the environmental component. In this paper we develop a methodology to study the heritability of age-at-onset or lifespan traits, with application to analysis of twin survival data. Due to limited period of observation, the data can be left truncated and right censored (LTRC). Under the LTRC setting we propose a genetic mixed linear model, which allows general fixed predictors and random components to capture genetic and environmental effects. Inferences are based upon the hierarchical-likelihood (h-likelihood), which provides a statistically efficient and unified framework for various mixed-effect models. We also propose a simple and fast computation method for dealing with large data sets. The method is illustrated by the survival data from the Swedish Twin Registry. Finally, a simulation study is carried out to evaluate its performance.

  9. Genetic control of residual variance of yearling weight in Nellore beef cattle.

    PubMed

    Iung, L H S; Neves, H H R; Mulder, H A; Carvalheiro, R

    2017-04-01

    There is evidence for genetic variability in residual variance of livestock traits, which offers the potential for selection for increased uniformity of production. Different statistical approaches have been employed to study this topic; however, little is known about the concordance between them. The aim of our study was to investigate the genetic heterogeneity of residual variance on yearling weight (YW; 291.15 ± 46.67) in a Nellore beef cattle population; to compare the results of the statistical approaches, the two-step approach and the double hierarchical generalized linear model (DHGLM); and to evaluate the effectiveness of power transformation to accommodate scale differences. The comparison was based on genetic parameters, accuracy of EBV for residual variance, and cross-validation to assess predictive performance of both approaches. A total of 194,628 yearling weight records from 625 sires were used in the analysis. The results supported the hypothesis of genetic heterogeneity of residual variance on YW in Nellore beef cattle and the opportunity of selection, measured through the genetic coefficient of variation of residual variance (0.10 to 0.12 for the two-step approach and 0.17 for DHGLM, using an untransformed data set). However, low estimates of genetic variance associated with positive genetic correlations between mean and residual variance (about 0.20 for two-step and 0.76 for DHGLM for an untransformed data set) limit the genetic response to selection for uniformity of production while simultaneously increasing YW itself. Moreover, large sire families are needed to obtain accurate estimates of genetic merit for residual variance, as indicated by the low heritability estimates (<0.007). Box-Cox transformation was able to decrease the dependence of the variance on the mean and decreased the estimates of genetic parameters for residual variance. The transformation reduced but did not eliminate all the genetic heterogeneity of residual variance, highlighting its presence beyond the scale effect. The DHGLM showed higher predictive ability of EBV for residual variance and therefore should be preferred over the two-step approach.

  10. Expression quantitative trait loci and genetic regulatory network analysis reveals that Gabra2 is involved in stress responses in the mouse.

    PubMed

    Dai, Jiajuan; Wang, Xusheng; Chen, Ying; Wang, Xiaodong; Zhu, Jun; Lu, Lu

    2009-11-01

    Previous studies have revealed that the subunit alpha 2 (Gabra2) of the gamma-aminobutyric acid receptor plays a critical role in the stress response. However, little is known about the gentetic regulatory network for Gabra2 and the stress response. We combined gene expression microarray analysis and quantitative trait loci (QTL) mapping to characterize the genetic regulatory network for Gabra2 expression in the hippocampus of BXD recombinant inbred (RI) mice. Our analysis found that the expression level of Gabra2 exhibited much variation in the hippocampus across the BXD RI strains and between the parental strains, C57BL/6J, and DBA/2J. Expression QTL (eQTL) mapping showed three microarray probe sets of Gabra2 to have highly significant linkage likelihood ratio statistic (LRS) scores. Gene co-regulatory network analysis showed that 10 genes, including Gria3, Chka, Drd3, Homer1, Grik2, Odz4, Prkag2, Grm5, Gabrb1, and Nlgn1 are directly or indirectly associated with stress responses. Eleven genes were implicated as Gabra2 downstream genes through mapping joint modulation. The genetical genomics approach demonstrates the importance and the potential power of the eQTL studies in identifying genetic regulatory networks that contribute to complex traits, such as stress responses.

  11. Population Structure, Genetic Diversity and Molecular Marker-Trait Association Analysis for High Temperature Stress Tolerance in Rice

    PubMed Central

    Barik, Saumya Ranjan; Sahoo, Ambika; Mohapatra, Sudipti; Nayak, Deepak Kumar; Mahender, Anumalla; Meher, Jitandriya; Anandan, Annamalai

    2016-01-01

    Rice exhibits enormous genetic diversity, population structure and molecular marker-traits associated with abiotic stress tolerance to high temperature stress. A set of breeding lines and landraces representing 240 germplasm lines were studied. Based on spikelet fertility percent under high temperature, tolerant genotypes were broadly classified into four classes. Genetic diversity indicated a moderate level of genetic base of the population for the trait studied. Wright’s F statistic estimates showed a deviation of Hardy-Weinberg expectation in the population. The analysis of molecular variance revealed 25 percent variation between population, 61 percent among individuals and 14 percent within individuals in the set. The STRUCTURE analysis categorized the entire population into three sub-populations and suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The composition of materials in the panel showed the presence of many QTLs representing the entire genome for the expression of tolerance. The strongly associated marker RM547 tagged with spikelet fertility under stress and the markers like RM228, RM205, RM247, RM242, INDEL3 and RM314 indirectly controlling the high temperature stress tolerance were detected through both mixed linear model and general linear model TASSEL analysis. These markers can be deployed as a resource for marker-assisted breeding program of high temperature stress tolerance. PMID:27494320

  12. Population Structure, Genetic Diversity and Molecular Marker-Trait Association Analysis for High Temperature Stress Tolerance in Rice.

    PubMed

    Pradhan, Sharat Kumar; Barik, Saumya Ranjan; Sahoo, Ambika; Mohapatra, Sudipti; Nayak, Deepak Kumar; Mahender, Anumalla; Meher, Jitandriya; Anandan, Annamalai; Pandit, Elssa

    2016-01-01

    Rice exhibits enormous genetic diversity, population structure and molecular marker-traits associated with abiotic stress tolerance to high temperature stress. A set of breeding lines and landraces representing 240 germplasm lines were studied. Based on spikelet fertility percent under high temperature, tolerant genotypes were broadly classified into four classes. Genetic diversity indicated a moderate level of genetic base of the population for the trait studied. Wright's F statistic estimates showed a deviation of Hardy-Weinberg expectation in the population. The analysis of molecular variance revealed 25 percent variation between population, 61 percent among individuals and 14 percent within individuals in the set. The STRUCTURE analysis categorized the entire population into three sub-populations and suggested that most of the landraces in each sub-population had a common primary ancestor with few admix individuals. The composition of materials in the panel showed the presence of many QTLs representing the entire genome for the expression of tolerance. The strongly associated marker RM547 tagged with spikelet fertility under stress and the markers like RM228, RM205, RM247, RM242, INDEL3 and RM314 indirectly controlling the high temperature stress tolerance were detected through both mixed linear model and general linear model TASSEL analysis. These markers can be deployed as a resource for marker-assisted breeding program of high temperature stress tolerance.

  13. Quantitative analysis of fetal facial morphology using 3D ultrasound and statistical shape modeling: a feasibility study.

    PubMed

    Dall'Asta, Andrea; Schievano, Silvia; Bruse, Jan L; Paramasivam, Gowrishankar; Kaihura, Christine Tita; Dunaway, David; Lees, Christoph C

    2017-07-01

    The antenatal detection of facial dysmorphism using 3-dimensional ultrasound may raise the suspicion of an underlying genetic condition but infrequently leads to a definitive antenatal diagnosis. Despite advances in array and noninvasive prenatal testing, not all genetic conditions can be ascertained from such testing. The aim of this study was to investigate the feasibility of quantitative assessment of fetal face features using prenatal 3-dimensional ultrasound volumes and statistical shape modeling. STUDY DESIGN: Thirteen normal and 7 abnormal stored 3-dimensional ultrasound fetal face volumes were analyzed, at a median gestation of 29 +4  weeks (25 +0 to 36 +1 ). The 20 3-dimensional surface meshes generated were aligned and served as input for a statistical shape model, which computed the mean 3-dimensional face shape and 3-dimensional shape variations using principal component analysis. Ten shape modes explained more than 90% of the total shape variability in the population. While the first mode accounted for overall size differences, the second highlighted shape feature changes from an overall proportionate toward a more asymmetric face shape with a wide prominent forehead and an undersized, posteriorly positioned chin. Analysis of the Mahalanobis distance in principal component analysis shape space suggested differences between normal and abnormal fetuses (median and interquartile range distance values, 7.31 ± 5.54 for the normal group vs 13.27 ± 9.82 for the abnormal group) (P = .056). This feasibility study demonstrates that objective characterization and quantification of fetal facial morphology is possible from 3-dimensional ultrasound. This technique has the potential to assist in utero diagnosis, particularly of rare conditions in which facial dysmorphology is a feature. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Genealogical analyses of rabies virus strains from Brazil based on N gene alleles.

    PubMed Central

    Heinemann, M. B.; Fernandes-Matioli, F. M. C.; Cortez, A.; Soares, R. M.; Sakamoto, S. M.; Bernardi, F.; Ito, F. H.; Madeira, A. M. B. N.; Richtzenhain, L. J.

    2002-01-01

    Thirty rabies virus isolates from cows and vampire bats from different regions of São Paulo State, Southeastern Brazil and three rabies vaccines were studied genetically. The analysis was based on direct sequencing of PCR-amplified products of 600 nucleotides coding for the amino terminus of nucleoprotein gene. The sequences were checked to verify their genealogical and evolutionary relationships and possible implication for health programmes. Statistical data indicated that there were no significant genetic differences between samples isolated from distinct hosts, from different geographical regions and between samples collected in the last two decades. According to the HKA test, the variability observed in the sequences is probably due to genetic drift. Since changes in genetic material may produce modifications in the protein responsible for immunogenicity of virus, which may eventually cause vaccine failure in herds, we suggest that continuous efforts in monitoring genetic diversity in rabies virus field strains, in relation to vaccine strains, must be conducted. PMID:12113496

  15. Assessment of diversity among populations of Rauvolfia serpentina Benth. Ex. Kurtz. from Southern Western Ghats of India, based on chemical profiling, horticultural traits and RAPD analysis.

    PubMed

    Nair, Vadakkemuriyil Divya; Raj, Rajan Pillai Dinesh; Panneerselvam, Rajaram; Gopi, Ragupathi

    2014-01-01

    Genetic, morphological and chemical variations of ten natural populations of Rauvolfia serpentina Benth. Ex. Kurtz. from Southern Western Ghats of India were assessed using RAPD markers reserpine content and morphological traits. An estimate of genetic diversity and differentiation between genotypes of breeding germplasm is of key importance for its improvement. Populations were collected from different geographical regions. Data obtained through three different methods were compared and the correlation among them was estimated. Statistical analysis showed significant differences for all horticultural characteristics among the accessions suggesting that selection for relevant characteristics could be possible. Variation in the content of Reserpine ranges from 0.192 g/100 g (population from Tusharagiri) to 1.312 g/100 g (population from Aryankavu). A high diversity within population and high genetic differentiation among them based on RAPDs were revealed caused both by habitat fragmentation of the low size of most populations and the low level of gene flow among them. The UPGMA dendrogram and PCA analysis based on reserpine content yielded higher separation among populations indicated specific adaptation of populations into clusters each of them including populations closed to their geographical origin. Genetic, chemical and morphological data were correlated based on Mantel test. Given the high differentiation among populations conservation strategies should take into account genetic diversity and chemical variation levels in relation to bioclimatic and geographic location of populations. Our results also indicate that RAPD approach along with horticultural analysis seemed to be best suited for assessing with high accuracy the genetic relationships among distinct R. serpentina accessions. © 2013.

  16. GWAMA: software for genome-wide association meta-analysis.

    PubMed

    Mägi, Reedik; Morris, Andrew P

    2010-05-28

    Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.

  17. [Genetic polymorphism of Tulipa gesneriana L. evaluated on the basis of the ISSR marking data].

    PubMed

    Kashin, A S; Kritskaya, T A; Schanzer, I A

    2016-10-01

    Using the method of ISSR analysis, the genetic diversity of 18 natural populations of Tulipa gesneriana L. from the north of the Lower Volga region was examined. The ten ISSR primers used in the study provided identification of 102 PCR fragments, of which 50 were polymorphic (49.0%). According to the proportion of polymorphic markers, two population groups were distinguished: (1) the populations in which the proportion of polymorphic markers ranged from 0.35 to 0.41; (2) the populations in which the proportion of polymorphic markers ranged from 0.64 to 0.85. UPGMA clustering analysis provided subdivision of the sample into two large clusters. The unrooted tree constructed using the Neighbor Joining algorithm had similar topology. The first cluster included slightly variable populations and the second cluster included highly variable populations. The AMOVA analysis showed statistically significant differences (F CT = 0.430; p = 0.000) between the two groups. Local populations are considerably genetically differentiated from each other (F ST = 0.632) and have almost no links via modern gene flow, as evidenced by the results of the Mantel test (r =–0.118; p = 0.819). It is suggested that the degree of genetic similarities and differences between the populations depends on the time and the species dispersal patterns on these territories.

  18. Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis.

    PubMed

    Benyamin, Beben; He, Ji; Zhao, Qiongyi; Gratten, Jacob; Garton, Fleur; Leo, Paul J; Liu, Zhijun; Mangelsdorf, Marie; Al-Chalabi, Ammar; Anderson, Lisa; Butler, Timothy J; Chen, Lu; Chen, Xiang-Ding; Cremin, Katie; Deng, Hong-Weng; Devine, Matthew; Edson, Janette; Fifita, Jennifer A; Furlong, Sarah; Han, Ying-Ying; Harris, Jessica; Henders, Anjali K; Jeffree, Rosalind L; Jin, Zi-Bing; Li, Zhongshan; Li, Ting; Li, Mengmeng; Lin, Yong; Liu, Xiaolu; Marshall, Mhairi; McCann, Emily P; Mowry, Bryan J; Ngo, Shyuan T; Pamphlett, Roger; Ran, Shu; Reutens, David C; Rowe, Dominic B; Sachdev, Perminder; Shah, Sonia; Song, Sharon; Tan, Li-Jun; Tang, Lu; van den Berg, Leonard H; van Rheenen, Wouter; Veldink, Jan H; Wallace, Robyn H; Wheeler, Lawrie; Williams, Kelly L; Wu, Jinyu; Wu, Xin; Yang, Jian; Yue, Weihua; Zhang, Zong-Hong; Zhang, Dai; Noakes, Peter G; Blair, Ian P; Henderson, Robert D; McCombe, Pamela A; Visscher, Peter M; Xu, Huji; Bartlett, Perry F; Brown, Matthew A; Wray, Naomi R; Fan, Dongsheng

    2017-09-20

    Cross-ethnic genetic studies can leverage power from differences in disease epidemiology and population-specific genetic architecture. In particular, the differences in linkage disequilibrium and allele frequency patterns across ethnic groups may increase gene-mapping resolution. Here we use cross-ethnic genetic data in sporadic amyotrophic lateral sclerosis (ALS), an adult-onset, rapidly progressing neurodegenerative disease. We report analyses of novel genome-wide association study data of 1,234 ALS cases and 2,850 controls. We find a significant association of rs10463311 spanning GPX3-TNIP1 with ALS (p = 1.3 × 10 -8 ), with replication support from two independent Australian samples (combined 576 cases and 683 controls, p = 1.7 × 10 -3 ). Both GPX3 and TNIP1 interact with other known ALS genes (SOD1 and OPTN, respectively). In addition, GGNBP2 was identified using gene-based analysis and summary statistics-based Mendelian randomization analysis, although further replication is needed to confirm this result. Our results increase our understanding of genetic aetiology of ALS.Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease. Here, Wray and colleagues identify association of the GPX3-TNIP1 locus with ALS using cross-ethnic meta-analyses.

  19. Heritabilities of Facial Measurements and Their Latent Factors in Korean Families

    PubMed Central

    Kim, Hyun-Jin; Im, Sun-Wha; Jargal, Ganchimeg; Lee, Siwoo; Yi, Jae-Hyuk; Park, Jeong-Yeon; Sung, Joohon; Cho, Sung-Il; Kim, Jong-Yeol; Kim, Jong-Il; Seo, Jeong-Sun

    2013-01-01

    Genetic studies on facial morphology targeting healthy populations are fundamental in understanding the specific genetic influences involved; yet, most studies to date, if not all, have been focused on congenital diseases accompanied by facial anomalies. To study the specific genetic cues determining facial morphology, we estimated familial correlations and heritabilities of 14 facial measurements and 3 latent factors inferred from a factor analysis in a subset of the Korean population. The study included a total of 229 individuals from 38 families. We evaluated a total of 14 facial measurements using 2D digital photographs. We performed factor analysis to infer common latent variables. The heritabilities of 13 facial measurements were statistically significant (p < 0.05) and ranged from 0.25 to 0.61. Of these, the heritability of intercanthal width in the orbital region was found to be the highest (h2 = 0.61, SE = 0.14). Three factors (lower face portion, orbital region, and vertical length) were obtained through factor analysis, where the heritability values ranged from 0.45 to 0.55. The heritability values for each factor were higher than the mean heritability value of individual original measurements. We have confirmed the genetic influence on facial anthropometric traits and suggest a potential way to categorize and analyze the facial portions into different groups. PMID:23843774

  20. Transcriptome profile and unique genetic evolution of positively selected genes in yak lungs.

    PubMed

    Lan, DaoLiang; Xiong, XianRong; Ji, WenHui; Li, Jian; Mipam, Tserang-Donko; Ai, Yi; Chai, ZhiXin

    2018-04-01

    The yak (Bos grunniens), which is a unique bovine breed that is distributed mainly in the Qinghai-Tibetan Plateau, is considered a good model for studying plateau adaptability in mammals. The lungs are important functional organs that enable animals to adapt to their external environment. However, the genetic mechanism underlying the adaptability of yak lungs to harsh plateau environments remains unknown. To explore the unique evolutionary process and genetic mechanism of yak adaptation to plateau environments, we performed transcriptome sequencing of yak and cattle (Bos taurus) lungs using RNA-Seq technology and a subsequent comparison analysis to identify the positively selected genes in the yak. After deep sequencing, a normal transcriptome profile of yak lung that containing a total of 16,815 expressed genes was obtained, and the characteristics of yak lungs transcriptome was described by functional analysis. Furthermore, Ka/Ks comparison statistics result showed that 39 strong positively selected genes are identified from yak lungs. Further GO and KEGG analysis was conducted for the functional annotation of these genes. The results of this study provide valuable data for further explorations of the unique evolutionary process of high-altitude hypoxia adaptation in yaks in the Tibetan Plateau and the genetic mechanism at the molecular level.

  1. Nonlinear Analysis of Time Series in Genome-Wide Linkage Disequilibrium Data

    NASA Astrophysics Data System (ADS)

    Hernández-Lemus, Enrique; Estrada-Gil, Jesús K.; Silva-Zolezzi, Irma; Fernández-López, J. Carlos; Hidalgo-Miranda, Alfredo; Jiménez-Sánchez, Gerardo

    2008-02-01

    The statistical study of large scale genomic data has turned out to be a very important tool in population genetics. Quantitative methods are essential to understand and implement association studies in the biomedical and health sciences. Nevertheless, the characterization of recently admixed populations has been an elusive problem due to the presence of a number of complex phenomena. For example, linkage disequilibrium structures are thought to be more complex than their non-recently admixed population counterparts, presenting the so-called ancestry blocks, admixed regions that are not yet smoothed by the effect of genetic recombination. In order to distinguish characteristic features for various populations we have implemented several methods, some of them borrowed or adapted from the analysis of nonlinear time series in statistical physics and quantitative physiology. We calculate the main fractal dimensions (Kolmogorov's capacity, information dimension and correlation dimension, usually named, D0, D1 and D2). We also have made detrended fluctuation analysis and information based similarity index calculations for the probability distribution of correlations of linkage disequilibrium coefficient of six recently admixed (mestizo) populations within the Mexican Genome Diversity Project [1] and for the non-recently admixed populations in the International HapMap Project [2]. Nonlinear correlations showed up as a consequence of internal structure within the haplotype distributions. The analysis of these correlations as well as the scope and limitations of these procedures within the biomedical sciences are discussed.

  2. SNPassoc: an R package to perform whole genome association studies.

    PubMed

    González, Juan R; Armengol, Lluís; Solé, Xavier; Guinó, Elisabet; Mercader, Josep M; Estivill, Xavier; Moreno, Víctor

    2007-03-01

    The popularization of large-scale genotyping projects has led to the widespread adoption of genetic association studies as the tool of choice in the search for single nucleotide polymorphisms (SNPs) underlying susceptibility to complex diseases. Although the analysis of individual SNPs is a relatively trivial task, when the number is large and multiple genetic models need to be explored it becomes necessary a tool to automate the analyses. In order to address this issue, we developed SNPassoc, an R package to carry out most common analyses in whole genome association studies. These analyses include descriptive statistics and exploratory analysis of missing values, calculation of Hardy-Weinberg equilibrium, analysis of association based on generalized linear models (either for quantitative or binary traits), and analysis of multiple SNPs (haplotype and epistasis analysis). Package SNPassoc is available at CRAN from http://cran.r-project.org. A tutorial is available on Bioinformatics online and in http://davinci.crg.es/estivill_lab/snpassoc.

  3. Parallel processing of genomics data

    NASA Astrophysics Data System (ADS)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  4. Cross-generational transmission from drug abuse in parents to attention-deficit/hyperactivity disorder in children

    PubMed Central

    Kendler, K. S.; Ohlsson, H.; Sundquist, K.; Sundquist, J.

    2016-01-01

    Background Attention-deficit/hyperactivity disorder (ADHD) predisposes to drug abuse (DA) and twin studies suggest shared genetic effects. We here seek to determine, using adoption and adoption-like samples, the magnitude of the cross-generational transmission from DA in parents to ADHD in their children and clarify the degree to which this arises from genetic v. rearing effects. Method We ascertained ADHD and DA from multiple Swedish registries. Statistical analysis was performed by Cox and path models. Results Risk for ADHD was significantly and similarly increased in the offspring of biological mothers and fathers with DA who did v. did not rear their offspring. Risk for ADHD was not elevated in the offspring of adoptive or step-parents with DA. Conclusions Cross-generational transmission was observed from DA in parents to ADHD in their children. An analysis of adoptive and adoptive-like parent–offspring relationships suggested that this transmission results from genetic and not from rearing effects. PMID:26928631

  5. Genetic diversity of currently circulating rubella viruses: a need to define more precise viral groups.

    PubMed

    Rivailler, P; Abernathy, E; Icenogle, J

    2017-03-01

    Recent studies have shown that the currently circulating rubella viruses are mostly members of two genotypes, 1E and 2B. Also, genetically distinct viruses of genotype 1G have been found in East and West Africa. This study used a Mantel test to objectively include both genetic diversity and geographic location in the definition of lineages, and identified statistically justified lineages (n=13) and sub-lineages (n=9) of viruses within genotypes 1G, 1E and 2B. Genotype 2B viruses were widely distributed, while viruses of genotype 1E as well as 1G and 1J were much more geographically restricted. This analysis showed that more precise groupings for rubella viruses are possible, which should improve the ability to track rubella viruses worldwide. A year-by-year analysis revealed gaps in surveillance that need to be resolved in order to support the surveillance needed for enhanced control and elimination goals for rubella.

  6. Genetic diversity of currently circulating rubella viruses: a need to define more precise viral groups

    PubMed Central

    Rivailler, P

    2017-01-01

    Recent studies have shown that the currently circulating rubella viruses are mostly members of two genotypes, 1E and 2B. Also, genetically distinct viruses of genotype 1G have been found in East and West Africa. This study used a Mantel test to objectively include both genetic diversity and geographic location in the definition of lineages, and identified statistically justified lineages (n=13) and sub-lineages (n=9) of viruses within genotypes 1G, 1E and 2B. Genotype 2B viruses were widely distributed, while viruses of genotype 1E as well as 1G and 1J were much more geographically restricted. This analysis showed that more precise groupings for rubella viruses are possible, which should improve the ability to track rubella viruses worldwide. A year-by-year analysis revealed gaps in surveillance that need to be resolved in order to support the surveillance needed for enhanced control and elimination goals for rubella. PMID:27959771

  7. Phylogeography, intraspecific structure and sex-biased dispersal of Dall's porpoise, Phocoenoides dalli, revealed by mitochondrial and microsatellite DNA analyses.

    PubMed

    Escorza-Treviño, S; Dizon, A E

    2000-08-01

    Mitochondrial DNA (mtDNA) control-region sequences and microsatellite loci length polymorphisms were used to estimate phylogeographical patterns (historical patterns underlying contemporary distribution), intraspecific population structure and gender-biased dispersal of Phocoenoides dalli dalli across its entire range. One-hundred and thirteen animals from several geographical strata were sequenced over 379 bp of mtDNA, resulting in 58 mtDNA haplotypes. Analysis using F(ST) values (based on haplotype frequencies) and phi(ST) values (based on frequencies and genetic distances between haplotypes) yielded statistically significant separation (bootstrap values P < 0.05) among most of the stocks currently used for management purposes. A minimum spanning network of haplotypes showed two very distinctive clusters, differentially occupied by western and eastern populations, with some common widespread haplotypes. This suggests some degree of phyletic radiation from west to east, superimposed on gene flow. Highly male-biased migration was detected for several population comparisons. Nuclear microsatellite DNA markers (119 individuals and six loci) provided additional support for population subdivision and gender-biased dispersal detected in the mtDNA sequences. Analysis using F(ST) values (based on allelic frequencies) yielded statistically significant separation between some, but not all, populations distinguished by mtDNA analysis. R(ST) values (based on frequencies of and genetic distance between alleles) showed no statistically significant subdivision. Again, highly male-biased dispersal was detected for all population comparisons, suggesting, together with morphological and reproductive data, the existence of sexual selection. Our molecular results argue for nine distinct dalli-type populations that should be treated as separate units for management purposes.

  8. A practical guide to environmental association analysis in landscape genomics.

    PubMed

    Rellstab, Christian; Gugerli, Felix; Eckert, Andrew J; Hancock, Angela M; Holderegger, Rolf

    2015-09-01

    Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next-generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced-representation sequencing, candidate-gene approaches, linearity of allele-environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies. © 2015 John Wiley & Sons Ltd.

  9. An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations

    PubMed Central

    Majumdar, Arunabha; Haldar, Tanushree; Bhattacharya, Sourabh; Witte, John S.

    2018-01-01

    Simultaneous analysis of genetic associations with multiple phenotypes may reveal shared genetic susceptibility across traits (pleiotropy). For a locus exhibiting overall pleiotropy, it is important to identify which specific traits underlie this association. We propose a Bayesian meta-analysis approach (termed CPBayes) that uses summary-level data across multiple phenotypes to simultaneously measure the evidence of aggregate-level pleiotropic association and estimate an optimal subset of traits associated with the risk locus. This method uses a unified Bayesian statistical framework based on a spike and slab prior. CPBayes performs a fully Bayesian analysis by employing the Markov Chain Monte Carlo (MCMC) technique Gibbs sampling. It takes into account heterogeneity in the size and direction of the genetic effects across traits. It can be applied to both cohort data and separate studies of multiple traits having overlapping or non-overlapping subjects. Simulations show that CPBayes can produce higher accuracy in the selection of associated traits underlying a pleiotropic signal than the subset-based meta-analysis ASSET. We used CPBayes to undertake a genome-wide pleiotropic association study of 22 traits in the large Kaiser GERA cohort and detected six independent pleiotropic loci associated with at least two phenotypes. This includes a locus at chromosomal region 1q24.2 which exhibits an association simultaneously with the risk of five different diseases: Dermatophytosis, Hemorrhoids, Iron Deficiency, Osteoporosis and Peripheral Vascular Disease. We provide an R-package ‘CPBayes’ implementing the proposed method. PMID:29432419

  10. A strategy to apply quantitative epistasis analysis on developmental traits.

    PubMed

    Labocha, Marta K; Yuan, Wang; Aleman-Meza, Boanerges; Zhong, Weiwei

    2017-05-15

    Genetic interactions are keys to understand complex traits and evolution. Epistasis analysis is an effective method to map genetic interactions. Large-scale quantitative epistasis analysis has been well established for single cells. However, there is a substantial lack of such studies in multicellular organisms and their complex phenotypes such as development. Here we present a method to extend quantitative epistasis analysis to developmental traits. In the nematode Caenorhabditis elegans, we applied RNA interference on mutants to inactivate two genes, used an imaging system to quantitatively measure phenotypes, and developed a set of statistical methods to extract genetic interactions from phenotypic measurement. Using two different C. elegans developmental phenotypes, body length and sex ratio, as examples, we showed that this method could accommodate various metazoan phenotypes with performances comparable to those methods in single cell growth studies. Comparing with qualitative observations, this method of quantitative epistasis enabled detection of new interactions involving subtle phenotypes. For example, several sex-ratio genes were found to interact with brc-1 and brd-1, the orthologs of the human breast cancer genes BRCA1 and BARD1, respectively. We confirmed the brc-1 interactions with the following genes in DNA damage response: C34F6.1, him-3 (ortholog of HORMAD1, HORMAD2), sdc-1, and set-2 (ortholog of SETD1A, SETD1B, KMT2C, KMT2D), validating the effectiveness of our method in detecting genetic interactions. We developed a reliable, high-throughput method for quantitative epistasis analysis of developmental phenotypes.

  11. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

  12. Association of Plasminogen Activator Inhibitor-Type 1 (-675 4G/5G) Polymorphism with Pre-Eclampsia: Systematic Review

    PubMed Central

    Morgan, Jessie A.; Bombell, Sarah; McGuire, William

    2013-01-01

    Background and Aims Excessive generation of plasminogen activator inhibitor-type 1 (PAI-1) is implicated in the pathogenesis of pre-eclampsia and related conditions. The PAI-1 (−675 4G/5G) promoter polymorphism (rs1799889) affects transcriptional activity and is a putative genetic risk factor for pre-eclampsia. The aim of this study was identify, appraise and synthesise the available evidence for the association of the PAI-1 (−675 4G/5G) polymorphism with pre-eclampsia. Methods Systematic review and random effects meta-analysis of genetic association studies. Results We found 12 eligible genetic association studies in which a total of 1511 women with pre-eclampsia, eclampsia or HELLP syndrome and 3492 controls participated. The studies were generally small (median number of cases 102, range 24 to 403) and underpowered to detect plausible association sizes. Meta-analysis of all of the studies detected statistically significant gene-disease associations in the recessive [pooled odds ratio 1.28 (95% confidence interval 1.09, 1.50); population attributable risk 7.7%] and dominant [pooled odds ratio 1.21 (95% confidence interval 1.01, 1.44); population attributable risk 13.7%] models. We did not find evidence of statistical heterogeneity, funnel plot asymmetry or small study bias. Conclusions These data suggest that the fibrinolytic pathway regulated by the PAI-1 gene may contribute to the pathogenesis of pre-eclampsia and related conditions. This association, if confirmed in larger genetic association studies, may inform research efforts to develop novel interventions or help to prioritise therapeutic targets that merit evaluation in randomised clinical trials. PMID:23457639

  13. Comparison of linear, skewed-linear, and proportional hazard models for the analysis of lambing interval in Ripollesa ewes.

    PubMed

    Casellas, J; Bach, R

    2012-06-01

    Lambing interval is a relevant reproductive indicator for sheep populations under continuous mating systems, although there is a shortage of selection programs accounting for this trait in the sheep industry. Both the historical assumption of small genetic background and its unorthodox distribution pattern have limited its implementation as a breeding objective. In this manuscript, statistical performances of 3 alternative parametrizations [i.e., symmetric Gaussian mixed linear (GML) model, skew-Gaussian mixed linear (SGML) model, and piecewise Weibull proportional hazard (PWPH) model] have been compared to elucidate the preferred methodology to handle lambing interval data. More specifically, flock-by-flock analyses were performed on 31,986 lambing interval records (257.3 ± 0.2 d) from 6 purebred Ripollesa flocks. Model performances were compared in terms of deviance information criterion (DIC) and Bayes factor (BF). For all flocks, PWPH models were clearly preferred; they generated a reduction of 1,900 or more DIC units and provided BF estimates larger than 100 (i.e., PWPH models against linear models). These differences were reduced when comparing PWPH models with different number of change points for the baseline hazard function. In 4 flocks, only 2 change points were required to minimize the DIC, whereas 4 and 6 change points were needed for the 2 remaining flocks. These differences demonstrated a remarkable degree of heterogeneity across sheep flocks that must be properly accounted for in genetic evaluation models to avoid statistical biases and suboptimal genetic trends. Within this context, all 6 Ripollesa flocks revealed substantial genetic background for lambing interval with heritabilities ranging between 0.13 and 0.19. This study provides the first evidence of the suitability of PWPH models for lambing interval analysis, clearly discarding previous parametrizations focused on mixed linear models.

  14. Impact of a chromosome X STR Decaplex in deficiency paternity cases.

    PubMed

    Trindade-Filho, Aluisio; Ferreira, Samuel; Oliveira, Silviene F

    2013-12-01

    Deficiency paternity cases, characterized by the absence of the alleged father, are a challenge for forensic genetics. Here we present four cases with a female child and a deceased alleged father in which the analysis of a set of 21 or 22 autosomal STRs (AS STRs) produced results within a range of doubt when genotyping relatives of the alleged father. Aiming to increase the Paternity Index (PI) and obtain more reliable results, a set of 10 X-linked STR markers, developed by the Spanish and Portuguese Group of the International Society for Forensic Genetics (ISFG), was then added. Statistical analysis substantially shifted the results towards the alleged fatherhood in all four cases, with more dramatic changes when the supposed half-sister and respective mother were the relatives tested.

  15. Impact of a chromosome X STR Decaplex in deficiency paternity cases

    PubMed Central

    Trindade-Filho, Aluisio; Ferreira, Samuel; Oliveira, Silviene F.

    2013-01-01

    Deficiency paternity cases, characterized by the absence of the alleged father, are a challenge for forensic genetics. Here we present four cases with a female child and a deceased alleged father in which the analysis of a set of 21 or 22 autosomal STRs (AS STRs) produced results within a range of doubt when genotyping relatives of the alleged father. Aiming to increase the Paternity Index (PI) and obtain more reliable results, a set of 10 X-linked STR markers, developed by the Spanish and Portuguese Group of the International Society for Forensic Genetics (ISFG), was then added. Statistical analysis substantially shifted the results towards the alleged fatherhood in all four cases, with more dramatic changes when the supposed half-sister and respective mother were the relatives tested. PMID:24385853

  16. Integration of statistical and physiological analyses of adaptation of near-isogenic barley lines.

    PubMed

    Romagosa, I; Fox, P N; García Del Moral, L F; Ramos, J M; García Del Moral, B; Roca de Togores, F; Molina-Cano, J L

    1993-08-01

    Seven near-isogenic barley lines, differing for three independent mutant genes, were grown in 15 environments in Spain. Genotype x environment interaction (G x E) for grain yield was examined with the Additive Main Effects and Multiplicative interaction (AMMI) model. The results of this statistical analysis of multilocation yield-data were compared with a morpho-physiological characterization of the lines at two sites (Molina-Cano et al. 1990). The first two principal component axes from the AMMI analysis were strongly associated with the morpho-physiological characters. The independent but parallel discrimination among genotypes reflects genetic differences and highlights the power of the AMMI analysis as a tool to investigate G x E. Characters which appear to be positively associated with yield in the germplasm under study could be identified for some environments.

  17. Genetic Parameters and the Impact of Off-Types for Theobroma cacao L. in a Breeding Program in Brazil

    PubMed Central

    DuVal, Ashley; Gezan, Salvador A.; Mustiga, Guiliana; Stack, Conrad; Marelli, Jean-Philippe; Chaparro, José; Livingstone, Donald; Royaert, Stefan; Motamayor, Juan C.

    2017-01-01

    Breeding programs of cacao (Theobroma cacao L.) trees share the many challenges of breeding long-living perennial crops, and genetic progress is further constrained by both the limited understanding of the inheritance of complex traits and the prevalence of technical issues, such as mislabeled individuals (off-types). To better understand the genetic architecture of cacao, in this study, 13 years of phenotypic data collected from four progeny trials in Bahia, Brazil were analyzed jointly in a multisite analysis. Three separate analyses (multisite, single site with and without off-types) were performed to estimate genetic parameters from statistical models fitted on nine important agronomic traits (yield, seed index, pod index, % healthy pods, % pods infected with witches broom, % of pods other loss, vegetative brooms, diameter, and tree height). Genetic parameters were estimated along with variance components and heritabilities from the multisite analysis, and a trial was fingerprinted with low-density SNP markers to determine the impact of off-types on estimations. Heritabilities ranged from 0.37 to 0.64 for yield and its components and from 0.03 to 0.16 for disease resistance traits. A weighted index was used to make selections for clonal evaluation, and breeding values estimated for the parental selection and estimation of genetic gain. The impact of off-types to breeding progress in cacao was assessed for the first time. Even when present at <5% of the total population, off-types altered selections by 48%, and impacted heritability estimations for all nine of the traits analyzed, including a 41% difference in estimated heritability for yield. These results show that in a mixed model analysis, even a low level of pedigree error can significantly alter estimations of genetic parameters and selections in a breeding program. PMID:29250097

  18. Smoking and caffeine consumption: a genetic analysis of their association.

    PubMed

    Treur, Jorien L; Taylor, Amy E; Ware, Jennifer J; Nivard, Michel G; Neale, Michael C; McMahon, George; Hottenga, Jouke-Jan; Baselmans, Bart M L; Boomsma, Dorret I; Munafò, Marcus R; Vink, Jacqueline M

    2017-07-01

    Smoking and caffeine consumption show a strong positive correlation, but the mechanism underlying this association is unclear. Explanations include shared genetic/environmental factors or causal effects. This study employed three methods to investigate the association between smoking and caffeine. First, bivariate genetic models were applied to data of 10 368 twins from the Netherlands Twin Register in order to estimate genetic and environmental correlations between smoking and caffeine use. Second, from the summary statistics of meta-analyses of genome-wide association studies on smoking and caffeine, the genetic correlation was calculated by LD-score regression. Third, causal effects were tested using Mendelian randomization analysis in 6605 Netherlands Twin Register participants and 5714 women from the Avon Longitudinal Study of Parents and Children. Through twin modelling, a genetic correlation of r0.47 and an environmental correlation of r0.30 were estimated between current smoking (yes/no) and coffee use (high/low). Between current smoking and total caffeine use, this was r0.44 and r0.00, respectively. LD-score regression also indicated sizeable genetic correlations between smoking and coffee use (r0.44 between smoking heaviness and cups of coffee per day, r0.28 between smoking initiation and coffee use and r0.25 between smoking persistence and coffee use). Consistent with the relatively high genetic correlations and lower environmental correlations, Mendelian randomization provided no evidence for causal effects of smoking on caffeine or vice versa. Genetic factors thus explain most of the association between smoking and caffeine consumption. These findings suggest that quitting smoking may be more difficult for heavy caffeine consumers, given their genetic susceptibility. © 2016 The Authors.Addiction Biology published by John Wiley & Sons Ltd on behalf of Society for the Study of Addiction.

  19. Smoking and caffeine consumption: a genetic analysis of their association

    PubMed Central

    Taylor, Amy E.; Ware, Jennifer J.; Nivard, Michel G.; Neale, Michael C.; McMahon, George; Hottenga, Jouke‐Jan; Baselmans, Bart M. L.; Boomsma, Dorret I.; Munafò, Marcus R.; Vink, Jacqueline M.

    2016-01-01

    Abstract Smoking and caffeine consumption show a strong positive correlation, but the mechanism underlying this association is unclear. Explanations include shared genetic/environmental factors or causal effects. This study employed three methods to investigate the association between smoking and caffeine. First, bivariate genetic models were applied to data of 10 368 twins from the Netherlands Twin Register in order to estimate genetic and environmental correlations between smoking and caffeine use. Second, from the summary statistics of meta‐analyses of genome‐wide association studies on smoking and caffeine, the genetic correlation was calculated by LD‐score regression. Third, causal effects were tested using Mendelian randomization analysis in 6605 Netherlands Twin Register participants and 5714 women from the Avon Longitudinal Study of Parents and Children. Through twin modelling, a genetic correlation of r0.47 and an environmental correlation of r0.30 were estimated between current smoking (yes/no) and coffee use (high/low). Between current smoking and total caffeine use, this was r0.44 and r0.00, respectively. LD‐score regression also indicated sizeable genetic correlations between smoking and coffee use (r0.44 between smoking heaviness and cups of coffee per day, r0.28 between smoking initiation and coffee use and r0.25 between smoking persistence and coffee use). Consistent with the relatively high genetic correlations and lower environmental correlations, Mendelian randomization provided no evidence for causal effects of smoking on caffeine or vice versa. Genetic factors thus explain most of the association between smoking and caffeine consumption. These findings suggest that quitting smoking may be more difficult for heavy caffeine consumers, given their genetic susceptibility. PMID:27027469

  20. Genetic consequences of sequential founder events by an island-colonizing bird.

    PubMed

    Clegg, Sonya M; Degnan, Sandie M; Kikkawa, Jiro; Moritz, Craig; Estoup, Arnaud; Owens, Ian P F

    2002-06-11

    The importance of founder events in promoting evolutionary changes on islands has been a subject of long-running controversy. Resolution of this debate has been hindered by a lack of empirical evidence from naturally founded island populations. Here we undertake a genetic analysis of a series of historically documented, natural colonization events by the silvereye species-complex (Zosterops lateralis), a group used to illustrate the process of island colonization in the original founder effect model. Our results indicate that single founder events do not affect levels of heterozygosity or allelic diversity, nor do they result in immediate genetic differentiation between populations. Instead, four to five successive founder events are required before indices of diversity and divergence approach that seen in evolutionarily old forms. A Bayesian analysis based on computer simulation allows inferences to be made on the number of effective founders and indicates that founder effects are weak because island populations are established from relatively large flocks. Indeed, statistical support for a founder event model was not significantly higher than for a gradual-drift model for all recently colonized islands. Taken together, these results suggest that single colonization events in this species complex are rarely accompanied by severe founder effects, and multiple founder events and/or long-term genetic drift have been of greater consequence for neutral genetic diversity.

  1. Genome-Wide Analysis in Brazilian Xavante Indians Reveals Low Degree of Admixture

    PubMed Central

    Kuhn, Patricia C.; Horimoto, Andréa R. V. Russo.; Sanches, José Maurício; Vieira Filho, João Paulo B.; Franco, Luciana; Fabbro, Amaury Dal; Franco, Laercio Joel; Pereira, Alexandre C.; Moises, Regina S

    2012-01-01

    Characterization of population genetic variation and structure can be used as tools for research in human genetics and population isolates are of great interest. The aim of the present study was to characterize the genetic structure of Xavante Indians and compare it with other populations. The Xavante, an indigenous population living in Brazilian Central Plateau, is one of the largest native groups in Brazil. A subset of 53 unrelated subjects was selected from the initial sample of 300 Xavante Indians. Using 86,197 markers, Xavante were compared with all populations of HapMap Phase III and HGDP-CEPH projects and with a Southeast Brazilian population sample to establish its population structure. Principal Components Analysis showed that the Xavante Indians are concentrated in the Amerindian axis near other populations of known Amerindian ancestry such as Karitiana, Pima, Surui and Maya and a low degree of genetic admixture was observed. This is consistent with the historical records of bottlenecks experience and cultural isolation. By calculating pair-wise Fst statistics we characterized the genetic differentiation between Xavante Indians and representative populations of the HapMap and from HGDP-CEPH project. We found that the genetic differentiation between Xavante Indians and populations of Ameridian, Asian, European, and African ancestry increased progressively. Our results indicate that the Xavante is a population that remained genetically isolated over the past decades and can offer advantages for genome-wide mapping studies of inherited disorders. PMID:22900041

  2. Genome-wide analysis in Brazilian Xavante Indians reveals low degree of admixture.

    PubMed

    Kuhn, Patricia C; Horimoto, Andréa R V Russo; Sanches, José Maurício; Vieira Filho, João Paulo B; Franco, Luciana; Fabbro, Amaury Dal; Franco, Laercio Joel; Pereira, Alexandre C; Moises, Regina S

    2012-01-01

    Characterization of population genetic variation and structure can be used as tools for research in human genetics and population isolates are of great interest. The aim of the present study was to characterize the genetic structure of Xavante Indians and compare it with other populations. The Xavante, an indigenous population living in Brazilian Central Plateau, is one of the largest native groups in Brazil. A subset of 53 unrelated subjects was selected from the initial sample of 300 Xavante Indians. Using 86,197 markers, Xavante were compared with all populations of HapMap Phase III and HGDP-CEPH projects and with a Southeast Brazilian population sample to establish its population structure. Principal Components Analysis showed that the Xavante Indians are concentrated in the Amerindian axis near other populations of known Amerindian ancestry such as Karitiana, Pima, Surui and Maya and a low degree of genetic admixture was observed. This is consistent with the historical records of bottlenecks experience and cultural isolation. By calculating pair-wise F(st) statistics we characterized the genetic differentiation between Xavante Indians and representative populations of the HapMap and from HGDP-CEPH project. We found that the genetic differentiation between Xavante Indians and populations of Ameridian, Asian, European, and African ancestry increased progressively. Our results indicate that the Xavante is a population that remained genetically isolated over the past decades and can offer advantages for genome-wide mapping studies of inherited disorders.

  3. Analysis of the quantitative dermatoglyphics of the digito-palmar complex in patients with multiple sclerosis.

    PubMed

    Supe, S; Milicić, J; Pavićević, R

    1997-06-01

    Recent studies on the etiopathogenesis of multiple sclerosis (MS) all point out that there is a polygenetical predisposition for this illness. The so called "MS Trait" determines the reactivity of the immunological system upon ecological factors. The development of the glyphological science and the study of the characteristics of the digito-palmar dermatoglyphic complex (for which it was established that they are polygenetically determined characteristics) all enable a better insight into the genetic development during early embriogenesis. The aim of this study was to estimate certain differences in the dermatoglyphics of digito-palmar complexes between the group with multiple sclerosis and the comparable, phenotypically healthy groups of both sexes. This study is based on the analysis of 18 quantitative characteristics of the digito-palmar complex in 125 patients with multiple sclerosis (41 males and 84 females) in comparison to a group of 400 phenotypically healthy patients (200 males and 200 females). The conducted analysis pointed towards a statistically significant decrease of the number of digital and palmar ridges, as well as with lower values of atd angles in a group of MS patients of both sexes. The main discriminators were the characteristic palmar dermatoglyphics with the possibility that the discriminate analysis classifies over 80% of the examinees which exceeds the statistical significance. The results of this study suggest a possible discrimination of patients with MS and the phenotypically health population through the analysis of the dermatoglyphic status, and therefore the possibility that multiple sclerosis is genetically predisposed disease.

  4. Holo-analysis.

    PubMed

    Rosen, G D

    2006-06-01

    Meta-analysis is a vague descriptor used to encompass very diverse methods of data collection analysis, ranging from simple averages to more complex statistical methods. Holo-analysis is a fully comprehensive statistical analysis of all available data and all available variables in a specified topic, with results expressed in a holistic factual empirical model. The objectives and applications of holo-analysis include software production for prediction of responses with confidence limits, translation of research conditions to praxis (field) circumstances, exposure of key missing variables, discovery of theoretically unpredictable variables and interactions, and planning future research. Holo-analyses are cited as examples of the effects on broiler feed intake and live weight gain of exogenous phytases, which account for 70% of variation in responses in terms of 20 highly significant chronological, dietary, environmental, genetic, managemental, and nutrient variables. Even better future accountancy of variation will be facilitated if and when authors of papers routinely provide key data for currently neglected variables, such as temperatures, complete feed formulations, and mortalities.

  5. EPS-LASSO: Test for High-Dimensional Regression Under Extreme Phenotype Sampling of Continuous Traits.

    PubMed

    Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen

    2018-01-25

    Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  6. Detecting Genomic Clustering of Risk Variants from Sequence Data: Cases vs. Controls

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

    2013-01-01

    As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method – Tango’s statistic – to genomic sequence data. An advantage of Tango’s method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled chi-square distribution, making computation of p-values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test (SKAT). Although our version of Tango’s statistic, which we call “Kernel Distance” statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff’s scan statistic had the greatest power over a range of clustering scenarios. PMID:23842950

  7. Supporting Regularized Logistic Regression Privately and Efficiently.

    PubMed

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

  8. Supporting Regularized Logistic Regression Privately and Efficiently

    PubMed Central

    Li, Wenfa; Liu, Hongzhe; Yang, Peng; Xie, Wei

    2016-01-01

    As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc. PMID:27271738

  9. Genetic toxicology at the crossroads-from qualitative hazard evaluation to quantitative risk assessment.

    PubMed

    White, Paul A; Johnson, George E

    2016-05-01

    Applied genetic toxicology is undergoing a transition from qualitative hazard identification to quantitative dose-response analysis and risk assessment. To facilitate this change, the Health and Environmental Sciences Institute (HESI) Genetic Toxicology Technical Committee (GTTC) sponsored a workshop held in Lancaster, UK on July 10-11, 2014. The event included invited speakers from several institutions and the contents was divided into three themes-1: Point-of-departure Metrics for Quantitative Dose-Response Analysis in Genetic Toxicology; 2: Measurement and Estimation of Exposures for Better Extrapolation to Humans and 3: The Use of Quantitative Approaches in Genetic Toxicology for human health risk assessment (HHRA). A host of pertinent issues were discussed relating to the use of in vitro and in vivo dose-response data, the development of methods for in vitro to in vivo extrapolation and approaches to use in vivo dose-response data to determine human exposure limits for regulatory evaluations and decision-making. This Special Issue, which was inspired by the workshop, contains a series of papers that collectively address topics related to the aforementioned themes. The Issue includes contributions that collectively evaluate, describe and discuss in silico, in vitro, in vivo and statistical approaches that are facilitating the shift from qualitative hazard evaluation to quantitative risk assessment. The use and application of the benchmark dose approach was a central theme in many of the workshop presentations and discussions, and the Special Issue includes several contributions that outline novel applications for the analysis and interpretation of genetic toxicity data. Although the contents of the Special Issue constitutes an important step towards the adoption of quantitative methods for regulatory assessment of genetic toxicity, formal acceptance of quantitative methods for HHRA and regulatory decision-making will require consensus regarding the relationships between genetic damage and disease, and the concomitant ability to use genetic toxicity results per se. © Her Majesty the Queen in Right of Canada 2016. Reproduced with the permission of the Minister of Health.

  10. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

    PubMed Central

    Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

    2015-01-01

    Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481

  11. Intra-articular decorin influences the fibrosis genetic expression profile in a rabbit model of joint contracture.

    PubMed

    Abdel, M P; Morrey, M E; Barlow, J D; Grill, D E; Kolbert, C P; An, K N; Steinmann, S P; Morrey, B F; Sanchez-Sotelo, J

    2014-01-01

    The goal of this study was to determine whether intra-articular administration of the potentially anti-fibrotic agent decorin influences the expression of genes involved in the fibrotic cascade, and ultimately leads to less contracture, in an animal model. A total of 18 rabbits underwent an operation on their right knees to form contractures. Six limbs in group 1 received four intra-articular injections of decorin; six limbs in group 2 received four intra-articular injections of bovine serum albumin (BSA) over eight days; six limbs in group 3 received no injections. The contracted limbs of rabbits in group 1 were biomechanically and genetically compared with the contracted limbs of rabbits in groups 2 and 3, with the use of a calibrated joint measuring device and custom microarray, respectively. There was no statistical difference in the flexion contracture angles between those limbs that received intra-articular decorin versus those that received intra-articular BSA (66° vs 69°; p = 0.41). Likewise, there was no statistical difference between those limbs that received intra-articular decorin versus those who had no injection (66° vs 72°; p = 0.27). When compared with BSA, decorin led to a statistically significant increase in the mRNA expression of 12 genes (p < 0.01). In addition, there was a statistical change in the mRNA expression of three genes, when compared with those without injection. In this model, when administered intra-articularly at eight weeks, 2 mg of decorin had no significant effect on joint contractures. However, our genetic analysis revealed a significant alteration in several fibrotic genes. Cite this article: Bone Joint Res 2014;3:82-8.

  12. A System-Level Pathway-Phenotype Association Analysis Using Synthetic Feature Random Forest

    PubMed Central

    Pan, Qinxin; Hu, Ting; Malley, James D.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.

    2015-01-01

    As the cost of genome-wide genotyping decreases, the number of genome-wide association studies (GWAS) has increased considerably. However, the transition from GWAS findings to the underlying biology of various phenotypes remains challenging. As a result, due to its system-level interpretability, pathway analysis has become a popular tool for gaining insights on the underlying biology from high-throughput genetic association data. In pathway analyses, gene sets representing particular biological processes are tested for significant associations with a given phenotype. Most existing pathway analysis approaches rely on single-marker statistics and assume that pathways are independent of each other. As biological systems are driven by complex biomolecular interactions, embracing the complex relationships between single-nucleotide polymorphisms (SNPs) and pathways needs to be addressed. To incorporate the complexity of gene-gene interactions and pathway-pathway relationships, we propose a system-level pathway analysis approach, synthetic feature random forest (SF-RF), which is designed to detect pathway-phenotype associations without making assumptions about the relationships among SNPs or pathways. In our approach, the genotypes of SNPs in a particular pathway are aggregated into a synthetic feature representing that pathway via Random Forest (RF). Multiple synthetic features are analyzed using RF simultaneously and the significance of a synthetic feature indicates the significance of the corresponding pathway. We further complement SF-RF with pathway-based Statistical Epistasis Network (SEN) analysis that evaluates interactions among pathways. By investigating the pathway SEN, we hope to gain additional insights into the genetic mechanisms contributing to the pathway-phenotype association. We apply SF-RF to a population-based genetic study of bladder cancer and further investigate the mechanisms that help explain the pathway-phenotype associations using SEN. The bladder cancer associated pathways we found are both consistent with existing biological knowledge and reveal novel and plausible hypotheses for future biological validations. PMID:24535726

  13. Genetic dissection of main and epistatic effects of QTL based on augmented triple test cross design

    PubMed Central

    Zhang, Zheng; Dai, Zhijun; Chen, Yuan; Yuan, Xiong; Yuan, Zheming; Tang, Wenbang; Li, Lanzhi; Hu, Zhongli

    2017-01-01

    The use of heterosis has considerably increased the productivity of many crops; however, the biological mechanism underpinning the technique remains elusive. The North Carolina design III (NCIII) and the triple test cross (TTC) are powerful and popular genetic mating design that can be used to decipher the genetic basis of heterosis. However, when using the NCIII design with the present quantitative trait locus (QTL) mapping method, if epistasis exists, the estimated additive or dominant effects are confounded with epistatic effects. Here, we propose a two-step approach to dissect all genetic effects of QTL and digenic interactions on a whole genome without sacrificing statistical power based on an augmented TTC (aTTC) design. Because the aTTC design has more transformation combinations than do the NCIII and TTC designs, it greatly enriches the QTL mapping for studying heterosis. When the basic population comprises recombinant inbred lines (RIL), we can use the same materials in the NCIII design for aTTC-design QTL mapping with transformation combination Z1, Z2, and Z4 to obtain genetic effect of QTL and digenic interactions. Compared with RIL-based TTC design, RIL-based aTTC design saves time, money, and labor for basic population crossed with F1. Several Monte Carlo simulation studies were carried out to confirm the proposed approach; the present genetic parameters could be identified with high statistical power, precision, and calculation speed, even at small sample size or low heritability. Additionally, two elite rice hybrid datasets for nine agronomic traits were estimated for real data analysis. We dissected the genetic effects and calculated the dominance degree of each QTL and digenic interaction. Real mapping results suggested that the dominance degree in Z2 that mainly characterize heterosis showed overdominance and dominance for QTL and digenic interactions. Dominance and overdominance were the major genetic foundations of heterosis in rice. PMID:29240818

  14. Inference and Analysis of Population Structure Using Genetic Data and Network Theory.

    PubMed

    Greenbaum, Gili; Templeton, Alan R; Bar-David, Shirli

    2016-04-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). Copyright © 2016 by the Genetics Society of America.

  15. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association.

    PubMed

    el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J

    2007-09-24

    In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.

  16. DNA analysis in Disaster Victim Identification.

    PubMed

    Montelius, Kerstin; Lindblom, Bertil

    2012-06-01

    DNA profiling and matching is one of the primary methods to identify missing persons in a disaster, as defined by the Interpol Disaster Victim Identification Guide. The process to identify a victim by DNA includes: the collection of the best possible ante-mortem (AM) samples, the choice of post-mortem (PM) samples, DNA-analysis, matching and statistical weighting of the genetic relationship or match. Each disaster has its own scenario, and each scenario defines its own methods for identification of the deceased.

  17. Genetic diversity and population structure of the Guinea pig (Cavia porcellus, Rodentia, Caviidae) in Colombia.

    PubMed

    Burgos-Paz, William; Cerón-Muñoz, Mario; Solarte-Portilla, Carlos

    2011-10-01

    The aim was to establish the genetic diversity and population structure of three guinea pig lines, from seven production zones located in Nariño, southwest Colombia. A total of 384 individuals were genotyped with six microsatellite markers. The measurement of intrapopulation diversity revealed allelic richness ranging from 3.0 to 6.56, and observed heterozygosity (Ho) from 0.33 to 0.60, with a deficit in heterozygous individuals. Although statistically significant (p < 0.05), genetic differentiation between population pairs was found to be low. Genetic distance, as well as clustering of guinea-pig lines and populations, coincided with the historical and geographical distribution of the populations. Likewise, high genetic identity between improved and native lines was established. An analysis of group probabilistic assignment revealed that each line should not be considered as a genetically homogeneous group. The findings corroborate the absorption of native genetic material into the improved line introduced into Colombia from Peru. It is necessary to establish conservation programs for native-line individuals in Nariño, and control genealogical and production records in order to reduce the inbreeding values in the populations.

  18. Genetic diversity and population structure of the Guinea pig (Cavia porcellus, Rodentia, Caviidae) in Colombia

    PubMed Central

    Burgos-Paz, William; Cerón-Muñoz, Mario; Solarte-Portilla, Carlos

    2011-01-01

    The aim was to establish the genetic diversity and population structure of three guinea pig lines, from seven production zones located in Nariño, southwest Colombia. A total of 384 individuals were genotyped with six microsatellite markers. The measurement of intrapopulation diversity revealed allelic richness ranging from 3.0 to 6.56, and observed heterozygosity (Ho) from 0.33 to 0.60, with a deficit in heterozygous individuals. Although statistically significant (p < 0.05), genetic differentiation between population pairs was found to be low. Genetic distance, as well as clustering of guinea-pig lines and populations, coincided with the historical and geographical distribution of the populations. Likewise, high genetic identity between improved and native lines was established. An analysis of group probabilistic assignment revealed that each line should not be considered as a genetically homogeneous group. The findings corroborate the absorption of native genetic material into the improved line introduced into Colombia from Peru. It is necessary to establish conservation programs for native-line individuals in Nariño, and control genealogical and production records in order to reduce the inbreeding values in the populations. PMID:22215979

  19. Evaluation of the genetic overlap between osteoarthritis with body mass index and height using genome-wide association scan data.

    PubMed

    Elliott, Katherine S; Chapman, Kay; Day-Williams, Aaron; Panoutsopoulou, Kalliope; Southam, Lorraine; Lindgren, Cecilia M; Arden, Nigel; Aslam, Nadim; Birrell, Fraser; Carluke, Ian; Carr, Andrew; Deloukas, Panos; Doherty, Michael; Loughlin, John; McCaskie, Andrew; Ollier, William E R; Rai, Ashok; Ralston, Stuart; Reed, Mike R; Spector, Timothy D; Valdes, Ana M; Wallis, Gillian A; Wilkinson, Mark; Zeggini, Eleftheria

    2013-06-01

    Obesity as measured by body mass index (BMI) is one of the major risk factors for osteoarthritis. In addition, genetic overlap has been reported between osteoarthritis and normal adult height variation. We investigated whether this relationship is due to a shared genetic aetiology on a genome-wide scale. We compared genetic association summary statistics (effect size, p value) for BMI and height from the GIANT consortium genome-wide association study (GWAS) with genetic association summary statistics from the arcOGEN consortium osteoarthritis GWAS. Significance was evaluated by permutation. Replication of osteoarthritis association of the highlighted signals was investigated in an independent dataset. Phenotypic information of height and BMI was accounted for in a separate analysis using osteoarthritis-free controls. We found significant overlap between osteoarthritis and height (p=3.3×10(-5) for signals with p≤0.05) when the GIANT and arcOGEN GWAS were compared. For signals with p≤0.001 we found 17 shared signals between osteoarthritis and height and four between osteoarthritis and BMI. However, only one of the height or BMI signals that had shown evidence of association with osteoarthritis in the arcOGEN GWAS was also associated with osteoarthritis in the independent dataset: rs12149832, within the FTO gene (combined p=2.3×10(-5)). As expected, this signal was attenuated when we adjusted for BMI. We found a significant excess of shared signals between both osteoarthritis and height and osteoarthritis and BMI, suggestive of a common genetic aetiology. However, only one signal showed association with osteoarthritis when followed up in a new dataset.

  20. Evaluation of the genetic overlap between osteoarthritis with body mass index and height using genome-wide association scan data

    PubMed Central

    Elliott, Katherine S; Chapman, Kay; Day-Williams, Aaron; Panoutsopoulou, Kalliope; Southam, Lorraine; Lindgren, Cecilia M; Arden, Nigel; Aslam, Nadim; Birrell, Fraser; Carluke, Ian; Carr, Andrew; Deloukas, Panos; Doherty, Michael; Loughlin, John; McCaskie, Andrew; Ollier, William E R; Rai, Ashok; Ralston, Stuart; Reed, Mike R; Spector, Timothy D; Valdes, Ana M; Wallis, Gillian A; Wilkinson, Mark; Zeggini, Eleftheria

    2013-01-01

    Objectives Obesity as measured by body mass index (BMI) is one of the major risk factors for osteoarthritis. In addition, genetic overlap has been reported between osteoarthritis and normal adult height variation. We investigated whether this relationship is due to a shared genetic aetiology on a genome-wide scale. Methods We compared genetic association summary statistics (effect size, p value) for BMI and height from the GIANT consortium genome-wide association study (GWAS) with genetic association summary statistics from the arcOGEN consortium osteoarthritis GWAS. Significance was evaluated by permutation. Replication of osteoarthritis association of the highlighted signals was investigated in an independent dataset. Phenotypic information of height and BMI was accounted for in a separate analysis using osteoarthritis-free controls. Results We found significant overlap between osteoarthritis and height (p=3.3×10−5 for signals with p≤0.05) when the GIANT and arcOGEN GWAS were compared. For signals with p≤0.001 we found 17 shared signals between osteoarthritis and height and four between osteoarthritis and BMI. However, only one of the height or BMI signals that had shown evidence of association with osteoarthritis in the arcOGEN GWAS was also associated with osteoarthritis in the independent dataset: rs12149832, within the FTO gene (combined p=2.3×10−5). As expected, this signal was attenuated when we adjusted for BMI. Conclusions We found a significant excess of shared signals between both osteoarthritis and height and osteoarthritis and BMI, suggestive of a common genetic aetiology. However, only one signal showed association with osteoarthritis when followed up in a new dataset. PMID:22956599

  1. Recent progress and challenges in population genetics of polyploid organisms: an overview of current state-of-the-art molecular and statistical tools.

    PubMed

    Dufresne, France; Stift, Marc; Vergilino, Roland; Mable, Barbara K

    2014-01-01

    Despite the importance of polyploidy and the increasing availability of new genomic data, there remain important gaps in our knowledge of polyploid population genetics. These gaps arise from the complex nature of polyploid data (e.g. multiple alleles and loci, mixed inheritance patterns, association between ploidy and mating system variation). Furthermore, many of the standard tools for population genetics that have been developed for diploids are often not feasible for polyploids. This review aims to provide an overview of the state-of-the-art in polyploid population genetics and to identify the main areas where further development of molecular techniques and statistical theory is required. We review commonly used molecular tools (amplified fragment length polymorphism, microsatellites, Sanger sequencing, next-generation sequencing and derived technologies) and their challenges associated with their use in polyploid populations: that is, allele dosage determination, null alleles, difficulty of distinguishing orthologues from paralogues and copy number variation. In addition, we review the approaches that have been used for population genetic analysis in polyploids and their specific problems. These problems are in most cases directly associated with dosage uncertainty and the problem of inferring allele frequencies and assumptions regarding inheritance. This leads us to conclude that for advancing the field of polyploid population genetics, most priority should be given to development of new molecular approaches that allow efficient dosage determination, and to further development of analytical approaches to circumvent dosage uncertainty and to accommodate 'flexible' modes of inheritance. In addition, there is a need for more simulation-based studies that test what kinds of biases could result from both existing and novel approaches. © 2013 John Wiley & Sons Ltd.

  2. Short communication: Estimates of heritabilities and genetic correlations among milk fatty acid unsaturation indices in Canadian Holsteins.

    PubMed

    Bilal, G; Cue, R I; Mustafa, A F; Hayes, J F

    2012-12-01

    The objectives of the present study were to estimate genetic parameters of milk fatty acid unsaturation indices in Canadian Holsteins. Data were available on milk fatty acid composition of 2,573 Canadian Holstein cows from 46 commercial herds enrolled in the Québec Dairy Production Centre of Expertise, Valacta (Sainte-Anne-de-Bellevue, Quebec, Canada). Individual fatty acid percentages (g/100 g of total fatty acids) were determined for each milk sample by gas chromatography. The unsaturation indices were calculated as the ratio of an unsaturated fatty acid to the sum of that unsaturated fatty acid and its corresponding substrate fatty acid, multiplied by 100. A mixed linear model was fitted under REML for the statistical analysis of milk fatty acid unsaturation indices. The statistical model included the fixed effects of parity, age at calving, and stage of lactation, each nested within parity, and the random effects of herd-year-season of calving, animal, and residual. Estimates of heritabilities for the C14, C16, C18, conjugated linoleic acid, and total unsaturation indices were 0.48, 0.25, 0.29, 0.14, and 0.19, respectively. Phenotypic and genetic correlation estimates among unsaturation indices were all positive and ranged from 0.20 to 0.65 and 0.23 to 0.81, respectively. The estimates of heritabilities and genetic correlations for milk fatty acid unsaturation indices suggest that genetic variation exists among cows in milk fatty acid unsaturation, and the proportions of desirable unsaturated fatty acids from a human health point of view may be increased in bovine milk through genetic selection. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  3. EDENetworks: a user-friendly software to build and analyse networks in biogeography, ecology and population genetics.

    PubMed

    Kivelä, Mikko; Arnaud-Haond, Sophie; Saramäki, Jari

    2015-01-01

    The recent application of graph-based network theory analysis to biogeography, community ecology and population genetics has created a need for user-friendly software, which would allow a wider accessibility to and adaptation of these methods. EDENetworks aims to fill this void by providing an easy-to-use interface for the whole analysis pipeline of ecological and evolutionary networks starting from matrices of species distributions, genotypes, bacterial OTUs or populations characterized genetically. The user can choose between several different ecological distance metrics, such as Bray-Curtis or Sorensen distance, or population genetic metrics such as FST or Goldstein distances, to turn the raw data into a distance/dissimilarity matrix. This matrix is then transformed into a network by manual or automatic thresholding based on percolation theory or by building the minimum spanning tree. The networks can be visualized along with auxiliary data and analysed with various metrics such as degree, clustering coefficient, assortativity and betweenness centrality. The statistical significance of the results can be estimated either by resampling the original biological data or by null models based on permutations of the data. © 2014 John Wiley & Sons Ltd.

  4. Characterization of genome-wide microsatellite markers in rabbitfishes, an important resource for artisanal fisheries in the Indo-West Pacific.

    PubMed

    Kiper, Ilkser Erdem; Bloomer, Paulette; Borsa, Philippe; Hoareau, Thierry Bernard

    2018-02-01

    Rabbitfishes are reef-associated fishes that support local fisheries throughout the Indo-West Pacific region. Sound management of the resource requires the development of molecular tools for appropriate stock delimitation of the different species in the family. Microsatellite markers were developed for the cordonnier, Siganus sutor, and their potential for cross-amplification was investigated in 12 congeneric species. A library of 792 repeat-containing sequences was built. Nineteen sets of newly developed primers, and 14 universal finfish microsatellites were tested in S. sutor. Amplification success of the 19 Siganus-specific markers ranged from 32 to 79% in the 12 other Siganus species, slightly decreasing when the genetic distance of the target species to S. sutor increased. Seventeen of these markers were polymorphic in S. sutor and were further assayed in S. luridus, S. rivulatus, and S. spinus, of which respectively 9, 10 and 8 were polymorphic. Statistical power analysis and an analysis of molecular variance showed that subtle genetic differentiation can be detected using these markers, highlighting their utility for the study of genetic diversity and population genetic structure in rabbitfishes.

  5. Highly Pathogenic H5N1 Avian Influenza Viruses Exhibit Few Barriers to Gene Flow in Vietnam

    PubMed Central

    Carrel, Margaret; Wan, Xiu-Feng; Nguyen, Tung; Emch, Michael

    2013-01-01

    Locating areas where genetic change is inhibited can illuminate underlying processes that drive evolution of pathogens. The persistence of highly pathogenic H5N1 avian influenza in Vietnam since 2003, and the continuous molecular evolution of Vietnamese avian influenza viruses, indicates that local environmental factors are supportive not only of incidence but also of viral adaptation. This article explores whether gene flow is constant across Vietnam, or whether there exist boundary areas where gene flow exhibits discontinuity. Using a dataset of 125 highly pathogenic H5N1 avian influenza viruses, principal components analysis and wombling analysis are used to indicate the location, magnitude, and statistical significance of genetic boundaries. Results show that a small number of geographically minor boundaries to gene flow in highly pathogenic H5N1 avian influenza viruses exist in Vietnam, but that overall there is little division in genetic exchange. This suggests that differences in genetic characteristics of viruses from one region to another are not the result of barriers to H5N1 viral exchange in Vietnam, and that H5N1 avian influenza is able to spread relatively unimpeded across the country. PMID:22350419

  6. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  7. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  8. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  9. The geography of malaria genetics in the Democratic Republic of Congo: A complex and fragmented landscape

    PubMed Central

    Carrel, Margaret; Patel, Jaymin; Taylor, Steve M.; Janko, Mark; Mwandagalirwa, Melchior Kashamuka; Tshefu, Antoinette K.; Escalante, Ananias A.; McCollum, Andrea; Alam, Md Tauqeer; Udhayakumar, Venkatachalam; Meshnick, Steven; Emch, Michael

    2014-01-01

    Understanding how malaria parasites move between populations is important, particularly given the potential for malaria to be reintroduced into areas where it was previously eliminated. We examine the distribution of malaria genetics across seven sites within the Democratic Republic of Congo (DRC) and two nearby countries, Ghana and Kenya, in order to understand how the relatedness of malaria parasites varies across space, and whether there are barriers to the flow of malaria parasites within the DRC or across borders. Parasite DNA was retrieved from dried blood spots from 7 Demographic and Health Survey sample clusters in the DRC. Malaria genetic characteristics of parasites from Ghana and Kenya were also obtained. For each of 9 geographic sites (7 DRC, 1 Ghana and 1 Kenya), a pair-wise RST statistic was calculated, indicating the genetic distance between malaria parasites found in those locations. Mapping genetics across the spatial extent of the study area indicates a complex genetic landscape, where relatedness between two proximal sites may be relatively high (RST > 0.64) or low (RST < 0.05), and where distal sites also exhibit both high and low genetic similarity. Mantel’s tests suggest that malaria genetics differ as geographic distances increase. Principal Coordinate Analysis suggests that genetically related samples are not co-located. Barrier analysis reveals no significant barriers to gene flow between locations. Malaria genetics in the DRC have a complex and fragmented landscape. Limited exchange of genes across space is reflected in greater genetic distance between malaria parasites isolated at greater geographic distances. There is, however, evidence for close genetic ties between distally located sample locations, indicating that movement of malaria parasites and flow of genes is being driven by factors other than distance decay. This research demonstrates the contributions that spatial disease ecology and landscape genetics can make to understanding the evolutionary dynamics of infectious diseases. PMID:25459204

  10. Meta-analysis reveals PTPN22 1858C/T polymorphism confers susceptibility to rheumatoid arthritis in Caucasian but not in Asian population.

    PubMed

    Nabi, Gowher; Akhter, Naseem; Wahid, Mohd; Bhatia, Kanchan; Mandal, Raju Kumar; Dar, Sajad Ahmad; Jawed, Arshad; Haque, Shafiul

    2016-01-01

    The PTPN22 1858C/T polymorphism is associated with rheumatoid arthritis (RA). However, reports from the Asian populations are conflicting in nature and lacks consensus. The aim of our study was to evaluate the association between the PTPN22 1858C/T polymorphism and RA in Asian and Caucasian subjects by carrying out a meta-analysis of Asian and Caucasian data. A total of 27 205 RA cases and 27 677 controls were considered in the present meta-analysis involving eight Asian and 35 Caucasian studies. The pooled odds ratios (ORs) were performed for the allele, dominant, and recessive genetic model. No statistically significant association was found between the PTPN22 1858C/T polymorphism and risk of RA in Asian population (allele genetic model: OR = 1.217, 95% confidence interval (CI) = 0.99-1.496, p value 0.061; dominant genetic model: OR = 1.238, 95% CI = 0.982-1.562, p value 0.071; recessive genetic model: OR = 1.964, 95% CI = 0.678-5.693, p value 0.213). A significant association with risk of RA in Caucasian population suggesting that T-- allele does confer susceptibility to RA in this subgroup was observed (allele genetic model: OR = 1.638, 95% CI = 1.574-1.705, p value < 0.0001; dominant genetic model: OR = 1.67, 95% CI = 1.598-1.745, p value < 0.0001; recessive genetic model: OR = 2.65, 95% CI = 2.273-3.089, p value < 0.0001). The PTPN22 1858C/T polymorphism is not associated with RA risk in Asian populations. However, our meta-analysis confirms that the PTPN22 1858C/T polymorphism is associated with RA susceptibility in Caucasians.

  11. ACCELERATED FAILURE TIME MODELS PROVIDE A USEFUL STATISTICAL FRAMEWORK FOR AGING RESEARCH

    PubMed Central

    Swindell, William R.

    2009-01-01

    Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model “deceleration factor”. AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data. PMID:19007875

  12. Accelerated failure time models provide a useful statistical framework for aging research.

    PubMed

    Swindell, William R

    2009-03-01

    Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model "deceleration factor". AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data.

  13. Evaluation of androgen receptor gene as a candidate gene in female androgenetic alopecia.

    PubMed

    el-Samahy, May H; Shaheen, Maha A; Saddik, Dina E B; Abdel-Fattah, Nermeen S A; el-Sawi, Mohammad A; Mahran, Manal Z; Shehab, Abeer A A

    2009-06-01

    Genetic polymorphisms of the androgen receptor (AR) gene have been studied in male androgenetic alopecia (AGA); however, little is known about gene polymorphism and female AGA. To evaluate the AR gene as a candidate gene for female AGA. Thirty premenopausal Egyptian female patients with AGA (mean age, 32.3 +/- 7 years) and 11 age- and sex-matched controls were included. All subjects underwent laboratory and pelvic ultrasound evaluation to exclude other precipitating cause(s) of hair loss. Scalp biopsy was taken and the AR gene was evaluated using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). According to Ludwig's classification, all patients had type II AGA. Statistical analysis showed no statistically significant difference in genotype (chi(2) = 5.513, P > or = 0.05) or allele frequency (chi(2) = 1.312, P > or = 0.05) between patients and controls. There was also no statistically significant difference between the genotype and allele frequency with disease duration. In contrast with male AGA, no association was found between type II AGA in Egyptian women and the AR gene. Therefore, the genetic study of this gene does not serve as a biomarker for the identification of women with a predisposition to AGA.

  14. Neurosphere and adherent culture conditions are equivalent for malignant glioma stem cell lines.

    PubMed

    Rahman, Maryam; Reyner, Karina; Deleyrolle, Loic; Millette, Sebastien; Azari, Hassan; Day, Bryan W; Stringer, Brett W; Boyd, Andrew W; Johns, Terrance G; Blot, Vincent; Duggal, Rohit; Reynolds, Brent A

    2015-03-01

    Certain limitations of the neurosphere assay (NSA) have resulted in a search for alternative culture techniques for brain tumor-initiating cells (TICs). Recently, reports have described growing glioblastoma (GBM) TICs as a monolayer using laminin. We performed a side-by-side analysis of the NSA and laminin (adherent) culture conditions to compare the growth and expansion of GBM TICs. GBM cells were grown using the NSA and adherent culture conditions. Comparisons were made using growth in culture, apoptosis assays, protein expression, limiting dilution clonal frequency assay, genetic affymetrix analysis, and tumorigenicity in vivo. In vitro expansion curves for the NSA and adherent culture conditions were virtually identical (P=0.24) and the clonogenic frequencies (5.2% for NSA vs. 5.0% for laminin, P=0.9) were similar as well. Likewise, markers of differentiation (glial fibrillary acidic protein and beta tubulin III) and proliferation (Ki67 and MCM2) revealed no statistical difference between the sphere and attachment methods. Several different methods were used to determine the numbers of dead or dying cells (trypan blue, DiIC, caspase-3, and annexin V) with none of the assays noting a meaningful variance between the two methods. In addition, genetic expression analysis with microarrays revealed no significant differences between the two groups. Finally, glioma cells derived from both methods of expansion formed large invasive tumors exhibiting GBM features when implanted in immune-compromised animals. A detailed functional, protein and genetic characterization of human GBM cells cultured in serum-free defined conditions demonstrated no statistically meaningful differences when grown using sphere (NSA) or adherent conditions. Hence, both methods are functionally equivalent and remain suitable options for expanding primary high-grade gliomas in tissue culture.

  15. Neurosphere and adherent culture conditions are equivalent for malignant glioma stem cell lines

    PubMed Central

    Reyner, Karina; Deleyrolle, Loic; Millette, Sebastien; Azari, Hassan; Day, Bryan W.; Stringer, Brett W.; Boyd, Andrew W.; Johns, Terrance G.; Blot, Vincent; Duggal, Rohit; Reynolds, Brent A.

    2015-01-01

    Certain limitations of the neurosphere assay (NSA) have resulted in a search for alternative culture techniques for brain tumor-initiating cells (TICs). Recently, reports have described growing glioblastoma (GBM) TICs as a monolayer using laminin. We performed a side-by-side analysis of the NSA and laminin (adherent) culture conditions to compare the growth and expansion of GBM TICs. GBM cells were grown using the NSA and adherent culture conditions. Comparisons were made using growth in culture, apoptosis assays, protein expression, limiting dilution clonal frequency assay, genetic affymetrix analysis, and tumorigenicity in vivo. In vitro expansion curves for the NSA and adherent culture conditions were virtually identical (P=0.24) and the clonogenic frequencies (5.2% for NSA vs. 5.0% for laminin, P=0.9) were similar as well. Likewise, markers of differentiation (glial fibrillary acidic protein and beta tubulin III) and proliferation (Ki67 and MCM2) revealed no statistical difference between the sphere and attachment methods. Several different methods were used to determine the numbers of dead or dying cells (trypan blue, DiIC, caspase-3, and annexin V) with none of the assays noting a meaningful variance between the two methods. In addition, genetic expression analysis with microarrays revealed no significant differences between the two groups. Finally, glioma cells derived from both methods of expansion formed large invasive tumors exhibiting GBM features when implanted in immune-compromised animals. A detailed functional, protein and genetic characterization of human GBM cells cultured in serum-free defined conditions demonstrated no statistically meaningful differences when grown using sphere (NSA) or adherent conditions. Hence, both methods are functionally equivalent and remain suitable options for expanding primary high-grade gliomas in tissue culture. PMID:25806119

  16. Genetics Home Reference: JAK3-deficient severe combined immunodeficiency

    MedlinePlus

    ... of a genetic condition? Genetic and Rare Diseases Information Center Frequency JAK3 -deficient SCID accounts for an estimated 7 to 14 percent of cases of SCID. The prevalence of SCID from all genetic causes combined is approximately 1 in ... Information What information about a genetic condition can statistics ...

  17. Spatial analyses for nonoverlapping objects with size variations and their application to coral communities.

    PubMed

    Muko, Soyoka; Shimatani, Ichiro K; Nozawa, Yoko

    2014-07-01

    Spatial distributions of individuals are conventionally analysed by representing objects as dimensionless points, in which spatial statistics are based on centre-to-centre distances. However, if organisms expand without overlapping and show size variations, such as is the case for encrusting corals, interobject spacing is crucial for spatial associations where interactions occur. We introduced new pairwise statistics using minimum distances between objects and demonstrated their utility when examining encrusting coral community data. We also calculated the conventional point process statistics and the grid-based statistics to clarify the advantages and limitations of each spatial statistical method. For simplicity, coral colonies were approximated by disks in these demonstrations. Focusing on short-distance effects, the use of minimum distances revealed that almost all coral genera were aggregated at a scale of 1-25 cm. However, when fragmented colonies (ramets) were treated as a genet, a genet-level analysis indicated weak or no aggregation, suggesting that most corals were randomly distributed and that fragmentation was the primary cause of colony aggregations. In contrast, point process statistics showed larger aggregation scales, presumably because centre-to-centre distances included both intercolony spacing and colony sizes (radius). The grid-based statistics were able to quantify the patch (aggregation) scale of colonies, but the scale was strongly affected by the colony size. Our approach quantitatively showed repulsive effects between an aggressive genus and a competitively weak genus, while the grid-based statistics (covariance function) also showed repulsion although the spatial scale indicated from the statistics was not directly interpretable in terms of ecological meaning. The use of minimum distances together with previously proposed spatial statistics helped us to extend our understanding of the spatial patterns of nonoverlapping objects that vary in size and the associated specific scales. © 2013 The Authors. Journal of Animal Ecology © 2013 British Ecological Society.

  18. Robustness of meta-analyses in finding gene × environment interactions

    PubMed Central

    Shi, Gang; Nehorai, Arye

    2017-01-01

    Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796

  19. Methodological approaches in analysing observational data: A practical example on how to address clustering and selection bias.

    PubMed

    Trutschel, Diana; Palm, Rebecca; Holle, Bernhard; Simon, Michael

    2017-11-01

    Because not every scientific question on effectiveness can be answered with randomised controlled trials, research methods that minimise bias in observational studies are required. Two major concerns influence the internal validity of effect estimates: selection bias and clustering. Hence, to reduce the bias of the effect estimates, more sophisticated statistical methods are needed. To introduce statistical approaches such as propensity score matching and mixed models into representative real-world analysis and to conduct the implementation in statistical software R to reproduce the results. Additionally, the implementation in R is presented to allow the results to be reproduced. We perform a two-level analytic strategy to address the problems of bias and clustering: (i) generalised models with different abilities to adjust for dependencies are used to analyse binary data and (ii) the genetic matching and covariate adjustment methods are used to adjust for selection bias. Hence, we analyse the data from two population samples, the sample produced by the matching method and the full sample. The different analysis methods in this article present different results but still point in the same direction. In our example, the estimate of the probability of receiving a case conference is higher in the treatment group than in the control group. Both strategies, genetic matching and covariate adjustment, have their limitations but complement each other to provide the whole picture. The statistical approaches were feasible for reducing bias but were nevertheless limited by the sample used. For each study and obtained sample, the pros and cons of the different methods have to be weighted. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  20. Applications of statistical physics and information theory to the analysis of DNA sequences

    NASA Astrophysics Data System (ADS)

    Grosse, Ivo

    2000-10-01

    DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.

  1. Melanocortin-1 receptor, skin cancer and phenotypic characteristics (M-SKIP) project: study design and methods for pooling results of genetic epidemiological studies

    PubMed Central

    2012-01-01

    Background For complex diseases like cancer, pooled-analysis of individual data represents a powerful tool to investigate the joint contribution of genetic, phenotypic and environmental factors to the development of a disease. Pooled-analysis of epidemiological studies has many advantages over meta-analysis, and preliminary results may be obtained faster and with lower costs than with prospective consortia. Design and methods Based on our experience with the study design of the Melanocortin-1 receptor (MC1R) gene, SKin cancer and Phenotypic characteristics (M-SKIP) project, we describe the most important steps in planning and conducting a pooled-analysis of genetic epidemiological studies. We then present the statistical analysis plan that we are going to apply, giving particular attention to methods of analysis recently proposed to account for between-study heterogeneity and to explore the joint contribution of genetic, phenotypic and environmental factors in the development of a disease. Within the M-SKIP project, data on 10,959 skin cancer cases and 14,785 controls from 31 international investigators were checked for quality and recoded for standardization. We first proposed to fit the aggregated data with random-effects logistic regression models. However, for the M-SKIP project, a two-stage analysis will be preferred to overcome the problem regarding the availability of different study covariates. The joint contribution of MC1R variants and phenotypic characteristics to skin cancer development will be studied via logic regression modeling. Discussion Methodological guidelines to correctly design and conduct pooled-analyses are needed to facilitate application of such methods, thus providing a better summary of the actual findings on specific fields. PMID:22862891

  2. Perceived knowledge and clinical comfort with genetics among Taiwanese nurses enrolled in a RN-to-BSN program.

    PubMed

    Hsiao, Chiu-Yueh; Lee, Shu-Hsin; Chen, Suh-Jen; Lin, Shu-Chin

    2013-08-01

    Advances in genetics have had a profound impact on health care. Yet, many nurses, as well as other health care providers, have limited genetic knowledge and feel uncomfortable integrating genetics into their practice. Very little is known about perceived genetic knowledge and clinical comfort among Taiwanese nurses enrolled in a Registered Nurse to Bachelor of Science in Nursing program. To examine perceived knowledge and clinical comfort with genetics among Taiwanese nurses enrolled in a Registered Nurse to Bachelor of Science in Nursing program and to assess how genetics has been integrated into their past and current nursing programs. The study also sought to examine correlations among perceived knowledge, integration of genetics into the nursing curriculum, and clinical comfort with genetics. A descriptive, cross-sectional study. Taiwanese nurses enrolled in a Registered Nurse to Bachelor of Science in Nursing program were recruited. A total of 190 of 220 nurses returned the completed survey (86.36% response rate). Descriptive statistics and the Pearson product-moment correlation were used for data analysis. Most nurses indicated limited perceived knowledge and clinical comfort with genetics. Curricular hours focused on genetics in a current nursing program were greater than those in past nursing programs. The use of genetic materials, attendance at genetic workshops and conferences, and clinically relevant genetics in nursing practice significantly related with perceived knowledge and clinical comfort with genetics. However, there were no correlations between prior genetic-based health care, perceived knowledge, and clinical comfort with genetics. This study demonstrated the need for emphasizing genetic education and practice to ensure health-related professionals become knowledgeable about genetic information. Given the rapidly developing genetic revolution, nurses and other health care providers need to utilize genetic discoveries to optimize health outcomes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  3. Multivariate Analysis of Genotype-Phenotype Association.

    PubMed

    Mitteroecker, Philipp; Cheverud, James M; Pavlicev, Mihaela

    2016-04-01

    With the advent of modern imaging and measurement technology, complex phenotypes are increasingly represented by large numbers of measurements, which may not bear biological meaning one by one. For such multivariate phenotypes, studying the pairwise associations between all measurements and all alleles is highly inefficient and prevents insight into the genetic pattern underlying the observed phenotypes. We present a new method for identifying patterns of allelic variation (genetic latent variables) that are maximally associated-in terms of effect size-with patterns of phenotypic variation (phenotypic latent variables). This multivariate genotype-phenotype mapping (MGP) separates phenotypic features under strong genetic control from less genetically determined features and thus permits an analysis of the multivariate structure of genotype-phenotype association, including its dimensionality and the clustering of genetic and phenotypic variables within this association. Different variants of MGP maximize different measures of genotype-phenotype association: genetic effect, genetic variance, or heritability. In an application to a mouse sample, scored for 353 SNPs and 11 phenotypic traits, the first dimension of genetic and phenotypic latent variables accounted for >70% of genetic variation present in all 11 measurements; 43% of variation in this phenotypic pattern was explained by the corresponding genetic latent variable. The first three dimensions together sufficed to account for almost 90% of genetic variation in the measurements and for all the interpretable genotype-phenotype association. Each dimension can be tested as a whole against the hypothesis of no association, thereby reducing the number of statistical tests from 7766 to 3-the maximal number of meaningful independent tests. Important alleles can be selected based on their effect size (additive or nonadditive effect on the phenotypic latent variable). This low dimensionality of the genotype-phenotype map has important consequences for gene identification and may shed light on the evolvability of organisms. Copyright © 2016 by the Genetics Society of America.

  4. Genes and gene networks implicated in aggression related behaviour.

    PubMed

    Malki, Karim; Pain, Oliver; Du Rietz, Ebba; Tosto, Maria Grazia; Paya-Cano, Jose; Sandnabba, Kenneth N; de Boer, Sietse; Schalkwyk, Leonard C; Sluyter, Frans

    2014-10-01

    Aggressive behaviour is a major cause of mortality and morbidity. Despite of moderate heritability estimates, progress in identifying the genetic factors underlying aggressive behaviour has been limited. There are currently three genetic mouse models of high and low aggression created using selective breeding. This is the first study to offer a global transcriptomic characterization of the prefrontal cortex across all three genetic mouse models of aggression. A systems biology approach has been applied to transcriptomic data across the three pairs of selected inbred mouse strains (Turku Aggressive (TA) and Turku Non-Aggressive (TNA), Short Attack Latency (SAL) and Long Attack Latency (LAL) mice and North Carolina Aggressive (NC900) and North Carolina Non-Aggressive (NC100)), providing novel insight into the neurobiological mechanisms and genetics underlying aggression. First, weighted gene co-expression network analysis (WGCNA) was performed to identify modules of highly correlated genes associated with aggression. Probe sets belonging to gene modules uncovered by WGCNA were carried forward for network analysis using ingenuity pathway analysis (IPA). The RankProd non-parametric algorithm was then used to statistically evaluate expression differences across the genes belonging to modules significantly associated with aggression. IPA uncovered two pathways, involving NF-kB and MAPKs. The secondary RankProd analysis yielded 14 differentially expressed genes, some of which have previously been implicated in pathways associated with aggressive behaviour, such as Adrbk2. The results highlighted plausible candidate genes and gene networks implicated in aggression-related behaviour.

  5. Quantitative genetic analysis of the body composition and blood pressure association in two ethnically diverse populations.

    PubMed

    Ghosh, Sudipta; Dosaev, Tasbulat; Prakash, Jai; Livshits, Gregory

    2017-04-01

    The major aim of this study was to conduct comparative quantitative-genetic analysis of the body composition (BCP) and somatotype (STP) variation, as well as their correlations with blood pressure (BP) in two ethnically, culturally and geographically different populations: Santhal, indigenous ethnic group from India and Chuvash, indigenous population from Russia. Correspondently two pedigree-based samples were collected from 1,262 Santhal and1,558 Chuvash individuals, respectively. At the first stage of the study, descriptive statistics and a series of univariate regression analyses were calculated. Finally, multiple and multivariate regression (MMR) analyses, with BP measurements as dependent variables and age, sex, BCP and STP as independent variables were carried out in each sample separately. The significant and independent covariates of BP were identified and used for re-examination in pedigree-based variance decomposition analysis. Despite clear and significant differences between the populations in BCP/STP, both Santhal and Chuvash were found to be predominantly mesomorphic irrespective of their sex. According to MMR analyses variation of BP significantly depended on age and mesomorphic component in both samples, and in addition on sex, ectomorphy and fat mass index in Santhal and on fat free mass index in Chuvash samples, respectively. Additive genetic component contributes to a substantial proportion of blood pressure and body composition variance. Variance component analysis in addition to above mentioned results suggests that additive genetic factors influence BP and BCP/STP associations significantly. © 2017 Wiley Periodicals, Inc.

  6. Logical and Methodological Issues Affecting Genetic Studies of Humans Reported in Top Neuroscience Journals.

    PubMed

    Grabitz, Clara R; Button, Katherine S; Munafò, Marcus R; Newbury, Dianne F; Pernet, Cyril R; Thompson, Paul A; Bishop, Dorothy V M

    2018-01-01

    Genetics and neuroscience are two areas of science that pose particular methodological problems because they involve detecting weak signals (i.e., small effects) in noisy data. In recent years, increasing numbers of studies have attempted to bridge these disciplines by looking for genetic factors associated with individual differences in behavior, cognition, and brain structure or function. However, different methodological approaches to guarding against false positives have evolved in the two disciplines. To explore methodological issues affecting neurogenetic studies, we conducted an in-depth analysis of 30 consecutive articles in 12 top neuroscience journals that reported on genetic associations in nonclinical human samples. It was often difficult to estimate effect sizes in neuroimaging paradigms. Where effect sizes could be calculated, the studies reporting the largest effect sizes tended to have two features: (i) they had the smallest samples and were generally underpowered to detect genetic effects, and (ii) they did not fully correct for multiple comparisons. Furthermore, only a minority of studies used statistical methods for multiple comparisons that took into account correlations between phenotypes or genotypes, and only nine studies included a replication sample or explicitly set out to replicate a prior finding. Finally, presentation of methodological information was not standardized and was often distributed across Methods sections and Supplementary Material, making it challenging to assemble basic information from many studies. Space limits imposed by journals could mean that highly complex statistical methods were described in only a superficial fashion. In summary, methods that have become standard in the genetics literature-stringent statistical standards, use of large samples, and replication of findings-are not always adopted when behavioral, cognitive, or neuroimaging phenotypes are used, leading to an increased risk of false-positive findings. Studies need to correct not just for the number of phenotypes collected but also for the number of genotypes examined, genetic models tested, and subsamples investigated. The field would benefit from more widespread use of methods that take into account correlations between the factors corrected for, such as spectral decomposition, or permutation approaches. Replication should become standard practice; this, together with the need for larger sample sizes, will entail greater emphasis on collaboration between research groups. We conclude with some specific suggestions for standardized reporting in this area.

  7. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    PubMed

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  8. Pathway Analysis in Attention Deficit Hyperactivity Disorder: An Ensemble Approach

    PubMed Central

    Mooney, Michael A.; McWeeney, Shannon K.; Faraone, Stephen V.; Hinney, Anke; Hebebrand, Johannes; Nigg, Joel T.; Wilmot, Beth

    2016-01-01

    Despite a wealth of evidence for the role of genetics in attention deficit hyperactivity disorder (ADHD), specific and definitive genetic mechanisms have not been identified. Pathway analyses, a subset of gene-set analyses, extend the knowledge gained from genome-wide association studies (GWAS) by providing functional context for genetic associations. However, there are numerous methods for association testing of gene sets and no real consensus regarding the best approach. The present study applied six pathway analysis methods to identify pathways associated with ADHD in two GWAS datasets from the Psychiatric Genomics Consortium. Methods that utilize genotypes to model pathway-level effects identified more replicable pathway associations than methods using summary statistics. In addition, pathways implicated by more than one method were significantly more likely to replicate. A number of brain-relevant pathways, such as RhoA signaling, glycosaminoglycan biosynthesis, fibroblast growth factor receptor activity, and pathways containing potassium channel genes, were nominally significant by multiple methods in both datasets. These results support previous hypotheses about the role of regulation of neurotransmitter release, neurite outgrowth and axon guidance in contributing to the ADHD phenotype and suggest the value of cross-method convergence in evaluating pathway analysis results. PMID:27004716

  9. Introduction to bioinformatics.

    PubMed

    Can, Tolga

    2014-01-01

    Bioinformatics is an interdisciplinary field mainly involving molecular biology and genetics, computer science, mathematics, and statistics. Data intensive, large-scale biological problems are addressed from a computational point of view. The most common problems are modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution usually involves the following steps: Collect statistics from biological data. Build a computational model. Solve a computational modeling problem. Test and evaluate a computational algorithm. This chapter gives a brief introduction to bioinformatics by first providing an introduction to biological terminology and then discussing some classical bioinformatics problems organized by the types of data sources. Sequence analysis is the analysis of DNA and protein sequences for clues regarding function and includes subproblems such as identification of homologs, multiple sequence alignment, searching sequence patterns, and evolutionary analyses. Protein structures are three-dimensional data and the associated problems are structure prediction (secondary and tertiary), analysis of protein structures for clues regarding function, and structural alignment. Gene expression data is usually represented as matrices and analysis of microarray data mostly involves statistics analysis, classification, and clustering approaches. Biological networks such as gene regulatory networks, metabolic pathways, and protein-protein interaction networks are usually modeled as graphs and graph theoretic approaches are used to solve associated problems such as construction and analysis of large-scale networks.

  10. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation

    PubMed Central

    Ferguson, John; Wheeler, William; Fu, YiPing; Prokunina-Olsson, Ludmila; Zhao, Hongyu; Sampson, Joshua

    2013-01-01

    With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave). PMID:23092956

  11. Novel canine circovirus strains from Thailand: Evidence for genetic recombination.

    PubMed

    Piewbang, Chutchai; Jo, Wendy K; Puff, Christina; van der Vries, Erhard; Kesdangsakonwut, Sawang; Rungsipipat, Anudep; Kruppa, Jochen; Jung, Klaus; Baumgärtner, Wolfgang; Techangamsuwan, Somporn; Ludlow, Martin; Osterhaus, Albert D M E

    2018-05-14

    Canine circoviruses (CanineCV's), belonging to the genus Circovirus of the Circoviridae family, were detected by next generation sequencing in samples from Thai dogs with respiratory symptoms. Genetic characterization and phylogenetic analysis of nearly complete CanineCV genomes suggested that natural recombination had occurred among different lineages of CanineCV's. Similarity plot and bootscaning analyses indicated that American and Chinese viruses had served as major and minor parental viruses, respectively. Positions of recombination breakpoints were estimated using maximum-likelihood frameworks with statistical significant testing. The putative recombination event was located in the Replicase gene, intersecting with open reading frame-3. Analysis of nucleotide changes confirmed the origin of the recombination event. This is the first description of naturally occurring recombinant CanineCV's that have resulted in the circulation of newly emerging CanineCV lineages.

  12. Statistical considerations for plot design, sampling procedures, analysis, and quality assurance of ozone injury studies

    Treesearch

    Michael Arbaugh; Larry Bednar

    1996-01-01

    The sampling methods used to monitor ozone injury to ponderosa and Jeffrey pines depend on the objectives of the study, geographic and genetic composition of the forest, and the source and composition of air pollutant emissions. By using a standardized sampling methodology, it may be possible to compare conditions within local areas more accurately, and to apply the...

  13. Genetic variability and resistance of cultivars of cowpea [Vigna unguiculata (L.) Walp] to cowpea weevil (Callosobruchus maculatus Fabr.).

    PubMed

    Vila Nova, M X; Leite, N G A; Houllou, L M; Medeiros, L V; Lira Neto, A C; Hsie, B S; Borges-Paluch, L R; Santos, B S; Araujo, C S F; Rocha, A A; Costa, A F

    2014-03-31

    The cowpea weevil (Callosobruchus maculatus Fabr.) is the most destructive pest of the cowpea bean; it reduces seed quality. To control this pest, resistance testing combined with genetic analysis using molecular markers has been widely applied in research. Among the markers that show reliable results, the inter-simple sequence repeats (ISSRs) (microsatellites) are noteworthy. This study was performed to evaluate the resistance of 27 cultivars of cowpea bean to cowpea weevil. We tested the resistance related to the genetic variability of these cultivars using ISSR markers. To analyze the resistance of cultivars to weevil, a completely randomized test design with 4 replicates and 27 treatments was adopted. Five pairs of the insect were placed in 30 grains per replicate. Analysis of variance showed that the number of eggs and emerged insects were significantly different in the treatments, and the means were compared by statistical tests. The analysis of the large genetic variability in all cultivars resulted in the formation of different groups. The test of resistance showed that the cultivar Inhuma was the most sensitive to both number of eggs and number of emerged adults, while the TE96-290-12-G and MNC99-537-F4 (BRS Tumucumaque) cultivars were the least sensitive to the number of eggs and the number of emerged insects, respectively.

  14. Populational genetic structure of free-living maned wolves (Chrysocyon brachyurus) determined by proteic markers.

    PubMed

    De Mattos, P S R; Del Lama, M A; Toppa, R H; Schwantes, A R

    2004-08-01

    Electrophoretic analysis of presumptive twenty gene loci products was conducted in hemolisates and plasma samples of twenty-eight maned wolves (Chrysocyon brachyurus) from an area in northeastern São Paulo State, Brazil. The area sampled was divided into three sub-areas, with the Mogi-Guaçu and Pardo rivers regarded as barriers to the gene flow. The polymorphism degree and heterozygosity level (intralocus and average) estimated in this study were similar to those detected by other authors for maned wolves and other species of wild free-living canids. The samples of each sub-area and the total sample exhibited genotype frequencies consistent with the genetic equilibrium model. The values of the F-statistics evidenced absence of inbreeding and population subdivision and, consequently, low genetic distances were found among the samples of each area.

  15. Trans-Ethnic Meta-Analysis Identifies Common and Rare Variants Associated with Hepatocyte Growth Factor Levels in the Multi-Ethnic Study of Atherosclerosis (MESA)

    PubMed Central

    Larson, Nicholas B.; Berardi, Cecilia; Decker, Paul A.; Wassel, Christina L.; Kirsch, Phillip S.; Pankow, James S.; Sale, Michele M.; de Andrade, Mariza; Sicotte, Hugues; Tang, Weihong; Hanson, Naomi Q.; Tsai, Michael Y.; Taylor, Kent D.; Bielinski, Suzette J.

    2015-01-01

    Summary Hepatocyte growth factor (HGF) is a mesenchyme-derived pleiotropic factor that regulates cell growth, motility, mitogenesis, and morphogenesis in a variety of cells, and increased serum levels of HGF have been linked to a number of clinical and subclinical cardiovascular disease phenotypes. However, little is currently known regarding what genetic factors influence HGF levels, despite evidence of substantial genetic contributions to HGF variation. Based upon ethnicity-stratified single-variant association analysis and trans-ethnic meta-analysis of 6201 participants of the Multi-Ethnic Study of Atherosclerosis (MESA), we discovered five statistically significant common and low-frequency variants: HGF missense polymorphism rs5745687 (p.E299K) as well as four variants (rs16844364, rs4690098, rs114303452, rs3748034) within or in proximity to HGFAC. We also identified two significant ethnicity-specific gene-level associations (A1BG in African Americans; FASN in Chinese Americans) based upon low-frequency/rare variants, while meta-analysis of gene-level results identified a significant association for HGFAC. However, identified single-variant associations explained modest proportions of the total trait variation and were not significantly associated with coronary artery calcium or coronary heart disease. Our findings indicate genetic factors influencing circulating HGF levels may be complex and ethnically diverse. PMID:25998175

  16. Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis

    PubMed Central

    2009-01-01

    In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects. PMID:20018032

  17. Association between DAOA gene polymorphisms and the risk of schizophrenia, bipolar disorder and depressive disorder.

    PubMed

    Tan, Jinjing; Lin, Yu; Su, Li; Yan, Yan; Chen, Qing; Jiang, Haiyun; Wei, Qiugui; Gu, Lian

    2014-06-03

    Schizophrenia (SCZ), bipolar disorder (BD) and depressive disorder (DD) are common psychiatric disorders, which show common genetic vulnerability. Previous gene-disease association studies have reported correlations between d-amino acid oxidase activator (DAOA) gene polymorphisms and the three psychiatric disorders. However, the findings were contradictory. A meta-analysis was therefore conducted to provide more robust investigations into DAOA polymorphisms and the risk of SCZ, BD and DD. This meta-analysis recruited 46 published studies up to July 2013, including 17,515 cases and 25,189 controls. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to evaluate the association between three specific DAOA SNPs and SCZ, BD and DD. Publication bias was tested by Begg's test and funnel plot, and heterogeneity was assessed by the Cochran's chi-square-based Q statistic and the inconsistency index (I(2)). Moreover, the robustness of the findings was estimated by cumulative meta-analysis. DAOA genetic polymorphisms (M15, M18 and M23) were not found to confer a statistically significant increased risk of SCZ, BD or DD in the overall sample, or in Caucasians and Asians following subgroup analysis. The current study indicated that M15, M18 and M23 might not be the risk factor for SCZ, BD or DD. However, further studies are required to provide robust evidence to estimate the association between DAOA polymorphisms and psychiatric disorders. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. dartr: An r package to facilitate analysis of SNP data generated from reduced representation genome sequencing.

    PubMed

    Gruber, Bernd; Unmack, Peter J; Berry, Oliver F; Georges, Arthur

    2018-05-01

    Although vast technological advances have been made and genetic software packages are growing in number, it is not a trivial task to analyse SNP data. We announce a new r package, dartr, enabling the analysis of single nucleotide polymorphism data for population genomic and phylogenomic applications. dartr provides user-friendly functions for data quality control and marker selection, and permits rigorous evaluations of conformation to Hardy-Weinberg equilibrium, gametic-phase disequilibrium and neutrality. The package reports standard descriptive statistics, permits exploration of patterns in the data through principal components analysis and conducts standard F-statistics, as well as basic phylogenetic analyses, population assignment, isolation by distance and exports data to a variety of commonly used downstream applications (e.g., newhybrids, faststructure and phylogeny applications) outside of the r environment. The package serves two main purposes: first, a user-friendly approach to lower the hurdle to analyse such data-therefore, the package comes with a detailed tutorial targeted to the r beginner to allow data analysis without requiring deep knowledge of r. Second, we use a single, well-established format-genlight from the adegenet package-as input for all our functions to avoid data reformatting. By strictly using the genlight format, we hope to facilitate this format as the de facto standard of future software developments and hence reduce the format jungle of genetic data sets. The dartr package is available via the r CRAN network and GitHub. © 2017 John Wiley & Sons Ltd.

  19. Effect of genetic polymorphisms on development of gout.

    PubMed

    Urano, Wako; Taniguchi, Atsuo; Inoue, Eisuke; Sekita, Chieko; Ichikawa, Naomi; Koseki, Yumi; Kamatani, Naoyuki; Yamanaka, Hisashi

    2013-08-01

    To validate the association between genetic polymorphisms and gout in Japanese patients, and to investigate the cumulative effects of multiple genetic factors on the development of gout. Subjects were 153 Japanese male patients with gout and 532 male controls. The genotypes of 11 polymorphisms in the 10 genes that have been indicated to be associated with serum uric acid levels or gout were determined. The cumulative effects of the genetic polymorphisms were investigated using a weighted genotype risk score (wGRS) based on the number of risk alleles and the OR for gout. A model to discriminate between patients with gout and controls was constructed by incorporating the wGRS and clinical factors. C statistics method was applied to evaluate the capability of the model to discriminate gout patients from controls. Seven polymorphisms were shown to be associated with gout. The mean wGRS was significantly higher in patients with gout (15.2 ± 2.01) compared to controls (13.4 ± 2.10; p < 0.0001). The C statistic for the model using genetic information alone was 0.72, while the C statistic was 0.81 for the full model that incorporated all genetic and clinical factors. Accumulation of multiple genetic factors is associated with the development of gout. A prediction model for gout that incorporates genetic and clinical factors may be useful for identifying individuals who are at risk of gout.

  20. Gene variations in sex hormone pathways and the risk of testicular germ cell tumour: a case-parent triad study in a Norwegian-Swedish population.

    PubMed

    Kristiansen, W; Andreassen, K E; Karlsson, R; Aschim, E L; Bremnes, R M; Dahl, O; Fosså, S D; Klepp, O; Langberg, C W; Solberg, A; Tretli, S; Adami, H-O; Wiklund, F; Grotmol, T; Haugen, T B

    2012-05-01

    Testicular germ cell tumour (TGCT) is the most common cancer in young men, and an imbalance between the estrogen and androgen levels in utero is hypothesized to influence TGCT risk. Thus, polymorphisms in genes involved in the action of sex hormones may contribute to variability in an individual's susceptibility to TGCT. We conducted a Norwegian-Swedish case-parent study. A total of 105 single-nucleotide polymorphisms (SNPs) in 20 sex hormone pathway genes were genotyped using Sequenom MassArray iPLEX Gold, in 831 complete triads and 474 dyads. To increase the statistical power, the analysis was expanded to include 712 case singletons and 3922 Swedish controls, thus including triads, dyads and the case-control samples in a single test for association. Analysis for allelic associations was performed with the UNPHASED program, using a likelihood-based association test for nuclear families with missing data, and odds ratios (ORs) and 95% confidence intervals (CIs) were calculated. False discovery rate (FDR) was used to adjust for multiple testing. Five genetic variants across the ESR2 gene [encoding estrogen receptor beta (ERβ)] were statistically significantly associated with the risk of TGCT. In the case-parent analysis, the markers rs12434245 and rs10137185 were associated with a reduced risk of TGCT (OR = 0.66 and 0.72, respectively; both FDRs <5%), whereas rs2978381 and rs12435857 were associated with an increased risk of TGCT (OR = 1.21 and 1.19, respectively; both FDRs <5%). In the combined case-parent/case-control analysis, rs12435857 and rs10146204 were associated with an increased risk of TGCT (OR = 1.15 and 1.13, respectively; both FDRs <5%), whereas rs10137185 was associated with a reduced risk of TGCT (OR = 0.79, FDR <5%). In addition, we found that three genetic variants in CYP19A1 (encoding aromatase) were statistically significantly associated with the risk of TGCT in the case-parent analysis. The T alleles of the rs2414099, rs8025374 and rs3751592 SNPs were associated with an increased risk of TGCT (OR = 1.30, 1.30 and 1.21, respectively; all FDRs <5%). We found no statistically significant differences in allelic effect estimates between parental inherited genetic variation in the sex hormone pathways and TGCT risk in the offspring, and no evidence of heterogeneity between seminomas and non-seminomas, or between the Norwegian and the Swedish population, in any of the SNPs examined. Our findings provide support for ERβ and aromatase being implicated in the aetiology of TGCT. Exploring the functional role of the TGCT risk-associated SNPs will further elucidate the biological mechanisms involved.

  1. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  2. Impact and quantification of the sources of error in DNA pooling designs.

    PubMed

    Jawaid, A; Sham, P

    2009-01-01

    The analysis of genome wide variation offers the possibility of unravelling the genes involved in the pathogenesis of disease. Genome wide association studies are also particularly useful for identifying and validating targets for therapeutic intervention as well as for detecting markers for drug efficacy and side effects. The cost of such large-scale genetic association studies may be reduced substantially by the analysis of pooled DNA from multiple individuals. However, experimental errors inherent in pooling studies lead to a potential increase in the false positive rate and a loss in power compared to individual genotyping. Here we quantify various sources of experimental error using empirical data from typical pooling experiments and corresponding individual genotyping counts using two statistical methods. We provide analytical formulas for calculating these different errors in the absence of complete information, such as replicate pool formation, and for adjusting for the errors in the statistical analysis. We demonstrate that DNA pooling has the potential of estimating allele frequencies accurately, and adjusting the pooled allele frequency estimates for differential allelic amplification considerably improves accuracy. Estimates of the components of error show that differential allelic amplification is the most important contributor to the error variance in absolute allele frequency estimation, followed by allele frequency measurement and pool formation errors. Our results emphasise the importance of minimising experimental errors and obtaining correct error estimates in genetic association studies.

  3. Genetic Epidemiology of Glucose-6-Dehydrogenase Deficiency in the Arab World.

    PubMed

    Doss, C George Priya; Alasmar, Dima R; Bux, Reem I; Sneha, P; Bakhsh, Fadheela Dad; Al-Azwani, Iman; Bekay, Rajaa El; Zayed, Hatem

    2016-11-17

    A systematic search was implemented using four literature databases (PubMed, Embase, Science Direct and Web of Science) to capture all the causative mutations of Glucose-6-phosphate dehydrogenase (G6PD) deficiency (G6PDD) in the 22 Arab countries. Our search yielded 43 studies that captured 33 mutations (23 missense, one silent, two deletions, and seven intronic mutations), in 3,430 Arab patients with G6PDD. The 23 missense mutations were then subjected to phenotypic classification using in silico prediction tools, which were compared to the WHO pathogenicity scale as a reference. These in silico tools were tested for their predicting efficiency using rigorous statistical analyses. Of the 23 missense mutations, p.S188F, p.I48T, p.N126D, and p.V68M, were identified as the most common mutations among Arab populations, but were not unique to the Arab world, interestingly, our search strategy found four other mutations (p.N135T, p.S179N, p.R246L, and p.Q307P) that are unique to Arabs. These mutations were exposed to structural analysis and molecular dynamics simulation analysis (MDSA), which predicting these mutant forms as potentially affect the enzyme function. The combination of the MDSA, structural analysis, and in silico predictions and statistical tools we used will provide a platform for future prediction accuracy for the pathogenicity of genetic mutations.

  4. Identifying future research needs in landscape genetics: Where to from here?

    Treesearch

    Niko Balkenhol; Felix Gugerli; Sam A. Cushman; Lisette P. Waits; Aurelie Coulon; J. W. Arntzen; Rolf Holderegger; Helene H. Wagner

    2009-01-01

    Landscape genetics is an emerging interdisciplinary field that combines methods and concepts from population genetics, landscape ecology, and spatial statistics. The interest in landscape genetics is steadily increasing, and the field is evolving rapidly. We here outline four major challenges for future landscape genetic research that were identified during an...

  5. Genetic Background and Climatic Droplet Keratopathy Incidence in a Mapuche Population from Argentina

    PubMed Central

    Schurr, Theodore G.; Dulik, Matthew C.; Cafaro, Thamara A.; Suarez, María F.

    2013-01-01

    Purpose To determine whether the incidence of and susceptibility to climatic droplet keratopathy (CDK), an acquired, often bilateral degenerative corneal disease, is influenced by the genetic background of the individuals who exhibit the disorder. Methods To determine whether the disease expression was influenced by the genetic ancestry of CDK cases in native Mapuche of the northwest area of Patagonia in Argentina, we examined mitochondrial DNA and Y-chromosome variation in 53 unrelated individuals. Twenty-nine of them were part of the CDK (patient) population, while 24 were part of the control group. The analysis revealed the maternal and paternal lineages that were present in the two study groups. Results This analysis demonstrated that nearly all persons had a Native American mtDNA background, whereas 50% of the CDK group and 37% of the control group had Native American paternal ancestry, respectively. There was no significant difference in the frequencies of mtDNA haplogroups between the CDK patient and control groups. Although the Y-chromosome data revealed differences in specific haplogroup frequencies between these two groups, there was no statistically significant relationship between individual paternal genetic backgrounds and the incidence or stage of disease. Conclusions These results indicate a lack of correlation between genetic ancestry as represented by haploid genetic systems and the incidence of CDK in Mapuche populations. In addition, the mtDNA appears to play less of a role in CDK expression than for other complex diseases linked to bioenergetic processes. However, further analysis of the mtDNA genome sequence and other genes involved in corneal function may reveal the more precise role that mitochondria play in the expression of CDK. PMID:24040292

  6. Genetic background and climatic droplet keratopathy incidence in a Mapuche population from Argentina.

    PubMed

    Schurr, Theodore G; Dulik, Matthew C; Cafaro, Thamara A; Suarez, María F; Urrets-Zavalia, Julio A; Serra, Horacio M

    2013-01-01

    To determine whether the incidence of and susceptibility to climatic droplet keratopathy (CDK), an acquired, often bilateral degenerative corneal disease, is influenced by the genetic background of the individuals who exhibit the disorder. To determine whether the disease expression was influenced by the genetic ancestry of CDK cases in native Mapuche of the northwest area of Patagonia in Argentina, we examined mitochondrial DNA and Y-chromosome variation in 53 unrelated individuals. Twenty-nine of them were part of the CDK (patient) population, while 24 were part of the control group. The analysis revealed the maternal and paternal lineages that were present in the two study groups. This analysis demonstrated that nearly all persons had a Native American mtDNA background, whereas 50% of the CDK group and 37% of the control group had Native American paternal ancestry, respectively. There was no significant difference in the frequencies of mtDNA haplogroups between the CDK patient and control groups. Although the Y-chromosome data revealed differences in specific haplogroup frequencies between these two groups, there was no statistically significant relationship between individual paternal genetic backgrounds and the incidence or stage of disease. These results indicate a lack of correlation between genetic ancestry as represented by haploid genetic systems and the incidence of CDK in Mapuche populations. In addition, the mtDNA appears to play less of a role in CDK expression than for other complex diseases linked to bioenergetic processes. However, further analysis of the mtDNA genome sequence and other genes involved in corneal function may reveal the more precise role that mitochondria play in the expression of CDK.

  7. Population genetic structure of clinical and environmental isolates of Blastomyces dermatitidis, Based on 27 Polymorphic Microsatellite Markers

    USGS Publications Warehouse

    Meece, J.K.; Anderson, J.L.; Fisher, M.C.; Henk, D.A.; Sloss, Brian L.; Reed, K.D.

    2011-01-01

    Blastomyces dermatitidis, a thermally dimorphic fungus, is the etiologic agent of North American blastomycosis. Clinical presentation is varied, ranging from silent infections to fulminant respiratory disease and dissemination to skin and other sites. Exploration of the population genetic structure of B. dermatitidis would improve our knowledge regarding variation in virulence phenotypes, geographic distribution, and difference in host specificity. The objective of this study was to develop and test a panel of microsatellite markers to delineate the population genetic structure within a group of clinical and environmental isolates of B. dermatitidis. We developed 27 microsatellite markers and genotyped B. dermatitidis isolates from various hosts and environmental sources (n = 112). Assembly of a neighbor-joining tree of allele-sharing distance revealed two genetically distinct groups, separated by a deep node. Bayesian admixture analysis showed that two populations were statistically supported. Principal coordinate analysis also reinforced support for two genetic groups, with the primary axis explaining 61.41% of the genetic variability. Group 1 isolates average 1.8 alleles/locus, whereas group 2 isolates are highly polymorphic, averaging 8.2 alleles/locus. In this data set, alleles at three loci are unshared between the two groups and appear diagnostic. The mating type of individual isolates was determined by PCR. Both mating type-specific genes, the HMG and ??-box domains, were represented in each of the genetic groups, with slightly more isolates having the HMG allele. One interpretation of this study is that the species currently designated B. dermatitidis includes a cryptic subspecies or perhaps a separate species. ?? 2011, American Society for Microbiology.

  8. Population genetic structure of clinical and environmental isolates of Blastomyces dermatitidis based on 27 polymorphic microsatellite markers

    USGS Publications Warehouse

    Meece, Jennifer K.; Anderson, Jennifer L.; Fisher, Matthew C.; Henk, Daniel A.; Sloss, Brian L.; Reed, Kurt D.

    2011-01-01

    Blastomyces dermatitidis, a thermally dimorphic fungus, is the etiologic agent of North American blastomycosis. Clinical presentation is varied, ranging from silent infections to fulminant respiratory disease and dissemination to skin and other sites. Exploration of the population genetic structure of B. dermatitidis would improve our knowledge regarding variation in virulence phenotypes, geographic distribution, and difference in host specificity. The objective of this study was to develop and test a panel of microsatellite markers to delineate the population genetic structure within a group of clinical and environmental isolates of B. dermatitidis. We developed 27 microsatellite markers and genotyped B. dermatitidis isolates from various hosts and environmental sources (n=112). Assembly of a neighbor-joining tree of allele-sharing distance revealed two genetically distinct groups, separated by a deep node. Bayesian admixture analysis showed that two populations were statistically supported. Principal coordinate analysis also reinforced support for two genetic groups, with the primary axis explaining 61.41% of the genetic variability. Group 1 isolates average 1.8 alleles/locus, whereas group 2 isolates are highly polymorphic, averaging 8.2 alleles/locus. In this data set, alleles at three loci are unshared between the two groups and appear diagnostic. The mating type of individual isolates was determined by PCR. Both mating type-specific genes, the HMG and α-box domains, were represented in each of the genetic groups, with slightly more isolates having the HMG allele. One interpretation of this study is that the species currently designated B. dermatitidis includes a cryptic subspecies or perhaps a separate species.

  9. Characterizing the genetic influences on risk aversion.

    PubMed

    Harrati, Amal

    2014-01-01

    Risk aversion has long been cited as an important factor in retirement decisions, investment behavior, and health. Some of the heterogeneity in individual risk tolerance is well understood, reflecting age gradients, wealth gradients, and similar effects, but much remains unexplained. This study explores genetic contributions to heterogeneity in risk aversion among older Americans. Using over 2 million genetic markers per individual from the U.S. Health and Retirement Study, I report results from a genome-wide association study (GWAS) on risk preferences using a sample of 10,455 adults. None of the single-nucleotide polymorphisms (SNPs) are found to be statistically significant determinants of risk preferences at levels stricter than 5 × 10(-8). These results suggest that risk aversion is a complex trait that is highly polygenic. The analysis leads to upper bounds on the number of genetic effects that could exceed certain thresholds of significance and still remain undetected at the current sample size. The findings suggest that the known heritability in risk aversion is likely to be driven by large numbers of genetic variants, each with a small effect size.

  10. Genetics Home Reference: sick sinus syndrome

    MedlinePlus

    ... of a genetic condition? Genetic and Rare Diseases Information Center Frequency Sick sinus syndrome accounts for 1 in 600 patients with heart disease who are over age 65. The incidence of this condition increases with age. Related Information What information about a genetic condition can statistics ...

  11. Genetic dissimilarity of putative gamma-ray-induced 'Preciosa-AAAB-Pome type' banana (Musa sp) mutants based on multivariate statistical analysis.

    PubMed

    Pestana, R K N; Amorim, E P; Ferreira, C F; Amorim, V B O; Oliveira, L S; Ledo, C A S; Silva, S O

    2011-10-25

    Bananas are among the most important fruit crops worldwide, being cultivated in more than 120 countries, mainly by small-scale producers. However, short-stature high-yielding bananas presenting good agronomic characteristics are hard to find. Consequently, wind continues to damage a great number of plantations each year, leading to lodging of plants and bunch loss. Development of new cultivars through conventional genetic breeding methods is hindered by female sterility and the low number of seeds. Mutation induction seems to have great potential for the development of new cultivars. We evaluated genetic dissimilarity among putative 'Preciosa' banana mutants generated by gamma-ray irradiation, using morphoagronomic characteristics and ISSR markers. The genetic distances between the putative 'Preciosa' mutants varied from 0.21 to 0.66, with a cophenetic correlation coefficient of 0.8064. We found good variability after irradiation of 'Preciosa' bananas; this procedure could be useful for banana breeding programs aimed at developing short-stature varieties with good agronomic characteristics.

  12. Developing single nucleotide polymorphism (SNP) markers from transcriptome sequences for identification of longan (Dimocarpus longan) germplasm

    PubMed Central

    Wang, Boyi; Tan, Hua-Wei; Fang, Wanping; Meinhardt, Lyndel W; Mischke, Sue; Matsumoto, Tracie; Zhang, Dapeng

    2015-01-01

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in 50 longan germplasm accessions, including cultivated varieties and wild germplasm; and designated 25 SNP markers that unambiguously identified all tested longan varieties with high statistical rigor (P<0.0001). Multiple trees from the same clone were verified and off-type trees were identified. Diversity analysis revealed genetic relationships among analyzed accessions. Cultivated varieties differed significantly from wild populations (Fst=0.300; P<0.001), demonstrating untapped genetic diversity for germplasm conservation and utilization. Within cultivated varieties, apparent differences between varieties from China and those from Thailand and Hawaii indicated geographic patterns of genetic differentiation. These SNP markers provide a powerful tool to manage longan genetic resources and breeding, with accurate and efficient genotype identification. PMID:26504559

  13. A Conceptual Framework for Pharmacodynamic Genome-wide Association Studies in Pharmacogenomics

    PubMed Central

    Wu, Rongling; Tong, Chunfa; Wang, Zhong; Mauger, David; Tantisira, Kelan; Szefler, Stanley J.; Chinchilli, Vernon M.; Israel, Elliot

    2013-01-01

    Summary Genome-wide association studies (GWAS) have emerged as a powerful tool to identify loci that affect drug response or susceptibility to adverse drug reactions. However, current GWAS based on a simple analysis of associations between genotype and phenotype ignores the biochemical reactions of drug response, thus limiting the scope of inference about its genetic architecture. To facilitate the inference of GWAS in pharmacogenomics, we sought to undertake the mathematical integration of the pharmacodynamic process of drug reactions through computational models. By estimating and testing the genetic control of pharmacodynamic and pharmacokinetic parameters, this mechanistic approach does not only enhance the biological and clinical relevance of significant genetic associations, but also improve the statistical power and robustness of gene detection. This report discusses the general principle and development of pharmacodynamics-based GWAS, highlights the practical use of this approach in addressing various pharmacogenomic problems, and suggests that this approach will be an important method to study the genetic architecture of drug responses or reactions. PMID:21920452

  14. Inference and Analysis of Population Structure Using Genetic Data and Network Theory

    PubMed Central

    Greenbaum, Gili; Templeton, Alan R.; Bar-David, Shirli

    2016-01-01

    Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition’s modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/). PMID:26888080

  15. Genetic structure of duckweed population of Spirodela, Landoltia and Lemna from Lake Tai, China.

    PubMed

    Tang, Jie; Zhang, Fei; Cui, Weihua; Ma, Jiong

    2014-06-01

    Duckweed is widely used in environmental biotechnology and has recently emerged as a potential feedstock for biofuels due to its high growth rate and starch content. The genetic diversity and composition of a natural duckweed population in genera Spirodela, Landoltia and Lemna from Lake Tai, China, were investigated using probabilistic analysis of multilocus sequence typing (MLST). The 78 strains were categorized into five lineages, among which strains representing L. aequinoctialis and S. polyrhiza were predominant. Among the five lineages, interlineage transfers of markers were infrequent and no recombination was statistically detected. Tajima's D tests determined that all loci are subject to population bottlenecks, which is likely one of the main reasons for the low genetic diversity observed within the lineages. Interestingly, strains of L. turionifera are found to contain small admixture from L. minor, providing rare evidence of transfer of genetic materials in duckweed. This was discussed with respect to the hypothesis that a cross of these two gave rise to L. japonica. Moreover, the conventional maximum-likelihood phylogenetic analysis clearly recognized all the species in the three genera with high bootstrap supports. In conclusion, this work offers a basic framework for using MLST to characterize Spirodela, Landoltia and in particular Lemna strains at the species level, and to study population genetics and evolution history of natural duckweed populations.

  16. A rapid generalized least squares model for a genome-wide quantitative trait association analysis in families.

    PubMed

    Li, Xiang; Basu, Saonli; Miller, Michael B; Iacono, William G; McGue, Matt

    2011-01-01

    Genome-wide association studies (GWAS) using family data involve association analyses between hundreds of thousands of markers and a trait for a large number of related individuals. The correlations among relatives bring statistical and computational challenges when performing these large-scale association analyses. Recently, several rapid methods accounting for both within- and between-family variation have been proposed. However, these techniques mostly model the phenotypic similarities in terms of genetic relatedness. The familial resemblances in many family-based studies such as twin studies are not only due to the genetic relatedness, but also derive from shared environmental effects and assortative mating. In this paper, we propose 2 generalized least squares (GLS) models for rapid association analysis of family-based GWAS, which accommodate both genetic and environmental contributions to familial resemblance. In our first model, we estimated the joint genetic and environmental variations. In our second model, we estimated the genetic and environmental components separately. Through simulation studies, we demonstrated that our proposed approaches are more powerful and computationally efficient than a number of existing methods are. We show that estimating the residual variance-covariance matrix in the GLS models without SNP effects does not lead to an appreciable bias in the p values as long as the SNP effect is small (i.e. accounting for no more than 1% of trait variance). Copyright © 2011 S. Karger AG, Basel.

  17. The genetic assimilation in language borrowing inferred from Jing People.

    PubMed

    Huang, Xiufeng; Zhou, Qinghui; Bin, Xiaoyun; Lai, Shu; Lin, Chaowen; Hu, Rong; Xiao, Jiashun; Luo, Dajun; Li, Yingxiang; Wei, Lan-Hai; Yeh, Hui-Yuan; Chen, Gang; Wang, Chuan-Chao

    2018-02-28

    The Jing people are a recognized ethnic group in Guangxi, southwest China, who are the immigrants from Vietnam during the 16th century. They speak Vietnamese but with lots of language borrowings from Cantonese, Zhuang, and Mandarin. However, it's unclear if there is large-scale gene flow from surrounding populations into Jing people during their language change due to the very limited genetic information of this population. We collected blood samples from 37 Jing and 3 Han Chinese individuals from Wanwei, Shanxin, and Wutou islands in Guangxi and genotyped about 600,000 genome-wide single nucleotide polymorphisms (SNPs). We used Principal Component Analysis (PCA), ADMIXTURE analysis, f statistics, qpWave and qpAdm to infer the population genetic structure and admixture. Our data revealed that the Jing people are genetically similar to the populations in southwest China and mainland Southeast Asia. But compared with Vietnamese, they show significant evidence of gene flow from surrounding East Asians. The admixture proportion is estimated to be around 35-42% in different Jing groups using southern Han Chinese as a proxy. The majority of the paternal lineages of Jing people are most likely from surrounding East Asians. We conclude that the formation and language change of present-day Jing people have involved genetic assimilation of surrounding East Asian populations. The language borrowing, in this case, is not only a cultural phenomenon but has involved demic diffusion. © 2018 Wiley Periodicals, Inc.

  18. spads 1.0: a toolbox to perform spatial analyses on DNA sequence data sets.

    PubMed

    Dellicour, Simon; Mardulyn, Patrick

    2014-05-01

    SPADS 1.0 (for 'Spatial and Population Analysis of DNA Sequences') is a population genetic toolbox for characterizing genetic variability within and among populations from DNA sequences. In view of the drastic increase in genetic information available through sequencing methods, spads was specifically designed to deal with multilocus data sets of DNA sequences. It computes several summary statistics from populations or groups of populations, performs input file conversions for other population genetic programs and implements locus-by-locus and multilocus versions of two clustering algorithms to study the genetic structure of populations. The toolbox also includes two MATLAB and r functions, GDISPAL and GDIVPAL, to display differentiation and diversity patterns across landscapes. These functions aim to generate interpolating surfaces based on multilocus distance and diversity indices. In the case of multiple loci, such surfaces can represent a useful alternative to multiple pie charts maps traditionally used in phylogeography to represent the spatial distribution of genetic diversity. These coloured surfaces can also be used to compare different data sets or different diversity and/or distance measures estimated on the same data set. © 2013 John Wiley & Sons Ltd.

  19. Genetic variance of tolerance and the toxicant threshold model.

    PubMed

    Tanaka, Yoshinari; Mano, Hiroyuki; Tatsuta, Haruki

    2012-04-01

    A statistical genetics method is presented for estimating the genetic variance (heritability) of tolerance to pollutants on the basis of a standard acute toxicity test conducted on several isofemale lines of cladoceran species. To analyze the genetic variance of tolerance in the case when the response is measured as a few discrete states (quantal endpoints), the authors attempted to apply the threshold character model in quantitative genetics to the threshold model separately developed in ecotoxicology. The integrated threshold model (toxicant threshold model) assumes that the response of a particular individual occurs at a threshold toxicant concentration and that the individual tolerance characterized by the individual's threshold value is determined by genetic and environmental factors. As a case study, the heritability of tolerance to p-nonylphenol in the cladoceran species Daphnia galeata was estimated by using the maximum likelihood method and nested analysis of variance (ANOVA). Broad-sense heritability was estimated to be 0.199 ± 0.112 by the maximum likelihood method and 0.184 ± 0.089 by ANOVA; both results implied that the species examined had the potential to acquire tolerance to this substance by evolutionary change. Copyright © 2012 SETAC.

  20. Experience with multiple control groups in a large population-based case-control study on genetic and environmental risk factors.

    PubMed

    Pomp, E R; Van Stralen, K J; Le Cessie, S; Vandenbroucke, J P; Rosendaal, F R; Doggen, C J M

    2010-07-01

    We discuss the analytic and practical considerations in a large case-control study that had two control groups; the first control group consisting of partners of patients and the second obtained by random digit dialling (RDD). As an example of the evaluation of a general lifestyle factor, we present body mass index (BMI). Both control groups had lower BMIs than the patients. The distribution in the partner controls was closer to that of the patients, likely due to similar lifestyles. A statistical approach was used to pool the results of both analyses, wherein partners were analyzed with a matched analysis, while RDDs were analyzed without matching. Even with a matched analysis, the odds ratio with partner controls remained closer to unity than with RDD controls, which is probably due to unmeasured confounders in the comparison with the random controls as well as intermediary factors. However, when studying injuries as a risk factor, the odds ratio remained higher with partner control subjects than with RRD control subjects, even after taking the matching into account. Finally we used factor V Leiden as an example of a genetic risk factor. The frequencies of factor V Leiden were identical in both control groups, indicating that for the analyses of this genetic risk factor the two control groups could be combined in a single unmatched analysis. In conclusion, the effect measures with the two control groups were in the same direction, and of the same order of magnitude. Moreover, it was not always the same control group that produced the higher or lower estimates, and a matched analysis did not remedy the differences. Our experience with the intricacies of dealing with two control groups may be useful to others when thinking about an optimal research design or the best statistical approach.

  1. Genetics Home Reference: polycystic lipomembranous osteodysplasia with sclerosing leukoencephalopathy

    MedlinePlus

    ... feelings of intense happiness (euphoria), a loss of inhibition, and poor concentration. These neurologic changes cause significant ... Information What information about a genetic condition can statistics provide? Why are some genetic conditions more common ...

  2. Genetics Home Reference: complement component 8 deficiency

    MedlinePlus

    ... in people with Hispanic, Japanese, or African Caribbean heritage, whereas type II primarily occurs in people of Northern European descent. Related Information What information about a genetic condition can statistics provide? Why are some genetic ...

  3. Identifying Genotype-by-Environment Interactions in the Metabolism of Germinating Arabidopsis Seeds Using Generalized Genetical Genomics 1[C][W][OA

    PubMed Central

    Joosen, Ronny Viktor Louis; Arends, Danny; Li, Yang; Willems, Leo A.J.; Keurentjes, Joost J.B.; Ligterink, Wilco; Jansen, Ritsert C.; Hilhorst, Henk W.M.

    2013-01-01

    A complex phenotype such as seed germination is the result of several genetic and environmental cues and requires the concerted action of many genes. The use of well-structured recombinant inbred lines in combination with “omics” analysis can help to disentangle the genetic basis of such quantitative traits. This so-called genetical genomics approach can effectively capture both genetic and epistatic interactions. However, to understand how the environment interacts with genomic-encoded information, a better understanding of the perception and processing of environmental signals is needed. In a classical genetical genomics setup, this requires replication of the whole experiment in different environmental conditions. A novel generalized setup overcomes this limitation and includes environmental perturbation within a single experimental design. We developed a dedicated quantitative trait loci mapping procedure to implement this approach and used existing phenotypical data to demonstrate its power. In addition, we studied the genetic regulation of primary metabolism in dry and imbibed Arabidopsis (Arabidopsis thaliana) seeds. In the metabolome, many changes were observed that were under both environmental and genetic controls and their interaction. This concept offers unique reduction of experimental load with minimal compromise of statistical power and is of great potential in the field of systems genetics, which requires a broad understanding of both plasticity and dynamic regulation. PMID:23606598

  4. Collaborative ring trial of the papaya endogenous reference gene and its polymerase chain reaction assays for genetically modified organism analysis.

    PubMed

    Wei, Jiaojun; Li, Feiwu; Guo, Jinchao; Li, Xiang; Xu, Junfeng; Wu, Gang; Zhang, Dabing; Yang, Litao

    2013-11-27

    The papaya (Carica papaya L.) Chymopapain (CHY) gene has been reported as a suitable endogenous reference gene for genetically modified (GM) papaya detection in previous studies. Herein, we further validated the use of the CHY gene and its qualitative and quantitative polymerase chain reaction (PCR) assays through an interlaboratory collaborative ring trial. A total of 12 laboratories working on detection of genetically modified organisms participated in the ring trial and returned test results. Statistical analysis of the returned results confirmed the species specificity, low heterogeneity, and single-copy number of the CHY gene among different papaya varieties. The limit of detection of the CHY qualitative PCR assay was 0.1%, while the limit of quantification of the quantitative PCR assay was ∼25 copies of haploid papaya genome with acceptable PCR efficiency and linearity. The differences between the tested and true values of papaya content in 10 blind samples ranged from 0.84 to 6.58%. These results indicated that the CHY gene was suitable as an endogenous reference gene for the identification and quantification of GM papaya.

  5. Sub-sampling genetic data to estimate black bear population size: A case study

    USGS Publications Warehouse

    Tredick, C.A.; Vaughan, M.R.; Stauffer, D.F.; Simek, S.L.; Eason, T.

    2007-01-01

    Costs for genetic analysis of hair samples collected for individual identification of bears average approximately US$50 [2004] per sample. This can easily exceed budgetary allowances for large-scale studies or studies of high-density bear populations. We used 2 genetic datasets from 2 areas in the southeastern United States to explore how reducing costs of analysis by sub-sampling affected precision and accuracy of resulting population estimates. We used several sub-sampling scenarios to create subsets of the full datasets and compared summary statistics, population estimates, and precision of estimates generated from these subsets to estimates generated from the complete datasets. Our results suggested that bias and precision of estimates improved as the proportion of total samples used increased, and heterogeneity models (e.g., Mh[CHAO]) were more robust to reduced sample sizes than other models (e.g., behavior models). We recommend that only high-quality samples (>5 hair follicles) be used when budgets are constrained, and efforts should be made to maximize capture and recapture rates in the field.

  6. Prediction/discussion-based learning cycle versus conceptual change text: comparative effects on students' understanding of genetics

    NASA Astrophysics Data System (ADS)

    khawaldeh, Salem A. Al

    2013-07-01

    Background and purpose: The purpose of this study was to investigate the comparative effects of a prediction/discussion-based learning cycle (HPD-LC), conceptual change text (CCT) and traditional instruction on 10th grade students' understanding of genetics concepts. Sample: Participants were 112 10th basic grade male students in three classes of the same school located in an urban area. The three classes taught by the same biology teacher were randomly assigned as a prediction/discussion-based learning cycle class (n = 39), conceptual change text class (n = 37) and traditional class (n = 36). Design and method: A quasi-experimental research design of pre-test-post-test non-equivalent control group was adopted. Participants completed the Genetics Concept Test as pre-test-post-test, to examine the effects of instructional strategies on their genetics understanding. Pre-test scores and Test of Logical Thinking scores were used as covariates. Results: The analysis of covariance showed a statistically significant difference between the experimental and control groups in the favor of experimental groups after treatment. However, no statistically significant difference between the experimental groups (HPD-LC versus CCT instruction) was found. Conclusions: Overall, the findings of this study support the use of the prediction/discussion-based learning cycle and conceptual change text in both research and teaching. The findings may be useful for improving classroom practices in teaching science concepts and for the development of suitable materials promoting students' understanding of science.

  7. Filtering genetic variants and placing informative priors based on putative biological function.

    PubMed

    Friedrichs, Stefanie; Malzahn, Dörthe; Pugh, Elizabeth W; Almeida, Marcio; Liu, Xiao Qing; Bailey, Julia N

    2016-02-03

    High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.

  8. Risk assessment model for development of advanced age-related macular degeneration.

    PubMed

    Klein, Michael L; Francis, Peter J; Ferris, Frederick L; Hamon, Sara C; Clemons, Traci E

    2011-12-01

    To design a risk assessment model for development of advanced age-related macular degeneration (AMD) incorporating phenotypic, demographic, environmental, and genetic risk factors. We evaluated longitudinal data from 2846 participants in the Age-Related Eye Disease Study. At baseline, these individuals had all levels of AMD, ranging from none to unilateral advanced AMD (neovascular or geographic atrophy). Follow-up averaged 9.3 years. We performed a Cox proportional hazards analysis with demographic, environmental, phenotypic, and genetic covariates and constructed a risk assessment model for development of advanced AMD. Performance of the model was evaluated using the C statistic and the Brier score and externally validated in participants in the Complications of Age-Related Macular Degeneration Prevention Trial. The final model included the following independent variables: age, smoking history, family history of AMD (first-degree member), phenotype based on a modified Age-Related Eye Disease Study simple scale score, and genetic variants CFH Y402H and ARMS2 A69S. The model did well on performance measures, with very good discrimination (C statistic = 0.872) and excellent calibration and overall performance (Brier score at 5 years = 0.08). Successful external validation was performed, and a risk assessment tool was designed for use with or without the genetic component. We constructed a risk assessment model for development of advanced AMD. The model performed well on measures of discrimination, calibration, and overall performance and was successfully externally validated. This risk assessment tool is available for online use.

  9. Study design and statistical analysis of data in human population studies with the micronucleus assay.

    PubMed

    Ceppi, Marcello; Gallo, Fabio; Bonassi, Stefano

    2011-01-01

    The most common study design performed in population studies based on the micronucleus (MN) assay, is the cross-sectional study, which is largely performed to evaluate the DNA damaging effects of exposure to genotoxic agents in the workplace, in the environment, as well as from diet or lifestyle factors. Sample size is still a critical issue in the design of MN studies since most recent studies considering gene-environment interaction, often require a sample size of several hundred subjects, which is in many cases difficult to achieve. The control of confounding is another major threat to the validity of causal inference. The most popular confounders considered in population studies using MN are age, gender and smoking habit. Extensive attention is given to the assessment of effect modification, given the increasing inclusion of biomarkers of genetic susceptibility in the study design. Selected issues concerning the statistical treatment of data have been addressed in this mini-review, starting from data description, which is a critical step of statistical analysis, since it allows to detect possible errors in the dataset to be analysed and to check the validity of assumptions required for more complex analyses. Basic issues dealing with statistical analysis of biomarkers are extensively evaluated, including methods to explore the dose-response relationship among two continuous variables and inferential analysis. A critical approach to the use of parametric and non-parametric methods is presented, before addressing the issue of most suitable multivariate models to fit MN data. In the last decade, the quality of statistical analysis of MN data has certainly evolved, although even nowadays only a small number of studies apply the Poisson model, which is the most suitable method for the analysis of MN data.

  10. Genetic Testing and Post-Testing Decision Making among BRCA-Positive Mutation Women: A Psychosocial Approach.

    PubMed

    Hesse-Biber, Sharlene; An, Chen

    2016-10-01

    Through an analysis of an online survey of women who tested positive for the BRCA genetic mutation for breast cancer, this research uses a social constructionist and feminist standpoint lens to understand the decision-making process that leads BRCA-positive women to choose genetic testing. Additionally, this research examines how they socially construct and understand their risk for developing breast cancer, as well as which treatment options they undergo post-testing. BRCA-positive women re-frame their statistical medical risk for developing cancer and their post-testing treatment choices through a broad psychosocial context of engagement that also includes their social networks. Important psychosocial factors drive women's medical decisions, such as individual feelings of guilt and vulnerability, and the degree of perceived social support. Women who felt guilty and fearful that they might pass the BRCA gene to their children were more likely to undergo risk reducing surgery. Women with at least one daughter and women without children were more inclined toward the risk reducing surgery compared to those with only sons. These psychosocial factors and social network engagements serve as a "nexus of decision making" that does not, for the most part, mirror the medical assessments of statistical odds for hereditary cancer development, nor the specific treatment protocols outlined by the medical establishment.

  11. A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations

    PubMed Central

    Wang, Chaolong; Zöllner, Sebastian; Rosenberg, Noah A.

    2012-01-01

    Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure. PMID:22927824

  12. Genetic and environmental-genetic interaction rules for the myopia based on a family exposed to risk from a myopic environment.

    PubMed

    Wenbo, Li; Congxia, Bai; Hui, Liu

    2017-08-30

    To quantitatively assess the role of heredity and environmental factors in myopia based on the family with enough exposed to risk from myopic environment for establishment of environmental and genetic index (EGI). A pedigree analysis unit was defined as one child (university student), father, and mother. Information pertaining to visual acuity, experience in participating in the college entrance examination in mainland of China (regarded as a strong environmental risk for myopia), and occupation for pedigree analysis units were obtained. The difference between effect of both genetic and environmental factors (myopia prevalence in children with two myopic parents) and environmental factors (myopia prevalence in children of whom neither parent was myopic) was defined as the EGI. Multiple regression analysis was performed for 114 pedigree using diopters of father, mother, average diopters in parents, maximum and minimum diopters in father and mother as variables. A total of 353 farmers and 162 farmer families were used as a control group. A distinct difference in myopia rate (96.2% versus 57.7%) was observed for children from parents with myopia and parents without myopia (EGI=0.385). The maximum diopter was included to regression equation which was statistically significant. The prevalence of myopia was 9.9% in the farmer. The prevalence in children is similar between the farmer and other families. A new genetic rule that myopia in children was directly related with maximum diopters in father and mother may be suggested. Environmental factors may play a leading role in the formation of myopia. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. A quantitative comparison of the similarity between genes and geography in worldwide human populations.

    PubMed

    Wang, Chaolong; Zöllner, Sebastian; Rosenberg, Noah A

    2012-08-01

    Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure.

  14. Vitamin D receptor gene Alw I, Fok I, Apa I, and Taq I polymorphisms in patients with urinary stone.

    PubMed

    Seo, Ill Young; Kang, In-Hong; Chae, Soo-Cheon; Park, Seung Chol; Lee, Young-Jin; Yang, Yun Sik; Ryu, Soo Bang; Rim, Joung Sik

    2010-04-01

    To evaluate vitamin D receptor (VDR) gene polymorphisms in Korean patients so as to identify the candidate genes associated with urinary stones. Urinary stones are a multifactorial disease that includes various genetic factors. A normal control group of 535 healthy subjects and 278 patients with urinary stones was evaluated. Of 125 patients who presented stone samples, 102 had calcium stones on chemical analysis. The VDR gene Alw I, Fok I, Apa I, and Taq I polymorphisms were evaluated using the polymerase chain reaction-restriction fragment length polymorphism analysis. Allelic and genotypic frequencies were calculated to identify associations in both groups. The haplotype frequencies of the VDR gene polymorphisms for multiple loci were also determined. For the VDR gene Alw I, Fok I, Apa I, and Taq I polymorphisms, there was no statistically significant difference between the patients with urinary stones and the healthy controls. There was also no statistically significant difference between the patients with calcium stones and the healthy controls. A novel haplotype (Ht 4; CTTT) was identified in 13.5% of the patients with urinary stones and in 8.3% of the controls (P = .001). The haplotype frequencies were significantly different between the patients with calcium stones and the controls (P = .004). The VDR gene Alw I, Fok I, Apa I, and Taq I polymorphisms does not seem to be candidate genetic markers for urinary stones in Korean patients. However, 1 novel haplotype of the VDR gene polymorphisms for multiple loci might be a candidate genetic marker. Copyright 2010 Elsevier Inc. All rights reserved.

  15. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association

    PubMed Central

    Grinde, Kelsey E.; Arbet, Jaron; Green, Alden; O'Connell, Michael; Valcarcel, Alessandra; Westra, Jason; Tintle, Nathan

    2017-01-01

    To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as “winner's curse.” We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10−6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures. PMID:28959274

  16. Stochastic sampling effects in STR typing: Implications for analysis and interpretation.

    PubMed

    Timken, Mark D; Klein, Sonja B; Buoncristiani, Martin R

    2014-07-01

    The analysis and interpretation of forensic STR typing results can become more complicated when reduced template amounts are used for PCR amplification due to increased stochastic effects. These effects are typically observed as reduced heterozygous peak-height balance and increased frequency of undetected alleles (allelic "dropout"). To investigate the origins of these effects, a study was performed using the AmpFlSTR(®) Identifiler Plus(®) and MiniFiler(®) kits to amplify replicates from a dilution series of NIST Human DNA Quantitation Standard (SRM(®) 2372A). The resulting amplicons were resolved and detected on two different genetic analyzer platforms, the Applied Biosystems 3130xL and 3500 analyzers. Results from our study show that the four different STR/genetic analyzer combinations exhibited very similar peak-height ratio statistics when normalized for the amount of template DNA in the PCR. Peak-height ratio statistics were successfully modeled using the Poisson distribution to simulate pre-PCR stochastic sampling of the alleles, confirming earlier explanations that sampling is the primary source for peak-height imbalance in reduced template dilutions. In addition, template-based pre-PCR sampling simulations also successfully predicted allelic dropout frequencies, as modeled by logistic regression methods, for the low-template DNA dilutions. We discuss the possibility that an accurately quantified DNA template might be used to characterize the linear signal response for data collected using different STR kits or genetic analyzer platforms, so as to provide a standardized approach for comparing results obtained from different STR/CE combinations and to aid in validation studies. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  17. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data.

    PubMed

    Excoffier, L; Smouse, P E; Quattro, J M

    1992-06-01

    We present here a framework for the study of molecular variation within a single species. Information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes. This analysis of molecular variance (AMOVA) produces estimates of variance components and F-statistic analogs, designated here as phi-statistics, reflecting the correlation of haplotypic diversity at different levels of hierarchical subdivision. The method is flexible enough to accommodate several alternative input matrices, corresponding to different types of molecular data, as well as different types of evolutionary assumptions, without modifying the basic structure of the analysis. The significance of the variance components and phi-statistics is tested using a permutational approach, eliminating the normality assumption that is conventional for analysis of variance but inappropriate for molecular data. Application of AMOVA to human mitochondrial DNA haplotype data shows that population subdivisions are better resolved when some measure of molecular differences among haplotypes is introduced into the analysis. At the intraspecific level, however, the additional information provided by knowing the exact phylogenetic relations among haplotypes or by a nonlinear translation of restriction-site change into nucleotide diversity does not significantly modify the inferred population genetic structure. Monte Carlo studies show that site sampling does not fundamentally affect the significance of the molecular variance components. The AMOVA treatment is easily extended in several different directions and it constitutes a coherent and flexible framework for the statistical analysis of molecular data.

  18. Studies on the nature and managment of psoriasis.

    PubMed

    Farber, E M

    1971-06-01

    Prevalence of psoriasis in Caucasians is estimated as 2 to 3 percent. Sound epidemiologic studies on a worldwide basis are needed to secure accurate prevalence rates for comparative purposes. Utilizing Stanford's psoriasis life histories records, the genetics of psoriasis has been explored by various means: statistical census data, pedigree analysis, and twin studies. This research suggests a multifactorial pattern of inheritance for psoriasis, implying that both genetic and environmental components are responsible for the manifestation of the disease. At present it is not possible to point to any single causative factor. Some of the suggested areas for research include study of uninvolved skin, growth control in the psoriatic lesion, viral causes, immunological aspects, and lipid metabolism.

  19. [Comparative analysis of STR and SNP polymorphism in the populations of sockeye salmon (Oncorhynchus nerka) from Eastern and Western Kamchatka].

    PubMed

    Khrustaleva, A M; Volkov, A A; Stoklitskaia, D S; Miuge, N S; Zelenina, D A

    2010-11-01

    Sockeye salmon samples from five largest lacustrine-riverine systems of Kamchatka Peninsula were tested for polymorphism at six microsatellite (STR) and five single nucleotide polymorphism (SNP) loci. Statistically significant genetic differentiation among local populations from this part of the species range examined was demonstrated. The data presented point to pronounced genetic divergence of the populations from two geographical regions, Eastern and Western Kamchatka. For sockeye salmon, the individual identification test accuracy was higher for microsatellites compared to similar number of SNP markers. Pooling of the STR and SNP allele frequency data sets provided the highest accuracy of the individual fish population assignment.

  20. Male-specific contributions to the Brazilian population of Espirito Santo.

    PubMed

    de F Figueiredo, Raquel; Ambrosio, Isabela B; Braganholi, Danilo F; Chemale, Gustavo; Martins, Joyce A; Gomes, Veronica; Gusmão, Leonor; Cicarelli, Regina M B

    2016-05-01

    Y chromosome markers have been widely studied due to their various applications in the fields of forensic and evolutionary genetics. In this study, 35 Y-SNPs and 17 Y-STRs were genotyped in 253 males from the State of Espirito Santo, Brazil. A total of 18 haplogroups and 243 haplotypes were detected; the haplogroup and haplotype diversities were 0.7794 and 0.9997, respectively. Genetic distance analysis using the Y-STR data showed no statistically significant differences between Espirito Santo and other admixed populations from Brazil. The classification of paternal lineages based on haplogroups showed a predominant European contribution (85.88%), followed by African (11.37%) and Amerindian (2.75%) contributions.

  1. Integrative Analysis of Genetic, Genomic, and Phenotypic Data for Ethanol Behaviors: A Network-Based Pipeline for Identifying Mechanisms and Potential Drug Targets.

    PubMed

    Bogenpohl, James W; Mignogna, Kristin M; Smith, Maren L; Miles, Michael F

    2017-01-01

    Complex behavioral traits, such as alcohol abuse, are caused by an interplay of genetic and environmental factors, producing deleterious functional adaptations in the central nervous system. The long-term behavioral consequences of such changes are of substantial cost to both the individual and society. Substantial progress has been made in the last two decades in understanding elements of brain mechanisms underlying responses to ethanol in animal models and risk factors for alcohol use disorder (AUD) in humans. However, treatments for AUD remain largely ineffective and few medications for this disease state have been licensed. Genome-wide genetic polymorphism analysis (GWAS) in humans, behavioral genetic studies in animal models and brain gene expression studies produced by microarrays or RNA-seq have the potential to produce nonbiased and novel insight into the underlying neurobiology of AUD. However, the complexity of such information, both statistical and informational, has slowed progress toward identifying new targets for intervention in AUD. This chapter describes one approach for integrating behavioral, genetic, and genomic information across animal model and human studies. The goal of this approach is to identify networks of genes functioning in the brain that are most relevant to the underlying mechanisms of a complex disease such as AUD. We illustrate an example of how genomic studies in animal models can be used to produce robust gene networks that have functional implications, and to integrate such animal model genomic data with human genetic studies such as GWAS for AUD. We describe several useful analysis tools for such studies: ComBAT, WGCNA, and EW_dmGWAS. The end result of this analysis is a ranking of gene networks and identification of their cognate hub genes, which might provide eventual targets for future therapeutic development. Furthermore, this combined approach may also improve our understanding of basic mechanisms underlying gene x environmental interactions affecting brain functioning in health and disease.

  2. INTEGRATIVE ANALYSIS OF GENETIC, GENOMIC AND PHENOTYPIC DATA FOR ETHANOL BEHAVIORS: A NETWORK-BASED PIPELINE FOR IDENTIFYING MECHANISMS AND POTENTIAL DRUG TARGETS

    PubMed Central

    Bogenpohl, James W.; Mignogna, Kristin M.; Smith, Maren L.; Miles, Michael F.

    2016-01-01

    Complex behavioral traits, such as alcohol abuse, are caused by an interplay of genetic and environmental factors, producing deleterious functional adaptations in the central nervous system. The long-term behavioral consequences of such changes are of substantial cost to both the individual and society. Substantial progress has been made in the last two decades in understanding elements of brain mechanisms underlying responses to ethanol in animal models and risk factors for alcohol use disorder (AUD) in humans. However, treatments for AUD remain largely ineffective and few medications for this disease state have been licensed. Genome-wide genetic polymorphism analysis (GWAS) in humans, behavioral genetic studies in animal models and brain gene expression studies produced by microarrays or RNA-seq have the potential to produce non-biased and novel insight into the underlying neurobiology of AUD. However, the complexity of such information, both statistical and informational, has slowed progress toward identifying new targets for intervention in AUD. This chapter describes one approach for integrating behavioral, genetic, and genomic information across animal model and human studies. The goal of this approach is to identify networks of genes functioning in the brain that are most relevant to the underlying mechanisms of a complex disease such as AUD. We illustrate an example of how genomic studies in animal models can be used to produce robust gene networks that have functional implications, and to integrate such animal model genomic data with human genetic studies such as GWAS for AUD. We describe several useful analysis tools for such studies: ComBAT, WGCNA and EW_dmGWAS. The end result of this analysis is a ranking of gene networks and identification of their cognate hub genes, which might provide eventual targets for future therapeutic development. Furthermore, this combined approach may also improve our understanding of basic mechanisms underlying gene x environmental interactions affecting brain functioning in health and disease. PMID:27933543

  3. Correlates of genetic monogamy in socially monogamous mammals: insights from Azara's owl monkeys

    PubMed Central

    Huck, Maren; Fernandez-Duque, Eduardo; Babb, Paul; Schurr, Theodore

    2014-01-01

    Understanding the evolution of mating systems, a central topic in evolutionary biology for more than 50 years, requires examining the genetic consequences of mating and the relationships between social systems and mating systems. Among pair-living mammals, where genetic monogamy is extremely rare, the extent of extra-group paternity rates has been associated with male participation in infant care, strength of the pair bond and length of the breeding season. This study evaluated the relationship between two of those factors and the genetic mating system of socially monogamous mammals, testing predictions that male care and strength of pair bond would be negatively correlated with rates of extra-pair paternity (EPP). Autosomal microsatellite analyses provide evidence for genetic monogamy in a pair-living primate with bi-parental care, the Azara's owl monkey (Aotus azarae). A phylogenetically corrected generalized least square analysis was used to relate male care and strength of the pair bond to their genetic mating system (i.e. proportions of EPP) in 15 socially monogamous mammalian species. The intensity of male care was correlated with EPP rates in mammals, while strength of pair bond failed to reach statistical significance. Our analyses show that, once social monogamy has evolved, paternal care, and potentially also close bonds, may facilitate the evolution of genetic monogamy. PMID:24648230

  4. Correlates of genetic monogamy in socially monogamous mammals: insights from Azara's owl monkeys.

    PubMed

    Huck, Maren; Fernandez-Duque, Eduardo; Babb, Paul; Schurr, Theodore

    2014-05-07

    Understanding the evolution of mating systems, a central topic in evolutionary biology for more than 50 years, requires examining the genetic consequences of mating and the relationships between social systems and mating systems. Among pair-living mammals, where genetic monogamy is extremely rare, the extent of extra-group paternity rates has been associated with male participation in infant care, strength of the pair bond and length of the breeding season. This study evaluated the relationship between two of those factors and the genetic mating system of socially monogamous mammals, testing predictions that male care and strength of pair bond would be negatively correlated with rates of extra-pair paternity (EPP). Autosomal microsatellite analyses provide evidence for genetic monogamy in a pair-living primate with bi-parental care, the Azara's owl monkey (Aotus azarae). A phylogenetically corrected generalized least square analysis was used to relate male care and strength of the pair bond to their genetic mating system (i.e. proportions of EPP) in 15 socially monogamous mammalian species. The intensity of male care was correlated with EPP rates in mammals, while strength of pair bond failed to reach statistical significance. Our analyses show that, once social monogamy has evolved, paternal care, and potentially also close bonds, may facilitate the evolution of genetic monogamy.

  5. 78 FR 9060 - Request for Nominations for Voting Members on Public Advisory Panels or Committees

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-02-07

    ... diagnostic assays, e.g., hepatologists; molecular biologists. Molecular and Clinical 2 June 1, 2013. Genetics.... Individuals with training in inborn errors of metabolism, biochemical and/or molecular genetics, population genetics, epidemiology and related statistical training, and clinical molecular genetics testing (e.g...

  6. 76 FR 80949 - Request for Nominations for Voting Members on Public Advisory Panels or Committees

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-12-27

    .... Molecular and Clinical 1 June 1, 2012. Genetics Devices Panel of the Medical Devices Advisory Committee..., biochemical and/or molecular genetics, population genetics, epidemiology and related statistical training, and clinical molecular genetics testing (e.g., genotyping, array CGH, etc.). Individuals with experience in...

  7. Social stratification in the Sikh population of Punjab (India) has a genetic basis: evidence from serological and biochemical markers.

    PubMed

    Chahal, Sukh Mohinder Singh; Virk, Rupinder Kaur; Kaur, Sukhvir; Bansal, Rupinder

    2011-01-01

    The present study was planned to assess whether social stratification in the Sikh population inhabiting the northwest border Indian state of Punjab has any genetic basis. Blood samples were collected randomly from a total of 2851 unrelated subjects belonging to 21 groups of two low-ranking Sikh scheduled caste populations, viz. Mazhabi and Ramdasi, and a high-ranking Jat Sikh caste population of Punjab. The genetic profile of Sikh groups was investigated using a total of nine serobiochemical genetic markers, comprising two blood groups (ABO, RH(D)) and a battery of seven red cell enzyme polymorphisms (ADA, AK1, ESD, PGM1, GLO1, ACP1, GPI), following standard serological and biochemical laboratory protocols. Genetic structure was studied using original allele frequency data and statistical measures of heterozygosity, genic differentiation, genetic distance, and genetic admixture. Great heterogeneity was observed between Sikh scheduled caste and Jat Sikh populations, especially in the RH(D) blood group system, and distribution of ESD, ACP1, and PGM1 enzyme markers was also found to be significantly different between many of their groups. Genetic distance trees demonstrated little or no genetic affinities between Sikh scheduled caste and Jat Sikh populations; the Mazhabi and Ramdasi also showed little genetic relationship. Genetic admixture analysis suggested a higher element of autochthonous tribal extraction in the Ramdasi. The present study revealed much genetic heterogeneity in differently ranking Sikh caste populations of Punjab, mainly attributable to their different ethnic backgrounds, and provided a genetic basis to social stratification present in this religious community of Punjab, India.

  8. Association of Genetic Variants Related to Serum Calcium Levels With Coronary Artery Disease and Myocardial Infarction.

    PubMed

    Larsson, Susanna C; Burgess, Stephen; Michaëlsson, Karl

    2017-07-25

    Serum calcium has been associated with cardiovascular disease in observational studies and evidence from randomized clinical trials indicates that calcium supplementation, which raises serum calcium levels, may increase the risk of cardiovascular events, particularly myocardial infarction. To evaluate the potential causal association between genetic variants related to elevated serum calcium levels and risk of coronary artery disease (CAD) and myocardial infarction using mendelian randomization. The analyses were performed using summary statistics obtained for single-nucleotide polymorphisms (SNPs) identified from a genome-wide association meta-analysis of serum calcium levels (N = up to 61 079 individuals) and from the Coronary Artery Disease Genome-wide Replication and Meta-analysis Plus the Coronary Artery Disease Genetics (CardiogramplusC4D) consortium's 1000 genomes-based genome-wide association meta-analysis (N = up to 184 305 individuals) that included cases (individuals with CAD and myocardial infarction) and noncases, with baseline data collected from 1948 and populations derived from across the globe. The association of each SNP with CAD and myocardial infarction was weighted by its association with serum calcium, and estimates were combined using an inverse-variance weighted meta-analysis. Genetic risk score based on genetic variants related to elevated serum calcium levels. Co-primary outcomes were the odds of CAD and myocardial infarction. Among the mendelian randomized analytic sample of 184 305 individuals (60 801 CAD cases [approximately 70% with myocardial infarction] and 123 504 noncases), the 6 SNPs related to serum calcium levels and without pleiotropic associations with potential confounders were estimated to explain about 0.8% of the variation in serum calcium levels. In the inverse-variance weighted meta-analysis (combining the estimates of the 6 SNPs), the odds ratios per 0.5-mg/dL increase (about 1 SD) in genetically predicted serum calcium levels were 1.25 (95% CI, 1.08-1.45; P = .003) for CAD and 1.24 (95% CI, 1.05-1.46; P = .009) for myocardial infarction. A genetic predisposition to higher serum calcium levels was associated with increased risk of CAD and myocardial infarction. Whether the risk of CAD associated with lifelong genetic exposure to increased serum calcium levels can be translated to a risk associated with short-term to medium-term calcium supplementation is unknown.

  9. An Analysis Pipeline with Statistical and Visualization-Guided Knowledge Discovery for Michigan-Style Learning Classifier Systems

    PubMed Central

    Urbanowicz, Ryan J.; Granizo-Mackenzie, Ambrose; Moore, Jason H.

    2014-01-01

    Michigan-style learning classifier systems (M-LCSs) represent an adaptive and powerful class of evolutionary algorithms which distribute the learned solution over a sizable population of rules. However their application to complex real world data mining problems, such as genetic association studies, has been limited. Traditional knowledge discovery strategies for M-LCS rule populations involve sorting and manual rule inspection. While this approach may be sufficient for simpler problems, the confounding influence of noise and the need to discriminate between predictive and non-predictive attributes calls for additional strategies. Additionally, tests of significance must be adapted to M-LCS analyses in order to make them a viable option within fields that require such analyses to assess confidence. In this work we introduce an M-LCS analysis pipeline that combines uniquely applied visualizations with objective statistical evaluation for the identification of predictive attributes, and reliable rule generalizations in noisy single-step data mining problems. This work considers an alternative paradigm for knowledge discovery in M-LCSs, shifting the focus from individual rules to a global, population-wide perspective. We demonstrate the efficacy of this pipeline applied to the identification of epistasis (i.e., attribute interaction) and heterogeneity in noisy simulated genetic association data. PMID:25431544

  10. A meta-analysis of interleukin-10-1082 promoter genetic polymorphism associated with atherosclerotic risk.

    PubMed

    Chao, Li; Lei, Huang; Fei, Jin

    2014-01-01

    This meta-analysis was conducted to assess the relationship between interleukin-10-1082 G/A single nucleotide polymorphism with atherosclerosis (AS) risk. The databases of PubMed, EMBASE, Chinese National Knowledge Infrastructure and Wan-Fang were searched from January 2000 to January 2014. 16 studies (involving 7779 cases and 7271 controls) were finally included. Each eligible study was scored for quality assessment. We adopted the most probably appropriate genetic model (recessive model) after carefully calculation. Between study heterogeneity was explored by subgroup analysis and publication bias was estimated by Begg's funnel plot and Egger's regression test. Statistically significant association was observed between AA genotype with overall AS risk, being mainly in coronary heart disease and stroke subgroups among Asian population, and peripheral artery disease (PAD) subgroup among Caucasians. Interleukin-10-1082 AA genotype is associated with increased overall AS risk. AA carriers of Asians seem to be more susceptible to coronary artery disease and stroke, and Caucasians are more susceptible to PAD.

  11. SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel

    PubMed Central

    Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari

    2009-01-01

    Background Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. Findings The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Conclusion Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software. PMID:19852806

  12. SNP_tools: A compact tool package for analysis and conversion of genotype data for MS-Excel.

    PubMed

    Chen, Bowang; Wilkening, Stefan; Drechsel, Marion; Hemminki, Kari

    2009-10-23

    Single nucleotide polymorphism (SNP) genotyping is a major activity in biomedical research. Scientists prefer to have a facile access to the results which may require conversions between data formats. First hand SNP data is often entered in or saved in the MS-Excel format, but this software lacks genetic and epidemiological related functions. A general tool to do basic genetic and epidemiological analysis and data conversion for MS-Excel is needed. The SNP_tools package is prepared as an add-in for MS-Excel. The code is written in Visual Basic for Application, embedded in the Microsoft Office package. This add-in is an easy to use tool for users with basic computer knowledge (and requirements for basic statistical analysis). Our implementation for Microsoft Excel 2000-2007 in Microsoft Windows 2000, XP, Vista and Windows 7 beta can handle files in different formats and converts them into other formats. It is a free software.

  13. Using genetic data to strengthen causal inference in observational research.

    PubMed

    Pingault, Jean-Baptiste; O'Reilly, Paul F; Schoeler, Tabea; Ploubidis, George B; Rijsdijk, Frühling; Dudbridge, Frank

    2018-06-05

    Causal inference is essential across the biomedical, behavioural and social sciences.By progressing from confounded statistical associations to evidence of causal relationships, causal inference can reveal complex pathways underlying traits and diseases and help to prioritize targets for intervention. Recent progress in genetic epidemiology - including statistical innovation, massive genotyped data sets and novel computational tools for deep data mining - has fostered the intense development of methods exploiting genetic data and relatedness to strengthen causal inference in observational research. In this Review, we describe how such genetically informed methods differ in their rationale, applicability and inherent limitations and outline how they should be integrated in the future to offer a rich causal inference toolbox.

  14. Polygenic risk score and heritability estimates reveals a genetic relationship between ASD and OCD.

    PubMed

    Guo, W; Samuels, J F; Wang, Y; Cao, H; Ritter, M; Nestadt, P S; Krasnow, J; Greenberg, B D; Fyer, A J; McCracken, J T; Geller, D A; Murphy, D L; Knowles, J A; Grados, M A; Riddle, M A; Rasmussen, S A; McLaughlin, N C; Nurmi, E L; Askland, K D; Cullen, B A; Piacentini, J; Pauls, D L; Bienvenu, O J; Stewart, S E; Goes, F S; Maher, B; Pulver, A E; Valle, D; Mattheisen, M; Qian, J; Nestadt, G; Shugart, Y Y

    2017-07-01

    Obsessive-compulsive disorder (OCD) and Autism spectrum disorder (ASD) are both highly heritable neurodevelopmental disorders that conceivably share genetic risk factors. However, the underlying genetic determinants remain largely unknown. In this work, the authors describe a combined genome-wide association study (GWAS) of ASD and OCD. The OCD dataset includes 2998 individuals in nuclear families. The ASD dataset includes 6898 individuals in case-parents trios. GWAS summary statistics were examined for potential enrichment of functional variants associated with gene expression levels in brain regions. The top ranked SNP is rs4785741 (chromosome 16) with P value=6.9×10 -7 in our re-analysis. Polygenic risk score analyses were conducted to investigate the genetic relationship within and across the two disorders. These analyses identified a significant polygenic component of ASD, predicting 0.11% of the phenotypic variance in an independent OCD data set. In addition, we examined the genomic architecture of ASD and OCD by estimating heritability on different chromosomes and different allele frequencies, analyzing genome-wide common variant data by using the Genome-wide Complex Trait Analysis (GCTA) program. The estimated global heritability of OCD is 0.427 (se=0.093) and 0.174 (se=0.053) for ASD in these imputed data. Published by Elsevier B.V.

  15. Two-locus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes.

    PubMed

    Cordell, H J; Todd, J A; Bennett, S T; Kawaguchi, Y; Farrall, M

    1995-10-01

    To investigate the genetic component of multifactorial diseases such as type 1 (insulin-dependent) diabetes mellitus (IDDM), models involving the joint action of several disease loci are important. These models can give increased power to detect an effect and a greater understanding of etiological mechanisms. Here, we present an extension of the maximum lod score method of N. Risch, which allows the simultaneous detection and modeling of two unlinked disease loci. Genetic constraints on the identical-by-descent sharing probabilities, analogous to the "triangle" restrictions in the single-locus method, are derived, and the size and power of the test statistics are investigated. The method is applied to affected-sib-pair data, and the joint effects of IDDM1 (HLA) and IDDM2 (the INS VNTR) and of IDDM1 and IDDM4 (FGF3-linked) are assessed with relation to the development of IDDM. In the presence of genetic heterogeneity, there is seen to be a significant advantage in analyzing more than one locus simultaneously. Analysis of these families indicates that the effects at IDDM1 and IDDM2 are well described by a multiplicative genetic model, while those at IDDM1 and IDDM4 follow a heterogeneity model.

  16. Two-locus maximum lod score analysis of a multifactorial trait: joint consideration of IDDM2 and IDDM4 with IDDM1 in type 1 diabetes.

    PubMed Central

    Cordell, H J; Todd, J A; Bennett, S T; Kawaguchi, Y; Farrall, M

    1995-01-01

    To investigate the genetic component of multifactorial diseases such as type 1 (insulin-dependent) diabetes mellitus (IDDM), models involving the joint action of several disease loci are important. These models can give increased power to detect an effect and a greater understanding of etiological mechanisms. Here, we present an extension of the maximum lod score method of N. Risch, which allows the simultaneous detection and modeling of two unlinked disease loci. Genetic constraints on the identical-by-descent sharing probabilities, analogous to the "triangle" restrictions in the single-locus method, are derived, and the size and power of the test statistics are investigated. The method is applied to affected-sib-pair data, and the joint effects of IDDM1 (HLA) and IDDM2 (the INS VNTR) and of IDDM1 and IDDM4 (FGF3-linked) are assessed with relation to the development of IDDM. In the presence of genetic heterogeneity, there is seen to be a significant advantage in analyzing more than one locus simultaneously. Analysis of these families indicates that the effects at IDDM1 and IDDM2 are well described by a multiplicative genetic model, while those at IDDM1 and IDDM4 follow a heterogeneity model. PMID:7573054

  17. Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

    PubMed

    Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

    2017-01-01

    The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.

  18. Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana

    PubMed Central

    Chaubey, Gyaneshwer; Kadian, Anurag; Bala, Saroj; Rao, Vadlamudi Raghavendra

    2015-01-01

    Kol, Bhil and Gond are some of the ancient tribal populations known from the Ramayana, one of the Great epics of India. Though there have been studies about their affinity based on classical and haploid genetic markers, the molecular insights of their relationship with other tribal and caste populations of extant India is expected to give more clarity about the the question of continuity vs. discontinuity. In this study, we scanned >97,000 of single nucleotide polymorphisms among three major ancient tribes mentioned in Ramayana, namely Bhil, Kol and Gond. The results obtained were then compared at inter and intra population levels with neighboring and other world populations. Using various statistical methods, our analysis suggested that the genetic architecture of these tribes (Kol and Gond) was largely similar to their surrounding tribal and caste populations, while Bhil showed closer affinity with Dravidian and Austroasiatic (Munda) speaking tribes. The haplotype based analysis revealed a massive amount of genome sharing among Bhil, Kol, Gond and with other ethnic groups of South Asian descent. On the basis of genetic component sharing among different populations, we anticipate their primary founding over the indigenous Ancestral South Indian (ASI) component has prevailed in the genepool over the last several thousand years. PMID:26061398

  19. GWAR: robust analysis and meta-analysis of genome-wide association studies.

    PubMed

    Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

    2017-05-15

    In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Microsatellite analysis of chloroquine resistance associated alleles and neutral loci reveal genetic structure of Indian Plasmodium falciparum

    PubMed Central

    Mallick, Prashant K.; Sutton, Patrick L.; Singh, Ruchi; Singh, Om P.; Dash, Aditya P.; Singh, Ashok K.; Carlton, Jane M.; Bhasin, Virendra K.

    2013-01-01

    Efforts to control malignant malaria caused by Plasmodium falciparum are hampered by the parasite’s acquisition of resistance to antimalarial drugs, e.g., chloroquine. This necessitates evaluating the spread of chloroquine resistance in any malaria-endemic area. India displays highly variable malaria epidemiology and also shares porous international borders with malaria-endemic Southeast Asian countries having multi-drug resistant malaria. Malaria epidemiology in India is believed to be affected by two major factors: high genetic diversity and evolving drug resistance in P. falciparum. How transmission intensity of malaria can influence the genetic structure of chloroquine-resistant P. falciparum population in India is unknown. Here, genetic diversity within and among P. falciparum populations is analyzed with respect to their prevalence and chloroquine resistance observed in 13 different locations in India. Microsatellites developed for P. falciparum, including three putatively neutral and seven microsatellites thought to be under a hitchhiking effect due to chloroquine selection were used. Genetic hitchhiking is observed in five of seven microsatellites flanking the gene responsible for chloroquine resistance. Genetic admixture analysis and F-statistics detected genetically distinct groups in accordance with transmission intensity of different locations and the probable use of chloroquine. A large genetic break between the chloroquine-resistant parasite of the Northeast-East-Island group and Southwest group (FST = 0.253, P<0.001) suggests a long period of isolation or a possibility of different origin between them. A pattern of significant isolation by distance was observed in low transmission areas (r = 0.49, P=0.003, N = 83, Mantel test). An unanticipated pattern of spread of hitchhiking suggests genetic structure for Indian P. falciparum population. Overall, the study suggests that transmission intensity can be an efficient driver for genetic differentiation at both neutral and adaptive loci across India. PMID:23871774

  1. Microsatellite analysis of chloroquine resistance associated alleles and neutral loci reveal genetic structure of Indian Plasmodium falciparum.

    PubMed

    Mallick, Prashant K; Sutton, Patrick L; Singh, Ruchi; Singh, Om P; Dash, Aditya P; Singh, Ashok K; Carlton, Jane M; Bhasin, Virendra K

    2013-10-01

    Efforts to control malignant malaria caused by Plasmodium falciparum are hampered by the parasite's acquisition of resistance to antimalarial drugs, e.g., chloroquine. This necessitates evaluating the spread of chloroquine resistance in any malaria-endemic area. India displays highly variable malaria epidemiology and also shares porous international borders with malaria-endemic Southeast Asian countries having multi-drug resistant malaria. Malaria epidemiology in India is believed to be affected by two major factors: high genetic diversity and evolving drug resistance in P. falciparum. How transmission intensity of malaria can influence the genetic structure of chloroquine-resistant P. falciparum population in India is unknown. Here, genetic diversity within and among P. falciparum populations is analyzed with respect to their prevalence and chloroquine resistance observed in 13 different locations in India. Microsatellites developed for P. falciparum, including three putatively neutral and seven microsatellites thought to be under a hitchhiking effect due to chloroquine selection were used. Genetic hitchhiking is observed in five of seven microsatellites flanking the gene responsible for chloroquine resistance. Genetic admixture analysis and F-statistics detected genetically distinct groups in accordance with transmission intensity of different locations and the probable use of chloroquine. A large genetic break between the chloroquine-resistant parasite of the Northeast-East-Island group and Southwest group (FST=0.253, P<0.001) suggests a long period of isolation or a possibility of different origin between them. A pattern of significant isolation by distance was observed in low transmission areas (r=0.49, P=0.003, N=83, Mantel test). An unanticipated pattern of spread of hitchhiking suggests genetic structure for Indian P. falciparum population. Overall, the study suggests that transmission intensity can be an efficient driver for genetic differentiation at both neutral and adaptive loci across India. Copyright © 2013 Elsevier B.V. All rights reserved.

  2. On an Additive Semigraphoid Model for Statistical Networks With Application to Pathway Analysis.

    PubMed

    Li, Bing; Chun, Hyonho; Zhao, Hongyu

    2014-09-01

    We introduce a nonparametric method for estimating non-gaussian graphical models based on a new statistical relation called additive conditional independence, which is a three-way relation among random vectors that resembles the logical structure of conditional independence. Additive conditional independence allows us to use one-dimensional kernel regardless of the dimension of the graph, which not only avoids the curse of dimensionality but also simplifies computation. It also gives rise to a parallel structure to the gaussian graphical model that replaces the precision matrix by an additive precision operator. The estimators derived from additive conditional independence cover the recently introduced nonparanormal graphical model as a special case, but outperform it when the gaussian copula assumption is violated. We compare the new method with existing ones by simulations and in genetic pathway analysis.

  3. Patterns of genetic and morphometric diversity in baobab (Adansonia digitata) populations across different climatic zones of Benin (West Africa).

    PubMed

    Assogbadjo, A E; Kyndt, T; Sinsin, B; Gheysen, G; van Damme, P

    2006-05-01

    Baobab (Adansonia digitata) is a multi-purpose tree used daily by rural African communities. The present study aimed at investigating the level of morphometric and genetic variation and spatial genetic structure within and between threatened baobab populations from the three climatic zones of Benin. A total of 137 individuals from six populations were analysed using morphometric data as well as molecular marker data generated using the AFLP technique. Five primer pairs resulted in a total of 217 scored bands with 78.34 % of them being polymorphic. A two-level AMOVA of 137 individuals from six baobab populations revealed 82.37 % of the total variation within populations and 17.63 % among populations (P < 0.001). Analysis of population structure with allele-frequency based F-statistics revealed a global F(ST) of 0.127 +/- 0.072 (P < 0.001). The mean gene diversity within populations (H(S)) and the average gene diversity between populations (D(ST)) were estimated at 0.309 +/- 0.000 and 0.045 +/- 0.072, respectively. Baobabs in the Sudanian and Sudan-Guinean zones of Benin were short and produced the highest yields of pulp, seeds and kernels, in contrast to the ones in the Guinean zone, which were tall and produced only a small number of fruits with a low pulp, seed and kernel productivity. A statistically significant correlation with the observed patterns of genetic diversity was observed for three morphological characteristics: height of the trees, number of branches and thickness of the capsules. The results indicate some degree of physical isolation of the populations collected in the different climatic zones and suggest a substantial amount of genetic structuring between the analysed populations of baobab. Sampling options of the natural populations are suggested for in or ex situ conservation.

  4. TATES: Efficient Multivariate Genotype-Phenotype Analysis for Genome-Wide Association Studies

    PubMed Central

    van der Sluis, Sophie; Posthuma, Danielle; Dolan, Conor V.

    2013-01-01

    To date, the genome-wide association study (GWAS) is the primary tool to identify genetic variants that cause phenotypic variation. As GWAS analyses are generally univariate in nature, multivariate phenotypic information is usually reduced to a single composite score. This practice often results in loss of statistical power to detect causal variants. Multivariate genotype–phenotype methods do exist but attain maximal power only in special circumstances. Here, we present a new multivariate method that we refer to as TATES (Trait-based Association Test that uses Extended Simes procedure), inspired by the GATES procedure proposed by Li et al (2011). For each component of a multivariate trait, TATES combines p-values obtained in standard univariate GWAS to acquire one trait-based p-value, while correcting for correlations between components. Extensive simulations, probing a wide variety of genotype–phenotype models, show that TATES's false positive rate is correct, and that TATES's statistical power to detect causal variants explaining 0.5% of the variance can be 2.5–9 times higher than the power of univariate tests based on composite scores and 1.5–2 times higher than the power of the standard MANOVA. Unlike other multivariate methods, TATES detects both genetic variants that are common to multiple phenotypes and genetic variants that are specific to a single phenotype, i.e. TATES provides a more complete view of the genetic architecture of complex traits. As the actual causal genotype–phenotype model is usually unknown and probably phenotypically and genetically complex, TATES, available as an open source program, constitutes a powerful new multivariate strategy that allows researchers to identify novel causal variants, while the complexity of traits is no longer a limiting factor. PMID:23359524

  5. Identification of genetic loci shared between schizophrenia and the Big Five personality traits.

    PubMed

    Smeland, Olav B; Wang, Yunpeng; Lo, Min-Tzu; Li, Wen; Frei, Oleksandr; Witoelar, Aree; Tesli, Martin; Hinds, David A; Tung, Joyce Y; Djurovic, Srdjan; Chen, Chi-Hua; Dale, Anders M; Andreassen, Ole A

    2017-05-22

    Schizophrenia is associated with differences in personality traits, and recent studies suggest that personality traits and schizophrenia share a genetic basis. Here we aimed to identify specific genetic loci shared between schizophrenia and the Big Five personality traits using a Bayesian statistical framework. Using summary statistics from genome-wide association studies (GWAS) on personality traits in the 23andMe cohort (n = 59,225) and schizophrenia in the Psychiatric Genomics Consortium cohort (n = 82,315), we evaluated overlap in common genetic variants. The Big Five personality traits neuroticism, extraversion, openness, agreeableness and conscientiousness were measured using a web implementation of the Big Five Inventory. Applying the conditional false discovery rate approach, we increased discovery of genetic loci and identified two loci shared between neuroticism and schizophrenia and six loci shared between openness and schizophrenia. The study provides new insights into the relationship between personality traits and schizophrenia by highlighting genetic loci involved in their common genetic etiology.

  6. Sparse models for correlative and integrative analysis of imaging and genetic data

    PubMed Central

    Lin, Dongdong; Cao, Hongbao; Calhoun, Vince D.

    2014-01-01

    The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis. PMID:25218561

  7. Asthma phenotypes in childhood.

    PubMed

    Reddy, Monica B; Covar, Ronina A

    2016-04-01

    This review describes the literature over the past 18 months that evaluated childhood asthma phenotypes, highlighting the key aspects of these studies, and comparing these studies to previous ones in this area. Recent studies on asthma phenotypes have identified new phenotypes on the basis of statistical analyses (using cluster analysis and latent class analysis methodology) and have evaluated the outcomes and associated risk factors of previously established early childhood asthma phenotypes that are based on asthma onset and patterns of wheezing illness. There have also been investigations focusing on immunologic, physiologic, and genetic correlates of various phenotypes, as well as identification of subphenotypes of severe childhood asthma. Childhood asthma remains a heterogeneous condition, and investigations into these various presentations, risk factors, and outcomes are important since they can offer therapeutic and prognostic relevance. Further investigation into the immunopathology and genetic basis underlying childhood phenotypes is important so therapy can be tailored accordingly.

  8. Genetic overlap between Alzheimer’s disease and Parkinson’s disease at the MAPT locus

    PubMed Central

    Desikan, Rahul S.; Schork, Andrew J.; Wang, Yunpeng; Witoelar, Aree; Sharma, Manu; McEvoy, Linda K.; Holland, Dominic; Brewer, James B.; Chen, Chi-Hua; Thompson, Wesley K.; Harold, Denise; Williams, Julie; Owen, Michael J.; O’Donovan, Michael C.; Pericak-Vance, Margaret A.; Mayeux, Richard; Haines, Jonathan L.; Farrer, Lindsay A.; Schellenberg, Gerard D.; Heutink, Peter; Singleton, Andrew B.; Brice, Alexis; Wood, Nicolas W.; Hardy, John; Martinez, Maria; Choi, Seung Hoi; DeStefano, Anita; Ikram, M. Arfan; Bis, Joshua C.; Smith, Albert; Fitzpatrick, Annette L.; Launer, Lenore; van Duijn, Cornelia; Seshadri, Sudha; Ulstein, Ingun Dina; Aarsland, Dag; Fladby, Tormod; Djurovic, Srdjan; Hyman, Bradley T.; Snaedal, Jon; Stefansson, Hreinn; Stefansson, Kari; Gasser, Thomas; Andreassen, Ole A.; Dale, Anders M.

    2015-01-01

    We investigated genetic overlap between Alzheimer’s disease (AD) and Parkinson’s disease (PD). Using summary statistics (p-values) from large recent genomewide association studies (GWAS) (total n = 89,904 individuals), we sought to identify single nucleotide polymorphisms (SNPs) associating with both AD and PD. We found and replicated association of both AD and PD with the A allele of rs393152 within the extended MAPT region on chromosome 17 (meta analysis p-value across 5 independent AD cohorts = 1.65 × 10−7). In independent datasets, we found a dose-dependent effect of the A allele of rs393152 on intra-cerebral MAPT transcript levels and volume loss within the entorhinal cortex and hippocampus. Our findings identify the tau-associated MAPT locus as a site of genetic overlap between AD and PD and extending prior work, we show that the MAPT region increases risk of Alzheimer’s neurodegeneration. PMID:25687773

  9. Genetic analysis of 20 autosomal STR loci in the Miao ethnic group from Yunnan Province, Southwest China.

    PubMed

    Zhang, Xiufeng; Hu, Liping; Du, Lei; Nie, Aiting; Rao, Min; Pang, Jing Bo; Xiran, Zeng; Nie, Shengjie

    2017-05-01

    The genetic polymorphisms of 20 autosomal short tandem repeat (STR) loci included in the PowerPlex ® 21 kit were evaluated from 748 unrelated healthy individuals of the Miao ethnic minority living in the Yunnan province in southwestern China. All of the loci reached Hardy-Weinberg equilibrium. These loci were examined to determine allele frequencies and forensic statistical parameters. The genetic relationship between the Miao population and other Chinese populations were also estimated. The combined discrimination power and probability of excluding paternity of the 20 STR loci were 0.999 999 999 999 999 999 999 991 26 and 0.999 999 975, respectively. The results suggested that the 20 STR loci were highly polymorphic, which makes them suitable for forensic personal identification and paternity testing. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Organizational Benchmarks for Test Utilization Performance: An Example Based on Positivity Rates for Genetic Tests.

    PubMed

    Rudolf, Joseph; Jackson, Brian R; Wilson, Andrew R; Smock, Kristi J; Schmidt, Robert L

    2017-04-01

    Health care organizations are under increasing pressure to deliver value by improving test utilization management. Many factors, including organizational factors, could affect utilization performance. Past research has focused on the impact of specific interventions in single organizations. The impact of organizational factors is unknown. The objective of this study is to determine whether testing patterns are subject to organizational effects, ie, are utilization patterns for individual tests correlated within organizations. Comparative analysis of ordering patterns (positivity rates for three genetic tests) across 659 organizations. Hierarchical regression was used to assess the impact of organizational factors after controlling for test-level factors (mutation prevalence) and hospital bed size. Test positivity rates were correlated within organizations. Organizations have a statistically significant impact on the positivity rate of three genetic tests. © American Society for Clinical Pathology, 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com

  11. Patient Electronic Health Records as a Means to Approach Genetic Research in Gastroenterology

    PubMed Central

    Ananthakrishnan, Ashwin N; Lieberman, David

    2015-01-01

    Electronic health records (EHR) are being increasingly utilized and form a unique source of extensive data gathered during routine clinical care. Through use of codified and free text concepts identified using clinical informatics tools, disease labels can be assigned with a high degree of accuracy. Analysis linking such EHR-assigned disease labels to a biospecimen repository has demonstrated that genetic associations identified in prospective cohorts can be replicated with adequate statistical power, and novel phenotypic associations identified. In addition, genetic discovery research can be performed utilizing clinical, laboratory, and procedure data obtained during care. Challenges with such research include the need to tackle variability in quality and quantity of EHR data and importance of maintaining patient privacy and data security. With appropriate safeguards, this novel and emerging field of research offers considerable promise and potential to further scientific research in gastroenterology efficiently, cost-effectively, and with engagement of patients and communities. PMID:26073373

  12. Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies

    PubMed Central

    2014-01-01

    Summary Background The epilepsies are a clinically heterogeneous group of neurological disorders. Despite strong evidence for heritability, genome-wide association studies have had little success in identification of risk loci associated with epilepsy, probably because of relatively small sample sizes and insufficient power. We aimed to identify risk loci through meta-analyses of genome-wide association studies for all epilepsy and the two largest clinical subtypes (genetic generalised epilepsy and focal epilepsy). Methods We combined genome-wide association data from 12 cohorts of individuals with epilepsy and controls from population-based datasets. Controls were ethnically matched with cases. We phenotyped individuals with epilepsy into categories of genetic generalised epilepsy, focal epilepsy, or unclassified epilepsy. After standardised filtering for quality control and imputation to account for different genotyping platforms across sites, investigators at each site conducted a linear mixed-model association analysis for each dataset. Combining summary statistics, we conducted fixed-effects meta-analyses of all epilepsy, focal epilepsy, and genetic generalised epilepsy. We set the genome-wide significance threshold at p<1·66 × 10−8. Findings We included 8696 cases and 26 157 controls in our analysis. Meta-analysis of the all-epilepsy cohort identified loci at 2q24.3 (p=8·71 × 10−10), implicating SCN1A, and at 4p15.1 (p=5·44 × 10−9), harbouring PCDH7, which encodes a protocadherin molecule not previously implicated in epilepsy. For the cohort of genetic generalised epilepsy, we noted a single signal at 2p16.1 (p=9·99 × 10−9), implicating VRK2 or FANCL. No single nucleotide polymorphism achieved genome-wide significance for focal epilepsy. Interpretation This meta-analysis describes a new locus not previously implicated in epilepsy and provides further evidence about the genetic architecture of these disorders, with the ultimate aim of assisting in disease classification and prognosis. The data suggest that specific loci can act pleiotropically raising risk for epilepsy broadly, or can have effects limited to a specific epilepsy subtype. Future genetic analyses might benefit from both lumping (ie, grouping of epilepsy types together) or splitting (ie, analysis of specific clinical subtypes). Funding International League Against Epilepsy and multiple governmental and philanthropic agencies. PMID:25087078

  13. Valuing the benefits of genetic testing for retinitis pigmentosa: a pilot application of the contingent valuation method.

    PubMed

    Eden, Martin; Payne, Katherine; Combs, Ryan M; Hall, Georgina; McAllister, Marion; Black, Graeme C M

    2013-08-01

    Technological advances present an opportunity for more people with, or at risk of, developing retinitis pigmentosa (RP) to be offered genetic testing. Valuation of these tests using current evaluative frameworks is problematic since benefits may be derived from diagnostic information rather than improvements in health. This pilot study aimed to explore if contingent valuation method (CVM) can be used to value the benefits of genetic testing for RP. CVM was used to elicit willingness-to-pay (WTP) values for (1) genetic counselling and (2) genetic counselling with genetic testing. Telephone and face-to-face interviews with a purposive sample of individuals with (n=25), and without (n=27), prior experience of RP were used to explore the feasibility and validity of CVM in this context. Faced with a hypothetical scenario, the majority of participants stated that they would seek genetic counselling and testing in the context of RP. Between participant groups, respondents offered similar justifications for stated WTP values. Overall stated WTP was higher for genetic counselling plus testing (median=£524.00) compared with counselling alone (median=£224.50). Between-group differences in stated WTP were statistically significant; participants with prior knowledge of the condition were willing to pay more for genetic ophthalmology services. Participants were able to attach a monetary value to the perceived potential benefit that genetic testing offered regardless of prior experience of the condition. This exploratory work represents an important step towards evaluating these services using formal cost-benefit analysis.

  14. The effect of genetic bottlenecks and inbreeding on the incidence of two major autoimmune diseases in standard poodles, sebaceous adenitis and Addison's disease.

    PubMed

    Pedersen, Niels C; Brucker, Lynn; Tessier, Natalie Green; Liu, Hongwei; Penedo, Maria Cecilia T; Hughes, Shayne; Oberbauer, Anita; Sacks, Ben

    2015-01-01

    Sebaceous adenitis (SA) and Addison's disease (AD) increased rapidly in incidence among Standard Poodles after the mid-twentieth century. Previous attempts to identify specific genetic causes using genome wide association studies and interrogation of the dog leukocyte antigen (DLA) region have been non-productive. However, such studies led us to hypothesize that positive selection for desired phenotypic traits that arose in the mid-twentieth century led to intense inbreeding and the inadvertent amplification of AD and SA associated traits. This hypothesis was tested with genetic studies of 761 Standard, Miniature, and Miniature/Standard Poodle crosses from the USA, Canada and Europe, coupled with extensive pedigree analysis of thousands more dogs. Genome-wide diversity across the world-wide population was measured using a panel of 33 short tandem repeat (STR) loci. Allele frequency data were also used to determine the internal relatedness of individual dogs within the population as a whole. Assays based on linkage between STR genomic loci and DLA genes were used to identify class I and II haplotypes and disease associations. Genetic diversity statistics based on genomic STR markers indicated that Standard Poodles from North America and Europe were closely related and reasonably diverse across the breed. However, genetic diversity statistics, internal relatedness, principal coordinate analysis, and DLA haplotype frequencies showed a marked imbalance with 30 % of the diversity in 70 % of the dogs. Standard Poodles with SA and AD were strongly linked to this inbred population, with dogs suffering with SA being the most inbred. No single strong association was found between STR defined DLA class I or II haplotypes and SA or AD in the breed as a whole, although certain haplotypes present in a minority of the population appeared to confer moderate degrees of risk or protection against either or both diseases. Dogs possessing minor DLA class I haplotypes were half as likely to develop SA or AD as dogs with common haplotypes. Miniature/Standard Poodle crosses being used for outcrossing were more genetically diverse than Standard Poodles and genetically distinguishable across the genome and in the DLA class I and II region. Ancestral genetic polymorphisms responsible for SA and AD entered Standard Poodles through separate lineages, AD earlier and SA later, and were increasingly fixed by a period of close linebreeding that was related to popular bloodlines from the mid-twentieth century. This event has become known as the midcentury bottleneck or MCB. Sustained positive selection resulted in a marked imbalance in genetic diversity across the genome and in the DLA class I and II region. Both SA and AD were concentrated among the most inbred dogs, with genetic outliers being relatively disease free. No specific genetic markers other than those reflecting the degree of inbreeding were consistently associated with either disease. Standard Poodles as a whole remain genetically diverse, but steps should be taken to rebalance diversity using genetic outliers and if necessary, outcrosses to phenotypically similar but genetically distinct breeds.

  15. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

    PubMed

    Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

    2011-05-05

    High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.

  16. Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information

    PubMed Central

    Perlin, Mark William

    2015-01-01

    Background: DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. Materials and Methods: The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI-1 value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI-1) values were examined and compared with corresponding log(LR) values. Results: The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI-1 increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Conclusions: Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice. PMID:26605124

  17. Inclusion probability for DNA mixtures is a subjective one-sided match statistic unrelated to identification information.

    PubMed

    Perlin, Mark William

    2015-01-01

    DNA mixtures of two or more people are a common type of forensic crime scene evidence. A match statistic that connects the evidence to a criminal defendant is usually needed for court. Jurors rely on this strength of match to help decide guilt or innocence. However, the reliability of unsophisticated match statistics for DNA mixtures has been questioned. The most prevalent match statistic for DNA mixtures is the combined probability of inclusion (CPI), used by crime labs for over 15 years. When testing 13 short tandem repeat (STR) genetic loci, the CPI(-1) value is typically around a million, regardless of DNA mixture composition. However, actual identification information, as measured by a likelihood ratio (LR), spans a much broader range. This study examined probability of inclusion (PI) mixture statistics for 517 locus experiments drawn from 16 reported cases and compared them with LR locus information calculated independently on the same data. The log(PI(-1)) values were examined and compared with corresponding log(LR) values. The LR and CPI methods were compared in case examples of false inclusion, false exclusion, a homicide, and criminal justice outcomes. Statistical analysis of crime laboratory STR data shows that inclusion match statistics exhibit a truncated normal distribution having zero center, with little correlation to actual identification information. By the law of large numbers (LLN), CPI(-1) increases with the number of tested genetic loci, regardless of DNA mixture composition or match information. These statistical findings explain why CPI is relatively constant, with implications for DNA policy, criminal justice, cost of crime, and crime prevention. Forensic crime laboratories have generated CPI statistics on hundreds of thousands of DNA mixture evidence items. However, this commonly used match statistic behaves like a random generator of inclusionary values, following the LLN rather than measuring identification information. A quantitative CPI number adds little meaningful information beyond the analyst's initial qualitative assessment that a person's DNA is included in a mixture. Statistical methods for reporting on DNA mixture evidence should be scientifically validated before they are relied upon by criminal justice.

  18. The score statistic of the LD-lod analysis: detecting linkage adaptive to linkage disequilibrium.

    PubMed

    Huang, J; Jiang, Y

    2001-01-01

    We study the properties of a modified lod score method for testing linkage that incorporates linkage disequilibrium (LD-lod). By examination of its score statistic, we show that the LD-lod score method adaptively combines two sources of information: (a) the IBD sharing score which is informative for linkage regardless of the existence of LD and (b) the contrast between allele-specific IBD sharing scores which is informative for linkage only in the presence of LD. We also consider the connection between the LD-lod score method and the transmission-disequilibrium test (TDT) for triad data and the mean test for affected sib pair (ASP) data. We show that, for triad data, the recessive LD-lod test is asymptotically equivalent to the TDT; and for ASP data, it is an adaptive combination of the TDT and the ASP mean test. We demonstrate that the LD-lod score method has relatively good statistical efficiency in comparison with the ASP mean test and the TDT for a broad range of LD and the genetic models considered in this report. Therefore, the LD-lod score method is an interesting approach for detecting linkage when the extent of LD is unknown, such as in a genome-wide screen with a dense set of genetic markers. Copyright 2001 S. Karger AG, Basel

  19. Genetic diversity analysis of cyanogenic potential (CNp) of root among improved genotypes of cassava using simple sequence repeat markers.

    PubMed

    Moyib, O K; Mkumbira, J; Odunola, O A; Dixon, A G

    2012-12-01

    Cyanogenic potential (CNp) of cassava constitutes a serious problem for over 500 million people who rely on the crop as their main source of calories. Genetic diversity is a key to successful crop improvement for breeding new improved variability for target traits. Forty-three improved genotypes of cassava developed by International Institute of Tropical Agriculture (ITA), Ibadan, were characterized for CNp trait using 35 Simple Sequence.Repeat (SSR) markers. Essential colorimetry picric test was used for evaluation of CNp on a color scale of 1 to 14. The CNp scores obtained ranged from 3 to 9, with a mean score of 5.48 (+/- 0.09) based on Statistical Analysis System (SAS) package. TMS M98/ 0068 (4.0 +/- 0.25) was identified as the best genotype with low CNp while TMS M98/0028 (7.75 +/- 0.25) was the worst. The 43 genotypes were assigned into 7 phenotypic groups based on rank-sum analysis in SAS. Dissimilarity analysis representatives for windows generated a phylogenetic tree with 5 clusters which represented hybridizing groups. Each of the clusters (except 4) contained low CNp genotypes that could be used for improving the high CNp genotypes in the same or near cluster. The scatter plot of the genotypes showed that there was little or no demarcation for phenotypic CNp groupings in the molecular groupings. The result of this study demonstrated that SSR markers are powerful tools for the assessment of genetic variability, and proper identification and selection of parents for genetic improvement of low CNp trait among the IITA cassava collection.

  20. The genetic consequences of selection in natural populations.

    PubMed

    Thurman, Timothy J; Barrett, Rowan D H

    2016-04-01

    The selection coefficient, s, quantifies the strength of selection acting on a genetic variant. Despite this parameter's central importance to population genetic models, until recently we have known relatively little about the value of s in natural populations. With the development of molecular genetic techniques in the late 20th century and the sequencing technologies that followed, biologists are now able to identify genetic variants and directly relate them to organismal fitness. We reviewed the literature for published estimates of natural selection acting at the genetic level and found over 3000 estimates of selection coefficients from 79 studies. Selection coefficients were roughly exponentially distributed, suggesting that the impact of selection at the genetic level is generally weak but can occasionally be quite strong. We used both nonparametric statistics and formal random-effects meta-analysis to determine how selection varies across biological and methodological categories. Selection was stronger when measured over shorter timescales, with the mean magnitude of s greatest for studies that measured selection within a single generation. Our analyses found conflicting trends when considering how selection varies with the genetic scale (e.g., SNPs or haplotypes) at which it is measured, suggesting a need for further research. Besides these quantitative conclusions, we highlight key issues in the calculation, interpretation, and reporting of selection coefficients and provide recommendations for future research. © 2016 John Wiley & Sons Ltd.

  1. Genome-Wide Analysis in Brazilians Reveals Highly Differentiated Native American Genome Regions

    PubMed Central

    Havt, Alexandre; Nayak, Uma; Pinkerton, Relana; Farber, Emily; Concannon, Patrick; Lima, Aldo A.; Guerrant, Richard L.

    2017-01-01

    Despite its population, geographic size, and emerging economic importance, disproportionately little genome-scale research exists into genetic factors that predispose Brazilians to disease, or the population genetics of risk. After identification of suitable proxy populations and careful analysis of tri-continental admixture in 1,538 North-Eastern Brazilians to estimate individual ancestry and ancestral allele frequencies, we computed 400,000 genome-wide locus-specific branch length (LSBL) Fst statistics of Brazilian Amerindian ancestry compared to European and African; and a similar set of differentiation statistics for their Amerindian component compared with the closest Asian 1000 Genomes population (surprisingly, Bengalis in Bangladesh). After ranking SNPs by these statistics, we identified the top 10 highly differentiated SNPs in five genome regions in the LSBL tests of Brazilian Amerindian ancestry compared to European and African; and the top 10 SNPs in eight regions comparing their Amerindian component to the closest Asian 1000 Genomes population. We found SNPs within or proximal to the genes CIITA (rs6498115), SMC6 (rs1834619), and KLHL29 (rs2288697) were most differentiated in the Amerindian-specific branch, while SNPs in the genes ADAMTS9 (rs7631391), DOCK2 (rs77594147), SLC28A1 (rs28649017), ARHGAP5 (rs7151991), and CIITA (rs45601437) were most highly differentiated in the Asian comparison. These genes are known to influence immune function, metabolic and anthropometry traits, and embryonic development. These analyses have identified candidate genes for selection within Amerindian ancestry, and by comparison of the two analyses, those for which the differentiation may have arisen during the migration from Asia to the Americas. PMID:28100790

  2. Temporal stability of parasite distribution and genetic variability values of Contracaecum osculatum sp. D and C. osculatum sp. E (Nematoda: Anisakidae) from fish of the Ross Sea (Antarctica)

    PubMed Central

    Mattiucci, Simonetta; Cipriani, Paolo; Paoletti, Michela; Nardi, Valentina; Santoro, Mario; Bellisario, Bruno; Nascetti, Giuseppe

    2015-01-01

    The Ross Sea, Eastern Antarctica, is considered a “pristine ecosystem” and a biodiversity “hotspot” scarcely impacted by humans. The sibling species Contracaecum osculatum sp. D and C. osculatum sp. E are anisakid parasites embedded in the natural Antarctic marine ecosystem. Aims of this study were to: identify the larvae of C. osculatum (s.l.) recovered in fish hosts during the XXVII Italian Expedition to Antarctica (2011–2012); perform a comparative analysis of the contemporary parasitic load and genetic variability estimates of C. osculatum sp. D and C. osculatum sp. E with respect to samples collected during the expedition of 1993–1994; to provide ecological data on these parasites. 200 fish specimens (Chionodraco hamatus, Trematomus bernacchii, Trematomus hansoni, Trematomus newnesi) were analysed for Contracaecum sp. larvae, identified at species level by allozyme diagnostic markers and sequences analysis of the mtDNA cox2 gene. Statistically significant differences were found between the occurrence of C. osculatum sp. D and C. osculatum sp. E in different fish species. C. osculatum sp. E was more prevalent in T. bernacchii; while, a higher percentage of C. osculatum sp. D occurred in Ch. hamatus and T. hansoni. The two species also showed differences in the host infection site: C. osculatum sp. D showed higher percentage of infection in the fish liver. High genetic variability values at both nuclear and mitochondrial level were found in the two species in both sampling periods. The parasitic infection levels by C. osculatum sp. D and sp. E and their estimates of genetic variability showed no statistically significant variation over a temporal scale (2012 versus 1994). This suggests that the low habitat disturbance of the Antarctic region permits the maintenance of stable ecosystem trophic webs, which contributes to the maintenance of a large populations of anisakid nematodes with high genetic variability. PMID:26767164

  3. Re-evaluating causal modeling with mantel tests in landscape genetics

    Treesearch

    Samuel A. Cushman; Tzeidle N. Wasserman; Erin L. Landguth; Andrew J. Shirk

    2013-01-01

    The predominant analytical approach to associate landscape patterns with gene flow processes is based on the association of cost distances with genetic distances between individuals. Mantel and partial Mantel tests have been the dominant statistical tools used to correlate cost distances and genetic distances in landscape genetics. However, the inherent high...

  4. Genetic structure of four socio-culturally diversified caste populations of southwest India and their affinity with related Indian and global groups.

    PubMed

    Rajkumar, Revathi; Kashyap, V K

    2004-08-19

    A large number of microsatellites have been extensively used to comprehend the genetic diversity of different global groups. This paper entails polymorphism at 15 STR in four predominant and endogamous populations representing Karnataka, located on the southwest coast of India. The populations residing in this region are believed to have received gene flow from south Indian populations and world migrants, hence, we carried out a detailed study on populations inhabiting this region to understand their genetic structure, diversity related to geography and linguistic affiliation and relatedness to other Indian and global migrant populations. Various statistical analyses were performed on the microsatellite data to accomplish the objectives of the paper. The heretozygosity was moderately high and similar across the loci, with low average GST value. Iyengar and Lyngayat were placed above the regression line in the R-matrix analysis as opposed to the Gowda and Muslim. AMOVA indicated that majority of variation was confined to individuals within a population, with geographic grouping demonstrating lesser genetic differentiation as compared to linguistic clustering. DA distances show the genetic affinity among the southern populations, with Iyengar, Lyngayat and Vanniyar displaying some affinity with northern Brahmins and global migrant groups from East Asia and Europe. The microsatellite study divulges a common ancestry for the four diverse populations of Karnataka, with the overall genetic differentiation among them being largely confined to intra-population variation. The practice of consanguineous marriages might have attributed to the relatively lower gene flow displayed by Gowda and Muslim as compared to Iyengar and Lyngayat. The various statistical analyses strongly suggest that the studied populations could not be differentiated on the basis of caste or spatial location, although, linguistic affinity was reflected among the southern populations, distinguishing them from the northern groups. Our study also indicates a heterogeneous origin for Lyngayat and Iyengar owing to their genetic proximity with southern populations and northern Brahmins. The high-ranking communities, in particular, Iyengar, Lyngayat, Vanniyar and northern Brahmins might have experienced genetic admixture from East Asian and European ethnic groups.

  5. Genetic structure of four socio-culturally diversified caste populations of southwest India and their affinity with related Indian and global groups

    PubMed Central

    Rajkumar, Revathi; Kashyap, VK

    2004-01-01

    Background A large number of microsatellites have been extensively used to comprehend the genetic diversity of different global groups. This paper entails polymorphism at 15 STR in four predominant and endogamous populations representing Karnataka, located on the southwest coast of India. The populations residing in this region are believed to have received gene flow from south Indian populations and world migrants, hence, we carried out a detailed study on populations inhabiting this region to understand their genetic structure, diversity related to geography and linguistic affiliation and relatedness to other Indian and global migrant populations. Results Various statistical analyses were performed on the microsatellite data to accomplish the objectives of the paper. The heretozygosity was moderately high and similar across the loci, with low average GST value. Iyengar and Lyngayat were placed above the regression line in the R-matrix analysis as opposed to the Gowda and Muslim. AMOVA indicated that majority of variation was confined to individuals within a population, with geographic grouping demonstrating lesser genetic differentiation as compared to linguistic clustering. DA distances show the genetic affinity among the southern populations, with Iyengar, Lyngayat and Vanniyar displaying some affinity with northern Brahmins and global migrant groups from East Asia and Europe. Conclusion The microsatellite study divulges a common ancestry for the four diverse populations of Karnataka, with the overall genetic differentiation among them being largely confined to intra-population variation. The practice of consanguineous marriages might have attributed to the relatively lower gene flow displayed by Gowda and Muslim as compared to Iyengar and Lyngayat. The various statistical analyses strongly suggest that the studied populations could not be differentiated on the basis of caste or spatial location, although, linguistic affinity was reflected among the southern populations, distinguishing them from the northern groups. Our study also indicates a heterogeneous origin for Lyngayat and Iyengar owing to their genetic proximity with southern populations and northern Brahmins. The high-ranking communities, in particular, Iyengar, Lyngayat, Vanniyar and northern Brahmins might have experienced genetic admixture from East Asian and European ethnic groups. PMID:15317657

  6. An alternative covariance estimator to investigate genetic heterogeneity in populations.

    PubMed

    Heslot, Nicolas; Jannink, Jean-Luc

    2015-11-26

    For genomic prediction and genome-wide association studies (GWAS) using mixed models, covariance between individuals is estimated using molecular markers. Based on the properties of mixed models, using available molecular data for prediction is optimal if this covariance is known. Under this assumption, adding individuals to the analysis should never be detrimental. However, some empirical studies showed that increasing training population size decreased prediction accuracy. Recently, results from theoretical models indicated that even if marker density is high and the genetic architecture of traits is controlled by many loci with small additive effects, the covariance between individuals, which depends on relationships at causal loci, is not always well estimated by the whole-genome kinship. We propose an alternative covariance estimator named K-kernel, to account for potential genetic heterogeneity between populations that is characterized by a lack of genetic correlation, and to limit the information flow between a priori unknown populations in a trait-specific manner. This is similar to a multi-trait model and parameters are estimated by REML and, in extreme cases, it can allow for an independent genetic architecture between populations. As such, K-kernel is useful to study the problem of the design of training populations. K-kernel was compared to other covariance estimators or kernels to examine its fit to the data, cross-validated accuracy and suitability for GWAS on several datasets. It provides a significantly better fit to the data than the genomic best linear unbiased prediction model and, in some cases it performs better than other kernels such as the Gaussian kernel, as shown by an empirical null distribution. In GWAS simulations, alternative kernels control type I errors as well as or better than the classical whole-genome kinship and increase statistical power. No or small gains were observed in cross-validated prediction accuracy. This alternative covariance estimator can be used to gain insight into trait-specific genetic heterogeneity by identifying relevant sub-populations that lack genetic correlation between them. Genetic correlation can be 0 between identified sub-populations by performing automatic selection of relevant sets of individuals to be included in the training population. It may also increase statistical power in GWAS.

  7. ENGINES: exploring single nucleotide variation in entire human genomes.

    PubMed

    Amigo, Jorge; Salas, Antonio; Phillips, Christopher

    2011-04-19

    Next generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data. We have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen. ENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php. © 2011 Amigo et al; licensee BioMed Central Ltd.

  8. Application of the lymphocyte Cytokinesis-Block Micronucleus Assay to populations exposed to petroleum and its derivatives: Results from a systematic review and meta-analysis.

    PubMed

    Angelini, Sabrina; Bermejo, Justo Lorenzo; Ravegnini, Gloria; Sammarini, Giulia; Hrelia, Patrizia

    The lymphocyte cytokinesis-block micronucleus (CBMN) assay is applied in many different in vivo biomonitoring studies of human exposure to genotoxic chemicals. Among extensively chemicals investigated, we identified petroleum and its derivatives, in particular benzene and the most common mixture of benzene, toluene, and xylene. Although conflicting results have been reported on the effects of benzene exposure, the number of positive findings in independent studies suggests that occupational exposure to benzene causes DNA damage in peripheral blood lymphocytes. To assess current evidence on this hypothesis, we conducted a meta-analysis. Our aim was to evaluate the effect of benzene exposure on genetic damage, quantified using the CBMN assay on individuals occupationally exposed to petroleum and its derivatives. Statistical analyses were conducted using the rmeta package from the free Software Environment for Statistical Computing R. Combined study results indicated that benzene exposure is associated with an increased level of genetic damage in peripheral blood lymphocytes, as reflected by an increased MN frequency. The summary mean difference in MN frequency between exposed and unexposed individuals was 1.64 (95% CI: 0.80-2.47). Overall, this finding points to MN frequency as a sensitive biomarker which could be used to evaluate genetic damage induced by occupational - industrial or environmental - exposure to benzene. This review also identified some important knowledge gaps as well as the need of large, well-designed studies. In particular, it is fundamental to accurately characterize the investigated population, including dietary habits and genetic variability which could modulate MN frequency in both exposed individuals and unexposed controls. In conclusion, according to present findings the use of the CBMN assay in biomonitoring studies could provide objective evidence to guide prioritization of preventive interventions in subjects occupationally exposed to petroleum derivatives, and in particular benzene. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. 16(th) IHIW: analysis of HLA population data, with updated results for 1996 to 2012 workshop data (AHPD project report).

    PubMed

    Riccio, M E; Buhler, S; Nunes, J M; Vangenot, C; Cuénod, M; Currat, M; Di, D; Andreani, M; Boldyreva, M; Chambers, G; Chernova, M; Chiaroni, J; Darke, C; Di Cristofaro, J; Dubois, V; Dunn, P; Edinur, H A; Elamin, N; Eliaou, J-F; Grubic, Z; Jaatinen, T; Kanga, U; Kervaire, B; Kolesar, L; Kunachiwa, W; Lokki, M L; Mehra, N; Nicoloso, G; Paakkanen, R; Voniatis, D Papaioannou; Papasteriades, C; Poli, F; Richard, L; Romón Alonso, I; Slavčev, A; Sulcebe, G; Suslova, T; Testi, M; Tiercy, J-M; Varnavidou, A; Vidan-Jeras, B; Wennerström, A; Sanchez-Mazas, A

    2013-02-01

    We present here the results of the Analysis of HLA Population Data (AHPD) project of the 16th International HLA and Immunogenetics Workshop (16IHIW) held in Liverpool in May-June 2012. Thanks to the collaboration of 25 laboratories from 18 different countries, HLA genotypic data for 59 new population samples (either well-defined populations or donor registry samples) were gathered and 55 were analysed statistically following HLA-NET recommendations. The new data included, among others, large sets of well-defined populations from north-east Europe and West Asia, as well as many donor registry data from European countries. The Gene[rate] computer tools were combined to create a Gene[rate] computer pipeline to automatically (i) estimate allele frequencies by an expectation-maximization algorithm accommodating ambiguities, (ii) estimate heterozygosity, (iii) test for Hardy-Weinberg equilibrium (HWE), (iv) test for selective neutrality, (v) generate frequency graphs and summary statistics for each sample at each locus and (vi) plot multidimensional scaling (MDS) analyses comparing the new samples with previous IHIW data. Intrapopulation analyses show that HWE is rarely rejected, while neutrality tests often indicate a significant excess of heterozygotes compared with neutral expectations. The comparison of the 16IHIW AHPD data with data collected during previous workshops (12th-15th) shows that geography is an excellent predictor of HLA genetic differentiations for HLA-A, -B and -DRB1 loci but not for HLA-DQ, whose patterns are probably more influenced by natural selection. In Europe, HLA genetic variation clearly follows a north to south-east axis despite a low level of differentiation between European, North African and West Asian populations. Pacific populations are genetically close to Austronesian-speaking South-East Asian and Taiwanese populations, in agreement with current theories on the peopling of Oceania. Thanks to this project, HLA genetic variation is more clearly defined worldwide and better interpreted in relation to human peopling history and HLA molecular evolution. © 2012 Blackwell Publishing Ltd.

  10. Studying the Genetics of Complex Disease With Ancestry-Specific Human Phenotype Networks: The Case of Type 2 Diabetes in East Asian Populations.

    PubMed

    Qiu, Jingya; Moore, Jason H; Darabos, Christian

    2016-05-01

    Genome-wide association studies (GWAS) have led to the discovery of over 200 single nucleotide polymorphisms (SNPs) associated with type 2 diabetes mellitus (T2DM). Additionally, East Asians develop T2DM at a higher rate, younger age, and lower body mass index than their European ancestry counterparts. The reason behind this occurrence remains elusive. With comprehensive searches through the National Human Genome Research Institute (NHGRI) GWAS catalog literature, we compiled a database of 2,800 ancestry-specific SNPs associated with T2DM and 70 other related traits. Manual data extraction was necessary because the GWAS catalog reports statistics such as odds ratio and P-value, but does not consistently include ancestry information. Currently, many statistics are derived by combining initial and replication samples from study populations of mixed ancestry. Analysis of all-inclusive data can be misleading, as not all SNPs are transferable across diverse populations. We used ancestry data to construct ancestry-specific human phenotype networks (HPN) centered on T2DM. Quantitative and visual analysis of network models reveal the genetic disparities between ancestry groups. Of the 27 phenotypes in the East Asian HPN, six phenotypes were unique to the network, revealing the underlying ancestry-specific nature of some SNPs associated with T2DM. We studied the relationship between T2DM and five phenotypes unique to the East Asian HPN to generate new interaction hypotheses in a clinical context. The genetic differences found in our ancestry-specific HPNs suggest different pathways are involved in the pathogenesis of T2DM among different populations. Our study underlines the importance of ancestry in the development of T2DM and its implications in pharmocogenetics and personalized medicine. © 2016 The Authors. *Genetic Epidemiology Published by Wiley Periodicals, Inc.

  11. Improving information retrieval in functional analysis.

    PubMed

    Rodriguez, Juan C; González, Germán A; Fresno, Cristóbal; Llera, Andrea S; Fernández, Elmer A

    2016-12-01

    Transcriptome analysis is essential to understand the mechanisms regulating key biological processes and functions. The first step usually consists of identifying candidate genes; to find out which pathways are affected by those genes, however, functional analysis (FA) is mandatory. The most frequently used strategies for this purpose are Gene Set and Singular Enrichment Analysis (GSEA and SEA) over Gene Ontology. Several statistical methods have been developed and compared in terms of computational efficiency and/or statistical appropriateness. However, whether their results are similar or complementary, the sensitivity to parameter settings, or possible bias in the analyzed terms has not been addressed so far. Here, two GSEA and four SEA methods and their parameter combinations were evaluated in six datasets by comparing two breast cancer subtypes with well-known differences in genetic background and patient outcomes. We show that GSEA and SEA lead to different results depending on the chosen statistic, model and/or parameters. Both approaches provide complementary results from a biological perspective. Hence, an Integrative Functional Analysis (IFA) tool is proposed to improve information retrieval in FA. It provides a common gene expression analytic framework that grants a comprehensive and coherent analysis. Only a minimal user parameter setting is required, since the best SEA/GSEA alternatives are integrated. IFA utility was demonstrated by evaluating four prostate cancer and the TCGA breast cancer microarray datasets, which showed its biological generalization capabilities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  12. From Mendel's discovery on pea to today's plant genetics and breeding : Commemorating the 150th anniversary of the reading of Mendel's discovery.

    PubMed

    Smýkal, Petr; K Varshney, Rajeev; K Singh, Vikas; Coyne, Clarice J; Domoney, Claire; Kejnovský, Eduard; Warkentin, Thomas

    2016-12-01

    This work discusses several selected topics of plant genetics and breeding in relation to the 150th anniversary of the seminal work of Gregor Johann Mendel. In 2015, we celebrated the 150th anniversary of the presentation of the seminal work of Gregor Johann Mendel. While Darwin's theory of evolution was based on differential survival and differential reproductive success, Mendel's theory of heredity relies on equality and stability throughout all stages of the life cycle. Darwin's concepts were continuous variation and "soft" heredity; Mendel espoused discontinuous variation and "hard" heredity. Thus, the combination of Mendelian genetics with Darwin's theory of natural selection was the process that resulted in the modern synthesis of evolutionary biology. Although biology, genetics, and genomics have been revolutionized in recent years, modern genetics will forever rely on simple principles founded on pea breeding using seven single gene characters. Purposeful use of mutants to study gene function is one of the essential tools of modern genetics. Today, over 100 plant species genomes have been sequenced. Mapping populations and their use in segregation of molecular markers and marker-trait association to map and isolate genes, were developed on the basis of Mendel's work. Genome-wide or genomic selection is a recent approach for the development of improved breeding lines. The analysis of complex traits has been enhanced by high-throughput phenotyping and developments in statistical and modeling methods for the analysis of phenotypic data. Introgression of novel alleles from landraces and wild relatives widens genetic diversity and improves traits; transgenic methodologies allow for the introduction of novel genes from diverse sources, and gene editing approaches offer possibilities to manipulate gene in a precise manner.

  13. PCR/RFLP-based analysis of genetically distinct Plasmodium vivax population of Pvmsp-3α and Pvmsp-3β genes in Pakistan.

    PubMed

    Khan, Shahid Niaz; Khan, Asif; Khan, Sanaullah; Ayaz, Sultan; Attaullah, Sobia; Khan, Jabbar; Khan, Muhammad Asim; Ali, Ijaz; Shah, Abdul Haleem

    2014-09-09

    Plasmodium vivax is one of the widespread human malarial parasites accounting for 75% of malaria epidemics. However, there is no baseline information about the status and nature of genetic variation of Plasmodium species circulating in various parts of Pakistan. The present study was aimed at observing the molecular epidemiology and genetic variation of Plasmodium vivax by analysing its merozoite surface protein-3α (msp-3α) and merozoite surface protein-3β (msp-3β) genes, by using suballele, species-specific, combined nested PCR/RFLP detection techniques. A total of 230 blood samples from suspected subjects tested slide positive for vivax malaria were collected from Punjab, Sindh, Khyber Pakhtunkhwa, and Balochistan during the period May 2012 to December 2013. Combined nested PCR/RFLP technique was conducted using Pvmsp-3α and Pvmsp-3β genetic markers to detect extent of genetic variation in clinical isolates of P. vivax in the studied areas of Pakistan. By PCR, P. vivax, 202/230 (87.82%), was found to be widely distributed in the studied areas. PCR/RFLP analysis showed a high range of allelic variations for both msp-3α and msp-3β genetic markers of P. vivax, i.e., 21 alleles for msp-3α and 19 for msp-3β. Statistically a significant difference (p ≤ 0.05) was observed in the genetic diversity of the suballelic variants of msp-3α and msp-3β genes of P. vivax. It is concluded that P. vivax populations are highly polymorphic and diverse allelic variants of Pvmsp-3α and Pvmsp-3β are present in Pakistan.

  14. High genetic diversity in the offshore island populations of the tephritid fruit fly Bactrocera dorsalis.

    PubMed

    Yi, Chunyan; Zheng, Chunyan; Zeng, Ling; Xu, Yijuan

    2016-10-13

    Geographic isolation is an important factor that limit species dispersal and thereby affects genetic diversity. Because islands are often small and surrounded by a natural water barrier to dispersal, they generally form discrete isolated habitats. Therefore, islands may play a key role in the distribution of the genetic diversity of insects, including flies. To characterize the genetic structure of island populations of Bactrocera dorsalis, we analyzed a dataset containing both microsatellite and mtDNA loci of B. dorsalis samples collected from six offshore islands in Southern China. The microsatellite data revealed a high level of genetic diversity among these six island populations based on observed heterozygosity (Ho), expected heterozygosity (H E ), Nei's standard genetic distance (D), genetic identity (I) and the percentage of polymorphic loci (PIC). These island populations had low F ST values (F ST  = 0.04161), and only 4.16 % of the total genetic variation in the species was found on these islands, as determined by an analysis of molecular variance. Based on the mtDNA COI data, high nucleotide diversity (0.9655) and haplotype diversity (0.00680) were observed in all six island populations. F-statistics showed that the six island populations exhibited low or medium levels of genetic differentiation among some island populations. To investigate the population differentiation between the sampled locations, a factorial correspondence analysis and both the unweighted pair-group method with arithmetic mean and Bayesian clustering methods were used to analyze the microsatellite data. The results showed that Hebao Island, Weizhou Island and Dong'ao Island were grouped together in one clade. Another clade consisted of Shangchuan Island and Naozhou Island, and a final, separate clade contained only the Wailingding Island population. Phylogenetic analysis of the mtDNA COI sequences revealed that the populations on each of these six islands were closely related to different populations on mainland China. Our study suggests that these island populations have high genetic diversity, experience frequent gene flow and exhibit low or medium levels of genetic differentiation among some island populations. Therefore, the geographic isolation of the six islands does not appear to be a major dispersal barrier to B. dorsalis. Such knowledge is helpful for a better understanding of evolutionary processes of the species of island populations.

  15. Genetic diversity analysis in Malaysian giant prawns using expressed sequence tag microsatellite markers for stock improvement program.

    PubMed

    Atin, K H; Christianus, A; Fatin, N; Lutas, A C; Shabanimofrad, M; Subha, B

    2017-08-17

    The Malaysian giant prawn is among the most commonly cultured species of the genus Macrobrachium. Stocks of giant prawns from four rivers in Peninsular Malaysia have been used for aquaculture over the past 25 years, which has led to repeated harvesting, restocking, and transplantation between rivers. Consequently, a stock improvement program is now important to avoid the depletion of wild stocks and the loss of genetic diversity. However, the success of such an improvement program depends on our knowledge of the genetic variation of these base populations. The aim of the current study was to estimate genetic variation and differentiation of these riverine sources using novel expressed sequence tag-microsatellite (EST-SSR) markers, which not only are informative on genetic diversity but also provide information on immune and metabolic traits. Our findings indicated that the tested stocks have inbreeding depression due to a significant deficiency in heterozygotes, and F IS was estimated as 0.15538 to 0.31938. An F-statistics analysis suggested that the stocks are composed of one large panmictic population. Among the four locations, stocks from Johor, in the southern region of the peninsular, showed higher allelic and genetic diversity than the other stocks. To overcome inbreeding problems, the Johor population could be used as a base population in a stock improvement program by crossing to the other populations. The study demonstrated that EST-SSR markers can be incorporated in future marker assisted breeding to aid the proper management of the stocks by breeders and stakeholders in Malaysia.

  16. A genetic variant in SLC28A3, rs56350726, is associated with progression to castration-resistant prostate cancer in a Korean population with metastatic prostate cancer.

    PubMed

    Jo, Jung Ku; Oh, Jong Jin; Kim, Yong Tae; Moon, Hong Sang; Choi, Hong Yong; Park, Seunghyun; Ho, Jin-Nyoung; Yoon, Sungroh; Park, Hae Young; Byun, Seok-Soo

    2017-11-14

    Genetic variation which related with progression to castration-resistant prostate cancer (CRPC) during androgen-deprivation therapy (ADT) has not been elucidated in patients with metastatic prostate cancer (mPCa). Therefore, we assessed the association between genetic variats in mPCa and progession to CRPC. Analysis of exome genotypes revealed that 42 SNPs were significantly associated with mPCa. The top five polymorphisms were statistically significantly associated with metastatic disease. In addition, one of these SNPs, rs56350726, was significantly associated with time to CRPC in Kaplan-Meier analysis (Log-rank test, p = 0.011). In multivariable Cox regression, rs56350726 was strongly associated with progression to CRPC (HR = 4.172 95% CI = 1.223-14.239, p = 0.023). We assessed genetic variation among 1000 patients with PCa with or without metastasis, using 242,221 single nucleotide polymorphisms (SNPs) on the custom HumanExome BeadChip v1.0 (Illuminam Inc.). We analyzed the time to CRPC in 110 of the 1000 patients who were treated with ADT. Genetic data were analyzed using unconditional logistic regression and odds ratios calculated as estimates of relative risk of metastasis. We identified SNPs associated with metastasis and analyzed the relationship between these SNPs and time to CRPC in mPCa. Based on a genetic variation, the five top SNPs were observed to associate with mPCa. And one (SLC28A3, rs56350726) of five SNP was found the association with the progression to CRPC in patients with mPCa.

  17. Mapping heritability and molecular genetic associations with cortical features using probabilistic brain atlases: methods and applications to schizophrenia.

    PubMed

    Cannon, Tyrone D; Thompson, Paul M; van Erp, Theo G M; Huttunen, Matti; Lonnqvist, Jouko; Kaprio, Jaakko; Toga, Arthur W

    2006-01-01

    There is an urgent need to decipher the complex nature of genotype-phenotype relationships within the multiple dimensions of brain structure and function that are compromised in neuropsychiatric syndromes such as schizophrenia. Doing so requires sophisticated methodologies to represent population variability in neural traits and to probe their heritable and molecular genetic bases. We have recently developed and applied computational algorithms to map the heritability of, as well as genetic linkage and association to, neural features encoded using brain imaging in the context of three-dimensional (3D), populationbased, statistical brain atlases. One set of algorithms builds on our prior work using classical twin study methods to estimate heritability by fitting biometrical models for additive genetic, unique, and common environmental influences. Another set of algorithms performs regression-based (Haseman-Elston) identical-bydescent linkage analysis and genetic association analysis of DNA polymorphisms in relation to neural traits of interest in the same 3D population-based brain atlas format. We demonstrate these approaches using samples of healthy monozygotic (MZ) and dizygotic (DZ) twin pairs, as well as MZ and DZ twin pairs discordant for schizophrenia, but the methods can be generalized to other classes of relatives and to other diseases. The results confirm prior evidence of genetic influences on gray matter density in frontal brain regions. They also provide converging evidence that the chromosome 1q42 region is relevant to schizophrenia by demonstrating linkage and association of markers of the Transelin-Associated-Factor-X and Disrupted-In- Schizophrenia-1 genes with prefrontal cortical gray matter deficits in twins discordant for schizophrenia.

  18. Comprehensive detection of genes causing a phenotype using phenotype sequencing and pathway analysis.

    PubMed

    Harper, Marc; Gronenberg, Luisa; Liao, James; Lee, Christopher

    2014-01-01

    Discovering all the genetic causes of a phenotype is an important goal in functional genomics. We combine an experimental design for detecting independent genetic causes of a phenotype with a high-throughput sequencing analysis that maximizes sensitivity for comprehensively identifying them. Testing this approach on a set of 24 mutant strains generated for a metabolic phenotype with many known genetic causes, we show that this pathway-based phenotype sequencing analysis greatly improves sensitivity of detection compared with previous methods, and reveals a wide range of pathways that can cause this phenotype. We demonstrate our approach on a metabolic re-engineering phenotype, the PEP/OAA metabolic node in E. coli, which is crucial to a substantial number of metabolic pathways and under renewed interest for biofuel research. Out of 2157 mutations in these strains, pathway-phenoseq discriminated just five gene groups (12 genes) as statistically significant causes of the phenotype. Experimentally, these five gene groups, and the next two high-scoring pathway-phenoseq groups, either have a clear connection to the PEP metabolite level or offer an alternative path of producing oxaloacetate (OAA), and thus clearly explain the phenotype. These high-scoring gene groups also show strong evidence of positive selection pressure, compared with strictly neutral selection in the rest of the genome.

  19. Identification of expression quantitative trait loci by the interaction analysis using genetic algorithm.

    PubMed

    Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung

    2007-01-01

    Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene x gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene x gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms.

  20. Identification of expression quantitative trait loci by the interaction analysis using genetic algorithm

    PubMed Central

    Namkung, Junghyun; Nam, Jin-Wu; Park, Taesung

    2007-01-01

    Many genes with major effects on quantitative traits have been reported to interact with other genes. However, finding a group of interacting genes from thousands of SNPs is challenging. Hence, an efficient and robust algorithm is needed. The genetic algorithm (GA) is useful in searching for the optimal solution from a very large searchable space. In this study, we show that genome-wide interaction analysis using GA and a statistical interaction model can provide a practical method to detect biologically interacting loci. We focus our search on transcriptional regulators by analyzing gene × gene interactions for cancer-related genes. The expression values of three cancer-related genes were selected from the expression data of the Genetic Analysis Workshop 15 Problem 1 data set. We implemented a GA to identify the expression quantitative trait loci that are significantly associated with expression levels of the cancer-related genes. The time complexity of the GA was compared with that of an exhaustive search algorithm. As a result, our GA, which included heuristic methods, such as archive, elitism, and local search, has greatly reduced computational time in a genome-wide search for gene × gene interactions. In general, the GA took one-fifth the computation time of an exhaustive search for the most significant pair of single-nucleotide polymorphisms. PMID:18466570

  1. Methods for Genome-Wide Analysis of Gene Expression Changes in Polyploids

    PubMed Central

    Wang, Jianlin; Lee, Jinsuk J.; Tian, Lu; Lee, Hyeon-Se; Chen, Meng; Rao, Sheetal; Wei, Edward N.; Doerge, R. W.; Comai, Luca; Jeffrey Chen, Z.

    2007-01-01

    Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)–complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts. PMID:15865985

  2. Identifying biologically relevant differences between metagenomic communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2010-03-15

    Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.

  3. The contribution of statistical physics to evolutionary biology.

    PubMed

    de Vladar, Harold P; Barton, Nicholas H

    2011-08-01

    Evolutionary biology shares many concepts with statistical physics: both deal with populations, whether of molecules or organisms, and both seek to simplify evolution in very many dimensions. Often, methodologies have undergone parallel and independent development, as with stochastic methods in population genetics. Here, we discuss aspects of population genetics that have embraced methods from physics: non-equilibrium statistical mechanics, travelling waves and Monte-Carlo methods, among others, have been used to study polygenic evolution, rates of adaptation and range expansions. These applications indicate that evolutionary biology can further benefit from interactions with other areas of statistical physics; for example, by following the distribution of paths taken by a population through time. Copyright © 2011 Elsevier Ltd. All rights reserved.

  4. Filipino-American Nurses' Knowledge, Perceptions, Beliefs and Practice of Genetics and Genomics.

    PubMed

    Saligan, Leorey N; Rivera, Reynaldo R

    2014-01-01

    There is limited information on the knowledge, perceptions, beliefs, and practice, about genetics and genomics among Filipino-American nurses. The National Coalition of Ethnic Minority Organizations (NCEMNA), in which the Philippine Nurses Association of America (PNAA) is a member organization, conducted an online survey to describe the genomic knowledge, perceptions, beliefs, and practice of minority nurses. This study reports on responses from Filipino-American survey participants, which is a subset analysis of the larger NCEMNA survey. The purpose of this study was to explore the knowledge, perceptions, beliefs, practice and genomic education of Filipino-American nurses. An online survey of 112 Filipino-American nurses was conducted to describe the knowledge, perceptions, beliefs, and practice of genetics/genomics. Survey responses were analyzed using descriptive statistics. Most (94%) Filipino-American nurses wanted to learn more about genetics. Although 41% of the respondents indicated good understanding of genetics of common diseases, 60% had not attended any related continuing education courses since RN licensure, and 73% reported unavailability of genetic courses to take. The majority (83%) of PNAA respondents indicated that they would attend genetics/genomics awareness training if it was offered by their national organization during their annual conference, and 86% reported that the national organization should have a visible role in genetics/genomics initiatives in their community. Filipino-American nurses wanted to learn more about genetics and were willing to attend genetics/genomics trainings if offered by PNAA. The study findings can assist PNAA in planning future educational programs that incorporates genetics and genomics information.

  5. Population genetic structure of the rock outcrop species Encholirium spectabile (Bromeliaceae): The role of pollination vs. seed dispersal and evolutionary implications.

    PubMed

    Gonçalves-Oliveira, Rodrigo C; Wöhrmann, Tina; Benko-Iseppon, Ana M; Krapp, Florian; Alves, Marccus; Wanderley, Maria das Graças L; Weising, Kurt

    2017-06-01

    Inselbergs are terrestrial, island-like rock outcrop environments that present a highly adapted flora. The epilithic bromeliad Encholirium spectabile is a dominant species on inselbergs in the Caatinga of northeastern Brazil. We conducted a population genetic analysis to test whether the substantial phenotypic diversity of E. spectabile could be explained by limited gene flow among populations and to assess the relative impact of pollen vs. seed dispersal on the genetic structure of the species. Nuclear and chloroplast microsatellite markers were used to genotype E. spectabile individuals from 20 rock outcrop locations, representing four geographic regions: northern Espinhaço Range, Borborema Plateau, southwestern Caatinga and southeastern Caatinga. F -statistics, structure, and other tools were applied to evaluate the genetic makeup of populations. Considerable levels of genetic diversity were revealed. Genetic structuring among populations was stronger on the plastid as compared with the nuclear level, indicating higher gene flow via bat pollination as compared with seed dispersal by wind. structure and AMOVA analyses of the nuclear data suggested a high genetic differentiation between two groups, one containing all populations from the southeastern Caatinga and the other one comprising all remaining samples. The strong genetic differentiation between southeastern Caatinga and the remaining regions may indicate the occurrence of a cryptic species in E. spectabile . The unique genetic composition of each inselberg population suggests in situ conservation as the most appropriate protection measure for this plant lineage. © 2017 Botanical Society of America.

  6. Evaluation of body condition score measured throughout lactation as an indicator of fertility in dairy cattle.

    PubMed

    Banos, G; Brotherstone, S; Coffey, M P

    2004-08-01

    Body condition score (BCS) records of primiparous Holstein cows were analyzed both as a single measure per animal and as repeated measures per sire of cow. The former resulted in a single, average, genetic evaluation for each sire, and the latter resulted in separate genetic evaluations per day of lactation. Repeated measure analysis yielded genetic correlations of less than unity between days of lactation, suggesting that BCS may not be the same trait across lactation. Differences between daily genetic evaluations on d 10 or 30 and subsequent daily evaluations were used to assess BCS change at different stages of lactation. Genetic evaluations for BCS level or change were used to estimate genetic correlations between BCS measures and fertility traits in order to assess the capacity of BCS to predict fertility. Genetic correlation estimates with calving interval and non-return rate were consistently higher for daily BCS than single measure BCS evaluations, but results were not always statistically different. Genetic correlations between BCS change and fertility traits were not significantly different from zero. The product of the accuracy of BCS evaluations with their genetic correlation with the UK fertility index, comprising calving interval and non-return rate, was consistently higher for daily than for single BCS evaluations, by 28 to 53%. This product is associated with the conceptual correlated response in fertility from BCS selection and was highest for early (d 10 to 75) evaluations.

  7. High genetic diversity of Vibrio cholerae in the European lake Neusiedler See is associated with intensive recombination in the reed habitat and the long-distance transfer of strains.

    PubMed

    Pretzer, Carina; Druzhinina, Irina S; Amaro, Carmen; Benediktsdóttir, Eva; Hedenström, Ingela; Hervio-Heath, Dominique; Huhulescu, Steliana; Schets, Franciska M; Farnleitner, Andreas H; Kirschner, Alexander K T

    2017-01-01

    Coastal marine Vibrio cholerae populations usually exhibit high genetic diversity. To assess the genetic diversity of abundant V. cholerae non-O1/non-O139 populations in the Central European lake Neusiedler See, we performed a phylogenetic analysis based on recA, toxR, gyrB and pyrH loci sequenced for 472 strains. The strains were isolated from three ecologically different habitats in a lake that is a hot-spot of migrating birds and an important bathing water. We also analyzed 76 environmental and human V. cholerae non-O1/non-O139 isolates from Austria and other European countries and added sequences of seven genome-sequenced strains. Phylogenetic analysis showed that the lake supports a unique endemic diversity of V. cholerae that is particularly rich in the reed stand. Phylogenetic trees revealed that many V. cholerae isolates from European countries were genetically related to the strains present in the lake belonging to statistically supported monophyletic clades. We hypothesize that the observed phenomena can be explained by the high degree of genetic recombination that is particularly intensive in the reed stand, acting along with the long distance transfer of strains most probably via birds and/or humans. Thus, the Neusiedler See may serve as a bioreactor for the appearance of new strains with new (pathogenic) properties. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

  8. Evidence of panmixia between sympatric life history forms of coastal cutthroat trout in two lower Columbia River tributaries

    USGS Publications Warehouse

    Johnson, Jeffrey R.; Baumsteiger, Jason; Zydlewski, Joseph D.; Hudson, J. Michael; Ardren, William R.

    2010-01-01

    Coastal cutthroat trout Oncorhynchus clarkii clarkii exhibit resident and migratory life history strategies that often occur sympatrically, but the relationship between these forms within a population is poorly characterized. Through use of passive integrated transponder technology, migratory and resident coastal cutthroat trout were identified in two lower Columbia River tributaries (Abernathy Creek and the Chinook River) separated by more than 80 km. Genetic data from 17 highly variable microsatellite loci were used to ascertain the genetic population structure of these life history forms within and between streams. No distinct genetic separation was observed between the life history forms within a stream, as assessed by four different statistical approaches: permutation tests based on the genetic differentiation index F ST, principal components analysis of individuals, analysis of molecular variance, and contingency tests of allele frequency heterogeneity. Genetic differences were an order of magnitude higher between stream samples (F ST > 0.03) than between life history forms within a stream (F ST < 0.003). The contingency test detected allele frequency differences between migratory and resident life history forms in Abernathy Creek (P = 0.001), but this result was influenced more by age-class structure than by reproductive isolation between life history forms. Results are consistent with a single, randomly mating population in each stream producing both migratory and resident life history forms. These data suggest that individual life history strategy in coastal cutthroat trout is predominantly determined by phenotypic plasticity rather than genotype.

  9. Mitochondrial genetic variations in natural house fly (Musca domestica L.) populations from the western and southern parts of Turkey.

    PubMed

    Doğaç, Ersin

    2016-09-01

    The house fly Musca domestica Linnaeus (Diptera) is one of the most studied species that is globally distributed and well known to everyone. In order to ensure baseline knowledge for the genetic resources of the species, genetic variation in M. domestica populations from western and southern parts of Turkey was investigated using nucleotide sequence analysis of 348 base pairs (bp) in the mitochondrial cytochrome oxidase subunit I gene (COI). Samples of 192 individuals were collected from 16 localities of Turkey. There were 10 variable sites defining two haplotypes of COI in this species. There was no difference in geographical distribution frequency between the two regions of Turkey. Overall, haplotype diversity (h) was low, ranging from 0 to 0.5606 with the average overall value of 0.178 ± 0.04 and nucleotide diversity (π), ranged from 0 to 0.0056 with the overall mean of 0.0016. Analysis of molecular variance (AMOVA) indicated that genetic differentiation within individuals and populations was low and significant (p < 0.05). Except Afyon population, conventional population statistic FST showed no significant genetic structure along the range of M. domestica populations. Sixteen populations clustered under six haplotypes and two of them are unique to Turkey. Haplotype networks suggested that house fly populations in Turkey are grouped with the Palearctic region, which is the most probable place for the origin of this species.

  10. Socioeconomic Status Is Not Related with Facial Fluctuating Asymmetry: Evidence from Latin-American Populations

    PubMed Central

    Quinto-Sánchez, Mirsha; Cintas, Celia; Silva de Cerqueira, Caio Cesar; Ramallo, Virginia; Acuña-Alonzo, Victor; Adhikari, Kaustubh; Castillo, Lucía; Gomez-Valdés, Jorge; Everardo, Paola; De Avila, Francisco; Hünemeier, Tábita; Jaramillo, Claudia; Arias, Williams; Fuentes, Macarena; Gallo, Carla; Poletti, Giovani; Schuler-Faccini, Lavinia; Bortolini, Maria Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Rosique, Javier; Ruiz-Linares, Andrés; González-José, Rolando

    2017-01-01

    The expression of facial asymmetries has been recurrently related with poverty and/or disadvantaged socioeconomic status. Departing from the developmental instability theory, previous approaches attempted to test the statistical relationship between the stress experienced by individuals grown in poor conditions and an increase in facial and corporal asymmetry. Here we aim to further evaluate such hypothesis on a large sample of admixed Latin Americans individuals by exploring if low socioeconomic status individuals tend to exhibit greater facial fluctuating asymmetry values. To do so, we implement Procrustes analysis of variance and Hierarchical Linear Modelling (HLM) to estimate potential associations between facial fluctuating asymmetry values and socioeconomic status. We report significant relationships between facial fluctuating asymmetry values and age, sex, and genetic ancestry, while socioeconomic status failed to exhibit any strong statistical relationship with facial asymmetry. These results are persistent after the effect of heterozygosity (a proxy for genetic ancestry) is controlled in the model. Our results indicate that, at least on the studied sample, there is no relationship between socioeconomic stress (as intended as low socioeconomic status) and facial asymmetries. PMID:28060876

  11. Using high-resolution variant frequencies to empower clinical genome interpretation.

    PubMed

    Whiffin, Nicola; Minikel, Eric; Walsh, Roddy; O'Donnell-Luria, Anne H; Karczewski, Konrad; Ing, Alexander Y; Barton, Paul J R; Funke, Birgit; Cook, Stuart A; MacArthur, Daniel; Ware, James S

    2017-10-01

    PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.

  12. Preliminary evidence for linkage to chromosome 1q31-32, 10q23.3, and 16p13.3 in a South African cohort with bipolar disorder.

    PubMed

    Savitz, Jonathan; Cupido, Cinda-Lee; Ramesar, Raj Kumar

    2007-04-05

    Although the genetic variants predisposing to the development of bipolar disorder (BPD) have yet to be conclusively identified, replicated reports of linkage to particular chromosomal regions have been encouraging. Here we carried out a non-parametric linkage analysis of nine of these candidate loci in a unique South African sample of 47 BPD pedigrees (N = 350). Three polymorphic markers per region of interest (3 x 9) were typed in a Caucasian cohort of Afrikaner and British origin. Statistically significant evidence for linkage was obtained at 1q31-32, 10q23.3, and 16p13.3 with maximum NPL scores of 2.52, 2.01, and 1.84, respectively. Our results add to the growing evidence that these chromosomal regions harbor genetic variants that play a role in the development of bipolar spectrum illness. Negative results were obtained for the remaining six candidate loci, possibly due to limited statistical power. (c) 2006 Wiley-Liss, Inc.

  13. Socioeconomic Status Is Not Related with Facial Fluctuating Asymmetry: Evidence from Latin-American Populations.

    PubMed

    Quinto-Sánchez, Mirsha; Cintas, Celia; Silva de Cerqueira, Caio Cesar; Ramallo, Virginia; Acuña-Alonzo, Victor; Adhikari, Kaustubh; Castillo, Lucía; Gomez-Valdés, Jorge; Everardo, Paola; De Avila, Francisco; Hünemeier, Tábita; Jaramillo, Claudia; Arias, Williams; Fuentes, Macarena; Gallo, Carla; Poletti, Giovani; Schuler-Faccini, Lavinia; Bortolini, Maria Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Rosique, Javier; Ruiz-Linares, Andrés; González-José, Rolando

    2017-01-01

    The expression of facial asymmetries has been recurrently related with poverty and/or disadvantaged socioeconomic status. Departing from the developmental instability theory, previous approaches attempted to test the statistical relationship between the stress experienced by individuals grown in poor conditions and an increase in facial and corporal asymmetry. Here we aim to further evaluate such hypothesis on a large sample of admixed Latin Americans individuals by exploring if low socioeconomic status individuals tend to exhibit greater facial fluctuating asymmetry values. To do so, we implement Procrustes analysis of variance and Hierarchical Linear Modelling (HLM) to estimate potential associations between facial fluctuating asymmetry values and socioeconomic status. We report significant relationships between facial fluctuating asymmetry values and age, sex, and genetic ancestry, while socioeconomic status failed to exhibit any strong statistical relationship with facial asymmetry. These results are persistent after the effect of heterozygosity (a proxy for genetic ancestry) is controlled in the model. Our results indicate that, at least on the studied sample, there is no relationship between socioeconomic stress (as intended as low socioeconomic status) and facial asymmetries.

  14. Evaluation of redundancy analysis to identify signatures of local adaptation.

    PubMed

    Capblancq, Thibaut; Luu, Keurcien; Blum, Michael G B; Bazin, Eric

    2018-05-26

    Ordination is a common tool in ecology that aims at representing complex biological information in a reduced space. In landscape genetics, ordination methods such as principal component analysis (PCA) have been used to detect adaptive variation based on genomic data. Taking advantage of environmental data in addition to genotype data, redundancy analysis (RDA) is another ordination approach that is useful to detect adaptive variation. This paper aims at proposing a test statistic based on RDA to search for loci under selection. We compare redundancy analysis to pcadapt, which is a nonconstrained ordination method, and to a latent factor mixed model (LFMM), which is a univariate genotype-environment association method. Individual-based simulations identify evolutionary scenarios where RDA genome scans have a greater statistical power than genome scans based on PCA. By constraining the analysis with environmental variables, RDA performs better than PCA in identifying adaptive variation when selection gradients are weakly correlated with population structure. Additionally, we show that if RDA and LFMM have a similar power to identify genetic markers associated with environmental variables, the RDA-based procedure has the advantage to identify the main selective gradients as a combination of environmental variables. To give a concrete illustration of RDA in population genomics, we apply this method to the detection of outliers and selective gradients on an SNP data set of Populus trichocarpa (Geraldes et al., 2013). The RDA-based approach identifies the main selective gradient contrasting southern and coastal populations to northern and continental populations in the northwestern American coast. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  15. Tightly Regulated Expression of Autographa californica Multicapsid Nucleopolyhedrovirus Immediate Early Genes Emerges from Their Interactions and Possible Collective Behaviors

    PubMed Central

    Taka, Hitomi; Asano, Shin-ichiro; Matsuura, Yoshiharu; Bando, Hisanori

    2015-01-01

    To infect their hosts, DNA viruses must successfully initiate the expression of viral genes that control subsequent viral gene expression and manipulate the host environment. Viral genes that are immediately expressed upon infection play critical roles in the early infection process. In this study, we investigated the expression and regulation of five canonical regulatory immediate-early (IE) genes of Autographa californica multicapsid nucleopolyhedrovirus: ie0, ie1, ie2, me53, and pe38. A systematic transient gene-expression analysis revealed that these IE genes are generally transactivators, suggesting the existence of a highly interactive regulatory network. A genetic analysis using gene knockout viruses demonstrated that the expression of these IE genes was tolerant to the single deletions of activator IE genes in the early stage of infection. A network graph analysis on the regulatory relationships observed in the transient expression analysis suggested that the robustness of IE gene expression is due to the organization of the IE gene regulatory network and how each IE gene is activated. However, some regulatory relationships detected by the genetic analysis were contradictory to those observed in the transient expression analysis, especially for IE0-mediated regulation. Statistical modeling, combined with genetic analysis using knockout alleles for ie0 and ie1, showed that the repressor function of ie0 was due to the interaction between ie0 and ie1, not ie0 itself. Taken together, these systematic approaches provided insight into the topology and nature of the IE gene regulatory network. PMID:25816136

  16. A possible genetic association with chronic fatigue in primary Sjögren's syndrome: a candidate gene study.

    PubMed

    Norheim, Katrine Brække; Le Hellard, Stephanie; Nordmark, Gunnel; Harboe, Erna; Gøransson, Lasse; Brun, Johan G; Wahren-Herlenius, Marie; Jonsson, Roland; Omdal, Roald

    2014-02-01

    Fatigue is prevalent and disabling in primary Sjögren's syndrome (pSS). Results from studies in chronic fatigue syndrome (CFS) indicate that genetic variation may influence fatigue. The aim of this study was to investigate single nucleotide polymorphism (SNP) variations in pSS patients with high and low fatigue. A panel of 85 SNPs in 12 genes was selected based on previous studies in CFS. A total of 207 pSS patients and 376 healthy controls were genotyped. One-hundred and ninety-three patients and 70 SNPs in 11 genes were available for analysis after quality control. Patients were dichotomized based on fatigue visual analogue scale (VAS) scores, with VAS <50 denominated "low fatigue" (n = 53) and VAS ≥50 denominated "high fatigue" (n = 140). We detected signals of association with pSS for one SNP in SLC25A40 (unadjusted p = 0.007) and two SNPs in PKN1 (both p = 0.03) in our pSS case versus control analysis. The association with SLC25A40 was stronger when only pSS high fatigue patients were analysed versus controls (p = 0.002). One SNP in PKN1 displayed an association in the case-only analysis of pSS high fatigue versus pSS low fatigue (p = 0.005). This candidate gene study in pSS did reveal a trend for associations between genetic variation in candidate genes and fatigue. The results will need to be replicated. More research on genetic associations with fatigue is warranted, and future trials should include larger cohorts and multicentre collaborations with sharing of genetic material to increase the statistical power.

  17. Molecular Diversity Analysis and Genetic Mapping of Pod Shatter Resistance Loci in Brassica carinata L.

    PubMed Central

    Raman, Rosy; Qiu, Yu; Coombes, Neil; Song, Jie; Kilian, Andrzej; Raman, Harsh

    2017-01-01

    Seed lost due to easy pod dehiscence at maturity (pod shatter) is a major problem in several members of Brassicaceae family. We investigated the level of pod shatter resistance in Ethiopian mustard (Brassica carinata) and identified quantitative trait loci (QTL) for targeted introgression of this trait in Ethiopian mustard and its close relatives of the genus Brassica. A set of 83 accessions of B. carinata, collected from the Australian Grains Genebank, was evaluated for pod shatter resistance based on pod rupture energy (RE). In comparison to B. napus (RE = 2.16 mJ), B. carinata accessions had higher RE values (2.53 to 20.82 mJ). A genetic linkage map of an F2 population from two contrasting B. carinata selections, BC73526 (shatter resistant with high RE) and BC73524 (shatter prone with low RE) comprising 300 individuals, was constructed using a set of 6,464 high quality DArTseq markers and subsequently used for QTL analysis. Genetic analysis of the F2 and F2:3 derived lines revealed five statistically significant QTL (LOD ≥ 3) that are linked with pod shatter resistance on chromosomes B1, B3, B8, and C5. Herein, we report for the first time, identification of genetic loci associated with pod shatter resistance in B. carinata. These characterized accessions would be useful in Brassica breeding programs for introgression of pod shatter resistance alleles in to elite breeding lines. Molecular markers would assist marker-assisted selection for tracing the introgression of resistant alleles. Our results suggest that the value of the germplasm collections can be harnessed through genetic and genomics tools. PMID:29250080

  18. Cancer type-dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types

    PubMed Central

    Park, Solip; Lehner, Ben

    2015-01-01

    Cancers, like many diseases, are normally caused by combinations of genetic alterations rather than by changes affecting single genes. It is well established that the genetic alterations that drive cancer often interact epistatically, having greater or weaker consequences in combination than expected from their individual effects. In a stringent statistical analysis of data from > 3,000 tumors, we find that the co-occurrence and mutual exclusivity relationships between cancer driver alterations change quite extensively in different types of cancer. This cannot be accounted for by variation in tumor heterogeneity or unrecognized cancer subtypes. Rather, it suggests that how genomic alterations interact cooperatively or partially redundantly to driver cancer changes in different types of cancers. This re-wiring of epistasis across cell types is likely to be a basic feature of genetic architecture, with important implications for understanding the evolution of multicellularity and human genetic diseases. In addition, if this plasticity of epistasis across cell types is also true for synthetic lethal interactions, a synthetic lethal strategy to kill cancer cells may frequently work in one type of cancer but prove ineffective in another. PMID:26227665

  19. Genetic Diversity in Nannotrigona testaceicornis (Hymenoptera: Apidae) Aggregations in Southeastern Brazil

    PubMed Central

    Fonseca, A. S.; Oliveira, E.J.F.; Freitas, G.S.; Assis, A.F.; Souza, C.C.M.; Contel, E.P.B.; Soares, A.E.E.

    2017-01-01

    The Meliponini, also known as stingless bees, are distributed in tropical and subtropical areas of the world and plays an essential role in pollinating many wild plants and crops These bees can build nests in cavities of trees or walls, underground or in associations with ants or termites; interestingly, these nests are sometimes found in aggregations. In order to assess the genetic diversity and structure in aggregates of Nannotrigona testaceicornis (Lepeletier), samples of this species were collected from six aggregations and genetically analyzed for eight specific microsatellite loci. We observed in this analysis that the mean genetic diversity value among aggregations was 0.354, and the mean expected and observed heterozygosity values was 0.414 and 0.283, respectively. The statistically significant Fis value indicated an observed heterozygosity lower than the expected heterozygosity in all loci studied resulting in high homozygosis level in these populations. In addition, the low number of private alleles observed reinforces the absence of structuring that is seen in the aggregates. These results can provide relevant information about genetic diversity in aggregations of N. testaceicornis and contribute to the management and conservation of these bees’ species that are critical for the pollination process. PMID:28130454

  20. Genetic Heterogeneity of Self-Reported Ancestry Groups in an Admixed Brazilian Population

    PubMed Central

    Lins, Tulio C; Vieira, Rodrigo G; Abreu, Breno S; Gentil, Paulo; Moreno-Lima, Ricardo; Oliveira, Ricardo J; Pereira, Rinaldo W

    2011-01-01

    Background Population stratification is the main source of spurious results and poor reproducibility in genetic association findings. Population heterogeneity can be controlled for by grouping individuals in ethnic clusters; however, in admixed populations, there is evidence that such proxies do not provide efficient stratification control. The aim of this study was to evaluate the relation of self-reported with genetic ancestry and the statistical risk of grouping an admixed sample based on self-reported ancestry. Methods A questionnaire that included an item on self-reported ancestry was completed by 189 female volunteers from an admixed Brazilian population. Individual genetic ancestry was then determined by genotyping ancestry informative markers. Results Self-reported ancestry was classified as white, intermediate, and black. The mean difference among self-reported groups was significant for European and African, but not Amerindian, genetic ancestry. Pairwise fixation index analysis revealed a significant difference among groups. However, the increase in the chance of type 1 error was estimated to be 14%. Conclusions Self-reporting of ancestry was not an appropriate methodology to cluster groups in a Brazilian population, due to high variance at the individual level. Ancestry informative markers are more useful for quantitative measurement of biological ancestry. PMID:21498954

Top