Sample records for assembly predicted snps

  1. SNP Discovery in the Transcriptome of White Pacific Shrimp Litopenaeus vannamei by Next Generation Sequencing

    PubMed Central

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047

  2. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    PubMed

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.

  3. Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

    PubMed

    Wei, Qiong; Xu, Qifang; Dunbrack, Roland L

    2013-02-01

    Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.

  4. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes.

    PubMed

    Ashrafi, Hamid; Hill, Theresa; Stoffel, Kevin; Kozik, Alexander; Yao, Jiqiang; Chin-Wo, Sebastian Reyes; Van Deynze, Allen

    2012-10-30

    Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.

  5. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    PubMed Central

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314

  6. Separating homeologs by phasing in the tetraploid wheat transcriptome.

    PubMed

    Krasileva, Ksenia V; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; Tabbita, Facundo; Soria, Marcelo; Wang, Shichen; Akhunov, Eduard; Uauy, Cristobal; Dubcovsky, Jorge

    2013-06-25

    The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.

  7. Separating homeologs by phasing in the tetraploid wheat transcriptome

    PubMed Central

    2013-01-01

    Background The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Conclusions Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. PMID:23800085

  8. A draft fur seal genome provides insights into factors affecting SNP validation and how to mitigate them.

    PubMed

    Humble, E; Martinez-Barrio, A; Forcada, J; Trathan, P N; Thorne, M A S; Hoffmann, M; Wolf, J B W; Hoffman, J I

    2016-07-01

    Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41 Gb; scaffold/contig N50 : 3.1 Mb/27.5 kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, reanalysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modelling. © 2015 John Wiley & Sons Ltd.

  9. Construction and Annotation of a High Density SNP Linkage Map of the Atlantic Salmon (Salmo salar) Genome.

    PubMed

    Tsai, Hsin Y; Robledo, Diego; Lowe, Natalie R; Bekaert, Michael; Taggart, John B; Bron, James E; Houston, Ross D

    2016-07-07

    High density linkage maps are useful tools for fine-scale mapping of quantitative trait loci, and characterization of the recombination landscape of a species' genome. Genomic resources for Atlantic salmon (Salmo salar) include a well-assembled reference genome, and high density single nucleotide polymorphism (SNP) arrays. Our aim was to create a high density linkage map, and to align it with the reference genome assembly. Over 96,000 SNPs were mapped and ordered on the 29 salmon linkage groups using a pedigreed population comprising 622 fish from 60 nuclear families, all genotyped with the 'ssalar01' high density SNP array. The number of SNPs per group showed a high positive correlation with physical chromosome length (r = 0.95). While the order of markers on the genetic and physical maps was generally consistent, areas of discrepancy were identified. Approximately 6.5% of the previously unmapped reference genome sequence was assigned to chromosomes using the linkage map. Male recombination rate was lower than females across the vast majority of the genome, but with a notable peak in subtelomeric regions. Finally, using RNA-Seq data to annotate the reference genome, the mapped SNPs were categorized according to their predicted function, including annotation of ∼2500 putative nonsynonymous variants. The highest density SNP linkage map for any salmonid species has been created, annotated, and integrated with the Atlantic salmon reference genome assembly. This map highlights the marked heterochiasmy of salmon, and provides a useful resource for salmonid genetics and genomics research. Copyright © 2016 Tsai et al.

  10. Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data.

    PubMed

    Seo, Dong-Won; Oh, Jae-Don; Jin, Shil; Song, Ki-Duk; Park, Hee-Bok; Heo, Kang-Nyeong; Shin, Younhee; Jung, Myunghee; Park, Junhyung; Jo, Cheorun; Lee, Hak-Kyo; Lee, Jun-Heon

    2015-02-01

    There are five native chicken lines in Korea, which are mainly classified by plumage colors (black, white, red, yellow, gray). These five lines are very important genetic resources in the Korean poultry industry. Based on a next generation sequencing technology, whole genome sequence and reference assemblies were performed using Gallus_gallus_4.0 (NCBI) with whole genome sequences from these lines to identify common and novel single nucleotide polymorphisms (SNPs). We obtained 36,660,731,136 ± 1,257,159,120 bp of raw sequence and average 26.6-fold of 25-29 billion reference assembly sequences representing 97.288 % coverage. Also, 4,006,068 ± 97,534 SNPs were observed from 29 autosomes and the Z chromosome and, of these, 752,309 SNPs are the common SNPs across lines. Among the identified SNPs, the number of novel- and known-location assigned SNPs was 1,047,951 ± 14,956 and 2,948,648 ± 81,414, respectively. The number of unassigned known SNPs was 1,181 ± 150 and unassigned novel SNPs was 8,238 ± 1,019. Synonymous SNPs, non-synonymous SNPs, and SNPs having character changes were 26,266 ± 1,456, 11,467 ± 604, 8,180 ± 458, respectively. Overall, 443,048 ± 26,389 SNPs in each bird were identified by comparing with dbSNP in NCBI. The presently obtained genome sequence and SNP information in Korean native chickens have wide applications for further genome studies such as genetic diversity studies to detect causative mutations for economic and disease related traits.

  11. SNP Identification from RNA Sequencing and Linkage Map Construction of Rubber Tree for Anchoring the Draft Genome

    PubMed Central

    Shearman, Jeremy R.; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2015-01-01

    Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly. PMID:25831195

  12. SNP identification from RNA sequencing and linkage map construction of rubber tree for anchoring the draft genome.

    PubMed

    Shearman, Jeremy R; Sangsrakru, Duangjai; Jomchai, Nukoon; Ruang-Areerate, Panthita; Sonthirod, Chutima; Naktang, Chaiwat; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2015-01-01

    Hevea brasiliensis, or rubber tree, is an important crop species that accounts for the majority of natural latex production. The rubber tree nuclear genome consists of 18 chromosomes and is roughly 2.15 Gb. The current rubber tree reference genome assembly consists of 1,150,326 scaffolds ranging from 200 to 531,465 bp and totalling 1.1 Gb. Only 143 scaffolds, totalling 7.6 Mb, have been placed into linkage groups. We have performed RNA-seq on 6 varieties of rubber tree to identify SNPs and InDels and used this information to perform target sequence enrichment and high throughput sequencing to genotype a set of SNPs in 149 rubber tree offspring from a cross between RRIM 600 and RRII 105 rubber tree varieties. We used this information to generate a linkage map allowing for the anchoring of 24,424 contigs from 3,009 scaffolds, totalling 115 Mb or 10.4% of the published sequence, into 18 linkage groups. Each linkage group contains between 319 and 1367 SNPs, or 60 to 194 non-redundant marker positions, and ranges from 156 to 336 cM in length. This linkage map includes 20,143 of the 69,300 predicted genes from rubber tree and will be useful for mapping studies and improving the reference genome assembly.

  13. A Primary Assembly of a Bovine Haplotype Block Map Based on a 15,036-Single-Nucleotide Polymorphism Panel Genotyped in Holstein–Friesian Cattle

    PubMed Central

    Khatkar, Mehar S.; Zenger, Kyall R.; Hobbs, Matthew; Hawken, Rachel J.; Cavanagh, Julie A. L.; Barris, Wes; McClintock, Alexander E.; McClintock, Sara; Thomson, Peter C.; Tier, Bruce; Nicholas, Frank W.; Raadsma, Herman W.

    2007-01-01

    Analysis of data on 1000 Holstein–Friesian bulls genotyped for 15,036 single-nucleotide polymorphisms (SNPs) has enabled genomewide identification of haplotype blocks and tag SNPs. A final subset of 9195 SNPs in Hardy–Weinberg equilibrium and mapped on autosomes on the bovine sequence assembly (release Btau 3.1) was used in this study. The average intermarker spacing was 251.8 kb. The average minor allele frequency (MAF) was 0.29 (0.05–0.5). Following recent precedents in human HapMap studies, a haplotype block was defined where 95% of combinations of SNPs within a region are in very high linkage disequilibrium. A total of 727 haplotype blocks consisting of ≥3 SNPs were identified. The average block length was 69.7 ± 7.7 kb, which is ∼5–10 times larger than in humans. These blocks comprised a total of 2964 SNPs and covered 50,638 kb of the sequence map, which constitutes 2.18% of the length of all autosomes. A set of tag SNPs, which will be useful for further fine-mapping studies, has been identified. Overall, the results suggest that as many as 75,000–100,000 tag SNPs would be needed to track all important haplotype blocks in the bovine genome. This would require ∼250,000 SNPs in the discovery phase. PMID:17435229

  14. Supramolecular Nanoparticles for Molecular Diagnostics and Therapeutics

    NASA Astrophysics Data System (ADS)

    Chen, Kuan-Ju

    Over the past decades, significant efforts have been devoted to explore the use of various nanoparticle-based systems in the field of nanomedicine, including molecular imaging and therapy. Supramolecular synthetic approaches have attracted lots of attention due to their flexibility, convenience, and modularity for producing nanoparticles. In this dissertation, the developmental story of our size-controllable supramolecular nanoparticles (SNPs) will be discussed, as well as their use in specific biomedical applications. To achieve the self-assembly of SNPs, the well-characterized molecular recognition system (i.e., cyclodextrin/adamantane recognition) was employed. The resulting SNPs, which were assembled from three molecular building blocks, possess incredible stability in various physiological conditions, reversible size-controllability and dynamic disassembly that were exploited for various in vitro and in vivo applications. An advantage of using the supramolecular approach is that it enables the convenient incorporation of functional ligands onto SNP surface that confers functionality ( e.g., targeting, cell penetration) to SNPs. We utilized SNPs for molecular imaging such as magnetic resonance imaging (MRI) and positron emission tomography (PET) by introducing reporter systems (i.e., radio-isotopes, MR contrast agents, and fluorophores) into SNPs. On the other hand, the incorporation of various payloads, including drugs, genes and proteins, into SNPs showed improved delivery performance and enhanced therapeutic efficacy for these therapeutic agents. Leveraging the powers of (i) a combinatorial synthetic approach based on supramolecular assembly and (ii) a digital microreactor, a rapid developmental pathway was developed that is capable of screening SNP candidates for the ideal structural and functional properties that deliver optimal performance. Moreover, SNP-based theranostic delivery systems that combine reporter systems and therapeutic payloads into a single SNP for both diagnosis and therapy were generated. The results show that this type of theranostic SNPs may have a great contribution in the optimization of therapeutic efficacy for individual patients in clinical translation in the near future. It is anticipated that our supramolecular synthetic approach could be adopted to assemble various SNP-based delivery agents for molecular diagnostics and therapeutics that pave the way toward personalized medicine.

  15. Choice of Reference Sequence and Assembler for Alignment of Listeria monocytogenes Short-Read Sequence Data Greatly Influences Rates of Error in SNP Analyses

    PubMed Central

    Pightling, Arthur W.; Petronella, Nicholas; Pagotto, Franco

    2014-01-01

    The wide availability of whole-genome sequencing (WGS) and an abundance of open-source software have made detection of single-nucleotide polymorphisms (SNPs) in bacterial genomes an increasingly accessible and effective tool for comparative analyses. Thus, ensuring that real nucleotide differences between genomes (i.e., true SNPs) are detected at high rates and that the influences of errors (such as false positive SNPs, ambiguously called sites, and gaps) are mitigated is of utmost importance. The choices researchers make regarding the generation and analysis of WGS data can greatly influence the accuracy of short-read sequence alignments and, therefore, the efficacy of such experiments. We studied the effects of some of these choices, including: i) depth of sequencing coverage, ii) choice of reference-guided short-read sequence assembler, iii) choice of reference genome, and iv) whether to perform read-quality filtering and trimming, on our ability to detect true SNPs and on the frequencies of errors. We performed benchmarking experiments, during which we assembled simulated and real Listeria monocytogenes strain 08-5578 short-read sequence datasets of varying quality with four commonly used assemblers (BWA, MOSAIK, Novoalign, and SMALT), using reference genomes of varying genetic distances, and with or without read pre-processing (i.e., quality filtering and trimming). We found that assemblies of at least 50-fold coverage provided the most accurate results. In addition, MOSAIK yielded the fewest errors when reads were aligned to a nearly identical reference genome, while using SMALT to align reads against a reference sequence that is ∼0.82% distant from 08-5578 at the nucleotide level resulted in the detection of the greatest numbers of true SNPs and the fewest errors. Finally, we show that whether read pre-processing improves SNP detection depends upon the choice of reference sequence and assembler. In total, this study demonstrates that researchers should test a variety of conditions to achieve optimal results. PMID:25144537

  16. Re-Assembly and Analysis of an Ancient Variola Virus Genome.

    PubMed

    Smithson, Chad; Imbery, Jacob; Upton, Chris

    2017-09-08

    We report a major improvement to the assembly of published short read sequencing data from an ancient variola virus (VARV) genome by the removal of contig-capping sequencing tags and manual searches for gap-spanning reads. The new assembly, together with camelpox and taterapox genomes, permitted new dates to be calculated for the last common ancestor of all VARV genomes. The analysis of recently sequenced VARV-like cowpox virus genomes showed that single nucleotide polymorphisms (SNPs) and amino acid changes in the vaccinia virus (VACV)-Cop-O1L ortholog, predicted to be associated with VARV host specificity and virulence, were introduced into the lineage before the divergence of these viruses. A comparison of the ancient and modern VARV genome sequences also revealed a measurable drift towards adenine + thymine (A + T) richness.

  17. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis

    PubMed Central

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-01-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1–8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. PMID:25762582

  18. Linear reduction method for predictive and informative tag SNP selection.

    PubMed

    He, Jingwu; Westbrooks, Kelly; Zelikovsky, Alexander

    2005-01-01

    Constructing a complete human haplotype map is helpful when associating complex diseases with their related SNPs. Unfortunately, the number of SNPs is very large and it is costly to sequence many individuals. Therefore, it is desirable to reduce the number of SNPs that should be sequenced to a small number of informative representatives called tag SNPs. In this paper, we propose a new linear algebra-based method for selecting and using tag SNPs. We measure the quality of our tag SNP selection algorithm by comparing actual SNPs with SNPs predicted from selected linearly independent tag SNPs. Our experiments show that for sufficiently long haplotypes, knowing only 0.4% of all SNPs the proposed linear reduction method predicts an unknown haplotype with the error rate below 2% based on 10% of the population.

  19. Facile control of silica nanoparticles using a novel solvent varying method for the fabrication of artificial opal photonic crystals

    NASA Astrophysics Data System (ADS)

    Gao, Weihong; Rigout, Muriel; Owens, Huw

    2016-12-01

    In this work, the Stöber process was applied to produce uniform silica nanoparticles (SNPs) in the meso-scale size range. The novel aspect of this work was to control the produced silica particle size by only varying the volume of the solvent ethanol used, whilst fixing the other reaction conditions. Using this one-step Stöber-based solvent varying (SV) method, seven batches of SNPs with target diameters ranging from 70 to 400 nm were repeatedly reproduced, and the size distribution in terms of the polydispersity index (PDI) was well maintained (within 0.1). An exponential equation was used to fit the relationship between the particle diameter and ethanol volume. This equation allows the prediction of the amount of ethanol required in order to produce particles of any target diameter within this size range. In addition, it was found that the reaction was completed in approximately 2 h for all batches regardless of the volume of ethanol. Structurally coloured artificial opal photonic crystals (PCs) were fabricated from the prepared SNPs by self-assembly under gravity sedimentation.

  20. Capturing haplotypes in germplasm core collections

    USDA-ARS?s Scientific Manuscript database

    Genomewide data sets of single nucleotide polymorphisms (SNPs) offer great potential to improve ex situ conservation. Two factors impede their use for producing core collections. First, due to the large number of SNPs, the assembly of collections that maximize diversity may be intractable using ex...

  1. Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content.

    PubMed

    Tran, Hue T M; Ramaraj, Thiruvarangan; Furtado, Agnelo; Lee, Leonard Slade; Henry, Robert J

    2018-03-07

    Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50 of 54 544 bp and a total scaffold length of 1448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome, and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99 829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1444 nonsynonymous single nucleotide polymorphisms (SNPs) associated with caffeine content. Based on Kyoto Encyclopaedia of Genes and Genomes pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates, which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  2. Size-controlled and redox-responsive supramolecular nanoparticles

    PubMed Central

    2015-01-01

    Summary Control over the assembly and disassembly of nanoparticles is pivotal for their use as drug delivery vehicles. Here, we aim to form supramolecular nanoparticles (SNPs) by combining advantages of the reversible assembly properties of SNPs using host–guest interactions and of a stimulus-responsive moiety. The SNPs are composed of a core of positively charged poly(ethylene imine) grafted with β-cyclodextrin (CD) and a positively charged ferrocene (Fc)-terminated poly(amidoamine) dendrimer, with a monovalent stabilizer at the surface. Fc was chosen for its loss of CD-binding properties when oxidizing it to the ferrocenium cation. The ionic strength was shown to play an important role in controlling the aggregate growth. The attractive supramolecular and repulsive electrostatic interactions constitute a balance of forces in this system at low ionic strengths. At higher ionic strengths, the increased charge screening led to a loss of electrostatic repulsion and therefore to faster aggregate growth. A Job plot showed that a 1:1 stoichiometry of host and guest moieties gave the most efficient aggregate growth. Different stabilizers were used to find the optimal stopper to limit the growth. A weaker guest moiety was shown to be less efficient in stabilizing the SNPs. Also steric repulsion is important for achieving SNP stability. SNPs of controlled particle size and good stability (up to seven days) were prepared by fine-tuning the ratio of multivalent and monovalent interactions. Finally, reversibility of the SNPs was confirmed by oxidizing the Fc guest moieties in the core of the SNPs. PMID:26733345

  3. Genome-wide SNP identification for the construction of a high-resolution genetic map of Japanese flounder (Paralichthys olivaceus): applications to QTL mapping of Vibrio anguillarum disease resistance and comparative genomic analysis.

    PubMed

    Shao, Changwei; Niu, Yongchao; Rastas, Pasi; Liu, Yang; Xie, Zhiyuan; Li, Hengde; Wang, Lei; Jiang, Yong; Tai, Shuaishuai; Tian, Yongsheng; Sakamoto, Takashi; Chen, Songlin

    2015-04-01

    High-resolution genetic maps are essential for fine mapping of complex traits, genome assembly, and comparative genomic analysis. Single-nucleotide polymorphisms (SNPs) are the primary molecular markers used for genetic map construction. In this study, we identified 13,362 SNPs evenly distributed across the Japanese flounder (Paralichthys olivaceus) genome. Of these SNPs, 12,712 high-confidence SNPs were subjected to high-throughput genotyping and assigned to 24 consensus linkage groups (LGs). The total length of the genetic linkage map was 3,497.29 cM with an average distance of 0.47 cM between loci, thereby representing the densest genetic map currently reported for Japanese flounder. Nine positive quantitative trait loci (QTLs) forming two main clusters for Vibrio anguillarum disease resistance were detected. All QTLs could explain 5.1-8.38% of the total phenotypic variation. Synteny analysis of the QTL regions on the genome assembly revealed 12 immune-related genes, among them 4 genes strongly associated with V. anguillarum disease resistance. In addition, 246 genome assembly scaffolds with an average size of 21.79 Mb were anchored onto the LGs; these scaffolds, comprising 522.99 Mb, represented 95.78% of assembled genomic sequences. The mapped assembly scaffolds in Japanese flounder were used for genome synteny analyses against zebrafish (Danio rerio) and medaka (Oryzias latipes). Flounder and medaka were found to possess almost one-to-one synteny, whereas flounder and zebrafish exhibited a multi-syntenic correspondence. The newly developed high-resolution genetic map, which will facilitate QTL mapping, scaffold assembly, and genome synteny analysis of Japanese flounder, marks a milestone in the ongoing genome project for this species. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  4. SNP-associations and phenotype predictions from hundreds of microbial genomes without genome alignments.

    PubMed

    Hall, Barry G

    2014-01-01

    SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP , a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ² probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.

  5. Discovery, Validation and Characterization of 1039 Cattle Single Nucleotide Polymorphisms

    USDA-ARS?s Scientific Manuscript database

    We identified approximately 13000 putative single nucleotide polymorphisms (SNPs) by comparison of repeat-masked BAC-end sequences from the cattle RPCI-42 BAC library with whole-genome shotgun contigs of cattle genome assembly Btau 1.0. Genotyping of a subset of these SNPs was performed on a panel ...

  6. Genetic polymorphisms to predict gains in maximal O2 uptake and knee peak torque after a high intensity training program in humans.

    PubMed

    Yoo, Jinho; Kim, Bo-Hyung; Kim, Soo-Hwan; Kim, Yangseok; Yim, Sung-Vin

    2016-05-01

    The study aimed to identify single nucleotide polymorphisms (SNPs) that significantly influenced the level of improvement of two kinds of training responses, including maximal O2 uptake (V'O2max) and knee peak torque of healthy adults participating in the high intensity training (HIT) program. The study also aimed to use these SNPs to develop prediction models for individual training responses. 79 Healthy volunteers participated in the HIT program. A genome-wide association study, based on 2,391,739 SNPs, was performed to identify SNPs that were significantly associated with gains in V'O2max and knee peak torque, following 9 weeks of the HIT program. To predict two training responses, two independent SNPs sets were determined using linear regression and iterative binary logistic regression analysis. False discovery rate analysis and permutation tests were performed to avoid false-positive findings. To predict gains in V'O2max, 7 SNPs were identified. These SNPs accounted for 26.0 % of the variance in the increment of V'O2max, and discriminated the subjects into three subgroups, non-responders, medium responders, and high responders, with prediction accuracy of 86.1 %. For the knee peak torque, 6 SNPs were identified, and accounted for 27.5 % of the variance in the increment of knee peak torque. The prediction accuracy discriminating the subjects into the three subgroups was estimated as 77.2 %. Novel SNPs found in this study could explain, and predict inter-individual variability in gains of V'O2max, and knee peak torque. Furthermore, with these genetic markers, a methodology suggested in this study provides a sound approach for the personalized training program.

  7. Prediction and analysis of three gene families related to leaf rust (Puccinia triticina) resistance in wheat (Triticum aestivum L.).

    PubMed

    Peng, Fred Y; Yang, Rong-Cai

    2017-06-20

    The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for adult resistance and six SNPs for seedling resistance in the NLR genes. Most of these coding SNPs were predicted to alter encoded amino acids and such information may serve as a starting point towards more thorough molecular and functional characterization of the designated Lr genes. Using the primer sequences of 99 known non-SNP markers from leaf rust resistance QTLs, we found candidate genes closely linked to these markers, including Lr34 with distances to its two gene-specific markers being 1212 bases (to cssfr1) and 2189 bases (to cssfr2). This study represents a comprehensive analysis of ABC, NLR and START genes in the hexaploid wheat genome and their physical relationships with QTLs for leaf rust resistance at seedling and adult stages. Our analysis suggests that the ABC (and START) genes are more likely to be co-located with QTLs for race-nonspecific, adult resistance whereas the NLR genes are more likely to be co-located with QTLs for race-specific resistance that would be often expressed at the seedling stage. Though our analysis was hampered by inaccurate or unknown physical positions of numerous QTLs due to the incomplete assembly of the complex hexaploid wheat genome that is currently available, the observed associations between (i) QTLs for race-specific resistance and NLR genes and (ii) QTLs for nonspecific resistance and ABC genes will help discover SNP variants for leaf rust resistance at seedling and adult stages. The genes containing nonsynonymous SNPs are promising candidates that can be investigated in future studies as potential new sources of leaf rust resistance in wheat breeding.

  8. The more from East-Asian, the better: risk prediction of colorectal cancer risk by GWAS-identified SNPs among Japanese.

    PubMed

    Abe, Makiko; Ito, Hidemi; Oze, Isao; Nomura, Masatoshi; Ogawa, Yoshihiro; Matsuo, Keitaro

    2017-12-01

    Little is known about the difference of genetic predisposition for CRC between ethnicities; however, many genetic traits common to colorectal cancer have been identified. This study investigated whether more SNPs identified in GWAS in East Asian population could improve the risk prediction of Japanese and explored possible application of genetic risk groups as an instrument of the risk communication. 558 Patients histologically verified colorectal cancer and 1116 first-visit outpatients were included for derivation study, and 547 cases and 547 controls were for replication study. Among each population, we evaluated prediction models for the risk of CRC that combined the genetic risk group based on SNPs from GWASs in European-population and a similarly developed model adding SNPs from GWASs in East Asian-population. We examined whether adding East Asian-specific SNPs would improve the discrimination. Six SNPs (rs6983267, rs4779584, rs4444235, rs9929218, rs10936599, rs16969681) from 23 SNPs by European-based GWAS and five SNPs (rs704017, rs11196172, rs10774214, rs647161, rs2423279) among ten SNPs by Asian-based GWAS were selected in CRC risk prediction model. Compared with a 6-SNP-based model, an 11-SNP model including Asian GWAS-SNPs showed improved discrimination capacity in Receiver operator characteristic analysis. A model with 11 SNPs resulted in statistically significant improvement in both derivation (P = 0.0039) and replication studies (P = 0.0018) compared with six SNP model. We estimated cumulative risk of CRC by using genetic risk group based on 11 SNPs and found that the cumulative risk at age 80 is approximately 13% in the high-risk group while 6% in the low-risk group. We constructed a more efficient CRC risk prediction model with 11 SNPs including newly identified East Asian-based GWAS SNPs (rs704017, rs11196172, rs10774214, rs647161, rs2423279). Risk grouping based on 11 SNPs depicted lifetime difference of CRC risk. This might be useful for effective individualized prevention for East Asian.

  9. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.

    PubMed

    Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner

    2017-01-16

    With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.

  10. IL1RN Variation Influences both Disease Susceptibility and Response to Human Recombinant IL-1RA Therapy in Systemic Juvenile Idiopathic Arthritis.

    PubMed

    Arthur, Victoria L; Shuldiner, Emily; Remmers, Elaine F; Hinks, Anne; Grom, Alexei A; Foell, Dirk; Martini, Alberto; Gattorno, Marco; Özen, Seza; Prahalad, Sampath; Zeft, Andrew S; Bohnsack, John F; Ilowite, Norman T; Mellins, Elizabeth D; Russo, Ricardo; Len, Claudio; Oliveira, Sheila; Yeung, Rae S M; Rosenberg, Alan M; Wedderburn, Lucy R; Anton, Jordi; Haas, Johannes-Peter; Rösen-Wolff, Angela; Minden, Kirsten; Szymanski, Ann Marie; Thomson, Wendy; Kastner, Daniel L; Woo, Patricia; Ombrello, Michael J

    2018-04-02

    To determine whether systemic juvenile idiopathic arthritis (sJIA) susceptibility loci identified by candidate gene studies demonstrated association with sJIA in the largest study population assembled to date. Single nucleotide polymorphisms (SNPs) from 11 previously reported sJIA risk loci were examined for association in 9 populations, including 770 sJIA cases and 6947 control subjects. The effect of sJIA-associated SNPs on gene expression was evaluated in silico in paired whole genome and RNA sequencing data from lymphoblastoid cell lines (LCL) of 373 European 1000 Genomes Project subjects. The relationship between sJIA-associated SNPs and response to anakinra treatment was evaluated in 38 US patients for whom treatment response data were available. We found no association of the 26 SNPs previously reported as sJIA-associated. Expanded analysis of the regions containing the 26 SNPs revealed only one significant association, the promoter region of IL1RN (p<1E-4). sJIA-associated SNPs correlated with IL1RN expression in LCLs, with an inverse correlation between sJIA risk and IL1RN expression. The presence of homozygous IL1RN high expression alleles correlated strongly with non-response to anakinra therapy (OR 28.7 [3.2, 255.8]). IL1RN was the only candidate locus associated with sJIA in our study. The implicated SNPs are among the strongest known determinants of IL1RN and IL1RA levels, linking low expression with increased sJIA risk. Homozygous high expression alleles predicted non-response to anakinra therapy, nominating them as candidate biomarkers to guide sJIA treatment. This is an important first step towards the personalized treatment of sJIA. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  11. The structural coloration of textile materials using self-assembled silica nanoparticles

    NASA Astrophysics Data System (ADS)

    Gao, Weihong; Rigout, Muriel; Owens, Huw

    2017-09-01

    The work presented investigates how to produce structural colours on textile materials by applying a surface coating of silica nanoparticles (SNPs). Uniform SNPs with particle diameters in a controlled micron size range (207-350 nm) were synthesized using a Stöber-based solvent varying (SV) method which has been reported previously. Photonic crystals (PCs) were formed on the surface of a piece of textile fabric through a process of natural sedimentation self-assembly of the colloidal suspension containing uniform SNPs. Due to the uniformity and a particular diameter range of the prepared SNPs, structural colours were observed from the fabric surface due to the Bragg diffraction of white light with the ordered structure of the silica PCs. By varying the mean particle diameter, a wide range of spectral colours from red to blue were obtained. The comparison of structural colours on fabrics and on glasses suggests that a smooth substrate is critical when producing materials with high colour intensity and spatial uniformity. This work suggested a promising approach to colour textile materials without the need for traditional dyes and/or pigments. [Figure not available: see fulltext.

  12. The structural coloration of textile materials using self-assembled silica nanoparticles.

    PubMed

    Gao, Weihong; Rigout, Muriel; Owens, Huw

    2017-01-01

    The work presented investigates how to produce structural colours on textile materials by applying a surface coating of silica nanoparticles (SNPs). Uniform SNPs with particle diameters in a controlled micron size range (207-350 nm) were synthesized using a Stöber-based solvent varying (SV) method which has been reported previously. Photonic crystals (PCs) were formed on the surface of a piece of textile fabric through a process of natural sedimentation self-assembly of the colloidal suspension containing uniform SNPs. Due to the uniformity and a particular diameter range of the prepared SNPs, structural colours were observed from the fabric surface due to the Bragg diffraction of white light with the ordered structure of the silica PCs. By varying the mean particle diameter, a wide range of spectral colours from red to blue were obtained. The comparison of structural colours on fabrics and on glasses suggests that a smooth substrate is critical when producing materials with high colour intensity and spatial uniformity. This work suggested a promising approach to colour textile materials without the need for traditional dyes and/or pigments. Graphical abstract.

  13. Integrating Milk Metabolite Profile Information for the Prediction of Traditional Milk Traits Based on SNP Information for Holstein Cows

    PubMed Central

    Melzer, Nina; Wittenburg, Dörte; Repsilber, Dirk

    2013-01-01

    In this study the benefit of metabolome level analysis for the prediction of genetic value of three traditional milk traits was investigated. Our proposed approach consists of three steps: First, milk metabolite profiles are used to predict three traditional milk traits of 1,305 Holstein cows. Two regression methods, both enabling variable selection, are applied to identify important milk metabolites in this step. Second, the prediction of these important milk metabolite from single nucleotide polymorphisms (SNPs) enables the detection of SNPs with significant genetic effects. Finally, these SNPs are used to predict milk traits. The observed precision of predicted genetic values was compared to the results observed for the classical genotype-phenotype prediction using all SNPs or a reduced SNP subset (reduced classical approach). To enable a comparison between SNP subsets, a special invariable evaluation design was implemented. SNPs close to or within known quantitative trait loci (QTL) were determined. This enabled us to determine if detected important SNP subsets were enriched in these regions. The results show that our approach can lead to genetic value prediction, but requires less than 1% of the total amount of (40,317) SNPs., significantly more important SNPs in known QTL regions were detected using our approach compared to the reduced classical approach. Concluding, our approach allows a deeper insight into the associations between the different levels of the genotype-phenotype map (genotype-metabolome, metabolome-phenotype, genotype-phenotype). PMID:23990900

  14. SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock.

    PubMed

    Nicolazzi, Ezequiel Luis; Picciolini, Matteo; Strozzi, Francesco; Schnabel, Robert David; Lawley, Cindy; Pirani, Ali; Brew, Fiona; Stella, Alessandra

    2014-02-11

    Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.

  15. Comparison of three assembly strategies for a heterozygous seedless grapevine genome assembly.

    PubMed

    Patel, Sagar; Lu, Zhixiu; Jin, Xiaozhu; Swaminathan, Padmapriya; Zeng, Erliang; Fennell, Anne Y

    2018-01-17

    De novo heterozygous assembly is an ongoing challenge requiring improved assembly approaches. In this study, three strategies were used to develop de novo Vitis vinifera 'Sultanina' genome assemblies for comparison with the inbred V. vinifera (PN40024 12X.v2) reference genome and a published Sultanina ALLPATHS-LG assembly (AP). The strategies were: 1) a default PLATANUS assembly (PLAT_d) for direct comparison with AP assembly, 2) an iterative merging strategy using METASSEMBLER to combine PLAT_d and AP assemblies (MERGE) and 3) PLATANUS parameter modifications plus GapCloser (PLAT*_GC). The three new assemblies were greater in size than the AP assembly. PLAT*_GC had the greatest number of scaffolds aligning with a minimum of 95% identity and ≥1000 bp alignment length to V. vinifera (PN40024 12X.v2) reference genome. SNP analysis also identified additional high quality SNPs. A greater number of sequence reads mapped back with zero-mismatch to the PLAT_d, MERGE, and PLAT*_GC (>94%) than was found in the AP assembly (87%) indicating a greater fidelity to the original sequence data in the new assemblies than in AP assembly. A de novo gene prediction conducted using seedless RNA-seq data predicted > 30,000 coding sequences for the three new de novo assemblies, with the greatest number (30,544) in PLAT*_GC and only 26,515 for the AP assembly. Transcription factor analysis indicated good family coverage, but some genes found in the VCOST.v3 annotation were not identified in any of the de novo assemblies, particularly some from  the MYB and ERF families. The PLAT_d and PLAT*_GC had a greater number of synteny blocks with the V. vinifera (PN40024 12X.v2) reference genome than AP or MERGE. PLAT*_GC provided the most contiguous assembly with only 1.2% scaffold N, in contrast to AP (10.7% N), PLAT_d (6.6% N) and Merge (6.4% N). A PLAT*_GC pseudo-chromosome assembly with chromosome alignment to the reference genome V. vinifera, (PN40024 12X.v2) provides new information for use in seedless grape genetic mapping studies. An annotated de novo gene prediction for the PLAT*_GC assembly, aligned with VitisNet pathways provides new seedless grapevine specific transcriptomic resource that has excellent fidelity with the seedless short read sequence data.

  16. snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

    PubMed

    Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

    2012-01-01

    The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.

  17. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Song, Qijian; Jia, Gaofeng; Hyten, David L.

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of largemore » scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad.« less

  18. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean

    DOE PAGES

    Song, Qijian; Jia, Gaofeng; Hyten, David L.; ...

    2015-08-28

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of largemore » scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad.« less

  19. SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean.

    PubMed

    Song, Qijian; Jia, Gaofeng; Hyten, David L; Jenkins, Jerry; Hwang, Eun-Young; Schroeder, Steven G; Osorno, Juan M; Schmutz, Jeremy; Jackson, Scott A; McClean, Phillip E; Cregan, Perry B

    2015-08-28

    A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad. Copyright © 2015 Song et al.

  20. In silico identification of genetic variants in glucocerebrosidase (GBA) gene involved in Gaucher's disease using multiple software tools.

    PubMed

    Manickam, Madhumathi; Ravanan, Palaniyandi; Singh, Pratibha; Talwar, Priti

    2014-01-01

    Gaucher's disease (GD) is an autosomal recessive disorder caused by the deficiency of glucocerebrosidase, a lysosomal enzyme that catalyses the hydrolysis of the glycolipid glucocerebroside to ceramide and glucose. Polymorphisms in GBA gene have been associated with the development of Gaucher disease. We hypothesize that prediction of SNPs using multiple state of the art software tools will help in increasing the confidence in identification of SNPs involved in GD. Enzyme replacement therapy is the only option for GD. Our goal is to use several state of art SNP algorithms to predict/address harmful SNPs using comparative studies. In this study seven different algorithms (SIFT, MutPred, nsSNP Analyzer, PANTHER, PMUT, PROVEAN, and SNPs&GO) were used to predict the harmful polymorphisms. Among the seven programs, SIFT found 47 nsSNPs as deleterious, MutPred found 46 nsSNPs as harmful. nsSNP Analyzer program found 43 out of 47 nsSNPs are disease causing SNPs whereas PANTHER found 32 out of 47 as highly deleterious, 22 out of 47 are classified as pathological mutations by PMUT, 44 out of 47 were predicted to be deleterious by PROVEAN server, all 47 shows the disease related mutations by SNPs&GO. Twenty two nsSNPs were commonly predicted by all the seven different algorithms. The common 22 targeted mutations are F251L, C342G, W312C, P415R, R463C, D127V, A309V, G46E, G202E, P391L, Y363C, Y205C, W378C, I402T, S366R, F397S, Y418C, P401L, G195E, W184R, R48W, and T43R.

  1. Explaining the disease phenotype of intergenic SNP through predicted long range regulation

    PubMed Central

    Chen, Jingqi; Tian, Weidong

    2016-01-01

    Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. PMID:27280978

  2. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.

    PubMed

    Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D

    2013-02-28

    Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.

  3. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

    PubMed Central

    2013-01-01

    Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change. PMID:23445355

  4. Significant SNPs have limited prediction ability for thyroid cancer

    PubMed Central

    Guo, Shicheng; Wang, Yu-Long; Li, Yi; Jin, Li; Xiong, Momiao; Ji, Qing-Hai; Wang, Jiucun

    2014-01-01

    Recently, five thyroid cancer significantly associated genetic variants (rs965513, rs944289, rs116909374, rs966423, and rs2439302) have been discovered and validated in two independent GWAS and numerous case–control studies, which were conducted in different populations. We genotyped the above five single nucleotide polymorphisms (SNPs) in Han Chinese populations and performed thyroid cancer-risk predictions with nine machine learning methods. We found that four SNPs were significantly associated with thyroid cancer in Han Chinese population, while no polymorphism was observed for rs116909374. Small familial relative risks (1.02–1.05) and limited power to predict thyroid cancer (AUCs: 0.54–0.60) indicate limited clinical potential. Four significant SNPs have limited prediction ability for thyroid cancer. PMID:24591304

  5. Explaining the disease phenotype of intergenic SNP through predicted long range regulation.

    PubMed

    Chen, Jingqi; Tian, Weidong

    2016-10-14

    Thousands of disease-associated SNPs (daSNPs) are located in intergenic regions (IGR), making it difficult to understand their association with disease phenotypes. Recent analysis found that non-coding daSNPs were frequently located in or approximate to regulatory elements, inspiring us to try to explain the disease phenotypes of IGR daSNPs through nearby regulatory sequences. Hence, after locating the nearest distal regulatory element (DRE) to a given IGR daSNP, we applied a computational method named INTREPID to predict the target genes regulated by the DRE, and then investigated their functional relevance to the IGR daSNP's disease phenotypes. 36.8% of all IGR daSNP-disease phenotype associations investigated were possibly explainable through the predicted target genes, which were enriched with, were functionally relevant to, or consisted of the corresponding disease genes. This proportion could be further increased to 60.5% if the LD SNPs of daSNPs were also considered. Furthermore, the predicted SNP-target gene pairs were enriched with known eQTL/mQTL SNP-gene relationships. Overall, it's likely that IGR daSNPs may contribute to disease phenotypes by interfering with the regulatory function of their nearby DREs and causing abnormal expression of disease genes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Study on the introgression of beef breeds in Canchim cattle using single nucleotide polymorphism markers

    PubMed Central

    Buzanskas, Marcos Eli; Ventura, Ricardo Vieira; Seleguim Chud, Tatiane Cristina; Bernardes, Priscila Arrigucci; Santos, Daniel Jordan de Abreu; Regitano, Luciana Correia de Almeida; de Alencar, Maurício Mello; Mudadu, Maurício de Alvarenga; Zanella, Ricardo; da Silva, Marcos Vinícius Gualberto Barbosa; Li, Changxi; Schenkel, Flavio Schramm; Munari, Danísio Prado

    2017-01-01

    The aim of this study was to evaluate the level of introgression of breeds in the Canchim (CA: 62.5% Charolais—37.5% Zebu) and MA genetic group (MA: 65.6% Charolais—34.4% Zebu) cattle using genomic information on Charolais (CH), Nelore (NE), and Indubrasil (IB) breeds. The number of animals used was 395 (CA and MA), 763 (NE), 338 (CH), and 37 (IB). The Bovine50SNP BeadChip from Illumina panel was used to estimate the levels of introgression of breeds considering the Maximum likelihood, Bayesian, and Single Regression method. After genotype quality control, 32,308 SNPs were considered in the analysis. Furthermore, three thresholds to prune out SNPs in linkage disequilibrium higher than 0.10, 0.05, and 0.01 were considered, resulting in 15,286, 7,652, and 1,582 SNPs, respectively. For k = 2, the proportion of taurine and indicine varied from the expected proportion based on pedigree for all methods studied. For k = 3, the Regression method was able to differentiate the animals in three main clusters assigned to each purebred breed, showing more reasonable according to its biological viewpoint. Analyzing the data considering k = 2 seems to be more appropriate for Canchim-MA animals due to its biological interpretation. The usage of 32,308 SNPs in the analyses resulted in similar findings between the estimated and expected breed proportions. Using the Regression approach, a contribution of Indubrasil was observed in Canchim-MA when k = 3 was considered. Genetic parameter estimation could account for this breed composition information as a source of variation in order to improve the accuracy of genetic models. Our findings may help assemble appropriate reference populations for genomic prediction for Canchim-MA in order to improve prediction accuracy. Using the information on the level of introgression in each individual could also be useful in breeding or crossing design to improve individual heterosis in crossbred cattle. PMID:28182737

  7. Study on the introgression of beef breeds in Canchim cattle using single nucleotide polymorphism markers.

    PubMed

    Buzanskas, Marcos Eli; Ventura, Ricardo Vieira; Seleguim Chud, Tatiane Cristina; Bernardes, Priscila Arrigucci; Santos, Daniel Jordan de Abreu; Regitano, Luciana Correia de Almeida; Alencar, Maurício Mello de; Mudadu, Maurício de Alvarenga; Zanella, Ricardo; da Silva, Marcos Vinícius Gualberto Barbosa; Li, Changxi; Schenkel, Flavio Schramm; Munari, Danísio Prado

    2017-01-01

    The aim of this study was to evaluate the level of introgression of breeds in the Canchim (CA: 62.5% Charolais-37.5% Zebu) and MA genetic group (MA: 65.6% Charolais-34.4% Zebu) cattle using genomic information on Charolais (CH), Nelore (NE), and Indubrasil (IB) breeds. The number of animals used was 395 (CA and MA), 763 (NE), 338 (CH), and 37 (IB). The Bovine50SNP BeadChip from Illumina panel was used to estimate the levels of introgression of breeds considering the Maximum likelihood, Bayesian, and Single Regression method. After genotype quality control, 32,308 SNPs were considered in the analysis. Furthermore, three thresholds to prune out SNPs in linkage disequilibrium higher than 0.10, 0.05, and 0.01 were considered, resulting in 15,286, 7,652, and 1,582 SNPs, respectively. For k = 2, the proportion of taurine and indicine varied from the expected proportion based on pedigree for all methods studied. For k = 3, the Regression method was able to differentiate the animals in three main clusters assigned to each purebred breed, showing more reasonable according to its biological viewpoint. Analyzing the data considering k = 2 seems to be more appropriate for Canchim-MA animals due to its biological interpretation. The usage of 32,308 SNPs in the analyses resulted in similar findings between the estimated and expected breed proportions. Using the Regression approach, a contribution of Indubrasil was observed in Canchim-MA when k = 3 was considered. Genetic parameter estimation could account for this breed composition information as a source of variation in order to improve the accuracy of genetic models. Our findings may help assemble appropriate reference populations for genomic prediction for Canchim-MA in order to improve prediction accuracy. Using the information on the level of introgression in each individual could also be useful in breeding or crossing design to improve individual heterosis in crossbred cattle.

  8. Genomic prediction of piglet response to infection with one of two porcine reproductive and respiratory syndrome virus isolates.

    PubMed

    Waide, Emily H; Tuggle, Christopher K; Serão, Nick V L; Schroyen, Martine; Hess, Andrew; Rowland, Raymond R R; Lunney, Joan K; Plastow, Graham; Dekkers, Jack C M

    2018-02-01

    Genomic prediction of the pig's response to the porcine reproductive and respiratory syndrome (PRRS) virus (PRRSV) would be a useful tool in the swine industry. This study investigated the accuracy of genomic prediction based on porcine SNP60 Beadchip data using training and validation datasets from populations with different genetic backgrounds that were challenged with different PRRSV isolates. Genomic prediction accuracy averaged 0.34 for viral load (VL) and 0.23 for weight gain (WG) following experimental PRRSV challenge, which demonstrates that genomic selection could be used to improve response to PRRSV infection. Training on WG data during infection with a less virulent PRRSV, KS06, resulted in poor accuracy of prediction for WG during infection with a more virulent PRRSV, NVSL. Inclusion of single nucleotide polymorphisms (SNPs) that are in linkage disequilibrium with a major quantitative trait locus (QTL) on chromosome 4 was vital for accurate prediction of VL. Overall, SNPs that were significantly associated with either trait in single SNP genome-wide association analysis were unable to predict the phenotypes with an accuracy as high as that obtained by using all genotyped SNPs across the genome. Inclusion of data from close relatives into the training population increased whole genome prediction accuracy by 33% for VL and by 37% for WG but did not affect the accuracy of prediction when using only SNPs in the major QTL region. Results show that genomic prediction of response to PRRSV infection is moderately accurate and, when using all SNPs on the porcine SNP60 Beadchip, is not very sensitive to differences in virulence of the PRRSV in training and validation populations. Including close relatives in the training population increased prediction accuracy when using the whole genome or SNPs other than those near a major QTL.

  9. GESPA: classifying nsSNPs to predict disease association.

    PubMed

    Khurana, Jay K; Reeder, Jay E; Shrimpton, Antony E; Thakar, Juilee

    2015-07-25

    Non-synonymous single nucleotide polymorphisms (nsSNPs) are the most common DNA sequence variation associated with disease in humans. Thus determining the clinical significance of each nsSNP is of great importance. Potential detrimental nsSNPs may be identified by genetic association studies or by functional analysis in the laboratory, both of which are expensive and time consuming. Existing computational methods lack accuracy and features to facilitate nsSNP classification for clinical use. We developed the GESPA (GEnomic Single nucleotide Polymorphism Analyzer) program to predict the pathogenicity and disease phenotype of nsSNPs. GESPA is a user-friendly software package for classifying disease association of nsSNPs. It allows flexibility in acceptable input formats and predicts the pathogenicity of a given nsSNP by assessing the conservation of amino acids in orthologs and paralogs and supplementing this information with data from medical literature. The development and testing of GESPA was performed using the humsavar, ClinVar and humvar datasets. Additionally, GESPA also predicts the disease phenotype associated with a nsSNP with high accuracy, a feature unavailable in existing software. GESPA's overall accuracy exceeds existing computational methods for predicting nsSNP pathogenicity. The usability of GESPA is enhanced by fast SQL-based cloud storage and retrieval of data. GESPA is a novel bioinformatics tool to determine the pathogenicity and phenotypes of nsSNPs. We anticipate that GESPA will become a useful clinical framework for predicting the disease association of nsSNPs. The program, executable jar file, source code, GPL 3.0 license, user guide, and test data with instructions are available at http://sourceforge.net/projects/gespa.

  10. Single Nucleotide Polymorphisms of Stemness Genes Predicted to Regulate RNA Splicing, microRNA and Oncogenic Signaling are Associated with Prostate Cancer Survival.

    PubMed

    Freedman, Jennifer A; Wang, Yanru; Li, Xuechan; Liu, Hongliang; Moorman, Patricia G; George, Daniel J; Lee, Norman H; Hyslop, Terry; Wei, Qingyi; Patierno, Steven R

    2018-05-03

    Prostate cancer is a clinically and molecularly heterogeneous disease, with variation in outcomes only partially predicted by grade and stage. Additional tools to distinguish indolent from aggressive disease are needed. Phenotypic characteristics of stemness correlate with poor cancer prognosis. Given this correlation, we identified single nucleotide polymorphisms (SNPs) of stemness-related genes and examined their associations with prostate cancer survival. SNPs within stemness-related genes were analyzed for association with overall survival of prostate cancer in the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial. Significant SNPs predicted to be functional were selected for linkage disequilibrium analysis and combined and stratified analyses. Identified SNPs were evaluated for association with gene expression. SNPs of CD44 (rs9666607), ABCC1 (rs35605 and rs212091) and GDF15 (rs1058587) were associated with prostate cancer survival and predicted to be functional. A role for rs9666607 of CD44 and rs35605 of ABCC1 in RNA splicing regulation, rs212091 of ABCC1 in miRNA binding site activity and rs1058587 of GDF15 in causing an amino acid change was predicted. These SNPs represent potential novel prognostic markers for overall survival of prostate cancer and support a contribution of the stemness pathway to prostate cancer patient outcome.

  11. Development of genetic markers in abalone through construction of a SNP database.

    PubMed

    Kang, J-H; Appleyard, S A; Elliott, N G; Jee, Y-J; Lee, J B; Kang, S W; Baek, M K; Han, Y S; Choi, T-J; Lee, Y S

    2011-06-01

    In the absence of a reference genome, single-nucleotide polymorphisms (SNP) discovery in a group of abalone species was undertaken by random sequence assembly. A web-based interface was constructed, and 11 932 DNA sequences from the genus Haliotis were assembled, with 1321 contigs built. Of these, 118 contigs that consisted of at least ten annotation groups were selected. The 1577 putative SNPs were identified from the 118 contigs, with SNPs in several HSP70 gene contigs confirmed by PCR amplification of an 809-bp DNA fragment. SNPs in the HSP70 gene were compared across eight abalone species. A total of 129 polymorphic sites, including heterozygote sites within and among species, were observed. Phylogenetic analysis of the partial HSP70 gene region showed separation of the tested abalone into two groups, one reflecting the southern hemisphere species and the other the northern hemisphere species. Interestingly, Haliotis iris from New Zealand showed a closer relationship to species distributed in the northern Pacific region. Although HSP genes are known to be highly conserved among taxa, the validation of polymorphic SNPs from HSP70 in this mollusc demonstrates the applicability of cross-species SNP markers in abalone and the first step towards universal nuclear markers in Haliotis. © 2010 NFRDI, Animal Genetics © 2010 Stichting International Foundation for Animal Genetics.

  12. SNPServer: a real-time SNP discovery tool.

    PubMed

    Savage, David; Batley, Jacqueline; Erwin, Tim; Logan, Erica; Love, Christopher G; Lim, Geraldine A C; Mongin, Emmanuel; Barker, Gary; Spangenberg, German C; Edwards, David

    2005-07-01

    SNPServer is a real-time flexible tool for the discovery of SNPs (single nucleotide polymorphisms) within DNA sequence data. The program uses BLAST, to identify related sequences, and CAP3, to cluster and align these sequences. The alignments are parsed to the SNP discovery software autoSNP, a program that detects SNPs and insertion/deletion polymorphisms (indels). Alternatively, lists of related sequences or pre-assembled sequences may be entered for SNP discovery. SNPServer and autoSNP use redundancy to differentiate between candidate SNPs and sequence errors. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co-segregation of the candidate SNP with other SNPs in the alignment. SNPServer is available at http://hornbill.cspp.latrobe.edu.au/snpdiscovery.html.

  13. SEAN: SNP prediction and display program utilizing EST sequence clusters.

    PubMed

    Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

    2006-02-15

    SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.

  14. Genetic prediction of type 2 diabetes using deep neural network.

    PubMed

    Kim, J; Kim, J; Kwak, M J; Bajaj, M

    2018-04-01

    Type 2 diabetes (T2DM) has strong heritability but genetic models to explain heritability have been challenging. We tested deep neural network (DNN) to predict T2DM using the nested case-control study of Nurses' Health Study (3326 females, 45.6% T2DM) and Health Professionals Follow-up Study (2502 males, 46.5% T2DM). We selected 96, 214, 399, and 678 single-nucleotide polymorphism (SNPs) through Fisher's exact test and L1-penalized logistic regression. We split each dataset randomly in 4:1 to train prediction models and test their performance. DNN and logistic regressions showed better area under the curve (AUC) of ROC curves than the clinical model when 399 or more SNPs included. DNN was superior than logistic regressions in AUC with 399 or more SNPs in male and 678 SNPs in female. Addition of clinical factors consistently increased AUC of DNN but failed to improve logistic regressions with 214 or more SNPs. In conclusion, we show that DNN can be a versatile tool to predict T2DM incorporating large numbers of SNPs and clinical information. Limitations include a relatively small number of the subjects mostly of European ethnicity. Further studies are warranted to confirm and improve performance of genetic prediction models using DNN in different ethnic groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  15. The versican gene and the risk of intracranial aneurysms.

    PubMed

    Ruigrok, Ynte M; Rinkel, Gabriël J E; Wijmenga, Cisca

    2006-09-01

    The proteoglycan versican is an excellent candidate gene for intracranial aneurysms (IAs) because it plays an important role in extracellular matrix assembly and is localized in a previously implicated locus for IAs on chromosome 5q. We analyzed all the common variations using 16-tag single nucleotide polymorphisms (SNPs) and haplotypes in the versican gene using a 2-stage genotyping approach. For stage 1, 16 SNPs were genotyped in 307 cases and 639 controls. For stage 2, the two SNPs yielding the most significant associations (P<0.01) were genotyped in a second independent cohort of 310 cases for confirmation of the associations. In stage 1, we found several SNPs in strong linkage disequilibrium and haplotypes constituting these SNPs associated with IAs in the Dutch population (strongest SNP association for rs173686 with odds ratio=1.34, 95% CI=1.09 to 1.65, P=0.004). In stage 2, we confirmed association for the 2 SNPs with the most significant associations (strongest SNP association for rs173686 with odds ratio=1.36, 95% CI=1.11 to 1.67, P=0.003). SNPs in strong linkage disequilibrium and haplotypes constituting these SNPs in the versican gene are associated with IAs suggesting that variation in or near the versican gene plays a role in susceptibility to IAs.

  16. Associations between Potentially Modifiable Risk Factors and Alzheimer Disease: A Mendelian Randomization Study

    PubMed Central

    Østergaard, Søren D.; Mukherjee, Shubhabrata; Sharp, Stephen J.; Proitsi, Petroula; Lotta, Luca A.; Day, Felix; Perry, John R. B.; Boehme, Kevin L.; Walter, Stefan; Kauwe, John S.; Gibbons, Laura E.; Larson, Eric B.; Powell, John F.; Langenberg, Claudia; Crane, Paul K.; Wareham, Nicholas J.; Scott, Robert A.

    2015-01-01

    Background Potentially modifiable risk factors including obesity, diabetes, hypertension, and smoking are associated with Alzheimer disease (AD) and represent promising targets for intervention. However, the causality of these associations is unclear. We sought to assess the causal nature of these associations using Mendelian randomization (MR). Methods and Findings We used SNPs associated with each risk factor as instrumental variables in MR analyses. We considered type 2 diabetes (T2D, N SNPs = 49), fasting glucose (N SNPs = 36), insulin resistance (N SNPs = 10), body mass index (BMI, N SNPs = 32), total cholesterol (N SNPs = 73), HDL-cholesterol (N SNPs = 71), LDL-cholesterol (N SNPs = 57), triglycerides (N SNPs = 39), systolic blood pressure (SBP, N SNPs = 24), smoking initiation (N SNPs = 1), smoking quantity (N SNPs = 3), university completion (N SNPs = 2), and years of education (N SNPs = 1). We calculated MR estimates of associations between each exposure and AD risk using an inverse-variance weighted approach, with summary statistics of SNP–AD associations from the International Genomics of Alzheimer’s Project, comprising a total of 17,008 individuals with AD and 37,154 cognitively normal elderly controls. We found that genetically predicted higher SBP was associated with lower AD risk (odds ratio [OR] per standard deviation [15.4 mm Hg] of SBP [95% CI]: 0.75 [0.62–0.91]; p = 3.4 × 10−3). Genetically predicted higher SBP was also associated with a higher probability of taking antihypertensive medication (p = 6.7 × 10−8). Genetically predicted smoking quantity was associated with lower AD risk (OR per ten cigarettes per day [95% CI]: 0.67 [0.51–0.89]; p = 6.5 × 10−3), although we were unable to stratify by smoking history; genetically predicted smoking initiation was not associated with AD risk (OR = 0.70 [0.37, 1.33]; p = 0.28). We saw no evidence of causal associations between glycemic traits, T2D, BMI, or educational attainment and risk of AD (all p > 0.1). Potential limitations of this study include the small proportion of intermediate trait variance explained by genetic variants and other implicit limitations of MR analyses. Conclusions Inherited lifetime exposure to higher SBP is associated with lower AD risk. These findings suggest that higher blood pressure—or some environmental exposure associated with higher blood pressure, such as use of antihypertensive medications—may reduce AD risk. PMID:26079503

  17. Using RNA-Seq to assemble a rose transcriptome with more than 13,000 full-length expressed genes and to develop the WagRhSNP 68k Axiom SNP array for rose (Rosa L.).

    PubMed

    Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M

    2015-01-01

    In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.

  18. Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins.

    PubMed

    He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L

    2018-04-01

    SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.

  19. Single-Nucleotide Polymorphisms Within the Thrombomodulin Gene (THBD) Predict Mortality in Patients With Graft-Versus-Host Disease.

    PubMed

    Rachakonda, Sivaramakrishna P; Penack, Olaf; Dietrich, Sascha; Blau, Olga; Blau, Igor Wolfgang; Radujkovic, Aleksandar; Isermann, Berend; Ho, Anthony D; Uharek, Lutz; Dreger, Peter; Kumar, Rajiv; Luft, Thomas

    2014-10-20

    Steroid-refractory graft-versus-host disease (GVHD) is a major and often fatal complication after allogeneic stem-cell transplantation (alloSCT). Although the pathophysiology of steroid refractoriness is not fully understood, evidence is accumulating that endothelial cell stress is involved, and endothelial thrombomodulin (THBD) plays a role in this process. Here we assess whether single-nucleotide polymorphisms (SNPs) within the THBD gene predict outcome after alloSCT. Seven SNPs within the THBD gene were studied (rs1962, rs1042579, rs1042580, rs3176123, rs3176124, rs3176126, and rs3176134) in a training cohort of 306 patients. The relevant genotypes were then validated in an independent cohort (n = 321). In the training cohort, an increased risk of nonrelapse mortality (NRM) was associated with three of seven SNPs tested: rs1962, rs1042579 (in linkage disequilibrium with rs3176123), and rs1042580. When patients were divided into risk groups (one v no high-risk SNP), a strong correlation with NRM was observed (hazard ratio [HR], 2.31; 95% CI, 1.36 to 3.95; P = .002). More specifically, NRM was predicted by THBD SNPs in patients who later developed GVHD (HR, 3.03; 95% CI, 1.61 to 5.68; P < .001) but not in patients without GVHD. In contrast, THBD SNPs did not predict incidence of acute GVHD. Multivariable analyses adjusting for clinical variables confirmed the independent effect of THBD SNPs on NRM. All findings could be reproduced in the validation cohort. THBD SNPs predict mortality of manifest GVHD but not the risk of acquiring GVHD, supporting the hypothesis that endothelial vulnerability contributes to GVHD refractoriness. © 2014 by American Society of Clinical Oncology.

  20. Combination Testing Using a Single MSH5 Variant alongside HLA Haplotypes Improves the Sensitivity of Predicting Coeliac Disease Risk in the Polish Population.

    PubMed

    Paziewska, Agnieszka; Cukrowska, Bozena; Dabrowska, Michalina; Goryca, Krzysztof; Piatkowska, Magdalena; Kluska, Anna; Mikula, Michal; Karczmarski, Jakub; Oralewska, Beata; Rybak, Anna; Socha, Jerzy; Balabas, Aneta; Zeber-Lubecka, Natalia; Ambrozkiewicz, Filip; Konopka, Ewa; Trojanowska, Ilona; Zagroba, Malgorzata; Szperl, Malgorzata; Ostrowski, Jerzy

    2015-01-01

    Assessment of non-HLA variants alongside standard HLA testing was previously shown to improve the identification of potential coeliac disease (CD) patients. We intended to identify new genetic variants associated with CD in the Polish population that would improve CD risk prediction when used alongside HLA haplotype analysis. DNA samples of 336 CD and 264 unrelated healthy controls were used to create DNA pools for a genome wide association study (GWAS). GWAS findings were validated with individual HLA tag single nucleotide polymorphism (SNP) typing of 473 patients and 714 healthy controls. Association analysis using four HLA-tagging SNPs showed that, as was found in other populations, positive predicting genotypes (HLA-DQ2.5/DQ2.5, HLA-DQ2.5/DQ2.2, and HLA-DQ2.5/DQ8) were found at higher frequencies in CD patients than in healthy control individuals in the Polish population. Both CD-associated SNPs discovered by GWAS were found in the CD susceptibility region, confirming the previously-determined association of the major histocompatibility (MHC) region with CD pathogenesis. The two most significant SNPs from the GWAS were rs9272346 (HLA-dependent; localized within 1 Kb of DQA1) and rs3130484 (HLA-independent; mapped to MSH5). Specificity of CD prediction using the four HLA-tagging SNPs achieved 92.9%, but sensitivity was only 45.5%. However, when a testing combination of the HLA-tagging SNPs and the MSH5 SNP was used, specificity decreased to 80%, and sensitivity increased to 74%. This study confirmed that improvement of CD risk prediction sensitivity could be achieved by including non-HLA SNPs alongside HLA SNPs in genetic testing.

  1. Prediction of gene expression with cis-SNPs using mixed models and regularization methods.

    PubMed

    Zeng, Ping; Zhou, Xiang; Huang, Shuiping

    2017-05-11

    It has been shown that gene expression in human tissues is heritable, thus predicting gene expression using only SNPs becomes possible. The prediction of gene expression can offer important implications on the genetic architecture of individual functional associated SNPs and further interpretations of the molecular basis underlying human diseases. We compared three types of methods for predicting gene expression using only cis-SNPs, including the polygenic model, i.e. linear mixed model (LMM), two sparse models, i.e. Lasso and elastic net (ENET), and the hybrid of LMM and sparse model, i.e. Bayesian sparse linear mixed model (BSLMM). The three kinds of prediction methods have very different assumptions of underlying genetic architectures. These methods were evaluated using simulations under various scenarios, and were applied to the Geuvadis gene expression data. The simulations showed that these four prediction methods (i.e. Lasso, ENET, LMM and BSLMM) behaved best when their respective modeling assumptions were satisfied, but BSLMM had a robust performance across a range of scenarios. According to R 2 of these models in the Geuvadis data, the four methods performed quite similarly. We did not observe any clustering or enrichment of predictive genes (defined as genes with R 2  ≥ 0.05) across the chromosomes, and also did not see there was any clear relationship between the proportion of the predictive genes and the proportion of genes in each chromosome. However, an interesting finding in the Geuvadis data was that highly predictive genes (e.g. R 2  ≥ 0.30) may have sparse genetic architectures since Lasso, ENET and BSLMM outperformed LMM for these genes; and this observation was validated in another gene expression data. We further showed that the predictive genes were enriched in approximately independent LD blocks. Gene expression can be predicted with only cis-SNPs using well-developed prediction models and these predictive genes were enriched in some approximately independent LD blocks. The prediction of gene expression can shed some light on the functional interpretation for identified SNPs in GWASs.

  2. Consensus generation and variant detection by Celera Assembler.

    PubMed

    Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger

    2008-04-15

    We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/

  3. Population structure of pigs determined by single nucleotide polymorphisms observed in assembled expressed sequence tags.

    PubMed

    Matsumoto, Toshimi; Okumura, Naohiko; Uenishi, Hirohide; Hayashi, Takeshi; Hamasima, Noriyuki; Awata, Takashi

    2012-01-01

    We have collected more than 190000 porcine expressed sequence tags (ESTs) from full-length complementary DNA (cDNA) libraries and identified more than 2800 single nucleotide polymorphisms (SNPs). In this study, we tentatively chose 222 SNPs observed in assembled ESTs to study pigs of different breeds; 104 were selected by comparing the cDNA sequences of a Meishan pig and samples of three-way cross pigs (Landrace, Large White, and Duroc: LWD), and 118 were selected from LWD samples. To evaluate the genetic variation between the chosen SNPs from pig breeds, we determined the genotypes for 192 pig samples (11 pig groups) from our DNA reference panel with matrix-assisted laser desorption ionization time-of-flight mass spectrometry. Of the 222 reference SNPs, 186 were successfully genotyped. A neighbor-joining tree showed that the pig groups were classified into two large clusters, namely, Euro-American and East Asian pig populations. F-statistics and the analysis of molecular variance of Euro-American pig groups revealed that approximately 25% of the genetic variations occurred because of intergroup differences. As the F(IS) values were less than the F(ST) values(,) the clustering, based on the Bayesian inference, implied that there was strong genetic differentiation among pig groups and less divergence within the groups in our samples. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.

  4. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.

    PubMed

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.

  5. Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications

    PubMed Central

    Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart

    2016-01-01

    Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. PMID:27583971

  6. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

    PubMed

    Nguyen, Thanh-Tung; Huang, Joshua; Wu, Qingyao; Nguyen, Thuy; Li, Mark

    2015-01-01

    Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction errors and outperformed most existing the-state-of-the-art random forests. The top 25 SNPs in Parkinson data set were identified by the proposed model including four interesting genes associated with neurological disorders. The presented approach has shown to be effective in selecting informative sub-groups of SNPs potentially associated with diseases that traditional statistical approaches might fail. The new RF works well for the data where the number of case-control objects is much smaller than the number of SNPs, which is a typical problem in gene data and GWAS. Experiment results demonstrated the effectiveness of the proposed RF model that outperformed the state-of-the-art RFs, including Breiman's RF, GRRF and wsRF methods.

  7. Non-additive genetic variation in growth, carcass and fertility traits of beef cattle.

    PubMed

    Bolormaa, Sunduimijid; Pryce, Jennie E; Zhang, Yuandan; Reverter, Antonio; Barendse, William; Hayes, Ben J; Goddard, Michael E

    2015-04-02

    A better understanding of non-additive variance could lead to increased knowledge on the genetic control and physiology of quantitative traits, and to improved prediction of the genetic value and phenotype of individuals. Genome-wide panels of single nucleotide polymorphisms (SNPs) have been mainly used to map additive effects for quantitative traits, but they can also be used to investigate non-additive effects. We estimated dominance and epistatic effects of SNPs on various traits in beef cattle and the variance explained by dominance, and quantified the increase in accuracy of phenotype prediction by including dominance deviations in its estimation. Genotype data (729 068 real or imputed SNPs) and phenotypes on up to 16 traits of 10 191 individuals from Bos taurus, Bos indicus and composite breeds were used. A genome-wide association study was performed by fitting the additive and dominance effects of single SNPs. The dominance variance was estimated by fitting a dominance relationship matrix constructed from the 729 068 SNPs. The accuracy of predicted phenotypic values was evaluated by best linear unbiased prediction using the additive and dominance relationship matrices. Epistatic interactions (additive × additive) were tested between each of the 28 SNPs that are known to have additive effects on multiple traits, and each of the other remaining 729 067 SNPs. The number of significant dominance effects was greater than expected by chance and most of them were in the direction that is presumed to increase fitness and in the opposite direction to inbreeding depression. Estimates of dominance variance explained by SNPs varied widely between traits, but had large standard errors. The median dominance variance across the 16 traits was equal to 5% of the phenotypic variance. Including a dominance deviation in the prediction did not significantly increase its accuracy for any of the phenotypes. The number of additive × additive epistatic effects that were statistically significant was greater than expected by chance. Significant dominance and epistatic effects occur for growth, carcass and fertility traits in beef cattle but they are difficult to estimate precisely and including them in phenotype prediction does not increase its accuracy.

  8. Partition dataset according to amino acid type improves the prediction of deleterious non-synonymous SNPs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Jing; Li, Yuan-Yuan; Shanghai Center for Bioinformation Technology, Shanghai 200235

    2012-03-02

    Highlights: Black-Right-Pointing-Pointer Proper dataset partition can improve the prediction of deleterious nsSNPs. Black-Right-Pointing-Pointer Partition according to original residue type at nsSNP is a good criterion. Black-Right-Pointing-Pointer Similar strategy is supposed promising in other machine learning problems. -- Abstract: Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNPmore » site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.« less

  9. Genome-wide association study and accuracy of genomic prediction for teat number in Duroc pigs using genotyping-by-sequencing.

    PubMed

    Tan, Cheng; Wu, Zhenfang; Ren, Jiangli; Huang, Zhuolin; Liu, Dewu; He, Xiaoyan; Prakapenka, Dzianis; Zhang, Ran; Li, Ning; Da, Yang; Hu, Xiaoxiang

    2017-03-29

    The number of teats in pigs is related to a sow's ability to rear piglets to weaning age. Several studies have identified genes and genomic regions that affect teat number in swine but few common results were reported. The objective of this study was to identify genetic factors that affect teat number in pigs, evaluate the accuracy of genomic prediction, and evaluate the contribution of significant genes and genomic regions to genomic broad-sense heritability and prediction accuracy using 41,108 autosomal single nucleotide polymorphisms (SNPs) from genotyping-by-sequencing on 2936 Duroc boars. Narrow-sense heritability and dominance heritability of teat number estimated by genomic restricted maximum likelihood were 0.365 ± 0.030 and 0.035 ± 0.019, respectively. The accuracy of genomic predictions, calculated as the average correlation between the genomic best linear unbiased prediction and phenotype in a tenfold validation study, was 0.437 ± 0.064 for the model with additive and dominance effects and 0.435 ± 0.064 for the model with additive effects only. Genome-wide association studies (GWAS) using three methods of analysis identified 85 significant SNP effects for teat number on chromosomes 1, 6, 7, 10, 11, 12 and 14. The region between 102.9 and 106.0 Mb on chromosome 7, which was reported in several studies, had the most significant SNP effects in or near the PTGR2, FAM161B, LIN52, VRTN, FCF1, AREL1 and LRRC74A genes. This region accounted for 10.0% of the genomic additive heritability and 8.0% of the accuracy of prediction. The second most significant chromosome region not reported by previous GWAS was the region between 77.7 and 79.7 Mb on chromosome 11, where SNPs in the FGF14 gene had the most significant effect and accounted for 5.1% of the genomic additive heritability and 5.2% of the accuracy of prediction. The 85 significant SNPs accounted for 28.5 to 28.8% of the genomic additive heritability and 35.8 to 36.8% of the accuracy of prediction. The three methods used for the GWAS identified 85 significant SNPs with additive effects on teat number, including SNPs in a previously reported chromosomal region and SNPs in novel chromosomal regions. Most significant SNPs with larger estimated effects also had larger contributions to the total genomic heritability and accuracy of prediction than other SNPs.

  10. Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems

    PubMed Central

    2011-01-01

    Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock. PMID:21504589

  11. Transcriptome Sequencing, and Rapid Development and Application of SNP Markers for the Legume Pod Borer Maruca vitrata (Lepidoptera: Crambidae)

    PubMed Central

    Margam, Venu M.; Coates, Brad S.; Bayles, Darrell O.; Hellmich, Richard L.; Agunbiade, Tolulope; Seufferheld, Manfredo J.; Sun, Weilin; Kroemer, Jeremy A.; Ba, Malick N.; Binso-Dabire, Clementine L.; Baoua, Ibrahim; Ishiyaku, Mohammad F.; Covas, Fernando G.; Srinivasan, Ramasamy; Armstrong, Joel; Murdock, Larry L.; Pittendrigh, Barry R.

    2011-01-01

    The legume pod borer, Maruca vitrata (Lepidoptera: Crambidae), is an insect pest species of crops grown by subsistence farmers in tropical regions of Africa. We present the de novo assembly of 3729 contigs from 454- and Sanger-derived sequencing reads for midgut, salivary, and whole adult tissues of this non-model species. Functional annotation predicted that 1320 M. vitrata protein coding genes are present, of which 631 have orthologs within the Bombyx mori gene model. A homology-based analysis assigned M. vitrata genes into a group of paralogs, but these were subsequently partitioned into putative orthologs following phylogenetic analyses. Following sequence quality filtering, a total of 1542 putative single nucleotide polymorphisms (SNPs) were predicted within M. vitrata contig assemblies. Seventy one of 1078 designed molecular genetic markers were used to screen M. vitrata samples from five collection sites in West Africa. Population substructure may be present with significant implications in the insect resistance management recommendations pertaining to the release of biological control agents or transgenic cowpea that express Bacillus thuringiensis crystal toxins. Mutation data derived from transcriptome sequencing is an expeditious and economical source for genetic markers that allow evaluation of ecological differentiation. PMID:21754987

  12. Transcriptome analysis of Capsicum annuum varieties Mandarin and Blackcluster: assembly, annotation and molecular marker discovery.

    PubMed

    Ahn, Yul-Kyun; Tripathi, Swati; Kim, Jeong-Ho; Cho, Young-Il; Lee, Hye-Eun; Kim, Do-Sun; Woo, Jong-Gyu; Cho, Myeong-Cheoul

    2014-01-10

    Next generation sequencing technologies have proven to be a rapid and cost-effective means to assemble and characterize gene content and identify molecular markers in various organisms. Pepper (Capsicum annuum L., Solanaceae) is a major staple vegetable crop, which is economically important and has worldwide distribution. High-throughput transcriptome profiling of two pepper cultivars, Mandarin and Blackcluster, using 454 GS-FLX pyrosequencing yielded 279,221 and 316,357 sequenced reads with a total 120.44 and 142.54Mb of sequence data (average read length of 431 and 450 nucleotides). These reads resulted from 17,525 and 16,341 'isogroups' and were assembled into 19,388 and 18,057 isotigs, and 22,217 and 13,153 singletons for both the cultivars, respectively. Assembled sequences were annotated functionally based on homology to genes in multiple public databases. Detailed sequence variant analysis identified a total of 9701 and 12,741 potential SNPs which eventually resulted in 1025 and 1059 genotype specific SNPs, for both the varieties, respectively, after examining SNP frequency distribution for each mapped unigenes. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies. © 2013 Elsevier B.V. All rights reserved.

  13. Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

    PubMed Central

    2011-01-01

    Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398

  14. Genotyping by sequencing for genomic prediction in a soybean breeding population.

    PubMed

    Jarquín, Diego; Kocak, Kyle; Posadas, Luis; Hyma, Katie; Jedlicka, Joseph; Graef, George; Lorenz, Aaron

    2014-08-29

    Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations. Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller. Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.

  15. Striped, Ellipsoidal Particles by Controlled Assembly of Diblock Copolymers

    DTIC Science & Technology

    2013-04-17

    morphology to a disordered bicontinuous morphology can be achieved.15,16,26−28 For poly(styrene- b -2-vinylpyridine) ( PS - b - P2VP ) materials, precise control of an...of SNPs, slow evaporation of chloroform from emulsion droplets containing PS - b - P2VP diblock copolymers resulted in solid particles with a spherical...lamellae of PS - b - P2VP and SNP necklaces decorating the outer surface could be obtained. The role of interfacially active SNPs in the morphology

  16. Genotype-phenotype association study via new multi-task learning model

    PubMed Central

    Huo, Zhouyuan; Shen, Dinggang

    2018-01-01

    Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896

  17. Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle.

    PubMed

    Uemoto, Yoshinobu; Sasaki, Shinji; Kojima, Takatoshi; Sugimoto, Yoshikazu; Watanabe, Toshio

    2015-11-19

    Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data. In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design. The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV. The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.

  18. WS-SNPs&GO: a web server for predicting the deleterious effect of human protein variants using functional annotation

    PubMed Central

    2013-01-01

    Background SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases. Results The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO3d programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively. Conclusions WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go. PMID:23819482

  19. Oxytocin receptor gene variations predict neural and behavioral response to oxytocin in autism

    PubMed Central

    Watanabe, Takamitsu; Otowa, Takeshi; Abe, Osamu; Kuwabara, Hitoshi; Aoki, Yuta; Natsubori, Tatsunobu; Takao, Hidemasa; Kakiuchi, Chihiro; Kondo, Kenji; Ikeda, Masashi; Iwata, Nakao; Kasai, Kiyoto; Sasaki, Tsukasa

    2017-01-01

    Abstract Oxytocin appears beneficial for autism spectrum disorder (ASD), and more than 20 single-nucleotide polymorphisms (SNPs) in oxytocin receptor (OXTR) are relevant to ASD. However, neither biological functions of OXTR SNPs in ASD nor critical OXTR SNPs that determine oxytocin’s effects on ASD remains known. Here, using a machine-learning algorithm that was designed to evaluate collective effects of multiple SNPs and automatically identify most informative SNPs, we examined relationships between 27 representative OXTR SNPs and six types of behavioral/neural response to oxytocin in ASD individuals. The oxytocin effects were extracted from our previous placebo-controlled within-participant clinical trial administering single-dose intranasal oxytocin to 38 high-functioning adult Japanese ASD males. Consequently, we identified six different SNP sets that could accurately predict the six different oxytocin efficacies, and confirmed the robustness of these SNP selections against variations of the datasets and analysis parameters. Moreover, major alleles of several prominent OXTR SNPs—including rs53576 and rs2254298—were found to have dissociable effects on the oxytocin efficacies. These findings suggest biological functions of the OXTR SNP variants on autistic oxytocin responses, and implied that clinical oxytocin efficacy may be genetically predicted before its actual administration, which would contribute to establishment of future precision medicines for ASD. PMID:27798253

  20. The diploid genome sequence of an Asian individual

    PubMed Central

    Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian

    2009-01-01

    Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735

  1. SU-D-204-06: Integration of Machine Learning and Bioinformatics Methods to Analyze Genome-Wide Association Study Data for Rectal Bleeding and Erectile Dysfunction Following Radiotherapy in Prostate Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Oh, J; Deasy, J; Kerns, S

    Purpose: We investigated whether integration of machine learning and bioinformatics techniques on genome-wide association study (GWAS) data can improve the performance of predictive models in predicting the risk of developing radiation-induced late rectal bleeding and erectile dysfunction in prostate cancer patients. Methods: We analyzed a GWAS dataset generated from 385 prostate cancer patients treated with radiotherapy. Using genotype information from these patients, we designed a machine learning-based predictive model of late radiation-induced toxicities: rectal bleeding and erectile dysfunction. The model building process was performed using 2/3 of samples (training) and the predictive model was tested with 1/3 of samples (validation).more » To identify important single nucleotide polymorphisms (SNPs), we computed the SNP importance score, resulting from our random forest regression model. We performed gene ontology (GO) enrichment analysis for nearby genes of the important SNPs. Results: After univariate analysis on the training dataset, we filtered out many SNPs with p>0.001, resulting in 749 and 367 SNPs that were used in the model building process for rectal bleeding and erectile dysfunction, respectively. On the validation dataset, our random forest regression model achieved the area under the curve (AUC)=0.70 and 0.62 for rectal bleeding and erectile dysfunction, respectively. We performed GO enrichment analysis for the top 25%, 50%, 75%, and 100% SNPs out of the select SNPs in the univariate analysis. When we used the top 50% SNPs, more plausible biological processes were obtained for both toxicities. An additional test with the top 50% SNPs improved predictive power with AUC=0.71 and 0.65 for rectal bleeding and erectile dysfunction. A better performance was achieved with AUC=0.67 when age and androgen deprivation therapy were added to the model for erectile dysfunction. Conclusion: Our approach that combines machine learning and bioinformatics techniques enabled designing better models and identifying more plausible biological processes associated with the outcomes.« less

  2. A Prediction Algorithm for Drug Response in Patients with Mesial Temporal Lobe Epilepsy Based on Clinical and Genetic Information

    PubMed Central

    Carvalho, Benilton S.; Bilevicius, Elizabeth; Alvim, Marina K. M.; Lopes-Cendes, Iscia

    2017-01-01

    Mesial temporal lobe epilepsy is the most common form of adult epilepsy in surgical series. Currently, the only characteristic used to predict poor response to clinical treatment in this syndrome is the presence of hippocampal sclerosis. Single nucleotide polymorphisms (SNPs) located in genes encoding drug transporter and metabolism proteins could influence response to therapy. Therefore, we aimed to evaluate whether combining information from clinical variables as well as SNPs in candidate genes could improve the accuracy of predicting response to drug therapy in patients with mesial temporal lobe epilepsy. For this, we divided 237 patients into two groups: 75 responsive and 162 refractory to antiepileptic drug therapy. We genotyped 119 SNPs in ABCB1, ABCC2, CYP1A1, CYP1A2, CYP1B1, CYP2C9, CYP2C19, CYP2D6, CYP2E1, CYP3A4, and CYP3A5 genes. We used 98 additional SNPs to evaluate population stratification. We assessed a first scenario using only clinical variables and a second one including SNP information. The random forests algorithm combined with leave-one-out cross-validation was used to identify the best predictive model in each scenario and compared their accuracies using the area under the curve statistic. Additionally, we built a variable importance plot to present the set of most relevant predictors on the best model. The selected best model included the presence of hippocampal sclerosis and 56 SNPs. Furthermore, including SNPs in the model improved accuracy from 0.4568 to 0.8177. Our findings suggest that adding genetic information provided by SNPs, located on drug transport and metabolism genes, can improve the accuracy for predicting which patients with mesial temporal lobe epilepsy are likely to be refractory to drug treatment, making it possible to identify patients who may benefit from epilepsy surgery sooner. PMID:28052106

  3. Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs

    PubMed Central

    Pavy, Nathalie; Parsons, Lee S; Paule, Charles; MacKay, John; Bousquet, Jean

    2006-01-01

    Background High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. Results A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP ≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using PSNP ≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. Conclusion We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies. PMID:16824208

  4. Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan).

    PubMed

    Singh, Vikas K; Khan, Aamir W; Saxena, Rachit K; Kumar, Vinay; Kale, Sandip M; Sinha, Pallavi; Chitikineni, Annapurna; Pazhamala, Lekha T; Garg, Vanika; Sharma, Mamta; Sameer Kumar, Chanda Venkata; Parupalli, Swathi; Vechalapu, Suryanarayana; Patil, Suyash; Muniswamy, Sonnappa; Ghanta, Anuradha; Yamini, Kalinati Narasimhan; Dharmaraj, Pallavi Subbanna; Varshney, Rajeev K

    2016-05-01

    To map resistance genes for Fusarium wilt (FW) and sterility mosaic disease (SMD) in pigeonpea, sequencing-based bulked segregant analysis (Seq-BSA) was used. Resistant (R) and susceptible (S) bulks from the extreme recombinant inbred lines of ICPL 20096 × ICPL 332 were sequenced. Subsequently, SNP index was calculated between R- and S-bulks with the help of draft genome sequence and reference-guided assembly of ICPL 20096 (resistant parent). Seq-BSA has provided seven candidate SNPs for FW and SMD resistance in pigeonpea. In parallel, four additional genotypes were re-sequenced and their combined analysis with R- and S-bulks has provided a total of 8362 nonsynonymous (ns) SNPs. Of 8362 nsSNPs, 60 were found within the 2-Mb flanking regions of seven candidate SNPs identified through Seq-BSA. Haplotype analysis narrowed down to eight nsSNPs in seven genes. These eight nsSNPs were further validated by re-sequencing 11 genotypes that are resistant and susceptible to FW and SMD. This analysis revealed association of four candidate nsSNPs in four genes with FW resistance and four candidate nsSNPs in three genes with SMD resistance. Further, In silico protein analysis and expression profiling identified two most promising candidate genes namely C.cajan_01839 for SMD resistance and C.cajan_03203 for FW resistance. Identified candidate genomic regions/SNPs will be useful for genomics-assisted breeding in pigeonpea. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  5. Genome-wide association study of sporadic brain arteriovenous malformations.

    PubMed

    Weinsheimer, Shantel; Bendjilali, Nasrine; Nelson, Jeffrey; Guo, Diana E; Zaroff, Jonathan G; Sidney, Stephen; McCulloch, Charles E; Al-Shahi Salman, Rustam; Berg, Jonathan N; Koeleman, Bobby P C; Simon, Matthias; Bostroem, Azize; Fontanella, Marco; Sturiale, Carmelo L; Pola, Roberto; Puca, Alfredo; Lawton, Michael T; Young, William L; Pawlikowska, Ludmila; Klijn, Catharina J M; Kim, Helen

    2016-09-01

    The pathogenesis of sporadic brain arteriovenous malformations (BAVMs) remains unknown, but studies suggest a genetic component. We estimated the heritability of sporadic BAVM and performed a genome-wide association study (GWAS) to investigate association of common single nucleotide polymorphisms (SNPs) with risk of sporadic BAVM in the international, multicentre Genetics of Arteriovenous Malformation (GEN-AVM) consortium. The Caucasian discovery cohort included 515 BAVM cases and 1191 controls genotyped using Affymetrix genome-wide SNP arrays. Genotype data were imputed to 1000 Genomes Project data, and well-imputed SNPs (>0.01 minor allele frequency) were analysed for association with BAVM. 57 top BAVM-associated SNPs (51 SNPs with p<10(-05) or p<10(-04) in candidate pathway genes, and 6 candidate BAVM SNPs) were tested in a replication cohort including 608 BAVM cases and 744 controls. The estimated heritability of BAVM was 17.6% (SE 8.9%, age and sex-adjusted p=0.015). None of the SNPs were significantly associated with BAVM in the replication cohort after correction for multiple testing. 6 SNPs had a nominal p<0.1 in the replication cohort and map to introns in EGFEM1P, SP4 and CDKAL1 or near JAG1 and BNC2. Of the 6 candidate SNPs, 2 in ACVRL1 and MMP3 had a nominal p<0.05 in the replication cohort. We performed the first GWAS of sporadic BAVM in the largest BAVM cohort assembled to date. No GWAS SNPs were replicated, suggesting that common SNPs do not contribute strongly to BAVM susceptibility. However, heritability estimates suggest a modest but significant genetic contribution. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  6. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping

    PubMed Central

    Hulse-Kemp, Amanda M.; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A.; Scheffler, Brian E.; Fang, David D.; Chen, Z. Jeffrey; Van Deynze, Allen; Stelly, David M.

    2015-01-01

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. PMID:25858960

  7. Regulatory element-based prediction identifies new susceptibility regulatory variants for osteoporosis.

    PubMed

    Yao, Shi; Guo, Yan; Dong, Shan-Shan; Hao, Ruo-Han; Chen, Xiao-Feng; Chen, Yi-Xiao; Chen, Jia-Bin; Tian, Qing; Deng, Hong-Wen; Yang, Tie-Lin

    2017-08-01

    Despite genome-wide association studies (GWASs) have identified many susceptibility genes for osteoporosis, it still leaves a large part of missing heritability to be discovered. Integrating regulatory information and GWASs could offer new insights into the biological link between the susceptibility SNPs and osteoporosis. We generated five machine learning classifiers with osteoporosis-associated variants and regulatory features data. We gained the optimal classifier and predicted genome-wide SNPs to discover susceptibility regulatory variants. We further utilized Genetic Factors for Osteoporosis Consortium (GEFOS) and three in-house GWASs samples to validate the associations for predicted positive SNPs. The random forest classifier performed best among all machine learning methods with the F1 score of 0.8871. Using the optimized model, we predicted 37,584 candidate SNPs for osteoporosis. According to the meta-analysis results, a list of regulatory variants was significantly associated with osteoporosis after multiple testing corrections and contributed to the expression of known osteoporosis-associated protein-coding genes. In summary, combining GWASs and regulatory elements through machine learning could provide additional information for understanding the mechanism of osteoporosis. The regulatory variants we predicted will provide novel targets for etiology research and treatment of osteoporosis.

  8. Profiling deleterious non-synonymous SNPs of smoker's gene CYP1A1.

    PubMed

    Ramesh, A Sai; Khan, Imran; Farhan, Md; Thiagarajan, Padma

    2013-01-01

    CYP1A1 gene belongs to the cytochrome P450 family and is known better as smokers' gene due to its hyperactivation as a consequence of long term smoking. The expression of CYP1A1 induces polycyclic aromatic hydrocarbon production in the lungs, which when over expressed, is known to cause smoking related diseases, such as cardiovascular pathologies, cancer, and diabetes. Single nucleotide polymorphisms (SNPs) are the simplest form of genetic variations that occur at a higher frequency, and are denoted as synonymous and non-synonymous SNPs on the basis of their effects on the amino acids. This study adopts a systematic in silico approach to predict the deleterious SNPs that are associated with disease conditions. It is inferred that four SNPs are highly deleterious, among which the SNP with rs17861094 is commonly predicted to be harmful by all tools. Hydrophobic (isoleucine) to hydrophilic (serine) amino acid variation was observed in the candidate gene. Hence, this investigation aims to characterize a candidate gene from 159 SNPs of CYP1A1.

  9. Genetic variation predicting cisplatin cytotoxicity associated with overall survival in lung cancer patients receiving platinum-based chemotherapy †, ‡

    PubMed Central

    Tan, Xiang-Lin; Moyer, Ann M.; Fridley, Brooke L.; Schaid, Daniel J.; Niu, Nifang; Batzler, Anthony J.; Jenkins, Gregory D.; Abo, Ryan P.; Li, Liang; Cunningham, Julie M.; Sun, Zhifu; Yang, Ping; Wang, Liewei

    2011-01-01

    Purpose Inherited variability in the prognosis of lung cancer patients treated with platinum-based chemotherapy has been widely investigated. However, the overall contribution of genetic variation to platinum response is not well established. To identify novel candidate SNPs/genes, we performed a genome-wide association study (GWAS) for cisplatin cytotoxicity using lymphoblastoid cell lines (LCLs), followed by an association study of selected SNPs from the GWAS with overall survival (OS) in lung cancer patients. Experimental Design GWAS for cisplatin were performed with 283 ethnically diverse LCLs. 168 top SNPs were genotyped in 222 small cell and 961 non-small cell lung cancer (SCLC, NSCLC) patients treated with platinum-based therapy. Association of the SNPs with OS was determined using the Cox regression model. Selected candidate genes were functionally validated by siRNA knockdown in human lung cancer cells. Results Among 157 successfully genotyped SNPs, 9 and 10 SNPs were top SNPs associated with OS for patients with NSCLC and SCLC, respectively, although they were not significant after adjusting for multiple testing. Fifteen genes, including 7 located within 200 kb up or downstream of the four top SNPs and 8 genes for which expression was correlated with three SNPs in LCLs were selected for siRNA screening. Knockdown of DAPK3 and METTL6, for which expression levels were correlated with the rs11169748 and rs2440915 SNPs, significantly decreased cisplatin sensitivity in lung cancer cells. Conclusions This series of clinical and complementary laboratory-based functional studies identified several candidate genes/SNPs that might help predict treatment outcomes for platinum-based therapy of lung cancer. PMID:21775533

  10. Evaluation and identification of damaged single nucleotide polymorphisms in COL1A1 gene involved in osteoporosis

    PubMed Central

    Alsaif, Mohammed A.; Al Shammari, Sulaiman A.; Alhamdan, Adel A.

    2012-01-01

    Introduction Single-nucleotide polymorphisms (SNPs) are biomarkers for exploring the genetic basis of many complex human diseases. The prediction of SNPs is promising in modern genetic analysis but it is still a great challenge to identify the functional SNPs in a disease-related gene. The computational approach has overcome this challenge and an increase in the successful rate of genetic association studies and reduced cost of genotyping have been achieved. The objective of this study is to identify deleterious non-synonymous SNPs (nsSNPs) associated with the COL1A1 gene. Material and methods The SNPs were retrieved from the Single Nucleotide Polymorphism Database (dbSNP). Using I-Mutant, protein stability change was calculated. The potentially functional nsSNPs and their effect on proteins were predicted by PolyPhen and SIFT respectively. FASTSNP was used for estimation of risk score. Results Our analysis revealed 247 SNPs as non-synonymous, out of which 5 nsSNPs were found to be least stable by I-Mutant 2.0 with a DDG value of > –1.0. Four nsSNPs, namely rs17853657, rs17857117, rs57377812 and rs1059454, showed a highly deleterious tolerance index score of 0.00 with a change in their physicochemical properties by the SIFT server. Seven nsSNPs, namely rs1059454, rs8179178, rs17853657, rs17857117, rs72656340, rs72656344 and rs72656351, were found to be probably damaging with a PSIC score difference between 2.0 and 3.5 by the PolyPhen server. Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, were found to be highly polymorphic with a risk score of 3-4 with a possible effect of non-conservative change and splicing regulation by FASTSNP. Conclusions Three nsSNPs, namely rs1059454, rs17853657 and rs17857117, are potential functional polymorphisms that are likely to have a functional impact on the COL1A1 gene. PMID:24273577

  11. MultiBLUP: improved SNP-based prediction for complex traits.

    PubMed

    Speed, Doug; Balding, David J

    2014-09-01

    BLUP (best linear unbiased prediction) is widely used to predict complex traits in plant and animal breeding, and increasingly in human genetics. The BLUP mathematical model, which consists of a single random effect term, was adequate when kinships were measured from pedigrees. However, when genome-wide SNPs are used to measure kinships, the BLUP model implicitly assumes that all SNPs have the same effect-size distribution, which is a severe and unnecessary limitation. We propose MultiBLUP, which extends the BLUP model to include multiple random effects, allowing greatly improved prediction when the random effects correspond to classes of SNPs with distinct effect-size variances. The SNP classes can be specified in advance, for example, based on SNP functional annotations, and we also provide an adaptive procedure for determining a suitable partition of SNPs. We apply MultiBLUP to genome-wide association data from the Wellcome Trust Case Control Consortium (seven diseases), and from much larger studies of celiac disease and inflammatory bowel disease, finding that it consistently provides better prediction than alternative methods. Moreover, MultiBLUP is computationally very efficient; for the largest data set, which includes 12,678 individuals and 1.5 M SNPs, the total analysis can be run on a single desktop PC in less than a day and can be parallelized to run even faster. Tools to perform MultiBLUP are freely available in our software LDAK. © 2014 Speed and Balding; Published by Cold Spring Harbor Laboratory Press.

  12. Cost–Effective Prediction of Gender-Labeling Errors and Estimation of Gender-Labeling Error Rates in Candidate-Gene Association Studies

    PubMed Central

    Qu, Conghui; Schuetz, Johanna M.; Min, Jeong Eun; Leach, Stephen; Daley, Denise; Spinelli, John J.; Brooks-Wilson, Angela; Graham, Jinko

    2011-01-01

    We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the heterozygosity of X-chromosome SNPs, by incorporating available information about the intensity of X-chromosome SNPs in candidate genes relative to autosomal SNPs from the same individual. To our knowledge, no published methods formalize a framework in which heterozygosity and relative intensity are simultaneously taken into account. Our method offers the advantage that, in the genotyping set, no additional space is required beyond that already assigned to X-chromosome SNPs in the candidate genes. We also show how the predictions can be used in a two-phase sampling design to estimate the gender-labeling error rates for an entire study, at a fraction of the cost of a conventional design. PMID:22303327

  13. Computational Methods to Work as First-Pass Filter in Deleterious SNP Analysis of Alkaptonuria

    PubMed Central

    Magesh, R.; George Priya Doss, C.

    2012-01-01

    A major challenge in the analysis of human genetic variation is to distinguish functional from nonfunctional SNPs. Discovering these functional SNPs is one of the main goals of modern genetics and genomics studies. There is a need to effectively and efficiently identify functionally important nsSNPs which may be deleterious or disease causing and to identify their molecular effects. The prediction of phenotype of nsSNPs by computational analysis may provide a good way to explore the function of nsSNPs and its relationship with susceptibility to disease. In this context, we surveyed and compared variation databases along with in silico prediction programs to assess the effects of deleterious functional variants on protein functions. In other respects, we attempted these methods to work as first-pass filter to identify the deleterious substitutions worth pursuing for further experimental research. In this analysis, we used the existing computational methods to explore the mutation-structure-function relationship in HGD gene causing alkaptonuria. PMID:22606059

  14. DRDB: An Online Date Palm Genomic Resource Database.

    PubMed

    He, Zilong; Zhang, Chengwei; Liu, Wanfei; Lin, Qiang; Wei, Ting; Aljohi, Hasan A; Chen, Wei-Hua; Hu, Songnian

    2017-01-01

    Background: Date palm ( Phoenix dactylifera L.) is a cultivated woody plant with agricultural and economic importance in many countries around the world. With the advantages of next generation sequencing technologies, genome sequences for many date palm cultivars have been released recently. Short sequence repeat (SSR) and single nucleotide polymorphism (SNP) can be identified from these genomic data, and have been proven to be very useful biomarkers in plant genome analysis and breeding. Results: Here, we first improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs) were annotated in this genome assembly; among the SSRs, mononucleotide SSRs (58.92%) were the most abundant, followed by di- (29.92%), tri- (8.14%), tetra- (2.47%), penta- (0.36%), and hexa-nucleotide SSRs (0.19%). The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total) SSRs. We also annotated 6,375,806 SNPs with raw read depth≥3 in 90% cultivars. To further reduce false positive SNPs, we only kept 5,572,650 (87.40% out of total) SNPs with at least 20% cultivars support for downstream analyses. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53%) SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt - Sudan, and Middle East - South Asian, with Egypt - Sudan being the admixture of North Africa and Middle East - South Asian cultivars; we further confirmed these clusters using principal component analysis. Moreover, 34,346 SSRs and 4,177,778 SNPs with PCR primers were assigned to shared cultivars for cultivar classification and diversity analysis. All these SSRs, SNPs and their classification are available in our database, and can be used for cultivar identification, comparison, and molecular breeding. Conclusion: DRDB is a comprehensive genomic resource database of date palm. It can serve as a bioinformatics platform for date palm genomics, genetics, and molecular breeding. DRDB is freely available at http://drdb.big.ac.cn/home.

  15. DRDB: An Online Date Palm Genomic Resource Database

    PubMed Central

    He, Zilong; Zhang, Chengwei; Liu, Wanfei; Lin, Qiang; Wei, Ting; Aljohi, Hasan A.; Chen, Wei-Hua; Hu, Songnian

    2017-01-01

    Background: Date palm (Phoenix dactylifera L.) is a cultivated woody plant with agricultural and economic importance in many countries around the world. With the advantages of next generation sequencing technologies, genome sequences for many date palm cultivars have been released recently. Short sequence repeat (SSR) and single nucleotide polymorphism (SNP) can be identified from these genomic data, and have been proven to be very useful biomarkers in plant genome analysis and breeding. Results: Here, we first improved the date palm genome assembly using 130X of HiSeq data generated in our lab. Then 246,445 SSRs (214,901 SSRs and 31,544 compound SSRs) were annotated in this genome assembly; among the SSRs, mononucleotide SSRs (58.92%) were the most abundant, followed by di- (29.92%), tri- (8.14%), tetra- (2.47%), penta- (0.36%), and hexa-nucleotide SSRs (0.19%). The high-quality PCR primer pairs were designed for most (174,497; 70.81% out of total) SSRs. We also annotated 6,375,806 SNPs with raw read depth≥3 in 90% cultivars. To further reduce false positive SNPs, we only kept 5,572,650 (87.40% out of total) SNPs with at least 20% cultivars support for downstream analyses. The high-quality PCR primer pairs were also obtained for 4,177,778 (65.53%) SNPs. We reconstructed the phylogenetic relationships among the 62 cultivars using these variants and found that they can be divided into three clusters, namely North Africa, Egypt – Sudan, and Middle East – South Asian, with Egypt – Sudan being the admixture of North Africa and Middle East – South Asian cultivars; we further confirmed these clusters using principal component analysis. Moreover, 34,346 SSRs and 4,177,778 SNPs with PCR primers were assigned to shared cultivars for cultivar classification and diversity analysis. All these SSRs, SNPs and their classification are available in our database, and can be used for cultivar identification, comparison, and molecular breeding. Conclusion: DRDB is a comprehensive genomic resource database of date palm. It can serve as a bioinformatics platform for date palm genomics, genetics, and molecular breeding. DRDB is freely available at http://drdb.big.ac.cn/home. PMID:29209336

  16. Identification of single nucleotide polymorphism in ginger using expressed sequence tags

    PubMed Central

    Chandrasekar, Arumugam; Riju, Aikkal; Sithara, Kandiyl; Anoop, Sahadevan; Eapen, Santhosh J

    2009-01-01

    Ginger (Zingiber officinale Rosc) (Family: Zingiberaceae) is a herbaceous perennial, the rhizomes of which are used as a spice. Ginger is a plant which is well known for its medicinal applications. Recently EST-derived SNPs are a free by-product of the currently expanding EST (Expressed Sequence Tag) databases. The development of high-throughput methods for the detection of SNPs (Single Nucleotide Polymorphism) and small indels (insertion/deletion) has led to a revolution in their use as molecular markers. Available (38139) Ginger EST sequences were mined from dbEST of NCBI. CAP3 program was used to assemble EST sequences into contigs. Candidate SNPs and Indel polymorphisms were detected using the perl script AutoSNP version 1.0 which has used 31905 ESTs for detecting SNPs and Indel sites. We found 64026 SNP sites and 7034 indel polymorphisms with frequency of 0.84 SNPs / 100 bp. Among the three tissues from which the EST libraries had been generated, Rhizomes had high frequency of 1.08 SNPs/indels per 100 bp whereas the leaves had lowest frequency of 0.63 per 100 bp and root is showing relative frequency 0.82/100bp. Transitions and transversion ratio is 0.90. In overall detected SNP, transversion is high when compare to transition. These detected SNPs can be used as markers for genetic studies. Availability The results of the present study hosted in our webserver www.spices.res.in/spicesnip PMID:20198184

  17. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping.

    PubMed

    Hulse-Kemp, Amanda M; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A; Scheffler, Brian E; Fang, David D; Chen, Z Jeffrey; Van Deynze, Allen; Stelly, David M

    2015-04-09

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. Copyright © 2015 Hulse-Kemp et al.

  18. Ancestry prediction in Singapore population samples using the Illumina ForenSeq kit.

    PubMed

    Ramani, Anantharaman; Wong, Yongxun; Tan, Si Zhen; Shue, Bing Hong; Syn, Christopher

    2017-11-01

    The ability to predict bio-geographic ancestry can be valuable to generate investigative leads towards solving crimes. Ancestry informative marker (AIM) sets include large numbers of SNPs to predict an ancestral population. Massively parallel sequencing has enabled forensic laboratories to genotype a large number of such markers in a single assay. Illumina's ForenSeq DNA Signature Kit includes the ancestry informative SNPs reported by Kidd et al. In this study, the ancestry prediction capabilities of the ForenSeq kit through sequencing on the MiSeq FGx were evaluated in 1030 unrelated Singapore population samples of Chinese, Malay and Indian origin. A total of 59 ancestry SNPs and phenotypic SNPs with AIM properties were selected. The bio-geographic ancestry of the 1030 samples, as predicted by Illumina's ForenSeq Universal Analysis Software (UAS), was determined. 712 of the genotyped samples were used as a training sample set for the generation of an ancestry prediction model using STRUCTURE and Snipper. The performance of the prediction model was tested by both methods with the remaining 318 samples. Ancestry prediction in UAS was able to correctly classify the Singapore Chinese as part of the East Asian cluster, while Indians clustered with Ad-mixed Americans and Malays clustered in-between these two reference populations. Principal component analyses showed that the 59 SNPs were only able to account for 26% of the variation between the Singapore sub-populations. Their discriminatory potential was also found to be lower (G ST =0.085) than that reported in ALFRED (F ST =0.357). The Snipper algorithm was able to correctly predict bio-geographic ancestry in 91% of Chinese and Indian, and 88% of Malay individuals, while the success rates for the STRUCTURE algorithm were 94% in Chinese, 80% in Malay, and 91% in Indian individuals. Both these algorithms were able to provide admixture proportions when present. Ancestry prediction accuracy (in terms of likelihood ratio) was generally high in the absence of admixture. Misclassification occurred in admixed individuals, who were likely offspring of inter-ethnic marriages, and hence whose self-reported bio-geographic ancestries were dependent on that of their fathers, and in individuals of minority sub-populations with inter-ethnic beliefs. The ancestry prediction capabilities of the 59 SNPs on the ForenSeq kit were reasonably effective in differentiating the Singapore Chinese, Malay and Indian sub-populations, and will be of use for investigative purposes. However, there is potential for more accurate prediction through the evaluation of other AIM sets. Copyright © 2017 Elsevier B.V. All rights reserved.

  19. Statistical modelling of growth using a mixed model with orthogonal polynomials.

    PubMed

    Suchocki, T; Szyda, J

    2011-02-01

    In statistical modelling, the effects of single-nucleotide polymorphisms (SNPs) are often regarded as time-independent. However, for traits recorded repeatedly, it is very interesting to investigate the behaviour of gene effects over time. In the analysis, simulated data from the 13th QTL-MAS Workshop (Wageningen, The Netherlands, April 2009) was used and the major goal was the modelling of genetic effects as time-dependent. For this purpose, a mixed model which describes each effect using the third-order Legendre orthogonal polynomials, in order to account for the correlation between consecutive measurements, is fitted. In this model, SNPs are modelled as fixed, while the environment is modelled as random effects. The maximum likelihood estimates of model parameters are obtained by the expectation-maximisation (EM) algorithm and the significance of the additive SNP effects is based on the likelihood ratio test, with p-values corrected for multiple testing. For each significant SNP, the percentage of the total variance contributed by this SNP is calculated. Moreover, by using a model which simultaneously incorporates effects of all of the SNPs, the prediction of future yields is conducted. As a result, 179 from the total of 453 SNPs covering 16 out of 18 true quantitative trait loci (QTL) were selected. The correlation between predicted and true breeding values was 0.73 for the data set with all SNPs and 0.84 for the data set with selected SNPs. In conclusion, we showed that a longitudinal approach allows for estimating changes of the variance contributed by each SNP over time and demonstrated that, for prediction, the pre-selection of SNPs plays an important role.

  20. Design and characterization of a 52K SNP chip for goats.

    PubMed

    Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C M; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T; McEwan, John; Martin, Patrice; Moreno, Carole R; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang

    2014-01-01

    The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50-60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years.

  1. Design and Characterization of a 52K SNP Chip for Goats

    PubMed Central

    Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C. M.; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T.; McEwan, John; Martin, Patrice; Moreno, Carole R.; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L.; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang

    2014-01-01

    The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50–60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years. PMID:24465974

  2. When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

    PubMed Central

    Gardner, Shea N.; Hall, Barry G.

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125

  3. When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

    PubMed

    Gardner, Shea N; Hall, Barry G

    2013-01-01

    Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.

  4. Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints

    PubMed Central

    Suciu, Maria C.; Telenius, Jelena

    2017-01-01

    In the era of genome-wide association studies (GWAS) and personalized medicine, predicting the impact of single nucleotide polymorphisms (SNPs) in regulatory elements is an important goal. Current approaches to determine the potential of regulatory SNPs depend on inadequate knowledge of cell-specific DNA binding motifs. Here, we present Sasquatch, a new computational approach that uses DNase footprint data to estimate and visualize the effects of noncoding variants on transcription factor binding. Sasquatch performs a comprehensive k-mer-based analysis of DNase footprints to determine any k-mer's potential for protein binding in a specific cell type and how this may be changed by sequence variants. Therefore, Sasquatch uses an unbiased approach, independent of known transcription factor binding sites and motifs. Sasquatch only requires a single DNase-seq data set per cell type, from any genotype, and produces consistent predictions from data generated by different experimental procedures and at different sequence depths. Here we demonstrate the effectiveness of Sasquatch using previously validated functional SNPs and benchmark its performance against existing approaches. Sasquatch is available as a versatile webtool incorporating publicly available data, including the human ENCODE collection. Thus, Sasquatch provides a powerful tool and repository for prioritizing likely regulatory SNPs in the noncoding genome. PMID:28904015

  5. Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

    PubMed Central

    Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

    2004-01-01

    The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394

  6. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.

    PubMed

    Karchin, Rachel; Diekhans, Mark; Kelly, Libusha; Thomas, Daryl J; Pieper, Ursula; Eswar, Narayanan; Haussler, David; Sali, Andrej

    2005-06-15

    The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org http://salilab.org/LS-SNP/supp-info.pdf.

  7. The HIrisPlex-S system for eye, hair and skin colour prediction from DNA: Introduction and forensic developmental validation.

    PubMed

    Chaitanya, Lakshmi; Breslin, Krystal; Zuñiga, Sofia; Wirken, Laura; Pośpiech, Ewelina; Kukla-Bartoszek, Magdalena; Sijen, Titia; Knijff, Peter de; Liu, Fan; Branicki, Wojciech; Kayser, Manfred; Walsh, Susan

    2018-07-01

    Forensic DNA Phenotyping (FDP), i.e. the prediction of human externally visible traits from DNA, has become a fast growing subfield within forensic genetics due to the intelligence information it can provide from DNA traces. FDP outcomes can help focus police investigations in search of unknown perpetrators, who are generally unidentifiable with standard DNA profiling. Therefore, we previously developed and forensically validated the IrisPlex DNA test system for eye colour prediction and the HIrisPlex system for combined eye and hair colour prediction from DNA traces. Here we introduce and forensically validate the HIrisPlex-S DNA test system (S for skin) for the simultaneous prediction of eye, hair, and skin colour from trace DNA. This FDP system consists of two SNaPshot-based multiplex assays targeting a total of 41 SNPs via a novel multiplex assay for 17 skin colour predictive SNPs and the previous HIrisPlex assay for 24 eye and hair colour predictive SNPs, 19 of which also contribute to skin colour prediction. The HIrisPlex-S system further comprises three statistical prediction models, the previously developed IrisPlex model for eye colour prediction based on 6 SNPs, the previous HIrisPlex model for hair colour prediction based on 22 SNPs, and the recently introduced HIrisPlex-S model for skin colour prediction based on 36 SNPs. In the forensic developmental validation testing, the novel 17-plex assay performed in full agreement with the Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines, as previously shown for the 24-plex assay. Sensitivity testing of the 17-plex assay revealed complete SNP profiles from as little as 63 pg of input DNA, equalling the previously demonstrated sensitivity threshold of the 24-plex HIrisPlex assay. Testing of simulated forensic casework samples such as blood, semen, saliva stains, of inhibited DNA samples, of low quantity touch (trace) DNA samples, and of artificially degraded DNA samples as well as concordance testing, demonstrated the robustness, efficiency, and forensic suitability of the new 17-plex assay, as previously shown for the 24-plex assay. Finally, we provide an update to the publically available HIrisPlex website https://hirisplex.erasmusmc.nl/, now allowing the estimation of individual probabilities for 3 eye, 4 hair, and 5 skin colour categories from HIrisPlex-S input genotypes. The HIrisPlex-S DNA test represents the first forensically validated tool for skin colour prediction, and reflects the first forensically validated tool for simultaneous eye, hair and skin colour prediction from DNA. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Genetic polymorphisms associated with breast cancer in malaysian cohort.

    PubMed

    Chahil, Jagdish Kaur; Munretnam, Khamsigan; Samsudin, Nurulhafizah; Lye, Say Hean; Hashim, Nikman Adli Nor; Ramzi, Nurul Hanis; Velapasamy, Sharmila; Wee, Ler Lian; Alex, Livy

    2015-04-01

    Genome-wide association studies have discovered multiple single nucleotide polymorphisms (SNPs) associated with the risk of common diseases. The objective of this study was to demonstrate the replication of previously published SNPs that showed statistical significance for breast cancer in the Malaysian population. In this case-control study, 80 subjects for each group were recruited from various hospitals in Malaysia. A total of 768 SNPs were genotyped and analyzed to distinguish risk and protective alleles. A total of three SNPs were found to be associated with increased risk of breast cancer while six SNPs showed protective effect. All nine were statistically significant SNPs (p ≤ 0.01), five SNPs from previous studies were successfully replicated in our study. Significant modifiable (diet) and non-modifiable (family history of breast cancer in first degree relative) risk factors were also observed. We identified nine SNPs from this study to be either conferring susceptibility or protection to breast cancer which may serve as potential markers in risk prediction.

  9. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  10. Ultra-low-density genotype panels for breed assignment of Angus and Hereford cattle.

    PubMed

    Judge, M M; Kelleher, M M; Kearney, J F; Sleator, R D; Berry, D P

    2017-06-01

    Angus and Hereford beef is marketed internationally for apparent superior meat quality attributes; DNA-based breed authenticity could be a useful instrument to ensure consumer confidence on premium meat products. The objective of this study was to develop an ultra-low-density genotype panel to accurately quantify the Angus and Hereford breed proportion in biological samples. Medium-density genotypes (13 306 single nucleotide polymorphisms (SNPs)) were available on 54 703 commercial and 4042 purebred animals. The breed proportion of the commercial animals was generated from the medium-density genotypes and this estimate was regarded as the gold-standard breed composition. Ten genotype panels (100 to 1000 SNPs) were developed from the medium-density genotypes; five methods were used to identify the most informative SNPs and these included the Delta statistic, the fixation (F st) statistic and an index of both. Breed assignment analyses were undertaken for each breed, panel density and SNP selection method separately with a programme to infer population structure using the entire 13 306 SNP panel (representing the gold-standard measure). Breed assignment was undertaken for all commercial animals (n=54 703), animals deemed to contain some proportion of Angus based on pedigree (n=5740) and animals deemed to contain some proportion of Hereford based on pedigree (n=5187). The predicted breed proportion of all animals from the lower density panels was then compared with the gold-standard breed prediction. Panel density, SNP selection method and breed all had a significant effect on the correlation of predicted and actual breed proportion. Regardless of breed, the Index method of SNP selection numerically (but not significantly) outperformed all other selection methods in accuracy (i.e. correlation and root mean square of prediction) when panel density was ⩾300 SNPs. The correlation between actual and predicted breed proportion increased as panel density increased. Using 300 SNPs (selected using the global index method), the correlation between predicted and actual breed proportion was 0.993 and 0.995 in the Angus and Hereford validation populations, respectively. When SNP panels optimised for breed prediction in one population were used to predict the breed proportion of a separate population, the correlation between predicted and actual breed proportion was 0.034 and 0.044 weaker in the Hereford and Angus populations, respectively (using the 300 SNP panel). It is necessary to include at least 300 to 400 SNPs (per breed) on genotype panels to accurately predict breed proportion from biological samples.

  11. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less

  12. Bioinformatic analyses to select phenotype affecting polymorphisms in HTR2C gene.

    PubMed

    Piva, Francesco; Giulietti, Matteo; Baldelli, Luisa; Nardi, Bernardo; Bellantuono, Cesario; Armeni, Tatiana; Saccucci, Franca; Principato, Giovanni

    2011-08-01

    Single nucleotide polymorphisms (SNPs) in serotonin related genes influence mental disorders, responses to pharmacological and psychotherapeutic treatments. In planning association studies, researchers that want to investigate new SNPs have to select some among a large number of candidates. Our aim is to guide researchers in the selection of the most likely phenotype affecting polymorphisms. Here, we studied serotonin receptor 2C (HTR2C) SNPs because, till now, only relatively few of about 2000 are investigated. We used the most updated and assessed bioinformatic tools to predict which variations can give rise to biological effects among 2450 HTR2C SNPs. We suggest 48 SNPs that are worth considering in future association studies in the field of psychiatry, psychology and pharmacogenomics. Moreover, our analyses point out the biological level probably affected, such as transcription, splicing, miRNA regulation and protein structure, thus allowing to suggest future molecular investigations. Although few association studies are available in literature, their results are in agreement with our predictions, showing that our selection methods can help to guide future association studies. Copyright © 2011 John Wiley & Sons, Ltd.

  13. In silico SNP analysis of the breast cancer antigen NY-BR-1.

    PubMed

    Kosaloglu, Zeynep; Bitzer, Julia; Halama, Niels; Huang, Zhiqin; Zapatka, Marc; Schneeweiss, Andreas; Jäger, Dirk; Zörnig, Inka

    2016-11-18

    Breast cancer is one of the most common malignancies with increasing incidences every year and a leading cause of death among women. Although early stage breast cancer can be effectively treated, there are limited numbers of treatment options available for patients with advanced and metastatic disease. The novel breast cancer associated antigen NY-BR-1 was identified by SEREX analysis and is expressed in the majority (>70%) of breast tumors as well as metastases, in normal breast tissue, in testis and occasionally in prostate tissue. The biological function and regulation of NY-BR-1 is up to date unknown. We performed an in silico analysis on the genetic variations of the NY-BR-1 gene using data available in public SNP databases and the tools SIFT, Polyphen and Provean to find possible functional SNPs. Additionally, we considered the allele frequency of the found damaging SNPs and also analyzed data from an in-house sequencing project of 55 breast cancer samples for recurring SNPs, recorded in dbSNP. Over 2800 SNPs are recorded in the dbSNP and NHLBI ESP databases for the NY-BR-1 gene. Of these, 65 (2.07%) are synonymous SNPs, 191 (6.09%) are non-synoymous SNPs, and 2430 (77.48%) are noncoding intronic SNPs. As a result, 69 non-synoymous SNPs were predicted to be damaging by at least two, and 16 SNPs were predicted as damaging by all three of the used tools. The SNPs rs200639888, rs367841401 and rs377750885 were categorized as highly damaging by all three tools. Eight damaging SNPs are located in the ankyrin repeat domain (ANK), a domain known for its frequent involvement in protein-protein interactions. No distinctive features could be observed in the allele frequency of the analyzed SNPs. Considering these results we expect to gain more insights into the variations of the NY-BR-1 gene and their possible impact on giving rise to splice variants and therefore influence the function of NY-BR-1 in healthy tissue as well as in breast cancer.

  14. Application of whole genome sequence data in analyzing the molecular epidemiology of Shiga toxin-producing Escherichia coli O157:H7/H.

    PubMed

    Yokoyama, Eiji; Hirai, Shinichiro; Ishige, Taichiro; Murakami, Satoshi

    2018-01-02

    Seventeen clusters of Shiga toxin-producing Escherichia coli O157:H7/- (O157) strains, determined by cluster analysis of pulsed-field gel electrophoresis patterns, were analyzed using whole genome sequence (WGS) data to investigate this pathogen's molecular epidemiology. The 17 clusters included 136 strains containing strains from nine outbreaks, with each outbreak caused by a single source contaminated with the organism, as shown by epidemiological contact surveys. WGS data of these strains were used to identify single nucleotide polymorphisms (SNPs) by two methods: short read data were directly mapped to a reference genome (mapping derived SNPs) and common SNPs between the mapping derived SNPs and SNPs in assembled data of short read data (common SNPs). Among both SNPs, those that were detected in genes with a gap were excluded to remove ambiguous SNPs from further analysis. The effectiveness of both SNPs was investigated among all the concatenated SNPs that were detected (whole SNP set); SNPs were divided into three categories based on the genes in which they were located (i.e., backbone SNP set, O-island SNP set, and mobile element SNP set); and SNPs in non-coding regions (intergenic region SNP set). When SNPs from strains isolated from the nine single source derived outbreaks were analyzed using an unweighted pair group method with arithmetic mean tree (UPGMA) and a minimum spanning tree (MST), the maximum pair-wise distances of the backbone SNP set of the mapping derived SNPs were significantly smaller than those of the whole and intergenic region SNP set on both UPGMAs and MSTs. This significant difference was also observed when the backbone SNP set of the common SNPs were examined (Steel-Dwass test, P≤0.01). When the maximum pair-wise distances were compared between the mapping derived and common SNPs, significant differences were observed in those of the whole, mobile element, and intergenic region SNP set (Wilcoxon signed rank test, P≤0.01). When all the strains included in one complex on an MST or one cluster on a UPGMA were designated as the same genotype, the values of the Hunter-Gaston Discriminatory Power Index for the backbone SNP set of the mapping derived and common SNPs were higher than those of other SNP sets. In contrast, the mobile element SNP set could not robustly subdivide lineage I strains of tested O157 strains using both the mapping derived and common SNPs. These results suggested that the backbone SNP set were the most effective for analysis of WGS data for O157 in enabling an appropriation of its molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. Improved Genetic Profiling of Anthropometric Traits Using a Big Data Approach.

    PubMed

    Canela-Xandri, Oriol; Rawlik, Konrad; Woolliams, John A; Tenesa, Albert

    2016-01-01

    Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach, the use of which has traditionally been limited due to computational requirements and lack of sufficiently large individual level genotyped cohorts, but which improve the prediction of multiple medically relevant phenotypes using the same panel of SNPs. As a proof of principle, we used a shared panel of 319,038 common SNPs with MAF > 0.05 to train the prediction models in 114,264 unrelated White-British individuals for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given the captured heritable component. For height, this represents an improvement in prediction accuracy of up to 68% (184% more phenotypic variance explained) over SNPs reported to be robustly associated with height in a previous GWAS meta-analysis of similar size. Across-population predictions in White non-British individuals were similar to those in White-British whilst those in Asian and Black individuals were informative but less accurate. We estimate that the genotyping of circa 500,000 unrelated individuals will yield predictions between 66% and 82% of the SNP-heritability captured by common variants in our array. Prediction accuracies did not improve when including rarer SNPs or when fitting multiple traits jointly in multivariate models.

  16. SNPs in putative regulatory regions identified by human mouse comparative sequencing and transcription factor binding site data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Banerjee, Poulabi; Bahlo, Melanie; Schwartz, Jody R.

    2002-01-01

    Genome wide disease association analysis using SNPs is being explored as a method for dissecting complex genetic traits and a vast number of SNPs have been generated for this purpose. As there are cost and throughput limitations of genotyping large numbers of SNPs and statistical issues regarding the large number of dependent tests on the same data set, to make association analysis practical it has been proposed that SNPs should be prioritized based on likely functional importance. The most easily identifiable functional SNPs are coding SNPs (cSNPs) and accordingly cSNPs have been screened in a number of studies. SNPs inmore » gene regulatory sequences embedded in noncoding DNA are another class of SNPs suggested for prioritization due to their predicted quantitative impact on gene expression. The main challenge in evaluating these SNPs, in contrast to cSNPs is a lack of robust algorithms and databases for recognizing regulatory sequences in noncoding DNA. Approaches that have been previously used to delineate noncoding sequences with gene regulatory activity include cross-species sequence comparisons and the search for sequences recognized by transcription factors. We combined these two methods to sift through mouse human genomic sequences to identify putative gene regulatory elements and subsequently localized SNPs within these sequences in a 1 Megabase (Mb) region of human chromosome 5q31, orthologous to mouse chromosome 11 containing the Interleukin cluster.« less

  17. Sasquatch: predicting the impact of regulatory SNPs on transcription factor binding from cell- and tissue-specific DNase footprints.

    PubMed

    Schwessinger, Ron; Suciu, Maria C; McGowan, Simon J; Telenius, Jelena; Taylor, Stephen; Higgs, Doug R; Hughes, Jim R

    2017-10-01

    In the era of genome-wide association studies (GWAS) and personalized medicine, predicting the impact of single nucleotide polymorphisms (SNPs) in regulatory elements is an important goal. Current approaches to determine the potential of regulatory SNPs depend on inadequate knowledge of cell-specific DNA binding motifs. Here, we present Sasquatch, a new computational approach that uses DNase footprint data to estimate and visualize the effects of noncoding variants on transcription factor binding. Sasquatch performs a comprehensive k -mer-based analysis of DNase footprints to determine any k -mer's potential for protein binding in a specific cell type and how this may be changed by sequence variants. Therefore, Sasquatch uses an unbiased approach, independent of known transcription factor binding sites and motifs. Sasquatch only requires a single DNase-seq data set per cell type, from any genotype, and produces consistent predictions from data generated by different experimental procedures and at different sequence depths. Here we demonstrate the effectiveness of Sasquatch using previously validated functional SNPs and benchmark its performance against existing approaches. Sasquatch is available as a versatile webtool incorporating publicly available data, including the human ENCODE collection. Thus, Sasquatch provides a powerful tool and repository for prioritizing likely regulatory SNPs in the noncoding genome. © 2017 Schwessinger et al.; Published by Cold Spring Harbor Laboratory Press.

  18. Multiple SNPs Within and Surrounding the Apolipoprotein E Gene Influence Cerebrospinal Fluid Apolipoprotein E Protein Levels

    PubMed Central

    Bekris, Lynn M.; Millard, Steven P.; Galloway, Nichole M.; Vuletic, Simona; Albers, John J.; Li, Ge; Galasko, Douglas R.; DeCarli, Charles; Farlow, Martin R.; Clark, Chris M.; Quinn, Joseph F.; Kaye, Jeffrey A.; Schellenberg, Gerard D.; Tsuang, Debby; Peskind, Elaine R.; Yu, Chang-En

    2010-01-01

    The ε4 allele of the apolipoprotein E gene (APOE) is associated with increased risk and earlier age at onset in late onset Alzheimer’s disease (AD). Other factors, such as expression level of apolipoprotein E protein (apoE), have been postulated to modify the APOE related risk of developing AD. Multiple loci in and outside of APOE are associated with a high risk of AD. The aim of this exploratory hypothesis generating investigation was to determine if some of these loci predict cerebrospinal fluid (CSF) apoE levels in healthy non-demented subjects. CSF apoE levels were measured from healthy non-demented subjects 21–87 years of age (n = 134). Backward regression models were used to evaluate the influence of 21 SNPs, within and surrounding APOE, on CSF apoE levels while taking into account age, gender, APOE ε4 and correlation between SNPs (linkage disequilibrium). APOE ε4 genotype does not predict CSF apoE levels. Three SNPs within the TOMM40 gene, one APOE promoter SNP and two SNPs within distal APOE enhancer elements (ME1 and BCR) predict CSF apoE levels. Further investigation of the genetic influence of these loci on apoE expression levels in the central nervous system is likely to provide new insight into apoE regulation as well as AD pathogenesis. PMID:18430993

  19. Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

    PubMed Central

    Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

    2012-01-01

    Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909

  20. Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.

    PubMed

    Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

    2014-12-01

    The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.

  1. Educational Attainment: A Genome Wide Association Study in 9538 Australians

    PubMed Central

    Martin, Nicolas W.; Medland, Sarah E.; Verweij, Karin J. H.; Lee, S. Hong; Nyholt, Dale R.; Madden, Pamela A.; Heath, Andrew C.; Montgomery, Grant W.; Wright, Margaret J.; Martin, Nicholas G.

    2011-01-01

    Background Correlations between Educational Attainment (EA) and measures of cognitive performance are as high as 0.8. This makes EA an attractive alternative phenotype for studies wishing to map genes affecting cognition due to the ease of collecting EA data compared to other cognitive phenotypes such as IQ. Methodology In an Australian family sample of 9538 individuals we performed a genome-wide association scan (GWAS) using the imputed genotypes of ∼2.4 million single nucleotide polymorphisms (SNP) for a 6-point scale measure of EA. Top hits were checked for replication in an independent sample of 968 individuals. A gene-based test of association was then applied to the GWAS results. Additionally we performed prediction analyses using the GWAS results from our discovery sample to assess the percentage of EA and full scale IQ variance explained by the predicted scores. Results The best SNP fell short of having a genome-wide significant p-value (p = 9.77×10−7). In our independent replication sample six SNPs among the top 50 hits pruned for linkage disequilibrium (r2<0.8) had a p-value<0.05 but only one of these SNPs survived correction for multiple testing - rs7106258 (p = 9.7*10−4) located in an intergenic region of chromosome 11q14.1. The gene based test results were non-significant and our prediction analyses show that the predicted scores explained little variance in EA in our replication sample. Conclusion While we have identified a polymorphism chromosome 11q14.1 associated with EA, further replication is warranted. Overall, the absence of genome-wide significant p-values in our large discovery sample confirmed the high polygenic architecture of EA. Only the assembly of large samples or meta-analytic efforts will be able to assess the implication of common DNA polymorphisms in the etiology of EA. PMID:21694764

  2. A survey of genome-wide single nucleotide polymorphisms through genome resequencing in the Périgord black truffle (Tuber melanosporum Vittad.).

    PubMed

    Payen, Thibaut; Murat, Claude; Gigant, Anaïs; Morin, Emmanuelle; De Mita, Stéphane; Martin, Francis

    2015-09-01

    The Périgord black truffle (Tuber melanosporum Vittad.), considered a gastronomic delicacy worldwide, is an ectomycorrhizal filamentous fungus that is ecologically important in Mediterranean French, Italian and Spanish woodlands. In this study, we developed a novel resource of single nucleotide polymorphisms (SNPs) for T. melanosporum using Illumina high-throughput resequencing. The genome from six T. melanosporum geographical accessions was sequenced to a depth of approximately 20×. These geographical accessions were selected from different populations within the northern and southern regions of the geographical species distribution. Approximately 80% of the reads for each of the six resequenced geographical accessions mapped against the reference T. melanosporum genome assembly, estimating the core genome size of this organism to be approximately 110 Mbp. A total of 442 326 SNPs corresponding to 3540 SNPs/Mbps were identified as being included in all seven genomes. The SNPs occurred more frequently in repeated sequences (85%), although 4501 SNPs were also identified in the coding regions of 2587 genes. Using the ratio of nonsynonymous mutations per nonsynonymous site (pN) to synonymous mutations per synonymous site (pS) and Tajima's D index scanning the whole genome, we were able to identify genomic regions and genes potentially subjected to positive or purifying selection. The SNPs identified represent a valuable resource for future population genetics and genomics studies. © 2015 John Wiley & Sons Ltd.

  3. In silico prediction of a disease-associated STIL mutant and its affect on the recruitment of centromere protein J (CENPJ).

    PubMed

    Kumar, Ambuj; Rajendran, Vidya; Sethumadhavan, Rao; Purohit, Rituraj

    2012-01-01

    Human STIL (SCL/TAL1 interrupting locus) protein maintains centriole stability and spindle pole localisation. It helps in recruitment of CENPJ (Centromere protein J)/CPAP (centrosomal P4.1-associated protein) and other centrosomal proteins. Mutations in STIL protein are reported in several disorders, especially in deregulation of cell cycle cascades. In this work, we examined the non-synonymous single nucleotide polymorphisms (nsSNPs) reported in STIL protein for their disease association. Different SNP prediction tools were used to predict disease-associated nsSNPs. Our evaluation technique predicted rs147744459 (R242C) as a highly deleterious disease-associated nsSNP and its interaction behaviour with CENPJ protein. Molecular modelling, docking and molecular dynamics simulation were conducted to examine the structural consequences of the predicted disease-associated mutation. By molecular dynamic simulation we observed structural consequences of R242C mutation which affects interaction of STIL and CENPJ functional domains. The result obtained in this study will provide a biophysical insight into future investigations of pathological nsSNPs using a computational platform.

  4. Genome-environment associations in sorghum landraces predict adaptive traits

    PubMed Central

    Lasky, Jesse R.; Upadhyaya, Hari D.; Ramu, Punna; Deshpande, Santosh; Hash, C. Tom; Bonnette, Jason; Juenger, Thomas E.; Hyma, Katie; Acharya, Charlotte; Mitchell, Sharon E.; Buckler, Edward S.; Brenton, Zachary; Kresovich, Stephen; Morris, Geoffrey P.

    2015-01-01

    Improving environmental adaptation in crops is essential for food security under global change, but phenotyping adaptive traits remains a major bottleneck. If associations between single-nucleotide polymorphism (SNP) alleles and environment of origin in crop landraces reflect adaptation, then these could be used to predict phenotypic variation for adaptive traits. We tested this proposition in the global food crop Sorghum bicolor, characterizing 1943 georeferenced landraces at 404,627 SNPs and quantifying allelic associations with bioclimatic and soil gradients. Environment explained a substantial portion of SNP variation, independent of geographical distance, and genic SNPs were enriched for environmental associations. Further, environment-associated SNPs predicted genotype-by-environment interactions under experimental drought stress and aluminum toxicity. Our results suggest that genomic signatures of environmental adaptation may be useful for crop improvement, enhancing germplasm identification and marker-assisted selection. Together, genome-environment associations and phenotypic analyses may reveal the basis of environmental adaptation. PMID:26601206

  5. Development and implementation of a highly-multiplexed SNP array for genetic mapping in maritime pine and comparative mapping with loblolly pine

    PubMed Central

    2011-01-01

    Background Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. Results We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. Conclusions Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers. PMID:21767361

  6. Identification of type 2 diabetes-associated combination of SNPs using support vector machine.

    PubMed

    Ban, Hyo-Jeong; Heo, Jee Yeon; Oh, Kyung-Soo; Park, Keun-Joon

    2010-04-23

    Type 2 diabetes mellitus (T2D), a metabolic disorder characterized by insulin resistance and relative insulin deficiency, is a complex disease of major public health importance. Its incidence is rapidly increasing in the developed countries. Complex diseases are caused by interactions between multiple genes and environmental factors. Most association studies aim to identify individual susceptibility single markers using a simple disease model. Recent studies are trying to estimate the effects of multiple genes and multi-locus in genome-wide association. However, estimating the effects of association is very difficult. We aim to assess the rules for classifying diseased and normal subjects by evaluating potential gene-gene interactions in the same or distinct biological pathways. We analyzed the importance of gene-gene interactions in T2D susceptibility by investigating 408 single nucleotide polymorphisms (SNPs) in 87 genes involved in major T2D-related pathways in 462 T2D patients and 456 healthy controls from the Korean cohort studies. We evaluated the support vector machine (SVM) method to differentiate between cases and controls using SNP information in a 10-fold cross-validation test. We achieved a 65.3% prediction rate with a combination of 14 SNPs in 12 genes by using the radial basis function (RBF)-kernel SVM. Similarly, we investigated subpopulation data sets of men and women and identified different SNP combinations with the prediction rates of 70.9% and 70.6%, respectively. As the high-throughput technology for genome-wide SNPs improves, it is likely that a much higher prediction rate with biologically more interesting combination of SNPs can be acquired by using this method. Support Vector Machine based feature selection method in this research found novel association between combinations of SNPs and T2D in a Korean population.

  7. De novo assembly of the transcriptome of Aegiceras corniculatum, a mangrove species in the Indo-West Pacific region.

    PubMed

    Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua

    2016-08-01

    Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies

    PubMed Central

    Widmer, Christian; Lippert, Christoph; Weissbrod, Omer; Fusi, Nicolo; Kadie, Carl; Davidson, Robert; Listgarten, Jennifer; Heckerman, David

    2014-01-01

    We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science. PMID:25387525

  9. SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity.

    PubMed

    Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp

    2017-01-04

    SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Further Improvements to Linear Mixed Models for Genome-Wide Association Studies

    NASA Astrophysics Data System (ADS)

    Widmer, Christian; Lippert, Christoph; Weissbrod, Omer; Fusi, Nicolo; Kadie, Carl; Davidson, Robert; Listgarten, Jennifer; Heckerman, David

    2014-11-01

    We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

  11. Further improvements to linear mixed models for genome-wide association studies.

    PubMed

    Widmer, Christian; Lippert, Christoph; Weissbrod, Omer; Fusi, Nicolo; Kadie, Carl; Davidson, Robert; Listgarten, Jennifer; Heckerman, David

    2014-11-12

    We examine improvements to the linear mixed model (LMM) that better correct for population structure and family relatedness in genome-wide association studies (GWAS). LMMs rely on the estimation of a genetic similarity matrix (GSM), which encodes the pairwise similarity between every two individuals in a cohort. These similarities are estimated from single nucleotide polymorphisms (SNPs) or other genetic variants. Traditionally, all available SNPs are used to estimate the GSM. In empirical studies across a wide range of synthetic and real data, we find that modifications to this approach improve GWAS performance as measured by type I error control and power. Specifically, when only population structure is present, a GSM constructed from SNPs that well predict the phenotype in combination with principal components as covariates controls type I error and yields more power than the traditional LMM. In any setting, with or without population structure or family relatedness, a GSM consisting of a mixture of two component GSMs, one constructed from all SNPs and another constructed from SNPs that well predict the phenotype again controls type I error and yields more power than the traditional LMM. Software implementing these improvements and the experimental comparisons are available at http://microsoft.com/science.

  12. Developing a clinical utility framework to evaluate prediction models in radiogenomics

    NASA Astrophysics Data System (ADS)

    Wu, Yirong; Liu, Jie; Munoz del Rio, Alejandro; Page, David C.; Alagoz, Oguzhan; Peissig, Peggy; Onitilo, Adedayo A.; Burnside, Elizabeth S.

    2015-03-01

    Combining imaging and genetic information to predict disease presence and behavior is being codified into an emerging discipline called "radiogenomics." Optimal evaluation methodologies for radiogenomics techniques have not been established. We aim to develop a clinical decision framework based on utility analysis to assess prediction models for breast cancer. Our data comes from a retrospective case-control study, collecting Gail model risk factors, genetic variants (single nucleotide polymorphisms-SNPs), and mammographic features in Breast Imaging Reporting and Data System (BI-RADS) lexicon. We first constructed three logistic regression models built on different sets of predictive features: (1) Gail, (2) Gail+SNP, and (3) Gail+SNP+BI-RADS. Then, we generated ROC curves for three models. After we assigned utility values for each category of findings (true negative, false positive, false negative and true positive), we pursued optimal operating points on ROC curves to achieve maximum expected utility (MEU) of breast cancer diagnosis. We used McNemar's test to compare the predictive performance of the three models. We found that SNPs and BI-RADS features augmented the baseline Gail model in terms of the area under ROC curve (AUC) and MEU. SNPs improved sensitivity of the Gail model (0.276 vs. 0.147) and reduced specificity (0.855 vs. 0.912). When additional mammographic features were added, sensitivity increased to 0.457 and specificity to 0.872. SNPs and mammographic features played a significant role in breast cancer risk estimation (p-value < 0.001). Our decision framework comprising utility analysis and McNemar's test provides a novel framework to evaluate prediction models in the realm of radiogenomics.

  13. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels

    PubMed Central

    Jasim, Anfal A.; Al-Bustan, Suzanne A.; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 (APOA5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3′ UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism. PMID:29686695

  14. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels.

    PubMed

    Jasim, Anfal A; Al-Bustan, Suzanne A; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 ( APOA 5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3' UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism.

  15. Association of single-nucleotide polymorphisms of the tau gene with late-onset Parkinson disease.

    PubMed

    Martin, E R; Scott, W K; Nance, M A; Watts, R L; Hubble, J P; Koller, W C; Lyons, K; Pahwa, R; Stern, M B; Colcher, A; Hiner, B C; Jankovic, J; Ondo, W G; Allen, F H; Goetz, C G; Small, G W; Masterman, D; Mastaglia, F; Laing, N G; Stajich, J M; Ribble, R C; Booze, M W; Rogala, A; Hauser, M A; Zhang, F; Gibson, R A; Middleton, L T; Roses, A D; Haines, J L; Scott, B L; Pericak-Vance, M A; Vance, J M

    2001-11-14

    The human tau gene, which promotes assembly of neuronal microtubules, has been associated with several rare neurologic diseases that clinically include parkinsonian features. We recently observed linkage in idiopathic Parkinson disease (PD) to a region on chromosome 17q21 that contains the tau gene. These factors make tau a good candidate for investigation as a susceptibility gene for idiopathic PD, the most common form of the disease. To investigate whether the tau gene is involved in idiopathic PD. Among a sample of 1056 individuals from 235 families selected from 13 clinical centers in the United States and Australia and from a family ascertainment core center, we tested 5 single-nucleotide polymorphisms (SNPs) within the tau gene for association with PD, using family-based tests of association. Both affected (n = 426) and unaffected (n = 579) family members were included; 51 individuals had unclear PD status. Analyses were conducted to test individual SNPs and SNP haplotypes within the tau gene. Family-based tests of association, calculated using asymptotic distributions. Analysis of association between the SNPs and PD yielded significant evidence of association for 3 of the 5 SNPs tested: SNP 3, P =.03; SNP 9i, P =.04; and SNP 11, P =.04. The 2 other SNPs did not show evidence of significant association (SNP 9ii, P =.11, and SNP 9iii, P =.87). Strong evidence of association was found with haplotype analysis, with a positive association with one haplotype (P =.009) and a negative association with another haplotype (P =.007). Substantial linkage disequilibrium (P<.001) was detected between 4 of the 5 SNPs (SNPs 3, 9i, 9ii, and 11). This integrated approach of genetic linkage and positional association analyses implicates tau as a susceptibility gene for idiopathic PD.

  16. A small MRI contrast agent library of gadolinium(III)-encapsulated supramolecular nanoparticles for improved relaxivity and sensitivity**

    PubMed Central

    Chen, Kuan-Ju; Wolahan, Stephanie M.; Wang, Hao; Hsu, Chao-Hsiung; Chang, Hsing-Wei; Durazo, Armando; Hwang, Lian-Pin; Garcia, Mitch A.; Jiang, Ziyue Karen; Wu, Lily

    2010-01-01

    We introduce a new category of nanoparticle-based T1 MRI contrast agents (CAs) by encapsulating paramagnetic chelated gadolinium(III), i.e., Gd3+·DOTA, through supramolecular assembly of molecular building blocks that carry complementary molecular recognition motifs, including adamantane (Ad) and β-cyclodextrin (CD). A small library of Gd3+·DOTA-encapsulated supramolecular nanoparticles (Gd3+·DOTA⊂SNPs) was produced by systematically altering the molecular building block mixing ratios. A broad spectrum of relaxation rates was correlated to the resulting Gd3+·DOTA⊂SNP library. Consequently, an optimal synthetic formulation of Gd3+·DOTA⊂SNPs with an r1 of 17.3 s−1mM−1 (ca. 4-fold higher than clinical Gd3+ chelated complexes at high field strengths) was identified. T1-weighted imaging of Gd3+·DOTA⊂SNPs exhibits an enhanced sensitivity with a contrast-to-noise ratio (C/N ratio) ca. 3.6 times greater than that observed for free Gd3+·DTPA. A Gd3+·DOTA⊂SNPs solution was injected into foot pads of mice, and MRI was employed to monitor dynamic lymphatic drainage of the Gd3+·DOTA⊂SNPs-based CA. We observe an increase in signal intensity of the brachial lymph node in T1-weighted imaging after injecting Gd3+·DOTA⊂SNPs but not after injecting Gd3+·DTPA. The MRI results are supported by ICP-MS analysis ex vivo. These results show that Gd3+·DOTA⊂SNPs not only exhibits enhanced relaxivity and high sensitivity but also can serve as a potential tool for diagnosis of cancer metastasis. PMID:21167594

  17. Prediction of Disease Causing Non-Synonymous SNPs by the Artificial Neural Network Predictor NetDiseaseSNP

    PubMed Central

    Johansen, Morten Bo; Izarzugaza, Jose M. G.; Brunak, Søren; Petersen, Thomas Nordahl; Gupta, Ramneek

    2013-01-01

    We have developed a sequence conservation-based artificial neural network predictor called NetDiseaseSNP which classifies nsSNPs as disease-causing or neutral. Our method uses the excellent alignment generation algorithm of SIFT to identify related sequences and a combination of 31 features assessing sequence conservation and the predicted surface accessibility to produce a single score which can be used to rank nsSNPs based on their potential to cause disease. NetDiseaseSNP classifies successfully disease-causing and neutral mutations. In addition, we show that NetDiseaseSNP discriminates cancer driver and passenger mutations satisfactorily. Our method outperforms other state-of-the-art methods on several disease/neutral datasets as well as on cancer driver/passenger mutation datasets and can thus be used to pinpoint and prioritize plausible disease candidates among nsSNPs for further investigation. NetDiseaseSNP is publicly available as an online tool as well as a web service: http://www.cbs.dtu.dk/services/NetDiseaseSNP PMID:23935863

  18. Exploring the deleterious SNPs in XRCC4 gene using computational approach and studying their association with breast cancer in the population of West India.

    PubMed

    Singh, Preety K; Mistry, Kinnari N; Chiramana, Haritha; Rank, Dharamshi N; Joshi, Chaitanya G

    2018-05-20

    Non-homologous end joining (NHEJ) pathway has pivotal role in repair of double-strand DNA breaks that may lead to carcinogenesis. XRCC4 is one of the essential proteins of this pathway and single-nucleotide polymorphisms (SNPs) of this gene are reported to be associated with cancer risks. In our study, we first used computational approaches to predict the damaging variants of XRCC4 gene. Tools predicted rs79561451 (S110P) nsSNP as the most deleterious SNP. Along with this SNP, we analysed other two SNPs (rs3734091 and rs6869366) to study their association with breast cancer in population of West India. Variant rs3734091 was found to be significantly associated with breast cancer while rs6869366 variant did not show any association. These SNPs may influence the susceptibility of individuals to breast cancer in this population. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Mutations that Cause Human Disease: A Computational/Experimental Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Beernink, P; Barsky, D; Pesavento, B

    International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less

  20. Incorporating Single-nucleotide Polymorphisms Into the Lyman Model to Improve Prediction of Radiation Pneumonitis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tucker, Susan L., E-mail: sltucker@mdanderson.org; Li Minghuan; Xu Ting

    2013-01-01

    Purpose: To determine whether single-nucleotide polymorphisms (SNPs) in genes associated with DNA repair, cell cycle, transforming growth factor-{beta}, tumor necrosis factor and receptor, folic acid metabolism, and angiogenesis can significantly improve the fit of the Lyman-Kutcher-Burman (LKB) normal-tissue complication probability (NTCP) model of radiation pneumonitis (RP) risk among patients with non-small cell lung cancer (NSCLC). Methods and Materials: Sixteen SNPs from 10 different genes (XRCC1, XRCC3, APEX1, MDM2, TGF{beta}, TNF{alpha}, TNFR, MTHFR, MTRR, and VEGF) were genotyped in 141 NSCLC patients treated with definitive radiation therapy, with or without chemotherapy. The LKB model was used to estimate the risk ofmore » severe (grade {>=}3) RP as a function of mean lung dose (MLD), with SNPs and patient smoking status incorporated into the model as dose-modifying factors. Multivariate analyses were performed by adding significant factors to the MLD model in a forward stepwise procedure, with significance assessed using the likelihood-ratio test. Bootstrap analyses were used to assess the reproducibility of results under variations in the data. Results: Five SNPs were selected for inclusion in the multivariate NTCP model based on MLD alone. SNPs associated with an increased risk of severe RP were in genes for TGF{beta}, VEGF, TNF{alpha}, XRCC1 and APEX1. With smoking status included in the multivariate model, the SNPs significantly associated with increased risk of RP were in genes for TGF{beta}, VEGF, and XRCC3. Bootstrap analyses selected a median of 4 SNPs per model fit, with the 6 genes listed above selected most often. Conclusions: This study provides evidence that SNPs can significantly improve the predictive ability of the Lyman MLD model. With a small number of SNPs, it was possible to distinguish cohorts with >50% risk vs <10% risk of RP when they were exposed to high MLDs.« less

  1. FTO Polymorphisms Moderate the Association of Food Reinforcement with Energy Intake

    PubMed Central

    Scheid, Jennifer L.; Carr, Katelyn A.; Lin, Henry; Fletcher, Kelly D.; Sucheston, Lara; Singh, Prashant K.; Salis, Robbert; Erbe, Richard; Faith, Myles S.; Allison, David B.; Epstein, Leonard H.

    2015-01-01

    Food reinforcement (RRVfood) is related to increased energy intake, cross-sectionally related to obesity, and prospectively related to weight gain. The fat mass and obesity-associated (FTO) gene is related to elevated body mass index and increased energy intake. The primary purpose of the current study was to determine whether any of 68 FTO single nucleotide polymorphisms (SNPs) or a FTO risk score moderate the association between food reinforcement and energy or macronutrient intake. Energy and macronutrient intake was measured using a laboratory ad libitum snack food consumption task in 237 adults of varying BMI. Controlling for BMI, the relative reinforcing value of reading (RRVreading) and proportion of African ancestry, RRVfood predicted 14.2% of the variance in energy intake, as well as predicted carbohydrate, fat, protein and sugar intake. In individual analyses, six FTO SNPs (rs12921970, rs9936768, rs12446047, rs7199716, rs8049933 and rs11076022, spanning approximately 251K bp) moderated the relationship between RRVfood and energy intake to predict an additional 4.9 - 7.4% of variance in energy intake. We created an FTO risk score based on 5 FTO SNPs (rs9939609, rs8050136, rs3751812, rs1421085, and rs1121980) that are related to BMI in multiple studies. The FTO risk score did not increase variance accounted for beyond individual FTO SNPs. Rs12921970 and rs12446047 served as moderators of the relationship between RRVfood and carbohydrate, fat, protein, and sugar intake. This study shows for the first time that the relationship between RRVfood and energy intake is moderated by FTO SNPs. Research is needed to understand how these processes interact to predict energy and macronutrient intake. PMID:24768648

  2. Prediction of functionally significant single nucleotide polymorphisms in PTEN tumor suppressor gene: An in silico approach.

    PubMed

    Khan, Imran; Ansari, Irfan A; Singh, Pratichi; Dass J, Febin Prabhu

    2017-09-01

    The phosphatase and tensin homolog (PTEN) gene plays a crucial role in signal transduction by negatively regulating the PI3K signaling pathway. It is the most frequent mutated gene in many human-related cancers. Considering its critical role, a functional analysis of missense mutations of PTEN gene was undertaken in this study. Thirty five nonsynonymous single nucleotide polymorphisms (nsSNPs) within the coding region of the PTEN gene were selected for our in silico investigation, and five nsSNPs (G129E, C124R, D252G, H61D, and R130G) were found to be deleterious based on combinatorial predictions of different computational tools. Moreover, molecular dynamics (MD) simulation was performed to investigate the conformational variation between native and all the five mutant PTEN proteins having predicted deleterious nsSNPs. The results of MD simulation of all mutant models illustrated variation in structural attributes such as root-mean-square deviation, root-mean-square fluctuation, radius of gyration, and total energy; which depicts the structural stability of PTEN protein. Furthermore, mutant PTEN protein structures also showed a significant variation in the solvent accessible surface area and hydrogen bond frequencies from the native PTEN structure. In conclusion, results of this study have established the deleterious effect of the all the five predicted nsSNPs on the PTEN protein structure. Thus, results of the current study can pave a new platform to sort out nsSNPs that can be undertaken for the confirmation of their phenotype and their correlation with diseased status in case of control studies. © 2016 International Union of Biochemistry and Molecular Biology, Inc.

  3. A function accounting for training set size and marker density to model the average accuracy of genomic prediction.

    PubMed

    Erbe, Malena; Gredler, Birgit; Seefried, Franz Reinhold; Bapst, Beat; Simianer, Henner

    2013-01-01

    Prediction of genomic breeding values is of major practical relevance in dairy cattle breeding. Deterministic equations have been suggested to predict the accuracy of genomic breeding values in a given design which are based on training set size, reliability of phenotypes, and the number of independent chromosome segments ([Formula: see text]). The aim of our study was to find a general deterministic equation for the average accuracy of genomic breeding values that also accounts for marker density and can be fitted empirically. Two data sets of 5'698 Holstein Friesian bulls genotyped with 50 K SNPs and 1'332 Brown Swiss bulls genotyped with 50 K SNPs and imputed to ∼600 K SNPs were available. Different k-fold (k = 2-10, 15, 20) cross-validation scenarios (50 replicates, random assignment) were performed using a genomic BLUP approach. A maximum likelihood approach was used to estimate the parameters of different prediction equations. The highest likelihood was obtained when using a modified form of the deterministic equation of Daetwyler et al. (2010), augmented by a weighting factor (w) based on the assumption that the maximum achievable accuracy is [Formula: see text]. The proportion of genetic variance captured by the complete SNP sets ([Formula: see text]) was 0.76 to 0.82 for Holstein Friesian and 0.72 to 0.75 for Brown Swiss. When modifying the number of SNPs, w was found to be proportional to the log of the marker density up to a limit which is population and trait specific and was found to be reached with ∼20'000 SNPs in the Brown Swiss population studied.

  4. FTO polymorphisms moderate the association of food reinforcement with energy intake.

    PubMed

    Scheid, Jennifer L; Carr, Katelyn A; Lin, Henry; Fletcher, Kelly D; Sucheston, Lara; Singh, Prashant K; Salis, Robbert; Erbe, Richard W; Faith, Myles S; Allison, David B; Epstein, Leonard H

    2014-06-10

    Food reinforcement (RRVfood) is related to increased energy intake, cross-sectionally related to obesity, and prospectively related to weight gain. The fat mass and obesity-associated (FTO) gene is related to elevated body mass index and increased energy intake. The primary purpose of the current study was to determine whether any of 68 FTO single nucleotide polymorphisms (SNPs) or a FTO risk score moderate the association between food reinforcement and energy or macronutrient intake. Energy and macronutrient intake was measured using a laboratory ad libitum snack food consumption task in 237 adults of varying BMI. Controlling for BMI, the relative reinforcing value of reading (RRVreading) and proportion of African ancestry, RRVfood predicted 14.2% of the variance in energy intake, as well as predicted carbohydrate, fat, protein and sugar intake. In individual analyses, six FTO SNPs (rs12921970, rs9936768, rs12446047, rs7199716, rs8049933 and rs11076022, spanning approximately 251kbp) moderated the relationship between RRVfood and energy intake to predict an additional 4.9-7.4% of variance in energy intake. We created an FTO risk score based on 5 FTO SNPs (rs9939609, rs8050136, rs3751812, rs1421085, and rs1121980) that are related to BMI in multiple studies. The FTO risk score did not increase variance accounted for beyond individual FTO SNPs. rs12921970 and rs12446047 served as moderators of the relationship between RRVfood and carbohydrate, fat, protein, and sugar intake. This study shows for the first time that the relationship between RRVfood and energy intake is moderated by FTO SNPs. Research is needed to understand how these processes interact to predict energy and macronutrient intake. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Resampling procedures to identify important SNPs using a consensus approach.

    PubMed

    Pardy, Christopher; Motyer, Allan; Wilson, Susan

    2011-11-29

    Our goal is to identify common single-nucleotide polymorphisms (SNPs) (minor allele frequency > 1%) that add predictive accuracy above that gained by knowledge of easily measured clinical variables. We take an algorithmic approach to predict each phenotypic variable using a combination of phenotypic and genotypic predictors. We perform our procedure on the first simulated replicate and then validate against the others. Our procedure performs well when predicting Q1 but is less successful for the other outcomes. We use resampling procedures where possible to guard against false positives and to improve generalizability. The approach is based on finding a consensus regarding important SNPs by applying random forests and the least absolute shrinkage and selection operator (LASSO) on multiple subsamples. Random forests are used first to discard unimportant predictors, narrowing our focus to roughly 100 important SNPs. A cross-validation LASSO is then used to further select variables. We combine these procedures to guarantee that cross-validation can be used to choose a shrinkage parameter for the LASSO. If the clinical variables were unavailable, this prefiltering step would be essential. We perform the SNP-based analyses simultaneously rather than one at a time to estimate SNP effects in the presence of other causal variants. We analyzed the first simulated replicate of Genetic Analysis Workshop 17 without knowledge of the true model. Post-conference knowledge of the simulation parameters allowed us to investigate the limitations of our approach. We found that many of the false positives we identified were substantially correlated with genuine causal SNPs.

  6. Japan PGx Data Science Consortium Database: SNPs and HLA genotype data from 2994 Japanese healthy individuals for pharmacogenomics studies.

    PubMed

    Kamitsuji, Shigeo; Matsuda, Takashi; Nishimura, Koichi; Endo, Seiko; Wada, Chisa; Watanabe, Kenji; Hasegawa, Koichi; Hishigaki, Haretsugu; Masuda, Masatoshi; Kuwahara, Yusuke; Tsuritani, Katsuki; Sugiura, Kenkichi; Kubota, Tomoko; Miyoshi, Shinji; Okada, Kinya; Nakazono, Kazuyuki; Sugaya, Yuki; Yang, Woosung; Sawamoto, Taiji; Uchida, Wataru; Shinagawa, Akira; Fujiwara, Tsutomu; Yamada, Hisaharu; Suematsu, Koji; Tsutsui, Naohisa; Kamatani, Naoyuki; Liou, Shyh-Yuh

    2015-06-01

    Japan Pharmacogenomics Data Science Consortium (JPDSC) has assembled a database for conducting pharmacogenomics (PGx) studies in Japanese subjects. The database contains the genotypes of 2.5 million single-nucleotide polymorphisms (SNPs) and 5 human leukocyte antigen loci from 2994 Japanese healthy volunteers, as well as 121 kinds of clinical information, including self-reports, physiological data, hematological data and biochemical data. In this article, the reliability of our data was evaluated by principal component analysis (PCA) and association analysis for hematological and biochemical traits by using genome-wide SNP data. PCA of the SNPs showed that all the samples were collected from the Japanese population and that the samples were separated into two major clusters by birthplace, Okinawa and other than Okinawa, as had been previously reported. Among 87 SNPs that have been reported to be associated with 18 hematological and biochemical traits in genome-wide association studies (GWAS), the associations of 56 SNPs were replicated using our data base. Statistical power simulations showed that the sample size of the JPDSC control database is large enough to detect genetic markers having a relatively strong association even when the case sample size is small. The JPDSC database will be useful as control data for conducting PGx studies to explore genetic markers to improve the safety and efficacy of drugs either during clinical development or in post-marketing.

  7. Inferring Alcoholism SNPs and Regulatory Chemical Compounds Based on Ensemble Bayesian Network.

    PubMed

    Chen, Huan; Sun, Jiatong; Jiang, Hong; Wang, Xianyue; Wu, Lingxiang; Wu, Wei; Wang, Qh

    2017-01-01

    The disturbance of consciousness is one of the most common symptoms of those have alcoholism and may cause disability and mortality. Previous studies indicated that several single nucleotide polymorphisms (SNP) increase the susceptibility of alcoholism. In this study, we utilized the Ensemble Bayesian Network (EBN) method to identify causal SNPs of alcoholism based on the verified GAW14 data. We built a Bayesian network combining random process and greedy search by using Genetic Analysis Workshop 14 (GAW14) dataset to establish EBN of SNPs. Then we predicted the association between SNPs and alcoholism by determining Bayes' prior probability. Thirteen out of eighteen SNPs directly connected with alcoholism were found concordance with potential risk regions of alcoholism in OMIM database. As many SNPs were found contributing to alteration on gene expression, known as expression quantitative trait loci (eQTLs), we further sought to identify chemical compounds acting as regulators of alcoholism genes captured by causal SNPs. Chloroprene and valproic acid were identified as the expression regulators for genes C11orf66 and SALL3 which were captured by alcoholism SNPs, respectively. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  8. Impact of SNPs on Protein Phosphorylation Status in Rice (Oryza sativa L.).

    PubMed

    Lin, Shoukai; Chen, Lijuan; Tao, Huan; Huang, Jian; Xu, Chaoqun; Li, Lin; Ma, Shiwei; Tian, Tian; Liu, Wei; Xue, Lichun; Ai, Yufang; He, Huaqin

    2016-11-11

    Single nucleotide polymorphisms (SNPs) are widely used in functional genomics and genetics research work. The high-quality sequence of rice genome has provided a genome-wide SNP and proteome resource. However, the impact of SNPs on protein phosphorylation status in rice is not fully understood. In this paper, we firstly updated rice SNP resource based on the new rice genome Ver. 7.0, then systematically analyzed the potential impact of Non-synonymous SNPs (nsSNPs) on the protein phosphorylation status. There were 3,897,312 SNPs in Ver. 7.0 rice genome, among which 9.9% was nsSNPs. Whilst, a total 2,508,261 phosphorylated sites were predicted in rice proteome. Interestingly, we observed that 150,197 (39.1%) nsSNPs could influence protein phosphorylation status, among which 52.2% might induce changes of protein kinase (PK) types for adjacent phosphorylation sites. We constructed a database, SNP_rice, to deposit the updated rice SNP resource and phosSNPs information. It was freely available to academic researchers at http://bioinformatics.fafu.edu.cn. As a case study, we detected five nsSNPs that potentially influenced heterotrimeric G proteins phosphorylation status in rice, indicating that genetic polymorphisms showed impact on the signal transduction by influencing the phosphorylation status of heterotrimeric G proteins. The results in this work could be a useful resource for future experimental identification and provide interesting information for better rice breeding.

  9. Replication and predictive value of SNPs associated with melanoma and pigmentation traits in a Southern European case-control study.

    PubMed

    Stefanaki, Irene; Panagiotou, Orestis A; Kodela, Elisavet; Gogas, Helen; Kypreou, Katerina P; Chatzinasiou, Foteini; Nikolaou, Vasiliki; Plaka, Michaela; Kalfa, Iro; Antoniou, Christina; Ioannidis, John P A; Evangelou, Evangelos; Stratigos, Alexander J

    2013-01-01

    Genetic association studies have revealed numerous polymorphisms conferring susceptibility to melanoma. We aimed to replicate previously discovered melanoma-associated single-nucleotide polymorphisms (SNPs) in a Greek case-control population, and examine their predictive value. Based on a field synopsis of genetic variants of melanoma (MelGene), we genotyped 284 patients and 284 controls at 34 melanoma-associated SNPs of which 19 derived from GWAS. We tested each one of the 33 SNPs passing quality control for association with melanoma both with and without accounting for the presence of well-established phenotypic risk factors. We compared the risk allele frequencies between the Greek population and the HapMap CEU sample. Finally, we evaluated the predictive ability of the replicated SNPs. Risk allele frequencies were significantly lower compared to the HapMap CEU for eight SNPs (rs16891982--SLC45A2, rs12203592--IRF4, rs258322--CDK10, rs1805007--MC1R, rs1805008--MC1R, rs910873--PIGU, rs17305573--PIGU, and rs1885120--MTAP) and higher for one SNP (rs6001027--PLA2G6) indicating a different profile of genetic susceptibility in the studied population. Previously identified effect estimates modestly correlated with those found in our population (r = 0.72, P<0.0001). The strongest associations were observed for rs401681-T in CLPTM1L (odds ratio [OR] 1.60, 95% CI 1.22-2.10; P = 0.001), rs16891982-C in SCL45A2 (OR 0.51, 95% CI 0.34-0.76; P = 0.001), and rs1805007-T in MC1R (OR 4.38, 95% CI 2.03-9.43; P = 2×10⁻⁵). Nominally statistically significant associations were seen also for another 5 variants (rs258322-T in CDK10, rs1805005-T in MC1R, rs1885120-C in MYH7B, rs2218220-T in MTAP and rs4911442-G in the ASIP region). The addition of all SNPs with nominal significance to a clinical non-genetic model did not substantially improve melanoma risk prediction (AUC for clinical model 83.3% versus 83.9%, p = 0.66). Overall, our study has validated genetic variants that are likely to contribute to melanoma susceptibility in the Greek population.

  10. Genomics and introgression: discovery and mapping of thousands of species-diagnostic SNPs using RAD sequencing

    USGS Publications Warehouse

    Hand, Brian K.; Hether, Tyler D; Kovach, Ryan P.; Muhlfeld, Clint C.; Amish, Stephen J.; Boyer, Matthew C.; O’Rourke, Sean M.; Miller, Michael R.; Lowe, Winsor H.; Hohenlohe, Paul A.; Luikart, Gordon

    2015-01-01

    Invasive hybridization and introgression pose a serious threat to the persistence of many native species. Understanding the effects of hybridization on native populations (e.g., fitness consequences) requires numerous species-diagnostic loci distributed genome-wide. Here we used RAD sequencing to discover thousands of single-nucleotide polymorphisms (SNPs) that are diagnostic between rainbow trout (RBT, Oncorhynchus mykiss), the world’s most widely introduced fish, and native westslope cutthroat trout (WCT, O. clarkii lewisi) in the northern Rocky Mountains, USA. We advanced previous work that identified 4,914 species-diagnostic loci by using longer sequence reads (100 bp vs. 60 bp) and a larger set of individuals (n = 84). We sequenced RAD libraries for individuals from diverse sampling sources, including native populations of WCT and hatchery broodstocks of WCT and RBT. We also took advantage of a newly released reference genome assembly for RBT to align our RAD loci. In total, we discovered 16,788 putatively diagnostic SNPs, 10,267 of which we mapped to anchored chromosome locations on the RBT genome. A small portion of previously discovered putative diagnostic loci (325 of 4,914) were no longer diagnostic (i.e., fixed between species) based on our wider survey of non-hybridized RBT and WCT individuals. Our study suggests that RAD loci mapped to a draft genome assembly could provide the marker density required to identify genes and chromosomal regions influencing selection in admixed populations of conservation concern and evolutionary interest.

  11. Genomics-assisted characterization of a breeding collection of Apios americana, an edible tuberous legume

    PubMed Central

    Belamkar, Vikas; Farmer, Andrew D.; Weeks, Nathan T.; Kalberer, Scott R.; Blackmon, William J.; Cannon, Steven B.

    2016-01-01

    For species with potential as new crops, rapid improvement may be facilitated by new genomic methods. Apios (Apios americana Medik.), once a staple food source of Native American Indians, produces protein-rich tubers, tolerates a wide range of soils, and symbiotically fixes nitrogen. We report the first high-quality de novo transcriptome assembly, an expression atlas, and a set of 58,154 SNP and 39,609 gene expression markers (GEMs) for characterization of a breeding collection. Both SNPs and GEMs identify six genotypic clusters in the collection. Transcripts mapped to the Phaseolus vulgaris genome–another phaseoloid legume with the same chromosome number–provide provisional genetic locations for 46,852 SNPs. Linkage disequilibrium decays within 10 kb (based on the provisional genetic locations), consistent with outcrossing reproduction. SNPs and GEMs identify more than 21 marker-trait associations for at least 11 traits. This study demonstrates a holistic approach for mining plant collections to accelerate crop improvement. PMID:27721469

  12. Effect of P450 Oxidoreductase Polymorphisms on the Metabolic Activities of Ten Cytochrome P450s Varied by Polymorphic CYP Genotypes in Human Liver Microsomes.

    PubMed

    Fang, Yan; Gao, Na; Tian, Xin; Zhou, Jun; Zhang, Hai-Feng; Gao, Jie; He, Xiao-Pei; Wen, Qiang; Jia, Lin-Jing; Jin, Han; Qiao, Hai-Ling

    2018-06-27

    Background/ Aims: Little is known about the effect of P450 oxidoreductase (POR) gene polymorphisms on the activities of CYPs with multiple genotypes. We genotyped 102 human livers for 18 known POR single nucleotide polymorphisms (SNPs) with allelic frequencies greater than 1% as well as for 27 known SNPs in 10 CYPs. CYP enzyme activities in microsomes prepared from these livers were determined by measuring probe substrate metabolism by high performance liquid chromatograph. We found that the effects of the 18 POR SNPs on 10 CYP activities were CYP genotype-dependent. The POR mutations were significantly associated with decreased overall Km for CYP2B6 and 2E1, and specific genotypes within CYP1A2, 2A6, 2B6, 2C8, 2D6 and 2E1 were identified as being affected by these POR SNPs. Notably, the effect of a specific POR mutation on the activity of a CYP genotype could not be predicted from other CYP genotypes of even the same CYP. When combining one POR SNP with other POR SNPs, a hitherto unrecognized effect of multiple-site POR gene polymorphisms (MSGP) on CYP activity was uncovered, which was not necessarily consistent with the effect of either single POR SNP. The effects of POR SNPs on CYP activities were not only CYP-dependent, but more importantly, CYP genotype-dependent. Moreover, the effect of a POR SNP alone and in combination with other POR SNPs (MSGP) was not always consistent, nor predictable. Understanding the impact of POR gene polymorphisms on drug metabolism necessitates knowing the complete SNP complement of POR and the genotype of the relevant CYPs. © 2018 The Author(s). Published by S. Karger AG, Basel.

  13. Translating natural genetic variation to gene expression in a computational model of the Drosophila gap gene regulatory network

    PubMed Central

    Kozlov, Konstantin N.; Kulakovskiy, Ivan V.; Zubair, Asif; Marjoram, Paul; Lawrie, David S.; Nuzhdin, Sergey V.; Samsonova, Maria G.

    2017-01-01

    Annotating the genotype-phenotype relationship, and developing a proper quantitative description of the relationship, requires understanding the impact of natural genomic variation on gene expression. We apply a sequence-level model of gap gene expression in the early development of Drosophila to analyze single nucleotide polymorphisms (SNPs) in a panel of natural sequenced D. melanogaster lines. Using a thermodynamic modeling framework, we provide both analytical and computational descriptions of how single-nucleotide variants affect gene expression. The analysis reveals that the sequence variants increase (decrease) gene expression if located within binding sites of repressors (activators). We show that the sign of SNP influence (activation or repression) may change in time and space and elucidate the origin of this change in specific examples. The thermodynamic modeling approach predicts non-local and non-linear effects arising from SNPs, and combinations of SNPs, in individual fly genotypes. Simulation of individual fly genotypes using our model reveals that this non-linearity reduces to almost additive inputs from multiple SNPs. Further, we see signatures of the action of purifying selection in the gap gene regulatory regions. To infer the specific targets of purifying selection, we analyze the patterns of polymorphism in the data at two phenotypic levels: the strengths of binding and expression. We find that combinations of SNPs show evidence of being under selective pressure, while individual SNPs do not. The model predicts that SNPs appear to accumulate in the genotypes of the natural population in a way biased towards small increases in activating action on the expression pattern. Taken together, these results provide a systems-level view of how genetic variation translates to the level of gene regulatory networks via combinatorial SNP effects. PMID:28898266

  14. Genetic variants in VEGF pathway genes in neoadjuvant breast cancer patients receiving bevacizumab: Results from the randomized phase III GeparQuinto study.

    PubMed

    Hein, Alexander; Lambrechts, Diether; von Minckwitz, Gunter; Häberle, Lothar; Eidtmann, Holger; Tesch, Hans; Untch, Michael; Hilfrich, Jörn; Schem, Christian; Rezai, Mahdi; Gerber, Bernd; Dan Costa, Serban; Blohmer, Jens-Uwe; Schwedler, Kathrin; Kittel, Kornelia; Fehm, Tanja; Kunz, Georg; Beckmann, Matthias W; Ekici, Arif B; Hanusch, Claus; Huober, Jens; Liedtke, Cornelia; Mau, Christine; Moisse, Matthieu; Müller, Volkmar; Nekljudova, Valentina; Peuteman, Gilian; Rack, Brigitte; Rübner, Matthias; Van Brussel, Thomas; Wang, Liewei; Weinshilboum, Richard M; Loibl, Sibylle; Fasching, Peter A

    2015-12-15

    Studies assessing the effect of bevacizumab (BEV) on breast cancer (BC) outcome have shown different effects on progression-free and overall survival, suggesting that a subgroup of patients may benefit from this treatment. Unfortunately, no biomarkers exist to identify these patients. Here, we investigate whether single nucleotide polymorphisms (SNPs) in VEGF pathway genes correlate with pathological complete response (pCR) in the neoadjuvant GeparQuinto trial. HER2-negative patients were randomized into treatment arms receiving either BEV combined with standard chemotherapy or chemotherapy alone. In a pre-planned biomarker study, DNA was collected from 729 and 724 patients, respectively from both treatment arms, and genotyped for 125 SNPs. Logistic regression assessed interaction between individual SNPs and both treatment arms to predict pCR. Five SNPs may be associated with a better response to BEV, but none of them remained significant after correction for multiple testing. The two SNPs most strongly associated, rs833058 and rs699947, were located upstream of the VEGF-A promoter. Odds ratios for the homozygous common, heterozygous and homozygous rare rs833058 genotypes were 2.36 (95% CI, 1.49-3.75), 1.20 (95% CI, 0.88-1.64) and 0.61 (95% CI, 0.34-1.12). Notably, some SNPs in VEGF-A exhibited a more pronounced effect in the triple-negative subgroup. Several SNPs in VEGF-A may be associated with improved pCR when receiving BEV in the neoadjuvant setting. Although none of the observed effects survived correction for multiple testing, our observations are consistent with previous studies on BEV efficacy in BC. Further research is warranted to clarify the predictive value of these markers. © 2015 UICC.

  15. Genetic tests for estimating dairy breed proportion and parentage assignment in East African crossbred cattle.

    PubMed

    Strucken, Eva M; Al-Mamun, Hawlader A; Esquivelzeta-Rabell, Cecilia; Gondro, Cedric; Mwai, Okeyo A; Gibson, John P

    2017-09-12

    Smallholder dairy farming in much of the developing world is based on the use of crossbred cows that combine local adaptation traits of indigenous breeds with high milk yield potential of exotic dairy breeds. Pedigree recording is rare in such systems which means that it is impossible to make informed breeding decisions. High-density single nucleotide polymorphism (SNP) assays allow accurate estimation of breed composition and parentage assignment but are too expensive for routine application. Our aim was to determine the level of accuracy achieved with low-density SNP assays. We constructed subsets of 100 to 1500 SNPs from the 735k-SNP Illumina panel by selecting: (a) on high minor allele frequencies (MAF) in a crossbred population; (b) on large differences in allele frequency between ancestral breeds; (c) at random; or (d) with a differential evolution algorithm. These panels were tested on a dataset of 1933 crossbred dairy cattle from Kenya/Uganda and on crossbred populations from Ethiopia (N = 545) and Tanzania (N = 462). Dairy breed proportions were estimated by using the ADMIXTURE program, a regression approach, and SNP-best linear unbiased prediction, and tested against estimates obtained by ADMIXTURE based on the 735k-SNP panel. Performance for parentage assignment was based on opposing homozygotes which were used to calculate the separation value (sv) between true and false assignments. Panels of SNPs based on the largest differences in allele frequency between European dairy breeds and a combined Nelore/N'Dama population gave the best predictions of dairy breed proportion (r 2  = 0.962 to 0.994 for 100 to 1500 SNPs) with an average absolute bias of 0.026. Panels of SNPs based on the highest MAF in the crossbred population (Kenya/Uganda) gave the most accurate parentage assignments (sv = -1 to 15 for 100 to 1500 SNPs). Due to the different required properties of SNPs, panels that did well for breed composition did poorly for parentage assignment and vice versa. A combined panel of 400 SNPs was not able to assign parentages correctly, thus we recommend the use of 200 SNPs either for breed proportion prediction or parentage assignment, independently.

  16. Association of variants in innate immune genes with asthma and eczema

    PubMed Central

    Sharma, Sunita; Poon, Audrey; Himes, Blanca E.; Lasky-Su, Jessica; Sordillo, Joanne E.; Belanger, Kathleen; Milton, Donald K.; Bracken, Michael B.; Triche, Elizabeth W.; Leaderer, Brian P.; Gold, Diane R.; Litonjua, Augusto A.

    2012-01-01

    Background The innate immune pathway is important in the pathogenesis of asthma and eczema. However, only a few variants in these genes have been associated with either disease. We investigate the association between polymorphisms of genes in the innate immune pathway with childhood asthma and eczema. In addition, we compare individual associations with those discovered using a multivariate approach. Methods Using a novel method, case control based association testing (C2BAT), 569 single nucleotide polymorphisms (SNPs) in 44 innate immune genes were tested for association with asthma and eczema in children from the Boston Home Allergens and Asthma Study and the Connecticut Childhood Asthma Study. The screening algorithm was used to identify the top SNPs associated with asthma and eczema. We next investigated the interaction of innate immune variants with asthma and eczema risk using Bayesian networks. Results After correction for multiple comparisons, 7 SNPs in 6 genes (CARD25, TGFB1, LY96, ACAA1, DEFB1, and IFNG) were associated with asthma (adjusted p-value<0.02), while 5 SNPs in 3 different genes (CD80, STAT4, and IRAKI) were significantly associated with eczema (adjusted p-value < 0.02). None of these SNPs were associated with both asthma and eczema. Bayesian network analysis identified 4 SNPs that were predictive of asthma and 10 SNPs that predicted eczema. Of the genes identified using Bayesian networks, only CD80 was associated with eczema in the single-SNP study. Using novel methodology that allows for screening and replication in the same population, we have identified associations of innate immune genes with asthma and eczema. Bayesian network analysis suggests that additional SNPs influence disease susceptibility via SNP interactions. Conclusion Our findings suggest that innate immune genes contribute to the pathogenesis of asthma and eczema, and that these diseases likely have different genetic determinants. PMID:22192168

  17. Single nucleotide polymorphisms in candidate genes related to daughter pregnancy rate in Holstein cows

    USDA-ARS?s Scientific Manuscript database

    ABSTRACT: Previously, a candidate gene approach identified 40 SNPs associated with daughter pregnancy rate (DPR) in dairy bulls. We evaluated 39 of these SNPs for relationship to DPR in a separate population of Holstein cows grouped on their predicted transmitting ability for DPR: <= -1 (n=1266) a...

  18. Screening and Evaluation of Deleterious SNPs in APOE Gene of Alzheimer's Disease.

    PubMed

    Masoodi, Tariq Ahmad; Al Shammari, Sulaiman A; Al-Muammar, May N; Alhamdan, Adel A

    2012-01-01

    Introduction. Apolipoprotein E (APOE) is an important risk factor for Alzheimer's disease (AD) and is present in 30-50% of patients who develop late-onset AD. Several single-nucleotide polymorphisms (SNPs) are present in APOE gene which act as the biomarkers for exploring the genetic basis of this disease. The objective of this study is to identify deleterious nsSNPs associated with APOE gene. Methods. The SNPs were retrieved from dbSNP. Using I-Mutant, protein stability change was calculated. The potentially functional nonsynonymous (ns) SNPs and their effect on protein was predicted by PolyPhen and SIFT, respectively. FASTSNP was used for functional analysis and estimation of risk score. The functional impact on the APOE protein was evaluated by using Swiss PDB viewer and NOMAD-Ref server. Results. Six nsSNPs were found to be least stable by I-Mutant 2.0 with DDG value of >-1.0. Four nsSNPs showed a highly deleterious tolerance index score of 0.00. Nine nsSNPs were found to be probably damaging with position-specific independent counts (PSICs) score of ≥2.0. Seven nsSNPs were found to be highly polymorphic with a risk score of 3-4. The total energies and root-mean-square deviation (RMSD) values were higher for three mutant-type structures compared to the native modeled structure. Conclusion. We concluded that three nsSNPs, namely, rs11542041, rs11542040, and rs11542034, to be potentially functional polymorphic.

  19. Microarray-based SNP genotyping to identify genetic risk factors of triple-negative breast cancer (TNBC) in South Indian population.

    PubMed

    Aravind Kumar, M; Singh, Vineeta; Naushad, Shaik Mohammad; Shanker, Uday; Lakshmi Narasu, M

    2018-05-01

    In the view of aggressive nature of Triple-Negative Breast cancer (TNBC) due to the lack of receptors (ER, PR, HER2) and high incidence of drug resistance associated with it, a case-control association study was conducted to identify the contributing genetic risk factors for Triple-negative breast cancer (TNBC). A total of 30 TNBC patients and 50 age and gender-matched controls of Indian origin were screened for 9,00,000 SNP markers using microarray-based SNP genotyping approach. The initial PLINK association analysis (p < 0.01, MAF 0.14-0.44, OR 10-24) identified 28 non-synonymous SNPs and one stop gain mutation in the exonic region as possible determinants of TNBC risk. All the 29 SNPs were annotated using ANNOVAR. The interactions between these markers were evaluated using Multifactor dimensionality reduction (MDR) analysis. The interactions were in the following order: exm408776 > exm1278309 > rs316389 > rs1651654 > rs635538 > exm1292477. Recursive partitioning analysis (RPA) was performed to construct decision tree useful in predicting TNBC risk. As shown in this analysis, rs1651654 and exm585172 SNPs are found to be determinants of TNBC risk. Artificial neural network model was used to generate the Receiver operating characteristic curves (ROC), which showed high sensitivity and specificity (AUC-0.94) of these markers. To conclude, among the 9,00,000 SNPs tested, CCDC42 exm1292477, ANXA3 exm408776, SASH1 exm585172 are found to be the most significant genetic predicting factors for TNBC. The interactions among exm408776, exm1278309, rs316389, rs1651654, rs635538, exm1292477 SNPs inflate the risk for TNBC further. Targeted analysis of these SNPs and genes alone also will have similar clinical utility in predicting TNBC.

  20. regSNPs: a strategy for prioritizing regulatory single nucleotide substitutions

    PubMed Central

    Teng, Mingxiang; Ichikawa, Shoji; Padgett, Leah R.; Wang, Yadong; Mort, Matthew; Cooper, David N.; Koller, Daniel L.; Foroud, Tatiana; Edenberg, Howard J.; Econs, Michael J.; Liu, Yunlong

    2012-01-01

    Motivation: One of the fundamental questions in genetics study is to identify functional DNA variants that are responsible to a disease or phenotype of interest. Results from large-scale genetics studies, such as genome-wide association studies (GWAS), and the availability of high-throughput sequencing technologies provide opportunities in identifying causal variants. Despite the technical advances, informatics methodologies need to be developed to prioritize thousands of variants for potential causative effects. Results: We present regSNPs, an informatics strategy that integrates several established bioinformatics tools, for prioritizing regulatory SNPs, i.e. the SNPs in the promoter regions that potentially affect phenotype through changing transcription of downstream genes. Comparing to existing tools, regSNPs has two distinct features. It considers degenerative features of binding motifs by calculating the differences on the binding affinity caused by the candidate variants and integrates potential phenotypic effects of various transcription factors. When tested by using the disease-causing variants documented in the Human Gene Mutation Database, regSNPs showed mixed performance on various diseases. regSNPs predicted three SNPs that can potentially affect bone density in a region detected in an earlier linkage study. Potential effects of one of the variants were validated using luciferase reporter assay. Contact: yunliu@iupui.edu Supplementary information: Supplementary data are available at Bioinformatics online PMID:22611130

  1. Sequencing of transcriptomes from two Miscanthus species reveals functional specificity in rhizomes, and clarifies evolutionary relationships

    PubMed Central

    2014-01-01

    Background Miscanthus is a promising biomass crop for temperate regions. Despite the increasing interest in this plant, limited sequence information has constrained research into its biology, physiology, and breeding. The whole genome transcriptomes of M. sinensis and M. sacchariflorus presented in this study may provide good resources to understand functional compositions of two important Miscanthus genomes and their evolutionary relationships. Results For M. sinensis, a total of 457,891 and 512,950 expressed sequence tags (ESTs) were produced from leaf and rhizome tissues, respectively, which were assembled into 12,166 contigs and 89,648 singletons for leaf, and 13,170 contigs and 112,138 singletons for rhizome. For M. sacchariflorus, a total of 288,806 and 267,952 ESTs from leaf and rhizome tissues, respectively, were assembled into 8,732 contigs and 66,881 singletons for leaf, and 8,104 contigs and 63,212 singletons for rhizome. Based on the distributions of synonymous nucleotide substitution (Ks), sorghum and Miscanthus diverged about 6.2 million years ago (MYA), Saccharum and Miscanthus diverged 4.6 MYA, and M. sinensis and M. sacchariflorus diverged 1.5 MYA. The pairwise alignment of predicted protein sequences from sorghum-Miscanthus and two Miscanthus species found a total of 43,770 and 35,818 nsSNPs, respectively. The impacts of striking mutations found by nsSNPs were much lower between sorghum and Miscanthus than those between the two Miscanthus species, perhaps as a consequence of the much higher level of gene duplication in Miscanthus and resulting ability to buffer essential functions against disturbance. Conclusions The ESTs generated in the present study represent a significant addition to Miscanthus functional genomics resources, permitting us to discover some candidate genes associated with enhanced biomass production. Ks distributions based on orthologous ESTs may serve as a guideline for future research into the evolution of Miscanthus species as well as its close relatives sorghum and Saccharum. PMID:24884969

  2. Sequencing of transcriptomes from two Miscanthus species reveals functional specificity in rhizomes, and clarifies evolutionary relationships.

    PubMed

    Kim, Changsoo; Lee, Tae-Ho; Guo, Hui; Chung, Sung Jin; Paterson, Andrew H; Kim, Do-Soon; Lee, Geung-Joo

    2014-05-18

    Miscanthus is a promising biomass crop for temperate regions. Despite the increasing interest in this plant, limited sequence information has constrained research into its biology, physiology, and breeding. The whole genome transcriptomes of M. sinensis and M. sacchariflorus presented in this study may provide good resources to understand functional compositions of two important Miscanthus genomes and their evolutionary relationships. For M. sinensis, a total of 457,891 and 512,950 expressed sequence tags (ESTs) were produced from leaf and rhizome tissues, respectively, which were assembled into 12,166 contigs and 89,648 singletons for leaf, and 13,170 contigs and 112,138 singletons for rhizome. For M. sacchariflorus, a total of 288,806 and 267,952 ESTs from leaf and rhizome tissues, respectively, were assembled into 8,732 contigs and 66,881 singletons for leaf, and 8,104 contigs and 63,212 singletons for rhizome. Based on the distributions of synonymous nucleotide substitution (Ks), sorghum and Miscanthus diverged about 6.2 million years ago (MYA), Saccharum and Miscanthus diverged 4.6 MYA, and M. sinensis and M. sacchariflorus diverged 1.5 MYA. The pairwise alignment of predicted protein sequences from sorghum-Miscanthus and two Miscanthus species found a total of 43,770 and 35,818 nsSNPs, respectively. The impacts of striking mutations found by nsSNPs were much lower between sorghum and Miscanthus than those between the two Miscanthus species, perhaps as a consequence of the much higher level of gene duplication in Miscanthus and resulting ability to buffer essential functions against disturbance. The ESTs generated in the present study represent a significant addition to Miscanthus functional genomics resources, permitting us to discover some candidate genes associated with enhanced biomass production. Ks distributions based on orthologous ESTs may serve as a guideline for future research into the evolution of Miscanthus species as well as its close relatives sorghum and Saccharum.

  3. Single-nucleotide polymorphism discovery in Leptographium longiclavatum, a mountain pine beetle-associated symbiotic fungus, using whole-genome resequencing.

    PubMed

    Ojeda, Dario I; Dhillon, Braham; Tsui, Clement K M; Hamelin, Richard C

    2014-03-01

    Single-nucleotide polymorphisms (SNPs) are rapidly becoming the standard markers in population genomics studies; however, their use in nonmodel organisms is limited due to the lack of cost-effective approaches to uncover genome-wide variation, and the large number of individuals needed in the screening process to reduce ascertainment bias. To discover SNPs for population genomics studies in the fungal symbionts of the mountain pine beetle (MPB), we developed a road map to discover SNPs and to produce a genotyping platform. We undertook a whole-genome sequencing approach of Leptographium longiclavatum in combination with available genomics resources of another MPB symbiont, Grosmannia clavigera. We sequenced 71 individuals pooled into four groups using the Illumina sequencing technology. We generated between 27 and 30 million reads of 75 bp that resulted in a total of 1, 181 contigs longer than 2 kb and an assembled genome size of 28.9 Mb (N50 = 48 kb, average depth = 125x). A total of 9052 proteins were annotated, and between 9531 and 17,266 SNPs were identified in the four pools. A subset of 206 genes (containing 574 SNPs, 11% false positives) was used to develop a genotyping platform for this species. Using this roadmap, we developed a genotyping assay with a total of 147 SNPs located in 121 genes using the Illumina(®) Sequenom iPLEX Gold. Our preliminary genotyping (success rate = 85%) of 304 individuals from 36 populations supports the utility of this approach for population genomics studies in other MPB fungal symbionts and other fungal nonmodel species. © 2013 John Wiley & Sons Ltd.

  4. Genotype-driven identification of a molecular network predictive of advanced coronary calcium in ClinSeq® and Framingham Heart Study cohorts.

    PubMed

    Oguz, Cihan; Sen, Shurjo K; Davis, Adam R; Fu, Yi-Ping; O'Donnell, Christopher J; Gibbons, Gary H

    2017-10-26

    One goal of personalized medicine is leveraging the emerging tools of data science to guide medical decision-making. Achieving this using disparate data sources is most daunting for polygenic traits. To this end, we employed random forests (RFs) and neural networks (NNs) for predictive modeling of coronary artery calcium (CAC), which is an intermediate endo-phenotype of coronary artery disease (CAD). Model inputs were derived from advanced cases in the ClinSeq®; discovery cohort (n=16) and the FHS replication cohort (n=36) from 89 th -99 th CAC score percentile range, and age-matched controls (ClinSeq®; n=16, FHS n=36) with no detectable CAC (all subjects were Caucasian males). These inputs included clinical variables and genotypes of 56 single nucleotide polymorphisms (SNPs) ranked highest in terms of their nominal correlation with the advanced CAC state in the discovery cohort. Predictive performance was assessed by computing the areas under receiver operating characteristic curves (ROC-AUC). RF models trained and tested with clinical variables generated ROC-AUC values of 0.69 and 0.61 in the discovery and replication cohorts, respectively. In contrast, in both cohorts, the set of SNPs derived from the discovery cohort were highly predictive (ROC-AUC ≥0.85) with no significant change in predictive performance upon integration of clinical and genotype variables. Using the 21 SNPs that produced optimal predictive performance in both cohorts, we developed NN models trained with ClinSeq®; data and tested with FHS data and obtained high predictive accuracy (ROC-AUC=0.80-0.85) with several topologies. Several CAD and "vascular aging" related biological processes were enriched in the network of genes constructed from the predictive SNPs. We identified a molecular network predictive of advanced coronary calcium using genotype data from ClinSeq®; and FHS cohorts. Our results illustrate that machine learning tools, which utilize complex interactions between disease predictors intrinsic to the pathogenesis of polygenic disorders, hold promise for deriving predictive disease models and networks.

  5. Association of Single-Nucleotide Polymorphisms of the Tau Gene With Late-Onset Parkinson Disease

    PubMed Central

    Martin, Eden R.; Scott, William K.; Nance, Martha A.; Watts, Ray L.; Hubble, Jean P.; Koller, William C.; Lyons, Kelly; Pahwa, Rajesh; Stern, Matthew B.; Colcher, Amy; Hiner, Bradley C.; Jankovic, Joseph; Ondo, William G.; Allen, Fred H.; Goetz, Christopher G.; Small, Gary W.; Masterman, Donna; Mastaglia, Frank; Laing, Nigel G.; Stajich, Jeffrey M.; Ribble, Robert C.; Booze, Michael W.; Rogala, Allison; Hauser, Michael A.; Zhang, Fengyu; Gibson, Rachel A.; Middleton, Lefkos T.; Roses, Allen D.; Haines, Jonathan L.; Scott, Burton L.; Pericak-Vance, Margaret A.; Vance, Jeffery M.

    2013-01-01

    Context The human tau gene, which promotes assembly of neuronal microtubules, has been associated with several rare neurologic diseases that clinically include parkinsonian features. We recently observed linkage in idiopathic Parkinson disease (PD) to a region on chromosome 17q21 that contains the tau gene. These factors make tau a good candidate for investigation as a susceptibility gene for idiopathic PD, the most common form of the disease. Objective To investigate whether the tau gene is involved in idiopathic PD. Design, Setting, and Participants Among a sample of 1056 individuals from 235 families selected from 13 clinical centers in the United States and Australia and from a family ascertainment core center, we tested 5 single-nucleotide polymorphisms (SNPs) within the tau gene for association with PD, using family-based tests of association. Both affected (n = 426) and unaffected (n = 579) family members were included; 51 individuals had unclear PD status. Analyses were conducted to test individual SNPs and SNP haplotypes within the tau gene. Main Outcome Measure Family-based tests of association, calculated using asymptotic distributions. Results Analysis of association between the SNPs and PD yielded significant evidence of association for 3 of the 5 SNPs tested: SNP 3, P = .03; SNP 9i, P = .04; and SNP 11, P = .04. The 2 other SNPs did not show evidence of significant association (SNP 9ii, P = .11, and SNP 9iii, P = .87). Strong evidence of association was found with haplotype analysis, with a positive association with one haplotype (P = .009) and a negative association with another haplotype (P = .007). Substantial linkage disequilibrium (P<.001) was detected between 4 of the 5 SNPs (SNPs 3,9i, 9ii, and 11). Conclusions This integrated approach of genetic linkage and positional association analyses implicates tau as a susceptibility gene for idiopathic PD. PMID:11710889

  6. Genetically-Predicted Adult Height and Alzheimer's Disease.

    PubMed

    Larsson, Susanna C; Traylor, Matthew; Burgess, Stephen; Markus, Hugh S

    2017-01-01

    Observational studies have linked increased adult height with better cognitive performance and reduced risk of Alzheimer's disease (AD). It is unclear whether the associations are due to shared biological processes that influence height and AD or due to confounding by early life exposures or environmental factors. To use a genetic approach to investigate the association between adult height and AD. We selected 682 single nucleotide polymorphisms (SNPs) associated with height at genome-wide significance (p < 5×10-8) in the Genetic Investigation of ANthropometric Traits (GIANT) consortium. Summary statistics for each of these SNPs on AD were obtained from the International Genomics of Alzheimer's Project (IGAP) of 17,008 individuals with AD and 37,154 controls. The estimate of the association between genetically predicted height and AD was calculated using the inverse-variance weighted method. The odds ratio of AD was 0.91 (95% confidence interval, 0.86-0.95; p = 9.8×10-5) per one standard deviation increase (about 6.5 cm) in genetically predicted height based on 682 SNPs, which were clustered in 419 loci. In an analysis restricted to one SNP from each height-associated locus (n = 419 SNPs), the corresponding OR was 0.92 (95% confidence interval, 0.86-0.97; p = 4.8×10-3). This finding suggests that biological processes that influence adult height may have a role in the etiology of AD.

  7. Evaluation and Integration of Genetic Signature for Prediction Risk of Nasopharyngeal Carcinoma in Southern China

    PubMed Central

    Winkler, Cheryl A.; Li, Ji; Guan, Li; Tang, Minzhong; Liao, Jian; Deng, Hong; de Thé, Guy; Zeng, Yi; O'Brien, Stephen J.

    2014-01-01

    Genetic factors, as well as environmental factors, play a role in development of nasopharyngeal carcinoma (NPC). A number of single nucleotide polymorphisms (SNPs) have been reported to be associated with NPC. To confirm these genetic associations with NPC, two independent case-control studies from Southern China comprising 1166 NPC cases and 2340 controls were conducted. Seven SNPs in ITGA9 at 3p21.3 and 9 SNPs within the 6p21.3 HLA region were genotyped. To explore the potential clinical application of these genetic markers in NPC, we further evaluate the predictive/diagnostic role of significant SNPs by calculating the area under the curve (AUC). Results. The reported associations between ITGA9 variants and NPC were not replicated. Multiple loci of GABBR1, HLA-F, HLA-A, and HCG9 were statistically significant in both cohorts (P combined range from 5.96 × 10−17 to 0.02). We show for the first time that these factors influence NPC development independent of environmental risk factors. This study also indicated that the SNP alone cannot serve as a predictive/diagnostic marker for NPC. Integrating the most significant SNP with IgA antibodies status to EBV, which is presently used as screening/diagnostic marker for NPC in Chinese populations, did not improve the AUC estimate for diagnosis of NPC. PMID:25180181

  8. Evaluation and integration of genetic signature for prediction risk of nasopharyngeal carcinoma in Southern China.

    PubMed

    Guo, Xiuchan; Winkler, Cheryl A; Li, Ji; Guan, Li; Tang, Minzhong; Liao, Jian; Deng, Hong; de Thé, Guy; Zeng, Yi; O'Brien, Stephen J

    2014-01-01

    Genetic factors, as well as environmental factors, play a role in development of nasopharyngeal carcinoma (NPC). A number of single nucleotide polymorphisms (SNPs) have been reported to be associated with NPC. To confirm these genetic associations with NPC, two independent case-control studies from Southern China comprising 1166 NPC cases and 2340 controls were conducted. Seven SNPs in ITGA9 at 3p21.3 and 9 SNPs within the 6p21.3 HLA region were genotyped. To explore the potential clinical application of these genetic markers in NPC, we further evaluate the predictive/diagnostic role of significant SNPs by calculating the area under the curve (AUC). The reported associations between ITGA9 variants and NPC were not replicated. Multiple loci of GABBR1, HLA-F, HLA-A, and HCG9 were statistically significant in both cohorts (P(combined) range from 5.96 × 10(-17) to 0.02). We show for the first time that these factors influence NPC development independent of environmental risk factors. This study also indicated that the SNP alone cannot serve as a predictive/diagnostic marker for NPC. Integrating the most significant SNP with IgA antibodies status to EBV, which is presently used as screening/diagnostic marker for NPC in Chinese populations, did not improve the AUC estimate for diagnosis of NPC.

  9. Nucleotide polymorphisms in a pine ortholog of the Arabidopsis degrading enzyme cellulase KORRIGAN are associated with early growth performance in Pinus pinaster.

    PubMed

    Cabezas, José Antonio; González-Martínez, Santiago C; Collada, Carmen; Guevara, María Angeles; Boury, Christophe; de María, Nuria; Eveno, Emmanuelle; Aranda, Ismael; Garnier-Géré, Pauline H; Brach, Jean; Alía, Ricardo; Plomion, Christophe; Cervera, María Teresa

    2015-09-01

    We have carried out a candidate-gene-based association genetic study in Pinus pinaster Aiton and evaluated the predictive performance for genetic merit gain of the most significantly associated genes and single nucleotide polymorphisms (SNPs). We used a second generation 384-SNP array enriched with candidate genes for growth and wood properties to genotype mother trees collected in 20 natural populations covering most of the European distribution of the species. Phenotypic data for total height, polycyclism, root-collar diameter and biomass were obtained from a replicated provenance-progeny trial located in two sites with contrasting environments (Atlantic vs Mediterranean climate). General linear models identified strong associations between growth traits (total height and polycyclism) and four SNPs from the korrigan candidate gene, after multiple testing corrections using false discovery rate. The combined genomic breeding value predictions assessed for the four associated korrigan SNPs by ridge regression-best linear unbiased prediction (RR-BLUP) and cross-validation accounted for up to 8 and 15% of the phenotypic variance for height and polycyclic growth, respectively, and did not improve adding SNPs from other growth-related candidate genes. For root-collar diameter and total biomass, they accounted for 1.6 and 1.1% of the phenotypic variance, respectively, but increased to 15 and 4.1% when other SNPs from lp3.1, lp3.3 and cad were included in RR-BLUP models. These results point towards a desirable integration of candidate-gene studies as a means to pre-select relevant markers, and aid genomic selection in maritime pine breeding programs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Relationship of ZNF423 and CTSO with breast cancer risk in two randomised tamoxifen prevention trials.

    PubMed

    Brentnall, Adam R; Cuzick, Jack; Byers, Helen; Segal, Corrinne; Reuter, Caroline; Detre, Simone; Sestak, Ivana; Howell, Anthony; Powles, Trevor J; Newman, William G; Dowsett, Mitchell

    2016-08-01

    A case-control study from two randomised breast cancer prevention trials of tamoxifen and raloxifene (P-1 and P-2) identified single-nucleotide polymorphisms (SNPs) in or near genes ZNF423 and CTSO as factors which predict which women will derive most anti-cancer benefit from selective oestrogen receptor modulator (SERM) therapy. In this article, we further examine this question using blood samples from two randomised tamoxifen prevention trials: the International Breast Cancer Intervention Study I (IBIS-I) and the Royal Marsden trial (Marsden). A nested case-control study was designed with 2:1 matching in IBIS-I and 1:1 matching in Marsden. The OncoArray was used for genotyping and included two SNPs previously identified (rs8060157 in ZNF423 and rs10030044 near CTSO), and 102 further SNPs within the same regions. Overall, there were 369 cases and 662 controls, with 148 cases and 268 controls from the tamoxifen arms. Odds ratios were estimated by conditional logistic regression, with Wald 95 % confidence intervals. In the tamoxifen arms, the per-allele odds ratio for rs8060157 was 0.99 (95 %CI 0.73-1.34) and 1.00 (95 %CI 0.76-1.33) for rs10030044. In the placebo arm, the odds ratio was 1.10 (95 %CI 0.87-1.40) for rs8060157 and 1.01 (95 %CI 0.79-1.29) for rs10030044. There was no evidence to suggest that other SNPs in the surrounding regions of these SNPs might predict response to tamoxifen. Results from these two prevention trials do not support the earlier findings. rs8060157 in ZNF423 and rs10030044 near CTSO do not appear to predict response to tamoxifen.

  11. Identification of genetic variants predictive of early onset pancreatic cancer through a population science analysis of functional genomic datasets

    PubMed Central

    Chen, Jinyun; Wu, Xifeng; Huang, Yujing; Chen, Wei; Brand, Randall E.; Killary, Ann M.; Sen, Subrata; Frazier, Marsha L.

    2016-01-01

    Biomarkers are critically needed for the early detection of pancreatic cancer (PC) are urgently needed. Our purpose was to identify a panel of genetic variants that, combined, can predict increased risk for early-onset PC and thereby identify individuals who should begin screening at an early age. Previously, we identified genes using a functional genomic approach that were aberrantly expressed in early pathways to PC tumorigenesis. We now report the discovery of single nucleotide polymorphisms (SNPs) in these genes associated with early age at diagnosis of PC using a two-phase study design. In silico and bioinformatics tools were used to examine functional relevance of the identified SNPs. Eight SNPs were consistently associated with age at diagnosis in the discovery phase, validation phase and pooled analysis. Further analysis of the joint effects of these 8 SNPs showed that, compared to participants carrying none of these unfavorable genotypes (median age at PC diagnosis 70 years), those carrying 1–2, 3–4, or 5 or more unfavorable genotypes had median ages at diagnosis of 64, 63, and 62 years, respectively (P = 3.0E–04). A gene-dosage effect was observed, with age at diagnosis inversely related to number of unfavorable genotypes (Ptrend = 1.0E–04). Using bioinformatics tools, we found that all of the 8 SNPs were predicted to play functional roles in the disruption of transcription factor and/or enhancer binding sites and most of them were expression quantitative trait loci (eQTL) of the target genes. The panel of genetic markers identified may serve as susceptibility markers for earlier PC diagnosis. PMID:27486767

  12. Genetic diversity of tyrosine hydroxylase (TH) and dopamine β-hydroxylase (DBH) genes in cattle breeds

    PubMed Central

    Lourenco-Jaramillo, Diana Lelidett; Sifuentes-Rincón, Ana María; Parra-Bracamonte, Gaspar Manuel; de la Rosa-Reyna, Xochitl Fabiola; Segura-Cabrera, Aldo; Arellano-Vera, Williams

    2012-01-01

    DNA from four cattle breeds was used to re-sequence all of the exons and 56% of the introns of the bovine tyrosine hydroxylase (TH) gene and 97% and 13% of the bovine dopamine β-hydroxylase (DBH) coding and non-coding sequences, respectively. Two novel single nucleotide polymorphisms (SNPs) and a microsatellite motif were found in the TH sequences. The DBH sequences contained 62 nucleotide changes, including eight non-synonymous SNPs (nsSNPs) that are of particular interest because they may alter protein function and therefore affect the phenotype. These DBH nsSNPs resulted in amino acid substitutions that were predicted to destabilize the protein structure. Six SNPs (one from TH and five from DBH non-synonymous SNPs) were genotyped in 140 animals; all of them were polymorphic and had a minor allele frequency of > 9%. There were significant differences in the intra- and inter-population haplotype distributions. The haplotype differences between Brahman cattle and the three B. t. taurus breeds (Charolais, Holstein and Lidia) were interesting from a behavioural point of view because of the differences in temperament between these breeds. PMID:22888292

  13. Pharmacogenetics Biomarkers and Their Specific Role in Neoadjuvant Chemoradiotherapy Treatments: An Exploratory Study on Rectal Cancer Patients

    PubMed Central

    Dreussi, Eva; Cecchin, Erika; Polesel, Jerry; Canzonieri, Vincenzo; Agostini, Marco; Boso, Caterina; Belluco, Claudio; Buonadonna, Angela; Lonardi, Sara; Bergamo, Francesca; Gagno, Sara; De Mattia, Elena; Pucciarelli, Salvatore; De Paoli, Antonino; Toffoli, Giuseppe

    2016-01-01

    Background: Pathological complete response (pCR) to neoadjuvant chemoradiotherapy (CRT) in locally advanced rectal cancer (LARC) is still ascribed to a minority of patients. A pathway based-approach could highlight the predictive role of germline single nucleotide polymorphisms (SNPs). The primary aim of this study was to define new predictive biomarkers considering treatment specificities. Secondary aim was to determine new potential predictive biomarkers independent from radiotherapy (RT) dosage and cotreatment with oxaliplatin. Methods: Thirty germ-line SNPs in twenty-one genes were selected according to a pathway-based approach. Genetic analyses were performed on 280 LARC patients who underwent fluoropyrimidine-based CRT. The potential predictive role of these SNPs in determining pathological tumor response was tested in Group 1 (94 patients undergoing also oxaliplatin), Group 2 (73 patients treated with high RT dosage), Group 3 (113 patients treated with standard RT dosage), and in the pooled population (280 patients). Results: Nine new predictive biomarkers were identified in the three groups. The most promising one was rs3136228-MSH6 (p = 0.004) arising from Group 3. In the pooled population, rs1801133-MTHFR showed only a trend (p = 0.073). Conclusion: This exploratory study highlighted new potential predictive biomarkers of neoadjuvant CRT and underlined the importance to strictly define treatment peculiarities in pharmacogenetic analyses. PMID:27608007

  14. Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array.

    PubMed

    Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J

    2012-05-25

    A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the 'Golden Delicious' reference sequence will assist in the continued improvement of the genome sequence assembly for that variety.

  15. Predicting the disease of Alzheimer with SNP biomarkers and clinical data using data mining classification approach: decision tree.

    PubMed

    Erdoğan, Onur; Aydin Son, Yeşim

    2014-01-01

    Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.

  16. SNPs in DNA repair or oxidative stress genes and late subcutaneous fibrosis in patients following single shot partial breast irradiation

    PubMed Central

    2012-01-01

    Background The aim of this study was to evaluate the potential association between single nucleotide polymorphisms related response to radiotherapy injury, such as genes related to DNA repair or enzymes involved in anti-oxidative activities. The paper aims to identify marker genes able to predict an increased risk of late toxicity studying our group of patients who underwent a Single Shot 3D-CRT PBI (SSPBI) after BCS (breast conserving surgery). Methods A total of 57 breast cancer patients who underwent SSPBI were genotyped for SNPs (single nucleotide polymorphisms) in XRCC1, XRCC3, GST and RAD51 by Pyrosequencing technology. Univariate analysis (ORs and 95% CI) was performed to correlate SNPs with the risk of developing ≥ G2 fibrosis or fat necrosis. Results A higher significant risk of developing ≥ G2 fibrosis or fat necrosis in patients with: polymorphic variant GSTP1 (Ile105Val) (OR = 2.9; 95%CI, 0.88-10.14, p = 0.047). Conclusions The presence of some SNPs involved in DNA repair or response to oxidative stress seem to be able to predict late toxicity. Trial Registration ClinicalTrials.gov: NCT01316328 PMID:22272830

  17. Genome-Wide SNP Genotyping to Infer the Effects on Gene Functions in Tomato

    PubMed Central

    Hirakawa, Hideki; Shirasawa, Kenta; Ohyama, Akio; Fukuoka, Hiroyuki; Aoki, Koh; Rothan, Christophe; Sato, Shusei; Isobe, Sachiko; Tabata, Satoshi

    2013-01-01

    The genotype data of 7054 single nucleotide polymorphism (SNP) loci in 40 tomato lines, including inbred lines, F1 hybrids, and wild relatives, were collected using Illumina's Infinium and GoldenGate assay platforms, the latter of which was utilized in our previous study. The dendrogram based on the genotype data corresponded well to the breeding types of tomato and wild relatives. The SNPs were classified into six categories according to their positions in the genes predicted on the tomato genome sequence. The genes with SNPs were annotated by homology searches against the nucleotide and protein databases, as well as by domain searches, and they were classified into the functional categories defined by the NCBI's eukaryotic orthologous groups (KOG). To infer the SNPs' effects on the gene functions, the three-dimensional structures of the 843 proteins that were encoded by the genes with SNPs causing missense mutations were constructed by homology modelling, and 200 of these proteins were considered to carry non-synonymous amino acid substitutions in the predicted functional sites. The SNP information obtained in this study is available at the Kazusa Tomato Genomics Database (http://plant1.kazusa.or.jp/tomato/). PMID:23482505

  18. Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae).

    PubMed

    Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J; Sathyanarayana, N; Egan, Ashley N

    2016-06-30

    Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean.

  19. Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

    PubMed Central

    2014-01-01

    Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. PMID:25080199

  20. An investigation of causes of false positive single nucleotide polymorphisms using simulated reads from a small eukaryote genome.

    PubMed

    Ribeiro, Antonio; Golicz, Agnieszka; Hackett, Christine Anne; Milne, Iain; Stephen, Gordon; Marshall, David; Flavell, Andrew J; Bayer, Micha

    2015-11-11

    Single Nucleotide Polymorphisms (SNPs) are widely used molecular markers, and their use has increased massively since the inception of Next Generation Sequencing (NGS) technologies, which allow detection of large numbers of SNPs at low cost. However, both NGS data and their analysis are error-prone, which can lead to the generation of false positive (FP) SNPs. We explored the relationship between FP SNPs and seven factors involved in mapping-based variant calling - quality of the reference sequence, read length, choice of mapper and variant caller, mapping stringency and filtering of SNPs by read mapping quality and read depth. This resulted in 576 possible factor level combinations. We used error- and variant-free simulated reads to ensure that every SNP found was indeed a false positive. The variation in the number of FP SNPs generated ranged from 0 to 36,621 for the 120 million base pairs (Mbp) genome. All of the experimental factors tested had statistically significant effects on the number of FP SNPs generated and there was a considerable amount of interaction between the different factors. Using a fragmented reference sequence led to a dramatic increase in the number of FP SNPs generated, as did relaxed read mapping and a lack of SNP filtering. The choice of reference assembler, mapper and variant caller also significantly affected the outcome. The effect of read length was more complex and suggests a possible interaction between mapping specificity and the potential for contributing more false positives as read length increases. The choice of tools and parameters involved in variant calling can have a dramatic effect on the number of FP SNPs produced, with particularly poor combinations of software and/or parameter settings yielding tens of thousands in this experiment. Between-factor interactions make simple recommendations difficult for a SNP discovery pipeline but the quality of the reference sequence is clearly of paramount importance. Our findings are also a stark reminder that it can be unwise to use the relaxed mismatch settings provided as defaults by some read mappers when reads are being mapped to a relatively unfinished reference sequence from e.g. a non-model organism in its early stages of genomic exploration.

  1. Validating genetic markers of response to recombinant human growth hormone in children with growth hormone deficiency and Turner syndrome: the PREDICT validation study

    PubMed Central

    Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre

    2016-01-01

    Objective Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Design and methods Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. Results The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes – SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). Conclusions The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. PMID:27651465

  2. Validating genetic markers of response to recombinant human growth hormone in children with growth hormone deficiency and Turner syndrome: the PREDICT validation study.

    PubMed

    Stevens, Adam; Murray, Philip; Wojcik, Jerome; Raelson, John; Koledova, Ekaterina; Chatelain, Pierre; Clayton, Peter

    2016-12-01

    Single-nucleotide polymorphisms (SNPs) associated with the response to recombinant human growth hormone (r-hGH) have previously been identified in growth hormone deficiency (GHD) and Turner syndrome (TS) children in the PREDICT long-term follow-up (LTFU) study (Nbib699855). Here, we describe the PREDICT validation (VAL) study (Nbib1419249), which aimed to confirm these genetic associations. Children with GHD (n = 293) or TS (n = 132) were recruited retrospectively from 29 sites in nine countries. All children had completed 1 year of r-hGH therapy. 48 SNPs previously identified as associated with first year growth response to r-hGH were genotyped. Regression analysis was used to assess the association between genotype and growth response using clinical/auxological variables as covariates. Further analysis was undertaken using random forest classification. The children were younger, and the growth response was higher in VAL study. Direct genotype analysis did not replicate what was found in the LTFU study. However, using exploratory regression models with covariates, a consistent relationship with growth response in both VAL and LTFU was shown for four genes - SOS1 and INPPL1 in GHD and ESR1 and PTPN1 in TS. The random forest analysis demonstrated that only clinical covariates were important in the prediction of growth response in mild GHD (>4 to <10 μg/L on GH stimulation test), however, in severe GHD (≤4 μg/L) several SNPs contributed (in IGF2, GRB10, FOS, IGFBP3 and GHRHR). The PREDICT validation study supports, in an independent cohort, the association of four of 48 genetic markers with growth response to r-hGH treatment in both pre-pubertal GHD and TS children after controlling for clinical/auxological covariates. However, the contribution of these SNPs in a prediction model of first-year response is not sufficient for routine clinical use. © 2016 European Society of Endocrinology.

  3. Polygenic influences on dyslipidemias.

    PubMed

    Dron, Jacqueline S; Hegele, Robert A

    2018-04-01

    Rare large-effect genetic variants underlie monogenic dyslipidemias, whereas common small-effect genetic variants - single nucleotide polymorphisms (SNPs) - have modest influences on lipid traits. Over the past decade, these small-effect SNPs have been shown to cumulatively exert consistent effects on lipid phenotypes under a polygenic framework, which is the focus of this review. Several groups have reported polygenic risk scores assembled from lipid-associated SNPs, and have applied them to their respective phenotypes. For lipid traits in the normal population distribution, polygenic effects quantified by a score that integrates several common polymorphisms account for about 20-30% of genetic variation. Among individuals at the extremes of the distribution, that is, those with clinical dyslipidemia, the polygenic component includes both rare variants with large effects and common polymorphisms: depending on the trait, 20-50% of susceptibility can be accounted for by this assortment of genetic variants. Accounting for polygenic effects increases the numbers of dyslipidemic individuals who can be explained genetically, but a substantial proportion of susceptibility remains unexplained. Whether documenting the polygenic basis of dyslipidemia will affect outcomes in clinical trials or prospective observational studies remains to be determined.

  4. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao

    PubMed Central

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-01-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. PMID:26070980

  5. Variant discovery in the sheepmeat odour and flavour in javanese fat tailed sheep using RNA sequencing

    NASA Astrophysics Data System (ADS)

    Abuzahra, M. A. M.; Jakaria; Listyarini, K.; Furqon, A.; Sumantri, C.; Uddin, M. J.; Gunawan, A.

    2018-05-01

    High-throughput RNA sequencing (RNA-Seq) reveals new challenges for the detection of transcriptome variants (SNPs) in different tissues and species. The aims of this study was to characterize a SNP discovery analysis in the sheep meat odour and flavour transcriptome using RNA-Seq. Six liver samples from divergent sheep meat odour and flavour were analyzed using the Illumina Genome Hiseq 2500 Analyzer. The SNP detection analysis revealed 142 SNPs in sheep meat samples, and a large number of those corresponded to differences between high and low sheep meat odour and flavour ovis genome assembly OAR v4.0. Among them, about 90.4% of genes had multiple polymorphisms within 12 genes (JAML, ANGPTL8, LOC101103463, SEPW1, SCN5A, LOC101113036, DOCK6, GTSE1, KIF12, KCTD17, KANK2, CYP2A6). Several of the SNPs (JAML, CYP2A6, SEPW1, and KIF12) found in this study could be included as suitable markers in genotyping platforms to perform association analyses in commercial populations and apply genomic selection protocols in the sheep meat production.

  6. GrigoraSNPs: Optimized Analysis of SNPs for DNA Forensics.

    PubMed

    Ricke, Darrell O; Shcherbina, Anna; Michaleas, Adam; Fremont-Smith, Philip

    2018-04-16

    High-throughput sequencing (HTS) of single nucleotide polymorphisms (SNPs) enables additional DNA forensic capabilities not attainable using traditional STR panels. However, the inclusion of sets of loci selected for mixture analysis, extended kinship, phenotype, biogeographic ancestry prediction, etc., can result in large panel sizes that are difficult to analyze in a rapid fashion. GrigoraSNP was developed to address the allele-calling bottleneck that was encountered when analyzing SNP panels with more than 5000 loci using HTS. GrigoraSNPs uses a MapReduce parallel data processing on multiple computational threads plus a novel locus-identification hashing strategy leveraging target sequence tags. This tool optimizes the SNP calling module of the DNA analysis pipeline with runtimes that scale linearly with the number of HTS reads. Results are compared with SNP analysis pipelines implemented with SAMtools and GATK. GrigoraSNPs removes a computational bottleneck for processing forensic samples with large HTS SNP panels. Published 2018. This article is a U.S. Government work and is in the public domain in the USA.

  7. SNPs in PTGS2 and LTA Predict Pain and Quality of Life in Long Term Lung Cancer Survivors

    PubMed Central

    Rausch, Sarah M.; Gonzalez, Brian D.; Clark, Matthew M.; Patten, Christi; Felten, Sara; Liu, Heshan; Li, Yafei; Sloan, Jeff; Yang, Ping

    2015-01-01

    PURPOSE Lung cancer survivors report the lowest quality of life relative to other cancer survivors. Pain is one of the most devastating, persistent, and incapacitating symptoms for lung cancer survivors. Prevalence rates vary with 80–100% of survivors experiencing cancer pain and healthcare costs are five times higher in cancer survivors with uncontrolled pain. Cancer pain often has a considerable impact on quality of life among cancer patients and cancer survivors. Therefore, early identification, and treatment is important. Although recent studies have suggested a relationship between single nucleotide polymorphisms (SNPs) in several cytokine and inflammation genes with cancer prognosis, associations with cancer pain are not clear. Therefore, the primary aim of this study was to identify SNPs related to pain in long term lung cancer survivors. PATIENTS AND METHODS Participants were enrolled in the Mayo Clinic Lung Cancer Cohort upon diagnosis of their lung cancer. 1149 Caucasian lung cancer survivors, (440 surviving < 3 years; 354 surviving 3–5 years; and 355 surviving> 5 years) completed study questionnaires and had genetic samples available. Ten SNPS from PTGS2 and LTA genes were selected based on the serum literature. Outcomes included pain, and quality of life as measured by the SF-8. RESULTS Of the 10 SNPs evaluated in LTA and PTGS2 genes, 3 were associated with pain severity (rs5277; rs1799964), social function (rs5277) and mental health (rs5275). These results suggested both specificity and consistency of these inflammatory gene SNPs in predicting pain severity in long term lung cancer survivors. CONCLUSION These results provide support for genetic predisposition to pain severity and may aid in identification of lung cancer survivors at high risk for morbidity and poor QOL. PMID:22464751

  8. Optimizing de novo transcriptome assembly and extending genomic resources for striped catfish (Pangasianodon hypophthalmus).

    PubMed

    Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Njaci, Isaac; Yoon, Byoung-Ha; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

    2015-10-01

    Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomic resources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomic resource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478bp and N50 length of 506bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomic resource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. Development of a Genetic Map for Onion (Allium cepa L.) Using Reference-Free Genotyping-by-Sequencing and SNP Assays

    PubMed Central

    Jo, Jinkwan; Purushotham, Preethi M.; Han, Koeun; Lee, Heung-Ryul; Nah, Gyoungju; Kang, Byoung-Cheorl

    2017-01-01

    Single nucleotide polymorphisms (SNPs) play important roles as molecular markers in plant genomics and breeding studies. Although onion (Allium cepa L.) is an important crop globally, relatively few molecular marker resources have been reported due to its large genome and high heterozygosity. Genotyping-by-sequencing (GBS) offers a greater degree of complexity reduction followed by concurrent SNP discovery and genotyping for species with complex genomes. In this study, GBS was employed for SNP mining in onion, which currently lacks a reference genome. A segregating F2 population, derived from a cross between ‘NW-001’ and ‘NW-002,’ as well as multiple parental lines were used for GBS analysis. A total of 56.15 Gbp of raw sequence data were generated and 1,851,428 SNPs were identified from the de novo assembled contigs. Stringent filtering resulted in 10,091 high-fidelity SNP markers. Robust SNPs that satisfied the segregation ratio criteria and with even distribution in the mapping population were used to construct an onion genetic map. The final map contained eight linkage groups and spanned a genetic length of 1,383 centiMorgans (cM), with an average marker interval of 8.08 cM. These robust SNPs were further analyzed using the high-throughput Fluidigm platform for marker validation. This is the first study in onion to develop genome-wide SNPs using GBS. The resulting SNP markers and developed linkage map will be valuable tools for genetic mapping of important agronomic traits and marker-assisted selection in onion breeding programs. PMID:28959273

  10. QTL-seq approach identified genomic regions and diagnostic markers for rust and late leaf spot resistance in groundnut (Arachis hypogaea L.).

    PubMed

    Pandey, Manish K; Khan, Aamir W; Singh, Vikas K; Vishwakarma, Manish K; Shasidhar, Yaduru; Kumar, Vinay; Garg, Vanika; Bhat, Ramesh S; Chitikineni, Annapurna; Janila, Pasupuleti; Guo, Baozhu; Varshney, Rajeev K

    2017-08-01

    Rust and late leaf spot (LLS) are the two major foliar fungal diseases in groundnut, and their co-occurrence leads to significant yield loss in addition to the deterioration of fodder quality. To identify candidate genomic regions controlling resistance to rust and LLS, whole-genome resequencing (WGRS)-based approach referred as 'QTL-seq' was deployed. A total of 231.67 Gb raw and 192.10 Gb of clean sequence data were generated through WGRS of resistant parent and the resistant and susceptible bulks for rust and LLS. Sequence analysis of bulks for rust and LLS with reference-guided resistant parent assembly identified 3136 single-nucleotide polymorphisms (SNPs) for rust and 66 SNPs for LLS with the read depth of ≥7 in the identified genomic region on pseudomolecule A03. Detailed analysis identified 30 nonsynonymous SNPs affecting 25 candidate genes for rust resistance, while 14 intronic and three synonymous SNPs affecting nine candidate genes for LLS resistance. Subsequently, allele-specific diagnostic markers were identified for three SNPs for rust resistance and one SNP for LLS resistance. Genotyping of one RIL population (TAG 24 × GPBD 4) with these four diagnostic markers revealed higher phenotypic variation for these two diseases. These results suggest usefulness of QTL-seq approach in precise and rapid identification of candidate genomic regions and development of diagnostic markers for breeding applications. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  11. Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.

    PubMed

    Müller, Bárbara S F; Neves, Leandro G; de Almeida Filho, Janeo E; Resende, Márcio F R; Muñoz, Patricio R; Dos Santos, Paulo E T; Filho, Estefano Paludzyszyn; Kirst, Matias; Grattapaglia, Dario

    2017-07-11

    The advent of high-throughput genotyping technologies coupled to genomic prediction methods established a new paradigm to integrate genomics and breeding. We carried out whole-genome prediction and contrasted it to a genome-wide association study (GWAS) for growth traits in breeding populations of Eucalyptus benthamii (n =505) and Eucalyptus pellita (n =732). Both species are of increasing commercial interest for the development of germplasm adapted to environmental stresses. Predictive ability reached 0.16 in E. benthamii and 0.44 in E. pellita for diameter growth. Predictive abilities using either Genomic BLUP or different Bayesian methods were similar, suggesting that growth adequately fits the infinitesimal model. Genomic prediction models using ~5000-10,000 SNPs provided predictive abilities equivalent to using all 13,787 and 19,506 SNPs genotyped in the E. benthamii and E. pellita populations, respectively. No difference was detected in predictive ability when different sets of SNPs were utilized, based on position (equidistantly genome-wide, inside genes, linkage disequilibrium pruned or on single chromosomes), as long as the total number of SNPs used was above ~5000. Predictive abilities obtained by removing relatedness between training and validation sets fell near zero for E. benthamii and were halved for E. pellita. These results corroborate the current view that relatedness is the main driver of genomic prediction, although some short-range historical linkage disequilibrium (LD) was likely captured for E. pellita. A GWAS identified only one significant association for volume growth in E. pellita, illustrating the fact that while genome-wide regression is able to account for large proportions of the heritability, very little or none of it is captured into significant associations using GWAS in breeding populations of the size evaluated in this study. This study provides further experimental data supporting positive prospects of using genome-wide data to capture large proportions of trait heritability and predict growth traits in trees with accuracies equal or better than those attainable by phenotypic selection. Additionally, our results document the superiority of the whole-genome regression approach in accounting for large proportions of the heritability of complex traits such as growth in contrast to the limited value of the local GWAS approach toward breeding applications in forest trees.

  12. Functional and Structural Consequence of Rare Exonic Single Nucleotide Polymorphisms: One Story, Two Tales

    PubMed Central

    Gu, Wanjun; Gurguis, Christopher I.; Zhou, Jin J.; Zhu, Yihua; Ko, Eun-A.; Ko, Jae-Hong; Wang, Ting; Zhou, Tong

    2015-01-01

    Genetic variation arising from single nucleotide polymorphisms (SNPs) is ubiquitously found among human populations. While disease-causing variants are known in some cases, identifying functional or causative variants for most human diseases remains a challenging task. Rare SNPs, rather than common ones, are thought to be more important in the pathology of most human diseases. We propose that rare SNPs should be divided into two categories dependent on whether the minor alleles are derived or ancestral. Derived alleles are less likely to have been purified by evolutionary processes and may be more likely to induce deleterious effects. We therefore hypothesized that the rare SNPs with derived minor alleles would be more important for human diseases and predicted that these variants would have larger functional or structural consequences relative to the rare variants for which the minor alleles are ancestral. We systematically investigated the consequences of the exonic SNPs on protein function, mRNA structure, and translation. We found that the functional and structural consequences are more significant for the rare exonic variants for which the minor alleles are derived. However, this pattern is reversed when the minor alleles are ancestral. Thus, the rare exonic SNPs with derived minor alleles are more likely to be deleterious. Age estimation of rare SNPs confirms that these potentially deleterious SNPs are recently evolved in the human population. These results have important implications for understanding the function of genetic variations in human exonic regions and for prioritizing functional SNPs in genome-wide association studies of human diseases. PMID:26454016

  13. BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

    PubMed Central

    Wang, Junbai; Batmanov, Kirill

    2015-01-01

    Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. PMID:26202972

  14. 1-CMDb: A Curated Database of Genomic Variations of the One-Carbon Metabolism Pathway.

    PubMed

    Bhat, Manoj K; Gadekar, Veerendra P; Jain, Aditya; Paul, Bobby; Rai, Padmalatha S; Satyamoorthy, Kapaettu

    2017-01-01

    The one-carbon metabolism pathway is vital in maintaining tissue homeostasis by driving the critical reactions of folate and methionine cycles. A myriad of genetic and epigenetic events mark the rate of reactions in a tissue-specific manner. Integration of these to predict and provide personalized health management requires robust computational tools that can process multiomics data. The DNA sequences that may determine the chain of biological events and the endpoint reactions within one-carbon metabolism genes remain to be comprehensively recorded. Hence, we designed the one-carbon metabolism database (1-CMDb) as a platform to interrogate its association with a host of human disorders. DNA sequence and network information of a total of 48 genes were extracted from a literature survey and KEGG pathway that are involved in the one-carbon folate-mediated pathway. The information generated, collected, and compiled for all these genes from the UCSC genome browser included the single nucleotide polymorphisms (SNPs), CpGs, copy number variations (CNVs), and miRNAs, and a comprehensive database was created. Furthermore, a significant correlation analysis was performed for SNPs in the pathway genes. Detailed data of SNPs, CNVs, CpG islands, and miRNAs for 48 folate pathway genes were compiled. The SNPs in CNVs (9670), CpGs (984), and miRNAs (14) were also compiled for all pathway genes. The SIFT score, the prediction and PolyPhen score, as well as the prediction for each of the SNPs were tabulated and represented for folate pathway genes. Also included in the database for folate pathway genes were the links to 124 various phenotypes and disease associations as reported in the literature and from publicly available information. A comprehensive database was generated consisting of genomic elements within and among SNPs, CNVs, CpGs, and miRNAs of one-carbon metabolism pathways to facilitate (a) single source of information and (b) integration into large-genome scale network analysis to be developed in the future by the scientific community. The database can be accessed at http://slsdb.manipal.edu/ocm/. © 2017 S. Karger AG, Basel.

  15. The Association of CD81 Polymorphisms with Alloimmunization in Sickle Cell Disease

    PubMed Central

    Tatari-Calderone, Zohreh; Tamouza, Ryad; Le Bouder, Gama P.; Dewan, Ramita; Luban, Naomi L. C.; Lasserre, Jacqueline; Maury, Jacqueline; Lionnet, François; Krishnamoorthy, Rajagopal; Girot, Robert

    2013-01-01

    The goal of the present work was to identify the candidate genetic markers predictive of alloimmunization in sickle cell disease (SCD). Red blood cell (RBC) transfusion is indicated for acute treatment, prevention, and abrogation of some complications of SCD. A well-known consequence of multiple RBC transfusions is alloimmunization. Given that a subset of SCD patients develop multiple RBC allo-/autoantibodies, while others do not in a similar multiple transfusional setting, we investigated a possible genetic basis for alloimmunization. Biomarker(s) which predicts (predict) susceptibility to alloimmunization could identify patients at risk before the onset of a transfusion program and thus may have important implications for clinical management. In addition, such markers could shed light on the mechanism(s) underlying alloimmunization. We genotyped 27 single nucleotide polymorphisms (SNPs) in the CD81, CHRNA10, and ARHG genes in two groups of SCD patients. One group (35) of patients developed alloantibodies, and another (40) had no alloantibodies despite having received multiple transfusions. Two SNPs in the CD81 gene, that encodes molecule involved in the signal modulation of B lymphocytes, show a strong association with alloimmunization. If confirmed in prospective studies with larger cohorts, the two SNPs identified in this retrospective study could serve as predictive biomarkers for alloimmunization. PMID:23762099

  16. An empirical comparison of SNPs and microsatellites for parentage and kinship assignment in a wild sockeye salmon (Oncorhynchus nerka) population.

    PubMed

    Hauser, Lorenz; Baird, Melissa; Hilborn, Ray; Seeb, Lisa W; Seeb, James E

    2011-03-01

    Because of their high variability, microsatellites are still considered the marker of choice for studies on parentage and kinship in wild populations. Nevertheless, single nucleotide polymorphisms (SNPs) are becoming increasing popular in many areas of molecular ecology, owing to their high-throughput, easy transferability between laboratories and low genotyping error. An ongoing discussion concerns the relative power of SNPs compared to microsatellites-that is, how many SNP loci are needed to replace a panel of microsatellites? Here, we evaluate the assignment power of 80 SNPs (H(E) = 0.30, 80 independent alleles) and 11 microsatellites (H(E) = 0.85, 192 independent alleles) in a wild population of about 400 sockeye salmon with two commonly used software packages (Cervus3, Colony2) and, for SNPs only, a newly developed software (SNPPIT). Assignment success was higher for SNPs than for microsatellites, especially for parent pairs, irrespective of the method used. Colony2 assigned a larger proportion of offspring to at least one parent than the other methods, although Cervus and SNPPIT detected more parent pairs. Identification of full-sib groups without parental information from relatedness measures was possible using both marker systems, although explicit reconstruction of such groups in Colony2 was impossible for SNPs because of computation time. Our results confirm the applicability of SNPs for parentage analyses and refute the predictability of assignment success from the number of independent alleles. © 2011 Blackwell Publishing Ltd.

  17. Functional evaluation of genetic variants associated with endometriosis near GREB1.

    PubMed

    Fung, Jenny N; Holdsworth-Carson, Sarah J; Sapkota, Yadav; Zhao, Zhen Zhen; Jones, Lincoln; Girling, Jane E; Paiva, Premila; Healey, Martin; Nyholt, Dale R; Rogers, Peter A W; Montgomery, Grant W

    2015-05-01

    Do DNA variants in the growth regulation by estrogen in breast cancer 1 (GREB1) region regulate endometrial GREB1 expression and increase the risk of developing endometriosis in women? We identified new single nucleotide polymorphisms (SNPs) with strong association with endometriosis at the GREB1 locus although we did not detect altered GREB1 expression in endometriosis patients with defined genotypes. Genome-wide association studies have identified the GREB1 region on chromosome 2p25.1 for increasing endometriosis risk. The differential expression of GREB1 has also been reported by others in association with endometriosis disease phenotype. Fine mapping studies comprehensively evaluated SNPs within the GREB1 region in a large-scale data set (>2500 cases and >4000 controls). Publicly available bioinformatics tools were employed to functionally annotate SNPs showing the strongest association signal with endometriosis risk. Endometrial GREB1 mRNA and protein expression was studied with respect to phases of the menstrual cycle (n = 2-45 per cycle stage) and expression quantitative trait loci (eQTL) analysis for significant SNPs were undertaken for GREB1 [mRNA (n = 94) and protein (n = 44) in endometrium]. Participants in this study are females who provided blood and/or endometrial tissue samples in a hospital setting. The key SNPs were genotyped using Sequenom MassARRAY. The functional roles and regulatory annotations for identified SNPs are predicted by various publicly available bioinformatics tools. Endometrial GREB1 expression work employed qRT-PCR, western blotting and immunohistochemistry studies. Fine mapping results identified a number of SNPs showing stronger association (0.004 < P < 0.032) with endometriosis risk than the original GWAS SNP (rs13394619) (P = 0.034). Some of these SNPs were predicted to have functional roles, for example, interaction with transcription factor motifs. The haplotype (a combination of alleles) formed by the risk alleles from two common SNPs showed significant association (P = 0.026) with endometriosis and epistasis analysis showed no evidence for interaction between the two SNPs, suggesting an additive effect of SNPs on endometriosis risk. In normal human endometrium, GREB1 protein expression was altered depending on the cycle stage (significantly different in late proliferative versus late secretory, P < 0.05) and cell type (glandular epithelium, not stromal cells). However, GREB1 expression in endometriosis cases versus controls and eQTL analyses did not reveal any significant changes. In silico prediction tools are generally based on cell lines different to our tissue and disease of interest. Functional annotations drawn from these analyses should be considered with this limitation in mind. We identified cell-specific and hormone-specific changes in GREB1 protein expression. The lack of a significant difference observed following our GREB1 expression studies may be the result of moderate power on mixed cell populations in the endometrial tissue samples. This study further implicates the GREB1 region on chromosome 2p25.1 and the GREB1 gene with involvement in endometriosis risk. More detailed functional studies are required to determine the role of the novel GREB1 transcripts in endometriosis pathophysiology. Funding for this work was provided by NHMRC Project Grants APP1012245, APP1026033, APP1049472 and APP1046880. There are no competing interests. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. Use of Multiple Metabolic and Genetic Markers to Improve the Prediction of Type 2 Diabetes: the EPIC-Potsdam Study

    PubMed Central

    Schulze, Matthias B.; Weikert, Cornelia; Pischon, Tobias; Bergmann, Manuela M.; Al-Hasani, Hadi; Schleicher, Erwin; Fritsche, Andreas; Häring, Hans-Ulrich; Boeing, Heiner; Joost, Hans-Georg

    2009-01-01

    OBJECTIVE We investigated whether metabolic biomarkers and single nucleotide polymorphisms (SNPs) improve diabetes prediction beyond age, anthropometry, and lifestyle risk factors. RESEARCH DESIGN AND METHODS A case-cohort study within a prospective study was designed. We randomly selected a subcohort (n = 2,500) from 26,444 participants, of whom 1,962 were diabetes free at baseline. Of the 801 incident type 2 diabetes cases identified in the cohort during 7 years of follow-up, 579 remained for analyses after exclusions. Prediction models were compared by receiver operatoring characteristic (ROC) curve and integrated discrimination improvement. RESULTS Case-control discrimination by the lifestyle characteristics (ROC-AUC: 0.8465) improved with plasma glucose (ROC-AUC: 0.8672, P < 0.001) and A1C (ROC-AUC: 0.8859, P < 0.001). ROC-AUC further improved with HDL cholesterol, triglycerides, γ-glutamyltransferase, and alanine aminotransferase (0.9000, P = 0.002). Twenty SNPs did not improve discrimination beyond these characteristics (P = 0.69). CONCLUSIONS Metabolic markers, but not genotyping for 20 diabetogenic SNPs, improve discrimination of incident type 2 diabetes beyond lifestyle risk factors. PMID:19720844

  19. Effective detection of human leukocyte antigen risk alleles in celiac disease using tag single nucleotide polymorphisms.

    PubMed

    Monsuur, Alienke J; de Bakker, Paul I W; Zhernakova, Alexandra; Pinto, Dalila; Verduijn, Willem; Romanos, Jihane; Auricchio, Renata; Lopez, Ana; van Heel, David A; Crusius, J Bart A; Wijmenga, Cisca

    2008-05-28

    The HLA genes, located in the MHC region on chromosome 6p21.3, play an important role in many autoimmune disorders, such as celiac disease (CD), type 1 diabetes (T1D), rheumatoid arthritis, multiple sclerosis, psoriasis and others. Known HLA variants that confer risk to CD, for example, include DQA1*05/DQB1*02 (DQ2.5) and DQA1*03/DQB1*0302 (DQ8). To diagnose the majority of CD patients and to study disease susceptibility and progression, typing these strongly associated HLA risk factors is of utmost importance. However, current genotyping methods for HLA risk factors involve many reactions, and are complicated and expensive. We sought a simple experimental approach using tagging SNPs that predict the CD-associated HLA risk factors. Our tagging approach exploits linkage disequilibrium between single nucleotide polymorphism (SNPs) and the CD-associated HLA risk factors DQ2.5 and DQ8 that indicate direct risk, and DQA1*0201/DQB1*0202 (DQ2.2) and DQA1*0505/DQB1*0301 (DQ7) that attribute to the risk of DQ2.5 to CD. To evaluate the predictive power of this approach, we performed an empirical comparison of the predicted DQ types, based on these six tag SNPs, with those executed with current validated laboratory typing methods of the HLA-DQA1 and -DQB1 genes in three large cohorts. The results were validated in three European celiac populations. Using this method, only six SNPs were needed to predict the risk types carried by >95% of CD patients. We determined that for this tagging approach the sensitivity was >0.991, specificity >0.996 and the predictive value >0.948. Our results show that this tag SNP method is very accurate and provides an excellent basis for population screening for CD. This method is broadly applicable in European populations.

  20. Characterization and machine learning prediction of allele-specific DNA methylation.

    PubMed

    He, Jianlin; Sun, Ming-an; Wang, Zhong; Wang, Qianfei; Li, Qing; Xie, Hehuang

    2015-12-01

    A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events. Published by Elsevier Inc.

  1. Performance of genetic risk factors in prediction of trichloroethylene induced hypersensitivity syndrome.

    PubMed

    Dai, Yufei; Chen, Ying; Huang, Hanlin; Zhou, Wei; Niu, Yong; Zhang, Mingrong; Bin, Ping; Dong, Haiyan; Jia, Qiang; Huang, Jianxun; Yi, Juan; Liao, Qijun; Li, Haishan; Teng, Yanxia; Zang, Dan; Zhai, Qingfeng; Duan, Huawei; Shen, Juan; He, Jiaxi; Meng, Tao; Sha, Yan; Shen, Meili; Ye, Meng; Jia, Xiaowei; Xiang, Yingping; Huang, Huiping; Wu, Qifeng; Shi, Mingming; Huang, Xianqing; Yang, Huanming; Luo, Longhai; Li, Sai; Li, Lin; Zhao, Jinyang; Li, Laiyu; Wang, Jun; Zheng, Yuxin

    2015-07-20

    Trichloroethylene induced hypersensitivity syndrome is dose-independent and potentially life threatening disease, which has become one of the serious occupational health issues and requires intensive treatment. To discover the genetic risk factors and evaluate the performance of risk prediction model for the disease, we conducted genomewide association study and replication study with total of 174 cases and 1761 trichloroethylene-tolerant controls. Fifty seven SNPs that exceeded the threshold for genome-wide significance (P < 5 × 10(-8)) were screened to relate with the disease, among which two independent SNPs were identified, that is rs2857281 at MICA (odds ratio, 11.92; P meta = 1.33 × 10(-37)) and rs2523557 between HLA-B and MICA (odds ratio, 7.33; P meta = 8.79 × 10(-35)). The genetic risk score with these two SNPs explains at least 20.9% of the disease variance and up to 32.5-fold variation in inter-individual risk. Combining of two SNPs as predictors for the disease would have accuracy of 80.73%, the area under receiver operator characteristic curves (AUC) scores was 0.82 with sensitivity of 74% and specificity of 85%, which was considered to have excellent discrimination for the disease, and could be considered for translational application for screening employees before exposure.

  2. SNP-markers in Allium species to facilitate introgression breeding in onion.

    PubMed

    Scholten, Olga E; van Kaauwen, Martijn P W; Shahin, Arwa; Hendrickx, Patrick M; Keizer, L C Paul; Burger, Karin; van Heusden, Adriaan W; van der Linden, C Gerard; Vosman, Ben

    2016-08-31

    Within onion, Allium cepa L., the availability of disease resistance is limited. The identification of sources of resistance in related species, such as Allium roylei and Allium fistulosum, was a first step towards the improvement of onion cultivars by breeding. SNP markers linked to resistance and polymorphic between these related species and onion cultivars are a valuable tool to efficiently introgress disease resistance genes. In this paper we describe the identification and validation of SNP markers valuable for onion breeding. Transcriptome sequencing resulted in 192 million RNA seq reads from the interspecific F1 hybrid between A. roylei and A. fistulosum (RF) and nine onion cultivars. After assembly, reliable SNPs were discovered in about 36 % of the contigs. For genotyping of the interspecific three-way cross population, derived from a cross between an onion cultivar and the RF (CCxRF), 1100 SNPs that are polymorphic in RF and monomorphic in the onion cultivars (RF SNPs) were selected for the development of KASP assays. A molecular linkage map based on 667 RF-SNP markers was constructed for CCxRF. In addition, KASP assays were developed for 1600 onion-SNPs (SNPs polymorphic among onion cultivars). A second linkage map was constructed for an F2 of onion x A. roylei (F2(CxR)) that consisted of 182 onion-SNPs and 119 RF-SNPs, and 76 previously mapped markers. Markers co-segregating in both the F2(CxR) and the CCxRF population were used to assign the linkage groups of RF to onion chromosomes. To validate usefulness of these SNP markers, QTL mapping was applied in the CCxRF population that segregates for resistance to Botrytis squamosa and resulted in a QTL for resistance on chromosome 6 of A. roylei. Our research has more than doubled the publicly available marker sequences of expressed onion genes and two onion-related species. It resulted in a detailed genetic map for the interspecific CCxRF population. This is the first paper that reports the detection of a QTL for resistance to B. squamosa in A. roylei.

  3. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    PubMed

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts. The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.

  4. Associations between RNA splicing regulatory variants of stemness-related genes and racial disparities in susceptibility to prostate cancer.

    PubMed

    Wang, Yanru; Freedman, Jennifer A; Liu, Hongliang; Moorman, Patricia G; Hyslop, Terry; George, Daniel J; Lee, Norman H; Patierno, Steven R; Wei, Qingyi

    2017-08-15

    Evidence suggests that cells with a stemness phenotype play a pivotal role in oncogenesis, and prostate cells exhibiting this phenotype have been identified. We used two genome-wide association study (GWAS) datasets of African descendants, from the Multiethnic/Minority Cohort Study of Diet and Cancer (MEC) and the Ghana Prostate Study, and two GWAS datasets of non-Hispanic whites, from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and the Breast and Prostate Cancer Cohort Consortium (BPC3), to analyze the associations between genetic variants of stemness-related genes and racial disparities in susceptibility to prostate cancer. We evaluated associations of single-nucleotide polymorphisms (SNPs) in 25 stemness-related genes with prostate cancer risk in 1,609 cases and 2,550 controls of non-Hispanic whites (4,934 SNPs) and 1,144 cases and 1,116 controls of African descendants (5,448 SNPs) with correction by false discovery rate ≤0.2. We identified 32 SNPs in five genes (TP63, ALDH1A1, WNT1, MET and EGFR) that were significantly associated with prostate cancer risk, of which six SNPs in three genes (TP63, ALDH1A1 and WNT1) and eight EGFR SNPs showed heterogeneity in susceptibility between these two racial groups. In addition, 13 SNPs in MET and one in ALDH1A1 were found only in African descendants. The in silico bioinformatics analyses revealed that EGFR rs2072454 and SNPs in linkage with the identified SNPs in MET and ALDH1A1 (r 2  > 0.6) were predicted to regulate RNA splicing. These variants may serve as novel biomarkers for racial disparities in prostate cancer risk. © 2017 UICC.

  5. Genetic variants in the vitamin D pathway and breast cancer disease-free survival

    PubMed Central

    Brewster, Abenaa M.

    2013-01-01

    Epidemiological studies have investigated the association between vitamin D pathway genes and breast cancer risk; however, little is known about the association between vitamin D pathway genes and breast cancer prognosis. In a retrospective cohort of 1029 patients with early-stage breast cancer, we analyzed the association between 106 tagging single nucleotide polymorphisms (SNPs) in eight vitamin D pathway genes and breast cancer disease-free survival (DFS) using Cox regression analysis adjusted for known prognostic variables. Using a false discovery rate of 10%, six intronic SNPs were significantly associated with poorer DFS: retinoid-X receptor alpha (RXRA) SNPs (rs881658, rs11185659, rs10881583, rs881657 and rs7864987) and plasminogen activator and urokinase receptor (PLAUR) SNP (rs4251864). Treatment received (no systemic therapy, hormone therapy alone or chemotherapy) was an effect modifier of the RXRA SNPs association with DFS (P < 0.05); therefore, we stratified further analysis by treatment group. Among patients who did not receive systemic therapy, RXRA SNP [rs10881583 (P = 0.02)] was associated with poorer DFS, and among patients who received chemotherapy, RXRA SNPs (rs881658, rs11185659, rs10881583, rs881657 and rs7864987) were associated with poorer DFS (P < 0.001 for all SNPs). However, RXRA SNPs: rs10881583 (P < 0.001) and rs881657 (P = 0.02) were associated with improved DFS in patients treated with hormone therapy alone. Our results suggest that SNPs in the RXRA and PLAUR genes in the vitamin D pathway may contribute to breast cancer DFS. In particular, SNPs in RXRA may predict for poorer or improved DFS in patients, according to type of systemic treatment received. If validated, these markers could be used for risk stratification of breast cancer patients. PMID:23180655

  6. Optimizing Training Population Size and Genotyping Strategy for Genomic Prediction Using Association Study Results and Pedigree Information. A Case of Study in Advanced Wheat Breeding Lines.

    PubMed

    Cericola, Fabio; Jahoor, Ahmed; Orabi, Jihad; Andersen, Jeppe R; Janss, Luc L; Jensen, Just

    2017-01-01

    Wheat breeding programs generate a large amount of variation which cannot be completely explored because of limited phenotyping throughput. Genomic prediction (GP) has been proposed as a new tool which provides breeding values estimations without the need of phenotyping all the material produced but only a subset of it named training population (TP). However, genotyping of all the accessions under analysis is needed and, therefore, optimizing TP dimension and genotyping strategy is pivotal to implement GP in commercial breeding schemes. Here, we explored the optimum TP size and we integrated pedigree records and genome wide association studies (GWAS) results to optimize the genotyping strategy. A total of 988 advanced wheat breeding lines were genotyped with the Illumina 15K SNPs wheat chip and phenotyped across several years and locations for yield, lodging, and starch content. Cross-validation using the largest possible TP size and all the SNPs available after editing (~11k), yielded predictive abilities (rGP) ranging between 0.5-0.6. In order to explore the Training population size, rGP were computed using progressively smaller TP. These exercises showed that TP of around 700 lines were enough to yield the highest observed rGP. Moreover, rGP were calculated by randomly reducing the SNPs number. This showed that around 1K markers were enough to reach the highest observed rGP. GWAS was used to identify markers associated with the traits analyzed. A GWAS-based selection of SNPs resulted in increased rGP when compared with random selection and few hundreds SNPs were sufficient to obtain the highest observed rGP. For each of these scenarios, advantages of adding the pedigree information were shown. Our results indicate that moderate TP sizes were enough to yield high rGP and that pedigree information and GWAS results can be used to greatly optimize the genotyping strategy.

  7. Simple Decision-Analytic Functions of the AUC for Ruling Out a Risk Prediction Model and an Added Predictor.

    PubMed

    Baker, Stuart G

    2018-02-01

    When using risk prediction models, an important consideration is weighing performance against the cost (monetary and harms) of ascertaining predictors. The minimum test tradeoff (MTT) for ruling out a model is the minimum number of all-predictor ascertainments per correct prediction to yield a positive overall expected utility. The MTT for ruling out an added predictor is the minimum number of added-predictor ascertainments per correct prediction to yield a positive overall expected utility. An approximation to the MTT for ruling out a model is 1/[P (H(AUC model )], where H(AUC) = AUC - {½ (1-AUC)} ½ , AUC is the area under the receiver operating characteristic (ROC) curve, and P is the probability of the predicted event in the target population. An approximation to the MTT for ruling out an added predictor is 1 /[P {(H(AUC Model:2 ) - H(AUC Model:1 )], where Model 2 includes an added predictor relative to Model 1. The latter approximation requires the Tangent Condition that the true positive rate at the point on the ROC curve with a slope of 1 is larger for Model 2 than Model 1. These approximations are suitable for back-of-the-envelope calculations. For example, in a study predicting the risk of invasive breast cancer, Model 2 adds to the predictors in Model 1 a set of 7 single nucleotide polymorphisms (SNPs). Based on the AUCs and the Tangent Condition, an MTT of 7200 was computed, which indicates that 7200 sets of SNPs are needed for every correct prediction of breast cancer to yield a positive overall expected utility. If ascertaining the SNPs costs $500, this MTT suggests that SNP ascertainment is not likely worthwhile for this risk prediction.

  8. Transcriptome assembly and identification of genes and SNPs associated with growth traits in largemouth bass (Micropterus salmoides).

    PubMed

    Li, Shengjie; Liu, Hao; Bai, Junjie; Zhu, Xinping

    2017-04-01

    Growth is one of the most crucial economic traits of all aquaculture species, but the molecular mechanisms involved in growth of largemouth bass (Micropterus salmoides) are poorly understood. The objective of this study was to screen growth-related genes of M. salmoides by RNA sequencing and identify growth-related single-nucleotide polymorphism (SNP) markers through a growth association study. The muscle transcriptomes of fast- and slow-growing largemouth bass were obtained using the RNA-Seq technique. A total of 54,058,178 and 54,742,444 qualified Illumina read pairs were obtained for the fast-growing and slow-growing groups, respectively, giving rise to 4,865,236,020 and 4,926,819,960 total clean bases, respectively. Gene expression profiling showed that 3,530 unigenes were differentially expressed between the fast-growing and slow-growing phenotypes (false discovery rate ≤0.001, the absolute value of log 2 (fold change) ≥1), including 1,441 up-regulated and 2,889 down-regulated unigenes in the fast-growing largemouth bass. Analysis of these genes revealed that several signalling pathways, including the growth hormone-insulin-like growth factor 1 axis and signalling pathway, the glycolysis pathway, and the myostatin/transforming growth factor beta signalling pathway, as well as heat shock protein, cytoskeleton, and myofibril component genes might be associated with muscle growth. From these genes, 10 genes with putative SNPs were selected, and 17 SNPs were genotyped successfully. Marker-trait analysis in 340 individuals of Youlu No. 1 largemouth bass revealed three SNPs associated with growth in key genes (phosphoenolpyruvate carboxykinase 1, FOXO3b, and heat shock protein beta-1). This research provides information about key genes and SNPs related to growth, providing new clues to understanding the molecular basis of largemouth bass growth.

  9. Genetic Diversity and Population Structure of Whitebark Pine (Pinus albicaulis Engelm.) in Western North America

    PubMed Central

    Liu, Jun-Jun; Sniezko, Richard; Murray, Michael; Wang, Ning; Chen, Hao; Zamany, Arezoo; Sturrock, Rona N.; Savin, Douglas; Kegley, Angelia

    2016-01-01

    Whitebark pine (WBP, Pinus albicaulis Engelm.) is an endangered conifer species due to heavy mortality from white pine blister rust (WPBR, caused by Cronartium ribicola) and mountain pine beetle (Dendroctonus ponderosae). Information about genetic diversity and population structure is of fundamental importance for its conservation and restoration. However, current knowledge on the genetic constitution and genomic variation is still limited for WBP. In this study, an integrated genomics approach was applied to characterize seed collections from WBP breeding programs in western North America. RNA-seq analysis was used for de novo assembly of the WBP needle transcriptome, which contains 97,447 protein-coding transcripts. Within the transcriptome, single nucleotide polymorphisms (SNPs) were discovered, and more than 22,000 of them were non-synonymous SNPs (ns-SNPs). Following the annotation of genes with ns-SNPs, 216 ns-SNPs within candidate genes with putative functions in disease resistance and plant defense were selected to design SNP arrays for high-throughput genotyping. Among these SNP loci, 71 were highly polymorphic, with sufficient variation to identify a unique genotype for each of the 371 individuals originating from British Columbia (Canada), Oregon and Washington (USA). A clear genetic differentiation was evident among seed families. Analyses of genetic spatial patterns revealed varying degrees of diversity and the existence of several genetic subgroups in the WBP breeding populations. Genetic components were associated with geographic variables and phenotypic rating of WPBR disease severity across landscapes, which may facilitate further identification of WBP genotypes and gene alleles contributing to local adaptation and quantitative resistance to WPBR. The WBP genomic resources developed here provide an invaluable tool for further studies and for exploitation and utilization of the genetic diversity preserved within this endangered conifer and other five-needle pines. PMID:27992468

  10. Structural impact analysis of missense SNPs present in the uroguanylin gene by long-term molecular dynamics simulations.

    PubMed

    Marcolino, Antonio C S; Porto, William F; Pires, Állan S; Franco, Octavio L; Alencar, Sérgio A

    2016-12-07

    The guanylate cyclase activator 2B, also known as uroguanylin, is part of the guanylin peptide family, which includes peptides such as guanylin and lymphoguanylin. The guanylin peptides could be related to sodium absorption inhibition and water secretion induction and their dysfunction may be related to various pathologies such as chronic renal failure, congestive heart failure and nephrotic syndrome. Besides, uroguanylin point mutations have been associated with essential hypertension. However, currently there are no studies on the impact of missense SNPs on uroguanylin structure. This study applied in silico SNP impact prediction tools to evaluate the impact of uroguanylin missense SNPs and to filter those considered as convergent deleterious, which were then further analyzed through long-term molecular dynamics simulations of 1μs of duration. The simulations suggested that all missense SNPs considered as convergent deleterious caused some kind of structural change to the uroguanylin peptide. Additionally, four of these SNPs were also shown to cause modifications in peptide flexibility, possibly resulting in functional changes. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. Role of genetic variation in docetaxel-induced neutropenia and pharmacokinetics.

    PubMed

    Nieuweboer, A J M; Smid, M; de Graan, A-J M; Elbouazzaoui, S; de Bruijn, P; Eskens, F A L M; Hamberg, P; Martens, J W M; Sparreboom, A; de Wit, R; van Schaik, R H N; Mathijssen, R H J

    2016-11-01

    Docetaxel is used for treatment of several solid malignancies. In this study, we aimed for predicting docetaxel clearance and docetaxel-induced neutropenia by developing several genetic models. Therefore, pharmacokinetic data and absolute neutrophil counts (ANCs) of 213 docetaxel-treated cancer patients were collected. Next, patients were genotyped for 1936 single nucleotide polymorphisms (SNPs) in 225 genes using the drug-metabolizing enzymes and transporters platform and thereafter split into two cohorts. The combination of SNPs that best predicted severe neutropenia or low clearance was selected in one cohort and validated in the other. Patients with severe neutropenia had lower docetaxel clearance than patients with ANCs in the normal range (P=0.01). Severe neutropenia was predicted with 70% sensitivity. True low clearance (1 s.d.

  12. A Larger Chocolate Chip-Development of a 15K Theobroma cacao L. SNP Array to Create High-Density Linkage Maps.

    PubMed

    Livingstone, Donald; Stack, Conrad; Mustiga, Guiliana M; Rodezno, Dayana C; Suarez, Carmen; Amores, Freddy; Feltus, Frank A; Mockaitis, Keithanne; Cornejo, Omar E; Motamayor, Juan C

    2017-01-01

    Cacao ( Theobroma cacao L.) is an important cash crop in tropical regions around the world and has a rich agronomic history in South America. As a key component in the cosmetic and confectionary industries, millions of people worldwide use products made from cacao, ranging from shampoo to chocolate. An Illumina Infinity II array was created using 13,530 SNPs identified within a small diversity panel of cacao. Of these SNPs, 12,643 derive from variation within annotated cacao genes. The genotypes of 3,072 trees were obtained, including two mapping populations from Ecuador. High-density linkage maps for these two populations were generated and compared to the cacao genome assembly. Phenotypic data from these populations were combined with the linkage maps to identify the QTLs for yield and disease resistance.

  13. Folate network genetic variation, plasma homocysteine, and global genomic methylation content: a genetic association study

    PubMed Central

    2011-01-01

    Background Sequence variants in genes functioning in folate-mediated one-carbon metabolism are hypothesized to lead to changes in levels of homocysteine and DNA methylation, which, in turn, are associated with risk of cardiovascular disease. Methods 330 SNPs in 52 genes were studied in relation to plasma homocysteine and global genomic DNA methylation. SNPs were selected based on functional effects and gene coverage, and assays were completed on the Illumina Goldengate platform. Age-, smoking-, and nutrient-adjusted genotype--phenotype associations were estimated in regression models. Results Using a nominal P ≤ 0.005 threshold for statistical significance, 20 SNPs were associated with plasma homocysteine, 8 with Alu methylation, and 1 with LINE-1 methylation. Using a more stringent false discovery rate threshold, SNPs in FTCD, SLC19A1, and SLC19A3 genes remained associated with plasma homocysteine. Gene by vitamin B-6 interactions were identified for both Alu and LINE-1 methylation, and epistatic interactions with the MTHFR rs1801133 SNP were identified for the plasma homocysteine phenotype. Pleiotropy involving the MTHFD1L and SARDH genes for both plasma homocysteine and Alu methylation phenotypes was identified. Conclusions No single gene was associated with all three phenotypes, and the set of the most statistically significant SNPs predictive of homocysteine or Alu or LINE-1 methylation was unique to each phenotype. Genetic variation in folate-mediated one-carbon metabolism, other than the well-known effects of the MTHFR c.665C>T (known as c.677 C>T, rs1801133, p.Ala222Val), is predictive of cardiovascular disease biomarkers. PMID:22103680

  14. Association of Genetic Variants with Self-Assessed Color Categories in Brazilians

    PubMed Central

    Durso, Danielle Fernandes; Bydlowski, Sergio Paulo; Hutz, Mara Helena; Suarez-Kurtz, Guilherme; Magalhães, Tiago R.; Junho Pena, Sérgio Danilo

    2014-01-01

    The Brazilian population was formed by extensive admixture of three different ancestral roots: Amerindians, Europeans and Africans. Our previous work has shown that at an individual level, ancestry, as estimated using molecular markers, was a poor predictor of color in Brazilians. We now investigate if SNPs known to be associated with human skin pigmentation can be used to predict color in Brazilians. For that, we studied the association of fifteen SNPs, previously known to be linked with skin color, in 243 unrelated Brazilian individuals self-identified as White, Browns or Blacks from Rio de Janeiro and 212 unrelated Brazilian individuals self-identified as White or Blacks from São Paulo. The significance of association of SNP genotypes with self-assessed color was evaluated using partial regression analysis. After controlling for ancestry estimates as covariates, only four SNPs remained significantly associated with skin pigmentation: rs1426654 and rs2555364 within SLC24A5, rs16891982 at SLC45A2 and rs1042602 at TYR. These loci are known to be involved in melanin synthesis or transport of melanosomes. We found that neither genotypes of these SNPs, nor their combination with biogeographical ancestry in principal component analysis, could predict self-assessed color in Brazilians at an individual level. However, significant correlations did emerge at group level, demonstrating that even though elements other than skin, eye and hair pigmentation do influence self-assessed color in Brazilians, the sociological act of self-classification is still substantially dependent of genotype at these four SNPs. PMID:24416183

  15. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Pei-Chun; Chen, Yen-Ching; Research Center for Gene, Environment, and Human Health, College of Public Health, National Taiwan University, Taiwan

    Purpose: To identify germline polymorphisms to predict concurrent chemoradiation therapy (CCRT) response in esophageal cancer patients. Materials and Methods: A total of 139 esophageal cancer patients treated with CCRT (cisplatin-based chemotherapy combined with 40 Gy of irradiation) and subsequent esophagectomy were recruited at the National Taiwan University Hospital between 1997 and 2008. After excluding confounding factors (i.e., females and patients aged {>=}70 years), 116 patients were enrolled to identify single nucleotide polymorphisms (SNPs) associated with specific CCRT responses. Genotyping arrays and mass spectrometry were used sequentially to determine germline polymorphisms from blood samples. These polymorphisms remain stable throughout disease progression,more » unlike somatic mutations from tumor tissues. Two-stage design and additive genetic models were adopted in this study. Results: From the 26 SNPs identified in the first stage, 2 SNPs were found to be significantly associated with CCRT response in the second stage. Single nucleotide polymorphism rs16863886, located between SGPP2 and FARSB on chromosome 2q36.1, was significantly associated with a 3.93-fold increase in pathologic complete response to CCRT (95% confidence interval 1.62-10.30) under additive models. Single nucleotide polymorphism rs4954256, located in ZRANB3 on chromosome 2q21.3, was associated with a 3.93-fold increase in pathologic complete response to CCRT (95% confidence interval 1.57-10.87). The predictive accuracy for CCRT response was 71.59% with these two SNPs combined. Conclusions: This is the first study to identify germline polymorphisms with a high accuracy for predicting CCRT response in the treatment of esophageal cancer.« less

  16. Risk Prediction for Epithelial Ovarian Cancer in 11 United States–Based Case-Control Studies: Incorporation of Epidemiologic Risk Factors and 17 Confirmed Genetic Loci

    PubMed Central

    Clyde, Merlise A.; Palmieri Weber, Rachel; Iversen, Edwin S.; Poole, Elizabeth M.; Doherty, Jennifer A.; Goodman, Marc T.; Ness, Roberta B.; Risch, Harvey A.; Rossing, Mary Anne; Terry, Kathryn L.; Wentzensen, Nicolas; Whittemore, Alice S.; Anton-Culver, Hoda; Bandera, Elisa V.; Berchuck, Andrew; Carney, Michael E.; Cramer, Daniel W.; Cunningham, Julie M.; Cushing-Haugen, Kara L.; Edwards, Robert P.; Fridley, Brooke L.; Goode, Ellen L.; Lurie, Galina; McGuire, Valerie; Modugno, Francesmary; Moysich, Kirsten B.; Olson, Sara H.; Pearce, Celeste Leigh; Pike, Malcolm C.; Rothstein, Joseph H.; Sellers, Thomas A.; Sieh, Weiva; Stram, Daniel; Thompson, Pamela J.; Vierkant, Robert A.; Wicklund, Kristine G.; Wu, Anna H.; Ziogas, Argyrios; Tworoger, Shelley S.; Schildkraut, Joellen M.

    2016-01-01

    Previously developed models for predicting absolute risk of invasive epithelial ovarian cancer have included a limited number of risk factors and have had low discriminatory power (area under the receiver operating characteristic curve (AUC) < 0.60). Because of this, we developed and internally validated a relative risk prediction model that incorporates 17 established epidemiologic risk factors and 17 genome-wide significant single nucleotide polymorphisms (SNPs) using data from 11 case-control studies in the United States (5,793 cases; 9,512 controls) from the Ovarian Cancer Association Consortium (data accrued from 1992 to 2010). We developed a hierarchical logistic regression model for predicting case-control status that included imputation of missing data. We randomly divided the data into an 80% training sample and used the remaining 20% for model evaluation. The AUC for the full model was 0.664. A reduced model without SNPs performed similarly (AUC = 0.649). Both models performed better than a baseline model that included age and study site only (AUC = 0.563). The best predictive power was obtained in the full model among women younger than 50 years of age (AUC = 0.714); however, the addition of SNPs increased the AUC the most for women older than 50 years of age (AUC = 0.638 vs. 0.616). Adapting this improved model to estimate absolute risk and evaluating it in prospective data sets is warranted. PMID:27698005

  17. Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae)

    PubMed Central

    Vatanparast, Mohammad; Shetty, Prateek; Chopra, Ratan; Doyle, Jeff J.; Sathyanarayana, N.; Egan, Ashley N.

    2016-01-01

    Winged bean, Psophocarpus tetragonolobus (L.) DC., is similar to soybean in yield and nutritional value but more viable in tropical conditions. Here, we strengthen genetic resources for this orphan crop by producing a de novo transcriptome assembly and annotation of two Sri Lankan accessions (denoted herein as CPP34 [PI 491423] and CPP37 [PI 639033]), developing simple sequence repeat (SSR) markers, and identifying single nucleotide polymorphisms (SNPs) between geographically separated genotypes. A combined assembly based on 804,757 reads from two accessions produced 16,115 contigs with an N50 of 889 bp, over 90% of which has significant sequence similarity to other legumes. Combining contigs with singletons produced 97,241 transcripts. We identified 12,956 SSRs, including 2,594 repeats for which primers were designed and 5,190 high-confidence SNPs between Sri Lankan and Nigerian genotypes. The transcriptomic data sets generated here provide new resources for gene discovery and marker development in this orphan crop, and will be vital for future plant breeding efforts. We also analyzed the soybean trypsin inhibitor (STI) gene family, important plant defense genes, in the context of related legumes and found evidence for radiation of the Kunitz trypsin inhibitor (KTI) gene family within winged bean. PMID:27356763

  18. Whole Genome Sequencing of Greater Amberjack (Seriola dumerili) for SNP Identification on Aligned Scaffolds and Genome Structural Variation Analysis Using Parallel Resequencing

    PubMed Central

    Aokic, Jun-ya; Kawase, Junya; Hamada, Kazuhisa; Fujimoto, Hiroshi; Yamamoto, Ikki; Usuki, Hironori

    2018-01-01

    Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8 Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence. PMID:29785397

  19. Transfer of genetic therapy across human populations: molecular targets for increasing patient coverage in repeat expansion diseases

    PubMed Central

    Varela, Miguel A; Curtis, Helen J; Douglas, Andrew GL; Hammond, Suzan M; O'Loughlin, Aisling J; Sobrido, Maria J; Scholefield, Janine; Wood, Matthew JA

    2016-01-01

    Allele-specific gene therapy aims to silence expression of mutant alleles through targeting of disease-linked single-nucleotide polymorphisms (SNPs). However, SNP linkage to disease varies between populations, making such molecular therapies applicable only to a subset of patients. Moreover, not all SNPs have the molecular features necessary for potent gene silencing. Here we provide knowledge to allow the maximisation of patient coverage by building a comprehensive understanding of SNPs ranked according to their predicted suitability toward allele-specific silencing in 14 repeat expansion diseases: amyotrophic lateral sclerosis and frontotemporal dementia, dentatorubral-pallidoluysian atrophy, myotonic dystrophy 1, myotonic dystrophy 2, Huntington's disease and several spinocerebellar ataxias. Our systematic analysis of DNA sequence variation shows that most annotated SNPs are not suitable for potent allele-specific silencing across populations because of suboptimal sequence features and low variability (>97% in HD). We suggest maximising patient coverage by selecting SNPs with high heterozygosity across populations, and preferentially targeting SNPs that lead to purine:purine mismatches in wild-type alleles to obtain potent allele-specific silencing. We therefore provide fundamental knowledge on strategies for optimising patient coverage of therapeutics for microsatellite expansion disorders by linking analysis of population genetic variation to the selection of molecular targets. PMID:25990798

  20. Transfer of genetic therapy across human populations: molecular targets for increasing patient coverage in repeat expansion diseases.

    PubMed

    Varela, Miguel A; Curtis, Helen J; Douglas, Andrew G L; Hammond, Suzan M; O'Loughlin, Aisling J; Sobrido, Maria J; Scholefield, Janine; Wood, Matthew J A

    2016-02-01

    Allele-specific gene therapy aims to silence expression of mutant alleles through targeting of disease-linked single-nucleotide polymorphisms (SNPs). However, SNP linkage to disease varies between populations, making such molecular therapies applicable only to a subset of patients. Moreover, not all SNPs have the molecular features necessary for potent gene silencing. Here we provide knowledge to allow the maximisation of patient coverage by building a comprehensive understanding of SNPs ranked according to their predicted suitability toward allele-specific silencing in 14 repeat expansion diseases: amyotrophic lateral sclerosis and frontotemporal dementia, dentatorubral-pallidoluysian atrophy, myotonic dystrophy 1, myotonic dystrophy 2, Huntington's disease and several spinocerebellar ataxias. Our systematic analysis of DNA sequence variation shows that most annotated SNPs are not suitable for potent allele-specific silencing across populations because of suboptimal sequence features and low variability (>97% in HD). We suggest maximising patient coverage by selecting SNPs with high heterozygosity across populations, and preferentially targeting SNPs that lead to purine:purine mismatches in wild-type alleles to obtain potent allele-specific silencing. We therefore provide fundamental knowledge on strategies for optimising patient coverage of therapeutics for microsatellite expansion disorders by linking analysis of population genetic variation to the selection of molecular targets.

  1. Determining Effects of Non-synonymous SNPs on Protein-Protein Interactions using Supervised and Semi-supervised Learning

    PubMed Central

    Zhao, Nan; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

    2014-01-01

    Single nucleotide polymorphisms (SNPs) are among the most common types of genetic variation in complex genetic disorders. A growing number of studies link the functional role of SNPs with the networks and pathways mediated by the disease-associated genes. For example, many non-synonymous missense SNPs (nsSNPs) have been found near or inside the protein-protein interaction (PPI) interfaces. Determining whether such nsSNP will disrupt or preserve a PPI is a challenging task to address, both experimentally and computationally. Here, we present this task as three related classification problems, and develop a new computational method, called the SNP-IN tool (non-synonymous SNP INteraction effect predictor). Our method predicts the effects of nsSNPs on PPIs, given the interaction's structure. It leverages supervised and semi-supervised feature-based classifiers, including our new Random Forest self-learning protocol. The classifiers are trained based on a dataset of comprehensive mutagenesis studies for 151 PPI complexes, with experimentally determined binding affinities of the mutant and wild-type interactions. Three classification problems were considered: (1) a 2-class problem (strengthening/weakening PPI mutations), (2) another 2-class problem (mutations that disrupt/preserve a PPI), and (3) a 3-class classification (detrimental/neutral/beneficial mutation effects). In total, 11 different supervised and semi-supervised classifiers were trained and assessed resulting in a promising performance, with the weighted f-measure ranging from 0.87 for Problem 1 to 0.70 for the most challenging Problem 3. By integrating prediction results of the 2-class classifiers into the 3-class classifier, we further improved its performance for Problem 3. To demonstrate the utility of SNP-IN tool, it was applied to study the nsSNP-induced rewiring of two disease-centered networks. The accurate and balanced performance of SNP-IN tool makes it readily available to study the rewiring of large-scale protein-protein interaction networks, and can be useful for functional annotation of disease-associated SNPs. SNIP-IN tool is freely accessible as a web-server at http://korkinlab.org/snpintool/. PMID:24784581

  2. Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.

    PubMed

    Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos

    2015-08-01

    Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  3. Structure-based activity prediction of CYP21A2 stability variants: A survey of available gene variations.

    PubMed

    Bruque, Carlos D; Delea, Marisol; Fernández, Cecilia S; Orza, Juan V; Taboas, Melisa; Buzzalino, Noemí; Espeche, Lucía D; Solari, Andrea; Luccerini, Verónica; Alba, Liliana; Nadra, Alejandro D; Dain, Liliana

    2016-12-14

    Congenital adrenal hyperplasia due to 21-hydroxylase deficiency accounts for 90-95% of CAH cases. In this work we performed an extensive survey of mutations and SNPs modifying the coding sequence of the CYP21A2 gene. Using bioinformatic tools and two plausible CYP21A2 structures as templates, we initially classified all known mutants (n = 343) according to their putative functional impacts, which were either reported in the literature or inferred from structural models. We then performed a detailed analysis on the subset of mutations believed to exclusively impact protein stability. For those mutants, the predicted stability was calculated and correlated with the variant's expected activity. A high concordance was obtained when comparing our predictions with available in vitro residual activities and/or the patient's phenotype. The predicted stability and derived activity of all reported mutations and SNPs lacking functional assays (n = 108) were assessed. As expected, most of the SNPs (52/76) showed no biological implications. Moreover, this approach was applied to evaluate the putative synergy that could emerge when two mutations occurred in cis. In addition, we propose a putative pathogenic effect of five novel mutations, p.L107Q, p.L122R, p.R132H, p.P335L and p.H466fs, found in 21-hydroxylase deficient patients of our cohort.

  4. Structure-based activity prediction of CYP21A2 stability variants: A survey of available gene variations

    PubMed Central

    Bruque, Carlos D.; Delea, Marisol; Fernández, Cecilia S.; Orza, Juan V.; Taboas, Melisa; Buzzalino, Noemí; Espeche, Lucía D.; Solari, Andrea; Luccerini, Verónica; Alba, Liliana; Nadra, Alejandro D.; Dain, Liliana

    2016-01-01

    Congenital adrenal hyperplasia due to 21-hydroxylase deficiency accounts for 90–95% of CAH cases. In this work we performed an extensive survey of mutations and SNPs modifying the coding sequence of the CYP21A2 gene. Using bioinformatic tools and two plausible CYP21A2 structures as templates, we initially classified all known mutants (n = 343) according to their putative functional impacts, which were either reported in the literature or inferred from structural models. We then performed a detailed analysis on the subset of mutations believed to exclusively impact protein stability. For those mutants, the predicted stability was calculated and correlated with the variant’s expected activity. A high concordance was obtained when comparing our predictions with available in vitro residual activities and/or the patient’s phenotype. The predicted stability and derived activity of all reported mutations and SNPs lacking functional assays (n = 108) were assessed. As expected, most of the SNPs (52/76) showed no biological implications. Moreover, this approach was applied to evaluate the putative synergy that could emerge when two mutations occurred in cis. In addition, we propose a putative pathogenic effect of five novel mutations, p.L107Q, p.L122R, p.R132H, p.P335L and p.H466fs, found in 21-hydroxylase deficient patients of our cohort. PMID:27966633

  5. Genetic predictors of antipsychotic response to lurasidone identified in a genome wide association study and by schizophrenia risk genes.

    PubMed

    Li, Jiang; Yoshikawa, Akane; Brennan, Mark D; Ramsey, Timothy L; Meltzer, Herbert Y

    2018-02-01

    Biomarkers which predict response to atypical antipsychotic drugs (AAPDs) increases their benefit/risk ratio. We sought to identify common variants in genes which predict response to lurasidone, an AAPD, by associating genome-wide association study (GWAS) data and changes (Δ) in Positive And Negative Syndrome Scale (PANSS) scores from two 6-week randomized, placebo-controlled trials of lurasidone in schizophrenia (SCZ) patients. We also included SCZ risk SNPs identified by the Psychiatric Genomics Consortium using a polygenic risk analysis. The top genomic loci, with uncorrected p<10 -4 , include: 1) synaptic adhesion (PTPRD, LRRC4C, NRXN1, ILIRAPL1, SLITRK1) and scaffolding (MAGI1, MAGI2, NBEA) genes, both essential for synaptic function; 2) other synaptic plasticity-related genes (NRG1/3 and KALRN); 3) the neuron-specific RNA splicing regulator, RBFOX1; and 4) ion channel genes, e.g. KCNA10, KCNAB1, KCNK9 and CACNA2D3). Some genes predicted response for patients with both European and African Ancestries. We replicated some SNPs reported to predict response to other atypical APDs in other GWAS. Although none of the biomarkers reached genome-wide significance, many of the genes and associated pathways have previously been linked to SCZ. Two polygenic modeling approaches, GCTA-GREML and PLINK-Polygenic Risk Score, demonstrated that some risk genes related to neurodevelopment, synaptic biology, immune response, and histones, also contributed to prediction of response. The top hits predicting response to lurasidone did not predict improvement with placebo. This is the first evidence from clinical trials that SCZ risk SNPs are related to clinical response to an AAPD. These results need to be replicated in an independent sample. Copyright © 2017. Published by Elsevier B.V.

  6. Tissue expression and predicted protein structures of the bovine ANGPTL3 and association of novel SNPs with growth and meat quality traits.

    PubMed

    Chen, N B; Ma, Y; Yang, T; Lin, F; Fu, W W; Xu, Y J; Li, F; Li, J Y; Gao, S X

    2015-08-01

    Angiopoietin-like protein 3 (ANGPTL3) is a secreted protein that regulates lipid, glucose and energy metabolism. This study was conducted to better understand the effect of ANGPTL3 on important economic traits in cattle. First, transcript profiles for ANGPTL3 were measured in nine different Jiaxian cattle tissues. Second, polymorphisms were identified in the complete coding region and promoter region of the bovine ANGPTL3 gene in 707 cattle samples. Finally, an association study was carried out utilizing these single nucleotide polymorphisms (SNPs) to determine the effect of these SNPs on the growth and meat quality traits. Quantitative real-time PCR analysis showed that ANGPTL3 was mainly expressed in the liver. The promoter of the bovine ANGPTL3 contained several putative transcription factor binding sites (SF1, HNF-1, LXRα, NFκβ, HNF-3 and C/EBP). In total, four SNPs of the bovine ANGPTL3 gene were identified by direct sequencing. SNP1 (rs469906272: g.-38T>C) was identified in the promoter, SNP2 (rs451104723:g.104A>T) and SNP3 (rs482516226: g.509A>G) were identified in exon 1, and SNP4 (rs477165942: g.8661T>C) was identified in exon 6. Changes in predicted protein structures due to non-synonymous SNPs were analyzed. Haplotype frequencies and linkage disequilibrium were also investigated. Analysis of four SNPs in cattle from different native Chinese breeds (Nanyang (NY) and Jiaxian (JX)) and commercial breeds (Angus (AG), Hereford (HF), Limousin (LM), Luxi (LX), Simmental (ST) and Jinnan (JN)) revealed a significant association with growth traits (including: BW and hipbone width) and meat quality traits (including: Warner-Bratzler shear force and ribeye area). Therefore, implementation of these four mutations in selection indices in the beef industry may be beneficial in selecting individuals with superior growth and meat quality traits.

  7. Shared polymorphisms and modifiable behavior factors for myocardial infarction and high cholesterol in a retrospective population study

    PubMed Central

    Liang, Yulan; Kelemen, Arpad

    2017-01-01

    Abstract Genetic and environmental (behavior, clinical, and demographic) factors are associated with increased risks of both myocardial infarction (MI) and high cholesterol (HC). It is known that HC is major risk factor that may cause MI. However, whether there are common single nucleotide polymorphism (SNPs) associated with both MI and HC is not firmly established, and whether there are modulate and modified effects (interactions of genetic and known environmental factors) on either HC or MI, and whether these joint effects improve the predictions of MI, is understudied. The purpose of this study is to identify novel shared SNPs and modifiable environmental factors on MI and HC. We assess whether SNPs from a metabolic pathway related to MI may relate to HC; whether there are moderate effects among SNPs, lifestyle (smoke and drinking), HC, and MI after controlling other factors [gender, body mass index (BMI), and hypertension (HTN)]; and evaluate prediction power of the joint and modulate genetic and environmental factors influencing the MI and HC. This is a retrospective study with residents of Erie and Niagara counties in New York with a history of MI or with no history of MI. The data set includes environmental variables (demographic, clinical, lifestyle). Thirty-one tagSNPs from a metabolic pathway related to MI are genotyped. Generalized linear models (GLMs) with imputation-based analysis are conducted for examining the common effects of tagSNPs and environmental exposures and their interactions on having a history of HC or MI. MI, BMI, and HTN are significant risk factors for HC. HC shows the strongest effect on risk of MI in addition to HTN; gender and smoking status while drinking status shows protective effect on MI. rs16944 (gene IL-1β) and rs17222772 (gene ALOX) increase the risks of HC, while rs17231896 (gene CETP) has protective effects on HC either with or without the clinical, behavioral, demographic factors with different effect sizes that may indicate the existence of moderate or modifiable effects. Further analysis with the inclusions of gene–gene and gene–environmental interactions shows interactions between rs17231896 (CETP) and rs17222772 (ALOX); rs17231896 (CETP) and gender. rs17237890 (CETP) and rs2070744 (NOS3) are found to be significantly associated with risks of MI adjusted by both SNPs and environmental factors. After multiple testing adjustments, these effects diminished as expected. In addition, an interaction between drinking and smoking status is significant. Overall, the prediction power in successfully classifying MI status is increased to 80% with inclusions of all significant tagSNPs and environmental factors and their interactions compared with environmental factors only (72%). Having a history of either HC or MI has significant effects on each other in both directions, in addition to HTN and gender. Genes/SNPs identified from this analysis that are associated with HC may be potentially linked to MI, which could be further examined and validated through haplotype-pairs analysis with appropriate population stratification corrections, and function/pathway regulation analysis to eliminate the limitations of the current analysis. PMID:28906356

  8. Shared polymorphisms and modifiable behavior factors for myocardial infarction and high cholesterol in a retrospective population study.

    PubMed

    Liang, Yulan; Kelemen, Arpad

    2017-09-01

    Genetic and environmental (behavior, clinical, and demographic) factors are associated with increased risks of both myocardial infarction (MI) and high cholesterol (HC). It is known that HC is major risk factor that may cause MI. However, whether there are common single nucleotide polymorphism (SNPs) associated with both MI and HC is not firmly established, and whether there are modulate and modified effects (interactions of genetic and known environmental factors) on either HC or MI, and whether these joint effects improve the predictions of MI, is understudied.The purpose of this study is to identify novel shared SNPs and modifiable environmental factors on MI and HC. We assess whether SNPs from a metabolic pathway related to MI may relate to HC; whether there are moderate effects among SNPs, lifestyle (smoke and drinking), HC, and MI after controlling other factors [gender, body mass index (BMI), and hypertension (HTN)]; and evaluate prediction power of the joint and modulate genetic and environmental factors influencing the MI and HC.This is a retrospective study with residents of Erie and Niagara counties in New York with a history of MI or with no history of MI. The data set includes environmental variables (demographic, clinical, lifestyle). Thirty-one tagSNPs from a metabolic pathway related to MI are genotyped. Generalized linear models (GLMs) with imputation-based analysis are conducted for examining the common effects of tagSNPs and environmental exposures and their interactions on having a history of HC or MI.MI, BMI, and HTN are significant risk factors for HC. HC shows the strongest effect on risk of MI in addition to HTN; gender and smoking status while drinking status shows protective effect on MI. rs16944 (gene IL-1β) and rs17222772 (gene ALOX) increase the risks of HC, while rs17231896 (gene CETP) has protective effects on HC either with or without the clinical, behavioral, demographic factors with different effect sizes that may indicate the existence of moderate or modifiable effects. Further analysis with the inclusions of gene-gene and gene-environmental interactions shows interactions between rs17231896 (CETP) and rs17222772 (ALOX); rs17231896 (CETP) and gender. rs17237890 (CETP) and rs2070744 (NOS3) are found to be significantly associated with risks of MI adjusted by both SNPs and environmental factors. After multiple testing adjustments, these effects diminished as expected. In addition, an interaction between drinking and smoking status is significant. Overall, the prediction power in successfully classifying MI status is increased to 80% with inclusions of all significant tagSNPs and environmental factors and their interactions compared with environmental factors only (72%).Having a history of either HC or MI has significant effects on each other in both directions, in addition to HTN and gender. Genes/SNPs identified from this analysis that are associated with HC may be potentially linked to MI, which could be further examined and validated through haplotype-pairs analysis with appropriate population stratification corrections, and function/pathway regulation analysis to eliminate the limitations of the current analysis.

  9. Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals.

    PubMed

    Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T G; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D; Sollid, Johanna U Ericson

    2014-11-01

    Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.

  10. Whole-genome sequencing reveals clonal expansion of multiresistant Staphylococcus haemolyticus in European hospitals

    PubMed Central

    Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T. G.; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D.; Sollid, Johanna U. Ericson

    2014-01-01

    Objectives Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Methods Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. Results SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A–G), of which four (A–D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. Conclusions The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. PMID:25038069

  11. Dissimilarity based Partial Least Squares (DPLS) for genomic prediction from SNPs.

    PubMed

    Singh, Priyanka; Engel, Jasper; Jansen, Jeroen; de Haan, Jorn; Buydens, Lutgarde Maria Celina

    2016-05-04

    Genomic prediction (GP) allows breeders to select plants and animals based on their breeding potential for desirable traits, without lengthy and expensive field trials or progeny testing. We have proposed to use Dissimilarity-based Partial Least Squares (DPLS) for GP. As a case study, we use the DPLS approach to predict Bacterial wilt (BW) in tomatoes using SNPs as predictors. The DPLS approach was compared with the Genomic Best-Linear Unbiased Prediction (GBLUP) and single-SNP regression with SNP as a fixed effect to assess the performance of DPLS. Eight genomic distance measures were used to quantify relationships between the tomato accessions from the SNPs. Subsequently, each of these distance measures was used to predict the BW using the DPLS prediction model. The DPLS model was found to be robust to the choice of distance measures; similar prediction performances were obtained for each distance measure. DPLS greatly outperformed the single-SNP regression approach, showing that BW is a comprehensive trait dependent on several loci. Next, the performance of the DPLS model was compared to that of GBLUP. Although GBLUP and DPLS are conceptually very different, the prediction quality (PQ) measured by DPLS models were similar to the prediction statistics obtained from GBLUP. A considerable advantage of DPLS is that the genotype-phenotype relationship can easily be visualized in a 2-D scatter plot. This so-called score-plot provides breeders an insight to select candidates for their future breeding program. DPLS is a highly appropriate method for GP. The model prediction performance was similar to the GBLUP and far better than the single-SNP approach. The proposed method can be used in combination with a wide range of genomic dissimilarity measures and genotype representations such as allele-count, haplotypes or allele-intensity values. Additionally, the data can be insightfully visualized by the DPLS model, allowing for selection of desirable candidates from the breeding experiments. In this study, we have assessed the DPLS performance on a single trait.

  12. Development of a dense SNP-based linkage map of an apple rootstock progeny using the Malus Infinium whole genome genotyping array

    PubMed Central

    2012-01-01

    Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the ‘Golden Delicious’ reference sequence will assist in the continued improvement of the genome sequence assembly for that variety. PMID:22631220

  13. Integrative Bayesian variable selection with gene-based informative priors for genome-wide association studies.

    PubMed

    Zhang, Xiaoshuai; Xue, Fuzhong; Liu, Hong; Zhu, Dianwen; Peng, Bin; Wiemels, Joseph L; Yang, Xiaowei

    2014-12-10

    Genome-wide Association Studies (GWAS) are typically designed to identify phenotype-associated single nucleotide polymorphisms (SNPs) individually using univariate analysis methods. Though providing valuable insights into genetic risks of common diseases, the genetic variants identified by GWAS generally account for only a small proportion of the total heritability for complex diseases. To solve this "missing heritability" problem, we implemented a strategy called integrative Bayesian Variable Selection (iBVS), which is based on a hierarchical model that incorporates an informative prior by considering the gene interrelationship as a network. It was applied here to both simulated and real data sets. Simulation studies indicated that the iBVS method was advantageous in its performance with highest AUC in both variable selection and outcome prediction, when compared to Stepwise and LASSO based strategies. In an analysis of a leprosy case-control study, iBVS selected 94 SNPs as predictors, while LASSO selected 100 SNPs. The Stepwise regression yielded a more parsimonious model with only 3 SNPs. The prediction results demonstrated that the iBVS method had comparable performance with that of LASSO, but better than Stepwise strategies. The proposed iBVS strategy is a novel and valid method for Genome-wide Association Studies, with the additional advantage in that it produces more interpretable posterior probabilities for each variable unlike LASSO and other penalized regression methods.

  14. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

    PubMed

    Holliday, Jason A; Wang, Tongli; Aitken, Sally

    2012-09-01

    Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

  15. Performance of genetic risk factors in prediction of trichloroethylene induced hypersensitivity syndrome

    PubMed Central

    Dai, Yufei; Chen, Ying; Huang, Hanlin; Zhou, Wei; Niu, Yong; Zhang, Mingrong; Bin, Ping; Dong, Haiyan; Jia, Qiang; Huang, Jianxun; Yi, Juan; Liao, Qijun; Li, Haishan; Teng, Yanxia; Zang, Dan; Zhai, Qingfeng; Duan, Huawei; Shen, Juan; He, Jiaxi; Meng, Tao; Sha, Yan; Shen, Meili; Ye, Meng; Jia, Xiaowei; Xiang, Yingping; Huang, Huiping; Wu, Qifeng; Shi, Mingming; Huang, Xianqing; Yang, Huanming; Luo, Longhai; Li, Sai; Li, Lin; Zhao, Jinyang; Li, Laiyu; Wang, Jun; Zheng, Yuxin

    2015-01-01

    Trichloroethylene induced hypersensitivity syndrome is dose-independent and potentially life threatening disease, which has become one of the serious occupational health issues and requires intensive treatment. To discover the genetic risk factors and evaluate the performance of risk prediction model for the disease, we conducted genomewide association study and replication study with total of 174 cases and 1761 trichloroethylene-tolerant controls. Fifty seven SNPs that exceeded the threshold for genome-wide significance (P < 5 × 10−8) were screened to relate with the disease, among which two independent SNPs were identified, that is rs2857281 at MICA (odds ratio, 11.92; Pmeta = 1.33 × 10−37) and rs2523557 between HLA-B and MICA (odds ratio, 7.33; Pmeta = 8.79 × 10−35). The genetic risk score with these two SNPs explains at least 20.9% of the disease variance and up to 32.5-fold variation in inter-individual risk. Combining of two SNPs as predictors for the disease would have accuracy of 80.73%, the area under receiver operator characteristic curves (AUC) scores was 0.82 with sensitivity of 74% and specificity of 85%, which was considered to have excellent discrimination for the disease, and could be considered for translational application for screening employees before exposure. PMID:26190474

  16. De-Novo Assembly and Analysis of the Heterozygous Triploid Genome of the Wine Spoilage Yeast Dekkera bruxellensis AWRI1499

    PubMed Central

    Chambers, Paul J.; Pretorius, Isak S.

    2012-01-01

    Despite its industrial importance, the yeast species Dekkera (Brettanomyces) bruxellensis has remained poorly understood at the genetic level. In this study we describe whole genome sequencing and analysis for a prevalent wine spoilage strain, AWRI1499. The 12.7 Mb assembly, consisting of 324 contigs in 99 scaffolds (super-contigs) at 26-fold coverage, exhibits a relatively high density of single nucleotide polymorphisms (SNPs). Haplotype sampling for 1.2% of open reading frames suggested that the D. bruxellensis AWRI1499 genome is comprised of a moderately heterozygous diploid genome, in combination with a divergent haploid genome. Gene content analysis revealed enrichment in membrane proteins, particularly transporters, along with oxidoreductase enzymes. Availability of this assembly and annotation provides a resource for further investigation of genomic organization in this species, and functional characterization of genes that may confer important phenotypic traits. PMID:22470482

  17. Integrating in silico prediction methods, molecular docking, and molecular dynamics simulation to predict the impact of ALK missense mutations in structural perspective.

    PubMed

    Doss, C George Priya; Chakraborty, Chiranjib; Chen, Luonan; Zhu, Hailong

    2014-01-01

    Over the past decade, advancements in next generation sequencing technology have placed personalized genomic medicine upon horizon. Understanding the likelihood of disease causing mutations in complex diseases as pathogenic or neutral remains as a major task and even impossible in the structural context because of its time consuming and expensive experiments. Among the various diseases causing mutations, single nucleotide polymorphisms (SNPs) play a vital role in defining individual's susceptibility to disease and drug response. Understanding the genotype-phenotype relationship through SNPs is the first and most important step in drug research and development. Detailed understanding of the effect of SNPs on patient drug response is a key factor in the establishment of personalized medicine. In this paper, we represent a computational pipeline in anaplastic lymphoma kinase (ALK) for SNP-centred study by the application of in silico prediction methods, molecular docking, and molecular dynamics simulation approaches. Combination of computational methods provides a way in understanding the impact of deleterious mutations in altering the protein drug targets and eventually leading to variable patient's drug response. We hope this rapid and cost effective pipeline will also serve as a bridge to connect the clinicians and in silico resources in tailoring treatments to the patients' specific genotype.

  18. Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach.

    PubMed

    Nagasundaram, N; Priya Doss, C George

    2011-01-01

    Distinguishing the deleterious from the massive number of non-functional nsSNPs that occur within a single genome is a considerable challenge in mutation research. In this approach, we have used the existing in silico methods to explore the mutation-structure-function relationship in the XPAgene. We used the Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), I-Mutant 2.0, and the Protein Analysis THrough Evolutionary Relationships methods to predict the effects of deleterious nsSNPs on protein function and evaluated the impact of mutation on protein stability by Molecular Dynamics simulations. By comparing the scores of all the four in silico methods, nsSNP with an ID rs104894131 at position C108F was predicted to be highly deleterious. We extended our Molecular dynamics approach to gain insight into the impact of this non-synonymous polymorphism on structural changes that may affect the activity of the XPAgene. Based on the in silico methods score, potential energy, root-mean-square deviation, and root-mean-square fluctuation, we predict that deleterious nsSNP at position C108F would play a significant role in causing disease by the XPA gene. Our approach would present the application of in silicotools in understanding the functional variation from the perspective of structure, evolution, and phenotype.

  19. TP53 and MDM2 single nucleotide polymorphisms influence survival in non-del(5q) myelodysplastic syndromes

    PubMed Central

    Sallman, David A.; Basiorka, Ashley A.; Irvine, Brittany A.; Zhang, Ling; Epling-Burnette, P.K.; Rollison, Dana E.; Mallo, Mar; Sokol, Lubomir; Solé, Francesc; Maciejewski, Jaroslaw; List, Alan F.

    2015-01-01

    P53 is a key regulator of many cellular processes and is negatively regulated by the human homolog of murine double minute-2 (MDM2) E3 ubiquitin ligase. Single nucleotide polymorphisms (SNPs) of either gene alone, and in combination, are linked to cancer susceptibility, disease progression, and therapy response. We analyzed the interaction of TP53 R72P and MDM2 SNP309 SNPs in relationship to outcome in patients with myelodysplastic syndromes (MDS). Sanger sequencing was performed on DNA isolated from 208 MDS cases. Utilizing a novel functional SNP scoring system ranging from +2 to −2 based on predicted p53 activity, we found statistically significant differences in overall survival (OS) (p = 0.02) and progression-free survival (PFS) (p = 0.02) in non-del(5q) MDS patients with low functional scores. In univariate analysis, only IPSS and the functional SNP score predicted OS and PFS in non-del(5q) patients. In multivariate analysis, the functional SNP score was independent of IPSS for OS and PFS. These data underscore the importance of TP53 R72P and MDM2 SNP309 SNPs in MDS, and provide a novel scoring system independent of IPSS that is predictive for disease outcome. PMID:26416416

  20. Common variants in genes encoding adiponectin (ADIPOQ) and its receptors (ADIPOR1/2), adiponectin concentrations, and diabetes incidence in the Diabetes Prevention Program

    PubMed Central

    Mather, K. J.; Christophi, C. A.; Jablonski, K. A.; Knowler, W. C.; Goldberg, R. B.; Kahn, S. E.; Spector, T.; Dastani, Z.; Waterworth, D.; Richards, J. B.; Funahashi, T.; Pi-Sunyer, F. X.; Pollin, T. I.; Florez, J. C.; Franks, P. W.

    2012-01-01

    Aims Baseline adiponectin concentrations predict incident Type 2 diabetes mellitus in the Diabetes Prevention Program. We tested the hypothesis that common variants in the genes encoding adiponectin (ADIPOQ) and its receptors (ADIPOR1, ADIPOR2) would associate with circulating adiponectin concentrations and/or with diabetes incidence in the Diabetes Prevention Program population. Methods Seventy-seven tagging single-nucleotide polymorphisms (SNPs) in ADIPOQ (24), ADIPOR1 (22) and ADIPOR2 (31) were genotyped. Associations of SNPs with baseline adiponectin concentrations were evaluated using linear modelling. Associations of SNPs with diabetes incidence were evaluated using Cox proportional hazards modelling. Results Thirteen of 24 ADIPOQ SNPs were significantly associated with baseline adiponectin concentrations. Multivariable analysis including these 13 SNPs revealed strong independent contributions from rs17366568, rs1648707, rs17373414 and rs1403696 with adiponectin concentrations. However, no ADIPOQ SNPs were directly associated with diabetes incidence. Two ADIPOR1 SNPs (rs1342387 and rs12733285) were associated with ~18% increased diabetes incidence for carriers of the minor allele without differences across treatment groups, and without any relationship with adiponectin concentrations. Conclusions ADIPOQ SNPs are significantly associated with adiponectin concentrations in the Diabetes Prevention Program cohort. This observation extends prior observations from unselected populations of European descent into a broader multi-ethnic population, and confirms the relevance of these variants in an obese/dysglycaemic population. Despite the robust relationship between adiponectin concentrations and diabetes risk in this cohort, variants in ADIPOQ that relate to adiponectin concentrations do not relate to diabetes risk in this population. ADIPOR1 variants exerted significant effects on diabetes risk distinct from any effect of adiponectin concentrations. [Clinical Trials Registry Nos; NCT 00004992 (Diabetes Prevention Program) and NCT 00038727 (Diabetes Prevention Program Outcomes Study)] PMID:22443353

  1. Novel Applications of Multi-task Learning and Multiple Output Regression to Multiple Genetic Trait Prediction

    USDA-ARS?s Scientific Manuscript database

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait predicti...

  2. A Larger Chocolate Chip—Development of a 15K Theobroma cacao L. SNP Array to Create High-Density Linkage Maps

    PubMed Central

    Livingstone, Donald; Stack, Conrad; Mustiga, Guiliana M.; Rodezno, Dayana C.; Suarez, Carmen; Amores, Freddy; Feltus, Frank A.; Mockaitis, Keithanne; Cornejo, Omar E.; Motamayor, Juan C.

    2017-01-01

    Cacao (Theobroma cacao L.) is an important cash crop in tropical regions around the world and has a rich agronomic history in South America. As a key component in the cosmetic and confectionary industries, millions of people worldwide use products made from cacao, ranging from shampoo to chocolate. An Illumina Infinity II array was created using 13,530 SNPs identified within a small diversity panel of cacao. Of these SNPs, 12,643 derive from variation within annotated cacao genes. The genotypes of 3,072 trees were obtained, including two mapping populations from Ecuador. High-density linkage maps for these two populations were generated and compared to the cacao genome assembly. Phenotypic data from these populations were combined with the linkage maps to identify the QTLs for yield and disease resistance. PMID:29259608

  3. A SNP panel and online tool for checking genotype concordance through comparing QR codes.

    PubMed

    Du, Yonghong; Martin, Joshua S; McGee, John; Yang, Yuchen; Liu, Eric Yi; Sun, Yingrui; Geihs, Matthias; Kong, Xuejun; Zhou, Eric Lingfeng; Li, Yun; Huang, Jie

    2017-01-01

    In the current precision medicine era, more and more samples get genotyped and sequenced. Both researchers and commercial companies expend significant time and resources to reduce the error rate. However, it has been reported that there is a sample mix-up rate of between 0.1% and 1%, not to mention the possibly higher mix-up rate during the down-stream genetic reporting processes. Even on the low end of this estimate, this translates to a significant number of mislabeled samples, especially over the projected one billion people that will be sequenced within the next decade. Here, we first describe a method to identify a small set of Single nucleotide polymorphisms (SNPs) that can uniquely identify a personal genome, which utilizes allele frequencies of five major continental populations reported in the 1000 genomes project and the ExAC Consortium. To make this panel more informative, we added four SNPs that are commonly used to predict ABO blood type, and another two SNPs that are capable of predicting sex. We then implement a web interface (http://qrcme.tech), nicknamed QRC (for QR code based Concordance check), which is capable of extracting the relevant ID SNPs from a raw genetic data, coding its genotype as a quick response (QR) code, and comparing QR codes to report the concordance of underlying genetic datasets. The resulting 80 fingerprinting SNPs represent a significant decrease in complexity and the number of markers used for genetic data labelling and tracking. Our method and web tool is easily accessible to both researchers and the general public who consider the accuracy of complex genetic data as a prerequisite towards precision medicine.

  4. A SNP panel and online tool for checking genotype concordance through comparing QR codes

    PubMed Central

    Du, Yonghong; Martin, Joshua S.; McGee, John; Yang, Yuchen; Liu, Eric Yi; Sun, Yingrui; Geihs, Matthias; Kong, Xuejun; Zhou, Eric Lingfeng; Li, Yun

    2017-01-01

    In the current precision medicine era, more and more samples get genotyped and sequenced. Both researchers and commercial companies expend significant time and resources to reduce the error rate. However, it has been reported that there is a sample mix-up rate of between 0.1% and 1%, not to mention the possibly higher mix-up rate during the down-stream genetic reporting processes. Even on the low end of this estimate, this translates to a significant number of mislabeled samples, especially over the projected one billion people that will be sequenced within the next decade. Here, we first describe a method to identify a small set of Single nucleotide polymorphisms (SNPs) that can uniquely identify a personal genome, which utilizes allele frequencies of five major continental populations reported in the 1000 genomes project and the ExAC Consortium. To make this panel more informative, we added four SNPs that are commonly used to predict ABO blood type, and another two SNPs that are capable of predicting sex. We then implement a web interface (http://qrcme.tech), nicknamed QRC (for QR code based Concordance check), which is capable of extracting the relevant ID SNPs from a raw genetic data, coding its genotype as a quick response (QR) code, and comparing QR codes to report the concordance of underlying genetic datasets. The resulting 80 fingerprinting SNPs represent a significant decrease in complexity and the number of markers used for genetic data labelling and tracking. Our method and web tool is easily accessible to both researchers and the general public who consider the accuracy of complex genetic data as a prerequisite towards precision medicine. PMID:28926565

  5. Fine-Scale Variation and Genetic Determinants of Alternative Splicing across Individuals

    PubMed Central

    Coulombe-Huntington, Jasmin; Lam, Kevin C. L.; Dias, Christel; Majewski, Jacek

    2009-01-01

    Recently, thanks to the increasing throughput of new technologies, we have begun to explore the full extent of alternative pre–mRNA splicing (AS) in the human transcriptome. This is unveiling a vast layer of complexity in isoform-level expression differences between individuals. We used previously published splicing sensitive microarray data from lymphoblastoid cell lines to conduct an in-depth analysis on splicing efficiency of known and predicted exons. By combining publicly available AS annotation with a novel algorithm designed to search for AS, we show that many real AS events can be detected within the usually unexploited, speculative majority of the array and at significance levels much below standard multiple-testing thresholds, demonstrating that the extent of cis-regulated differential splicing between individuals is potentially far greater than previously reported. Specifically, many genes show subtle but significant genetically controlled differences in splice-site usage. PCR validation shows that 42 out of 58 (72%) candidate gene regions undergo detectable AS, amounting to the largest scale validation of isoform eQTLs to date. Targeted sequencing revealed a likely causative SNP in most validated cases. In all 17 incidences where a SNP affected a splice-site region, in silico splice-site strength modeling correctly predicted the direction of the micro-array and PCR results. In 13 other cases, we identified likely causative SNPs disrupting predicted splicing enhancers. Using Fst and REHH analysis, we uncovered significant evidence that 2 putative causative SNPs have undergone recent positive selection. We verified the effect of five SNPs using in vivo minigene assays. This study shows that splicing differences between individuals, including quantitative differences in isoform ratios, are frequent in human populations and that causative SNPs can be identified using in silico predictions. Several cases affected disease-relevant genes and it is likely some of these differences are involved in phenotypic diversity and susceptibility to complex diseases. PMID:20011102

  6. SNPdbe: constructing an nsSNP functional impacts database.

    PubMed

    Schaefer, Christian; Meier, Alice; Rost, Burkhard; Bromberg, Yana

    2012-02-15

    Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe-SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; 'human' being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. http://www.rostlab.org/services/snpdbe.

  7. Racial/Ethnic Disparities in Inflammatory Gene SNPs as Predictors of High Risk for Symptom Burden in Patients with Multiple Myeloma 1 Year Post-Diagnosis

    PubMed Central

    Shi, Qiuling; Wang, Xin Shelley; Li, Guojun; Shah, Nina D.; Orlowski, Robert Z.; Williams, Loretta A.; Mendoza, Tito R.; Cleeland, Charles S.

    2015-01-01

    Background We conducted a study to determine whether any regulatory single-nucleotide polymorphism (SNP) in an inflammatory gene was associated with high symptom burden in patients 1 year after diagnosis with multiple myeloma (MM). Methods MM patients rated symptoms using the MD Anderson Symptom Inventory (MDASI)-MM module and provided buccal-swab DNA samples. SNPs for 4 cytokine genes (IL6 –174G>C, IL1β –511C>T, TNFα –308G>A, IL10 –1082G>A) were tested. Logistic regression models were used to identify SNPs that might predict moderate/severe symptoms (rated ≥4 on the MDASI-MM’s 0–10 scale). To evaluate the relationship between SNPs and overall symptom burden, we used 2-step cluster analysis to divide patients into subgroups with high or low symptom levels. Results Of the 344 patients enrolled, 41% had high overall symptom burden. The most prevalent moderate/severe symptoms were fatigue (47%), pain (42%), numbness (38%), and bone aches (32%). For non-Hispanic whites, the IL1β –511 CC genotype was associated with high overall symptom burden (OR, 2.35; 95% CI, 1.25–4.72; P = .004), while IL6 –174 GG genotype predicted less moderate/severe fatigue (OR, 0.53; 95% CI, 0.29–0.88; P = .013). For other patients, IL6 –174 GG genotype predicted moderate/severe pain (OR, 3.36; 95% CI 1.23–13.64; P = .010). Conclusions Our results support growing evidence that inflammation is associated with cancer-related symptoms and suggest that racial/ethnic factors contribute to this association. PMID:25469832

  8. Genetic polymorphisms associated with smoking behaviour predict the risk of surgery in patients with Crohn's disease.

    PubMed

    Lang, B M; Biedermann, L; van Haaften, W T; de Vallière, C; Schuurmans, M; Begré, S; Zeitz, J; Scharl, M; Turina, M; Greuter, T; Schreiner, P; Heinrich, H; Kuntzen, T; Vavricka, S R; Rogler, G; Beerenwinkel, N; Misselwitz, B

    2018-01-01

    Smoking is a strong environmental factor leading to adverse outcomes in Crohn's disease, but a more benign course in ulcerative colitis. Several single nucleotide polymorphisms (SNPs) are associated with smoking quantity and behaviour. To assess whether smoking-associated SNPs interact with smoking to influence the clinical course of inflammatory bowel diseases. Genetic and prospectively obtained clinical data from 1434 Swiss inflammatory bowel disease cohort patients (821 Crohn's disease and 613 ulcerative colitis) were analysed. Six SNPs associated with smoking quantity and behaviour (rs588765, rs1051730, rs1329650, rs4105144, rs6474412 and rs3733829) were combined to form a risk score (range: 0-12) by adding the number of risk alleles. We calculated multivariate models for smoking, risk of surgery, fistula, Crohn's disease location and ulcerative colitis disease extent. In Crohn's disease patients who smoke, the number of surgeries was associated with the genetic risk score. This translates to a predicted 3.5-fold (95% confidence interval: 2.4- to 5.7-fold, P<.0001) higher number of surgical procedures in smokers with 12 risk alleles than individuals with the lowest risk. Patients with a risk score >7 had a significantly shorter time to first intestinal surgery. The genetic risk score did not predict surgery in ulcerative colitis or occurrence of fistulae in Crohn's disease. SNP rs6265 was associated with ileal disease in Crohn's disease (P<.05) and proctitis in ulcerative colitis (P<.05). SNPs associated with smoking quantity is associated with an increased risk for surgery in Crohn's disease patients who smoke. Our data provide an example of genetics interacting with the environment to influence the disease course of inflammatory bowel disease. © 2017 John Wiley & Sons Ltd.

  9. Genome-wide prediction of childhood asthma and related phenotypes in a longitudinal birth cohort

    PubMed Central

    Spycher, Ben D.; Henderson, John; Granell, Raquel; Evans, David M.; Smith, George Davey; Timpson, Nicholas J.; Sterne, Jonathan A. C.

    2016-01-01

    Background Childhood wheezing and asthma vary greatly in clinical presentation and time course. The extent to which phenotypic variation reflects heterogeneity in disease pathways is unclear. Objective To assess the extent to which single nucleotide polymorphisms (SNPs) associated with childhood asthma in a genome-wide association study are predictive of asthma-related phenotypes. Methods In 8365 children from a population based birth cohort, the Avon Longitudinal Study of Parents and Children, allelic scores were derived based on between 10 and 215,443 SNPs ranked according to inverse of the p-value for their association with physician diagnosed asthma in an independent genome-wide association study (6176 cases and 7111 controls). We assessed the predictive value of allelic scores for asthma-related outcomes at age 7-9 years (physician’s diagnosis, longitudinal wheezing phenotypes, and measurements of pulmonary function, bronchial responsiveness and atopy). Results Scores based on the 46 highest-ranked SNPs were associated with the symptom-based phenotypes persistent (P<10-11, area under ROC curve (AUC)=0.59) and intermediate onset (P<10-3, AUC=0.58) wheeze. Among lower-ranked SNPs (ranks 21,545-46,416), there was evidence for associations with diagnosed asthma (P<10-4, AUC=0.54) and atopy (P<10-5, AUC=0.55). We found little evidence of associations with transient early wheezing, reduced pulmonary function or non-asthma phenotypes. Conclusion The genetic origins of asthma are diverse and: some pathways are specific to wheezing syndromes while others are shared with atopy and bronchial hyper-responsiveness. Out study also provides evidence of aetiological differences among wheezing syndromes. PMID:22846752

  10. Further evidence for population specific differences in the effect of DNA markers and gender on eye colour prediction in forensics.

    PubMed

    Pośpiech, Ewelina; Karłowska-Pik, Joanna; Ziemkiewicz, Bartosz; Kukla, Magdalena; Skowron, Małgorzata; Wojas-Pelc, Anna; Branicki, Wojciech

    2016-07-01

    The genetics of eye colour has been extensively studied over the past few years, and the identified polymorphisms have been applied with marked success in the field of Forensic DNA Phenotyping. A picture that arises from evaluation of the currently available eye colour prediction markers shows that only the analysis of HERC2-OCA2 complex has similar effectiveness in different populations, while the predictive potential of other loci may vary significantly. Moreover, the role of gender in the explanation of human eye colour variation should not be neglected in some populations. In the present study, we re-investigated the data for 1020 Polish individuals and using neural networks and logistic regression methods explored predictive capacity of IrisPlex SNPs and gender in this population sample. In general, neural networks provided higher prediction accuracy comparing to logistic regression (AUC increase by 0.02-0.06). Four out of six IrisPlex SNPs were associated with eye colour in the studied population. HERC2 rs12913832, OCA2 rs1800407 and SLC24A4 rs12896399 were found to be the most important eye colour predictors (p < 0.007) while the effect of rs16891982 in SLC45A2 was less significant. Gender was found to be significantly associated with eye colour with males having ~1.5 higher odds for blue eye colour comparing to females (p = 0.002) and was ranked as the third most important factor in blue/non-blue eye colour determination. However, the implementation of gender into the developed prediction models had marginal and ambiguous impact on the overall accuracy of prediction confirming that the effect of gender on eye colour in this population is small. Our study indicated the advantage of neural networks in prediction modeling in forensics and provided additional evidence for population specific differences in the predictive importance of the IrisPlex SNPs and gender.

  11. Replicability and Robustness of GWAS for Behavioral Traits

    PubMed Central

    Rietveld, Cornelius A.; Conley, Dalton; Eriksson, Nicholas; Esko, Tõnu; Medland, Sarah E.; Vinkhuyzen, Anna A.E.; Yang, Jian; Boardman, Jason D.; Chabris, Christopher F.; Dawes, Christopher T.; Domingue, Benjamin W.; Hinds, David A.; Johannesson, Magnus; Kiefer, Amy K.; Laibson, David; Magnusson, Patrik K. E.; Mountain, Joanna L.; Oskarsson, Sven; Rostapshova, Olga; Teumer, Alexander; Tung, Joyce Y.; Visscher, Peter M.; Benjamin, Daniel J.; Cesarini, David; Koellinger, Philipp D.

    2015-01-01

    A recent genome-wide association study (GWAS) of educational attainment identified three single-nucleotide polymorphisms (SNPs) that, despite their small effect sizes (each R2 ≈ 0.02%), reached genome-wide significance (p < 5×10−8) in a large discovery sample and replicated in an independent sample (p < 0.05). The study also reported associations between educational attainment and indices of SNPs called “polygenic scores.” We evaluate the robustness of these findings. Study 1 finds that all three SNPs replicate in another large (N = 34,428) independent sample. We also find that the scores remain predictive (R2 ≈ 2%) with stringent controls for stratification (Study 2) and in new within-family analyses (Study 3). Our results show that large and therefore well-powered GWASs can identify replicable genetic associations with behavioral traits. The small effect sizes of individual SNPs are likely to be a major contributing explanation for the striking contrast between our results and the disappointing replication record of most candidate gene studies. PMID:25287667

  12. Identification of SNPs associated with variola virus virulence.

    PubMed

    Hoen, Anne Gatewood; Gardner, Shea N; Moore, Jason H

    2013-02-14

    Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity.

  13. A functional SNP catalog of overlapping miRNA-binding sites in genes implicated in prion disease and other neurodegenerative disorders.

    PubMed

    Saba, Reuben; Medina, Sarah J; Booth, Stephanie A

    2014-10-01

    The involvement of SNPs in miRNA target sites remains poorly investigated in neurodegenerative disease. In addition to associations with disease risk, such genetic variations can also provide novel insight into mechanistic pathways that may be responsible for disease etiology and/or pathobiology. To identify SNPs associated specifically with degenerating neurons, we restricted our analysis to genes that are dysregulated in CA1 hippocampal neurons of mice during early, preclinical phase of Prion disease. The 125 genes chosen are also implicated in other numerous degenerative and neurological diseases and disorders and are therefore likely to be of fundamental importance. We predicted those SNPs that could increase, decrease, or have neutral effects on miRNA binding. This group of genes was more likely to possess DNA variants than were genes chosen at random. Furthermore, many of the SNPs are common within the human population, and could contribute to the growing awareness that miRNAs and associated SNPs could account for detrimental neurological states. Interestingly, SNPs that overlapped miRNA-binding sites in the 3'-UTR of GABA-receptor subunit coding genes were particularly enriched. Moreover, we demonstrated that SNP rs9291296 would strengthen miR-26a-5p binding to a highly conserved site in the 3'-UTR of gamma-aminobutyric acid receptor subunit alpha-4. © 2014 WILEY PERIODICALS, INC.

  14. A small number of candidate gene SNPs reveal continental ancestry in African Americans

    PubMed Central

    KODAMAN, NURI; ALDRICH, MELINDA C.; SMITH, JEFFREY R.; SIGNORELLO, LISA B.; BRADLEY, KEVIN; BREYER, JOAN; COHEN, SARAH S.; LONG, JIRONG; CAI, QIUYIN; GILES, JUSTIN; BUSH, WILLIAM S.; BLOT, WILLIAM J.; MATTHEWS, CHARLES E.; WILLIAMS, SCOTT M.

    2013-01-01

    SUMMARY Using genetic data from an obesity candidate gene study of self-reported African Americans and European Americans, we investigated the number of Ancestry Informative Markers (AIMs) and candidate gene SNPs necessary to infer continental ancestry. Proportions of African and European ancestry were assessed with STRUCTURE (K=2), using 276 AIMs. These reference values were compared to estimates derived using 120, 60, 30, and 15 SNP subsets randomly chosen from the 276 AIMs and from 1144 SNPs in 44 candidate genes. All subsets generated estimates of ancestry consistent with the reference estimates, with mean correlations greater than 0.99 for all subsets of AIMs, and mean correlations of 0.99±0.003; 0.98± 0.01; 0.93±0.03; and 0.81± 0.11 for subsets of 120, 60, 30, and 15 candidate gene SNPs, respectively. Among African Americans, the median absolute difference from reference African ancestry values ranged from 0.01 to 0.03 for the four AIMs subsets and from 0.03 to 0.09 for the four candidate gene SNP subsets. Furthermore, YRI/CEU Fst values provided a metric to predict the performance of candidate gene SNPs. Our results demonstrate that a small number of SNPs randomly selected from candidate genes can be used to estimate admixture proportions in African Americans reliably. PMID:23278390

  15. Effect of polymorphisms in the CSN3 (κ-casein) gene on milk production traits in Chinese Holstein Cattle.

    PubMed

    Alim, M A; Dong, T; Xie, Y; Wu, X P; Zhang, Yi; Zhang, Shengli; Sun, D X

    2014-11-01

    This study was designed to evaluate significant associations between single nucleotide polymorphisms (SNPs) and milk composition and milk production traits in Chinese Holstein cows. Six SNPs were identified in the κ-casein gene using pooled DNA sequencing. The identified SNPs were genotyped by Matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) methods from 507 individuals. Out of six, we identified three non-synonymous SNPs (g.10888T>C, g.10924C>A and g.10944A>G) that changed in the protein product. SIFT (Sorting_Intolerant_From_Tolerant) prediction score (0.01) demonstrated that protein changed Isoleucine > Threonine (g.10888T>C) will affect the phenotypes. Significant associations between identified SNPs and three yield traits (milk, protein and fat) and two composition traits (fat and protein percentages) were found whereas it did not reach significance for fat percentage in haplotypes association. Importantly, the significant SNPs in our results showed a large proportion of the phenotypic variation of milk protein yield and concentration. Our results suggest that CSN3 is an important candidate gene that influences milk production traits, and identified polymorphisms and haplotypes could be used as a genetic marker in programs of marker-assisted selection for the genetic improvement of milk production traits in dairy cattle.

  16. Cumulative Genetic Risk Predicts Platinum/Taxane-Induced Neurotoxicity

    PubMed Central

    McWhinney-Glass, Sarah; Winham, Stacey J.; Hertz, Daniel L.; Revollo, Jane Yen; Paul, Jim; He, Yijing; Brown, Robert; Motsinger-Reif, Alison A.; McLeod, Howard L.

    2013-01-01

    Purpose The combination of a platinum and taxane are standard of care for many cancers, but the utility is often limited due to debilitating neurotoxicity. We examined whether single nucleotide polymorphisms (SNPs) from annotated candidate genes will identify genetic risk for chemotherapy-induced neurotoxicity. Patients and Methods A candidate-gene association study was conducted to validate the relevance of 1261 SNPs within 60 candidate genes in 404 ovarian cancer patients receiving platinum/taxane chemotherapy on the SCOTROC1 trial. Statistically significant variants were then assessed for replication in a separate 404 patient replication cohort from SCOTROC1. Results Significant associations with chemotherapy-induced neurotoxicity were identified and replicated for four SNPs in SOX10, BCL2, OPRM1, and TRPV1. The Population Attributable Risk for each of the four SNPs ranged from 5–35%, with a cumulative risk of 62%. According to the multiplicative model, the odds of developing neurotoxicity increase by a factor of 1.64 for every risk genotype. Patients possessing 3 risk variants have an estimated odds ratio of 4.49 (2.36–8.54) compared to individuals with 0 risk variants. Neither the four SNPs nor the risk score were associated with progression free survival or overall survival. Conclusions This study demonstrates that SNPs in four genes have a significant cumulative association with increased risk for the development of chemotherapy-induced neurotoxicity, independent of patient survival. PMID:23963862

  17. Identification of SNPs associated with variola virus virulence

    PubMed Central

    2013-01-01

    Background Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The top performing unadjusted model and adjusted models both revealed the same two-way gene-gene interaction. We discuss the biological plausibility of the influence of the SNPs identified these and other significant models on the strain-specific virulence of VARV. Conclusions We have identified genetic loci in the VARV genome that are statistically associated with VARV virulence as measured by CFR. While our ability to infer a causal relationship between the specific SNPs identified in our analysis and VARV virulence is limited, our results suggest that smallpox severity is in part associated with VARV strain variation and that VARV virulence may be determined by multiple genetic loci. This study represents the first application of MDR to the identification of pathogen gene-gene interactions for predicting infectious disease outbreak severity. PMID:23410064

  18. Development and Evaluation of a Genome-Wide 6K SNP Array for Diploid Sweet Cherry and Tetraploid Sour Cherry

    PubMed Central

    Peace, Cameron; Bassil, Nahla; Main, Dorrie; Ficklin, Stephen; Rosyara, Umesh R.; Stegmeir, Travis; Sebolt, Audrey; Gilmore, Barbara; Lawley, Cindy; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Iezzoni, Amy

    2012-01-01

    High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops. Next-generation sequencing in diverse breeding germplasm provided 25 billion basepairs (Gb) of cherry DNA sequence from which were identified genome-wide SNPs for sweet cherry and for the two sour cherry subgenomes derived from sweet cherry (avium subgenome) and P. fruticosa (fruticosa subgenome). Anchoring to the peach genome sequence, recently released by the International Peach Genome Initiative, predicted relative physical locations of the 1.9 million putative SNPs detected, preliminarily filtered to 368,943 SNPs. Further filtering was guided by results of a 144-SNP subset examined with the Illumina GoldenGate® assay on 160 accessions. A 6K Infinium® II array was designed with SNPs evenly spaced genetically across the sweet and sour cherry genomes. SNPs were developed for each sour cherry subgenome by using minor allele frequency in the sour cherry detection panel to enrich for subgenome-specific SNPs followed by targeting to either subgenome according to alleles observed in sweet cherry. The array was evaluated using panels of sweet (n = 269) and sour (n = 330) cherry breeding germplasm. Approximately one third of array SNPs were informative for each crop. A total of 1825 polymorphic SNPs were verified in sweet cherry, 13% of these originally developed for sour cherry. Allele dosage was resolved for 2058 polymorphic SNPs in sour cherry, one third of these being originally developed for sweet cherry. This publicly available genomics resource represents a significant advance in cherry genome-scanning capability that will accelerate marker-locus-trait association discovery, genome structure investigation, and genetic diversity assessment in this diploid-tetraploid crop group. PMID:23284615

  19. Re-sequencing and genetic variation identification of a rice line with ideal plant architecture.

    PubMed

    Li, Shuangcheng; Xie, Kailong; Li, Wenbo; Zou, Ting; Ren, Yun; Wang, Shiquan; Deng, Qiming; Zheng, Aiping; Zhu, Jun; Liu, Huainian; Wang, Lingxia; Ai, Peng; Gao, Fengyan; Huang, Bin; Cao, Xuemei; Li, Ping

    2012-12-01

    The ideal plant architecture (IPA) includes several important characteristics such as low tiller numbers, few or no unproductive tillers, more grains per panicle, and thick and sturdy stems. We have developed an indica restorer line 7302R that displays the IPA phenotype in terms of tiller number, grain number, and stem strength. However, its mechanism had to be clarified. We performed re-sequencing and genome-wide variation analysis of 7302R using the Solexa sequencing technology. With the genomic sequence of the indica cultivar 9311 as reference, 307 627 SNPs, 57 372 InDels, and 3 096 SVs were identified in the 7302R genome. The 7302R-specific variations were investigated via the synteny analysis of all the SNPs of 7302R with those of the previous sequenced none-IPA-type lines IR24, MH63, and SH527. Moreover, we found 178 168 7302R-specific SNPs across the whole genome and 30 239 SNPs in the predicted mRNA regions, among which 8 517 were Non-syn CDS. In addition, 263 large-effect SNPs that were expected to affect the integrity of encoded proteins were identified from the 7302R-specific SNPs. SNPs of several important previously cloned rice genes were also identified by aligning the 7302R sequence with other sequence lines. Our results provided several candidates account for the IPA phenotype of 7302R. These results therefore lay the groundwork for long-term efforts to uncover important genes and alleles for rice plant architecture construction, also offer useful data resources for future genetic and genomic studies in rice.

  20. Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

    PubMed

    van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F

    2015-09-17

    In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

  1. Cell Specific eQTL Analysis without Sorting Cells

    PubMed Central

    Esko, Tõnu; Peters, Marjolein J.; Schurmann, Claudia; Schramm, Katharina; Kettunen, Johannes; Yaghootkar, Hanieh; Fairfax, Benjamin P.; Andiappan, Anand Kumar; Li, Yang; Fu, Jingyuan; Karjalainen, Juha; Platteel, Mathieu; Visschedijk, Marijn; Weersma, Rinse K.; Kasela, Silva; Milani, Lili; Tserel, Liina; Peterson, Pärt; Reinmaa, Eva; Hofman, Albert; Uitterlinden, André G.; Rivadeneira, Fernando; Homuth, Georg; Petersmann, Astrid; Lorbeer, Roberto; Prokisch, Holger; Meitinger, Thomas; Herder, Christian; Roden, Michael; Grallert, Harald; Ripatti, Samuli; Perola, Markus; Wood, Andrew R.; Melzer, David; Ferrucci, Luigi; Singleton, Andrew B.; Hernandez, Dena G.; Knight, Julian C.; Melchiotti, Rossella; Lee, Bernett; Poidinger, Michael; Zolezzi, Francesca; Larbi, Anis; Wang, De Yun; van den Berg, Leonard H.; Veldink, Jan H.; Rotzschke, Olaf; Makino, Seiko; Salomaa, Veikko; Strauch, Konstantin; Völker, Uwe; van Meurs, Joyce B. J.; Metspalu, Andres; Wijmenga, Cisca; Jansen, Ritsert C.; Franke, Lude

    2015-01-01

    The functional consequences of trait associated SNPs are often investigated using expression quantitative trait locus (eQTL) mapping. While trait-associated variants may operate in a cell-type specific manner, eQTL datasets for such cell-types may not always be available. We performed a genome-environment interaction (GxE) meta-analysis on data from 5,683 samples to infer the cell type specificity of whole blood cis-eQTLs. We demonstrate that this method is able to predict neutrophil and lymphocyte specific cis-eQTLs and replicate these predictions in independent cell-type specific datasets. Finally, we show that SNPs associated with Crohn’s disease preferentially affect gene expression within neutrophils, including the archetypal NOD2 locus. PMID:25955312

  2. The Medicago sativa gene index 1.2: a web-accessible gene expression atlas for investigating expression differences between Medicago sativa subspecies.

    PubMed

    O'Rourke, Jamie A; Fu, Fengli; Bucciarelli, Bruna; Yang, S Sam; Samac, Deborah A; Lamb, JoAnn F S; Monteros, Maria J; Graham, Michelle A; Gronwald, John W; Krom, Nick; Li, Jun; Dai, Xinbin; Zhao, Patrick X; Vance, Carroll P

    2015-07-07

    Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement. A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified. The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/ , a publicly available genomic resource for alfalfa improvement and legume research.

  3. Massive sequencing of Ulmus minor’s transcriptome provides new molecular tools for a genus under the constant threat of Dutch elm disease

    PubMed Central

    Perdiguero, Pedro; Venturas, Martin; Cervera, María Teresa; Gil, Luis; Collada, Carmen

    2015-01-01

    Elms, especially Ulmus minor and U. americana, are carrying out a hard battle against Dutch elm disease (DED). This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i) obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii) perform a functional annotation of the assembled transcriptome, (iii) identify genes potentially involved in the molecular response to environmental stress, and (iv) develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to single nucleotide polymorphisms (SNPs; 27,359). In a subset randomly selected for validation, 87% of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations. PMID:26257751

  4. Genome Sequence, Assembly and Characterization of Two Metschnikowia fructicola Strains Used as Biocontrol Agents of Postharvest Diseases

    PubMed Central

    Piombo, Edoardo; Sela, Noa; Wisniewski, Michael; Hoffmann, Maria; Gullino, Maria L.; Allard, Marc W.; Levin, Elena; Spadaro, Davide; Droby, Samir

    2018-01-01

    The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue. PMID:29666611

  5. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms.

    PubMed

    Puritz, Jonathan B; Hollenbeck, Christopher M; Gold, John R

    2014-01-01

    Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com.

  6. dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

    PubMed Central

    Hollenbeck, Christopher M.; Gold, John R.

    2014-01-01

    Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for non-model organisms with large effective population sizes and high levels of genetic polymorphism. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is due to the fact that dDocent quality trims instead of filtering, incorporates both forward and reverse reads (including reads with INDEL polymorphisms) in assembly, mapping, and SNP calling. The pipeline and a comprehensive user guide can be found at http://dDocent.wordpress.com. PMID:24949246

  7. Estimating the Population Mutation Rate from a de novo Assembled Bactrian Camel Genome and Cross-Species Comparison with Dromedary ESTs

    PubMed Central

    2014-01-01

    The Bactrian camel (Camelus bactrianus) and the dromedary (Camelus dromedarius) are among the last species that have been domesticated around 3000–6000 years ago. During domestication, strong artificial (anthropogenic) selection has shaped the livestock, creating a huge amount of phenotypes and breeds. Hence, domestic animals represent a unique resource to understand the genetic basis of phenotypic variation and adaptation. Similar to its late domestication history, the Bactrian camel is also among the last livestock animals to have its genome sequenced and deciphered. As no genomic data have been available until recently, we generated a de novo assembly by shotgun sequencing of a single male Bactrian camel. We obtained 1.6 Gb genomic sequences, which correspond to more than half of the Bactrian camel’s genome. The aim of this study was to identify heterozygous single-nucleotide polymorphisms (SNPs) and to estimate population parameters and nucleotide diversity based on an individual camel. With an average 6.6-fold coverage, we detected over 116 000 heterozygous SNPs and recorded a genome-wide nucleotide diversity similar to that of other domesticated ungulates. More than 20 000 (85%) dromedary expressed sequence tags successfully aligned to our genomic draft. Our results provide a template for future association studies targeting economically relevant traits and to identify changes underlying the process of camel domestication and environmental adaptation. PMID:23454912

  8. Development of cleaved amplified polymorphic sequence markers and a CAPS-based genetic linkage map in watermelon (Citrullus lanatus [Thunb.] Matsum. and Nakai) constructed using whole-genome re-sequencing data

    PubMed Central

    Liu, Shi; Gao, Peng; Zhu, Qianglong; Luan, Feishi; Davis, Angela R.; Wang, Xiaolu

    2016-01-01

    Cleaved amplified polymorphic sequence (CAPS) markers are useful tools for detecting single nucleotide polymorphisms (SNPs). This study detected and converted SNP sites into CAPS markers based on high-throughput re-sequencing data in watermelon, for linkage map construction and quantitative trait locus (QTL) analysis. Two inbred lines, Cream of Saskatchewan (COS) and LSW-177 had been re-sequenced and analyzed by Perl self-compiled script for CAPS marker development. 88.7% and 78.5% of the assembled sequences of the two parental materials could map to the reference watermelon genome, respectively. Comparative assembled genome data analysis provided 225,693 and 19,268 SNPs and indels between the two materials. 532 pairs of CAPS markers were designed with 16 restriction enzymes, among which 271 pairs of primers gave distinct bands of the expected length and polymorphic bands, via PCR and enzyme digestion, with a polymorphic rate of 50.94%. Using the new CAPS markers, an initial CAPS-based genetic linkage map was constructed with the F2 population, spanning 1836.51 cM with 11 linkage groups and 301 markers. 12 QTLs were detected related to fruit flesh color, length, width, shape index, and brix content. These newly CAPS markers will be a valuable resource for breeding programs and genetic studies of watermelon. PMID:27162496

  9. Single nucleotide polymorphisms and haplotypes associated with feed efficiency in beef cattle

    PubMed Central

    2013-01-01

    Background General, breed- and diet-dependent associations between feed efficiency in beef cattle and single nucleotide polymorphisms (SNPs) or haplotypes were identified on a population of 1321 steers using a 50 K SNP panel. Genomic associations with traditional two-step indicators of feed efficiency – residual feed intake (RFI), residual average daily gain (RADG), and residual intake gain (RIG) – were compared to associations with two complementary one-step indicators of feed efficiency: efficiency of intake (EI) and efficiency of gain (EG). Associations uncovered in a training data set were evaluated on independent validation data set. A multi-SNP model was developed to predict feed efficiency. Functional analysis of genes harboring SNPs significantly associated with feed efficiency and network visualization aided in the interpretation of the results. Results For the five feed efficiency indicators, the numbers of general, breed-dependent, and diet-dependent associations with SNPs (P-value < 0.0001) were 31, 40, and 25, and with haplotypes were six, ten, and nine, respectively. Of these, 20 SNP and six haplotype associations overlapped between RFI and EI, and five SNP and one haplotype associations overlapped between RADG and EG. This result confirms the complementary value of the one and two-step indicators. The multi-SNP models included 89 SNPs and offered a precise prediction of the five feed efficiency indicators. The associations of 17 SNPs and 7 haplotypes with feed efficiency were confirmed on the validation data set. Nine clusters of Gene Ontology and KEGG pathway categories (mean P-value < 0.001) including, 9nucleotide binding; ion transport, phosphorous metabolic process, and the MAPK signaling pathway were overrepresented among the genes harboring the SNPs associated with feed efficiency. Conclusions The general SNP associations suggest that a single panel of genomic variants can be used regardless of breed and diet. The breed- and diet-dependent associations between SNPs and feed efficiency suggest that further refinement of variant panels require the consideration of the breed and management practices. The unique genomic variants associated with the one- and two-step indicators suggest that both types of indicators offer complementary description of feed efficiency that can be exploited for genome-enabled selection purposes. PMID:24066663

  10. Characterization of SNPs Associated with Prostate Cancer in Men of Ashkenazic Descent from the Set of GWAS Identified SNPs: Impact of Cancer Family History and Cumulative SNP Risk Prediction

    PubMed Central

    Agalliu, Ilir; Wang, Zhaoming; Wang, Tao; Dunn, Anne; Parikh, Hemang; Myers, Timothy

    2013-01-01

    Background Genome-wide association studies (GWAS) have identified multiple SNPs associated with prostate cancer (PrCa). Population isolates may have different sets of risk alleles for PrCa constituting unique population and individual risk profiles. Methods To test this hypothesis, associations between 31 GWAS SNPs of PrCa were examined among 979 PrCa cases and 1,251 controls of Ashkenazic descent using logistic regression. We also investigated risks by age at diagnosis, pathological features of PrCa, and family history of cancer. Moreover, we examined associations between cumulative number of risk alleles and PrCa and assessed the utility of risk alleles in PrCa risk prediction by comparing the area under the curve (AUC) for different logistic models. Results Of the 31 genotyped SNPs, 8 were associated with PrCa at p≤0.002 (corrected p-value threshold) with odds ratios (ORs) ranging from 1.22 to 1.42 per risk allele. Four SNPs were associated with aggressive PrCa, while three other SNPs showed potential interactions for PrCa by family history of PrCa (rs8102476; 19q13), lung cancer (rs17021918; 4q22), and breast cancer (rs10896449; 11q13). Men in the highest vs. lowest quartile of cumulative number of risk alleles had ORs of 3.70 (95% CI 2.76–4.97); 3.76 (95% CI 2.57–5.50), and 5.20 (95% CI 2.94–9.19) for overall PrCa, aggressive cancer and younger age at diagnosis, respectively. The addition of cumulative risk alleles to the model containing age at diagnosis and family history of PrCa yielded a slightly higher AUC (0.69 vs. 0.64). Conclusion These data define a set of risk alleles associated with PrCa in men of Ashkenazic descent and indicate possible genetic differences for PrCa between populations of European and Ashkenazic ancestry. Use of genetic markers might provide an opportunity to identify men at highest risk for younger age of onset PrCa; however, their clinical utility in identifying men at highest risk for aggressive cancer remains limited. PMID:23573233

  11. SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine.

    PubMed

    Dayem Ullah, Abu Z; Oscanoa, Jorge; Wang, Jun; Nagano, Ai; Lemoine, Nicholas R; Chelala, Claude

    2018-05-11

    Broader functional annotation of genetic variation is a valuable means for prioritising phenotypically-important variants in further disease studies and large-scale genotyping projects. We developed SNPnexus to meet this need by assessing the potential significance of known and novel SNPs on the major transcriptome, proteome, regulatory and structural variation models. Since its previous release in 2012, we have made significant improvements to the annotation categories and updated the query and data viewing systems. The most notable changes include broader functional annotation of noncoding variants and expanding annotations to the most recent human genome assembly GRCh38/hg38. SNPnexus has now integrated rich resources from ENCODE and Roadmap Epigenomics Consortium to map and annotate the noncoding variants onto different classes of regulatory regions and noncoding RNAs as well as providing their predicted functional impact from eight popular non-coding variant scoring algorithms and computational methods. A novel functionality offered now is the support for neo-epitope predictions from leading tools to facilitate its use in immunotherapeutic applications. These updates to SNPnexus are in preparation for its future expansion towards a fully comprehensive computational workflow for disease-associated variant prioritization from sequencing data, placing its users at the forefront of translational research. SNPnexus is freely available at http://www.snp-nexus.org.

  12. Transcriptome characterization of three wild Chinese Vitis uncovers a large number of distinct disease related genes.

    PubMed

    Jiao, Chen; Gao, Min; Wang, Xiping; Fei, Zhangjun

    2015-03-21

    Grape is one of the most valuable fruit crops and can serve for both fresh consumption and wine production. Grape cultivars have been selected and evolved to produce high-quality fruits during their domestication over thousands of years. However, current widely planted grape cultivars suffer extensive loss to many diseases while most wild species show resistance to various pathogens. Therefore, a comprehensive evaluation of wild grapes would contribute to the improvement of disease resistance in grape breeding programs. We performed deep transcriptome sequencing of three Chinese wild grapes using the Illumina strand-specific RNA-Seq technology. High quality transcriptomes were assembled de novo and more than 93% transcripts were shared with the reference PN40024 genome. Over 1,600 distinct transcripts, which were absent or highly divergent from sequences in the reference PN40024 genome, were identified in each of the three wild grapes, among which more than 1,000 were potential protein-coding genes. Gene Ontology (GO) and pathway annotations of these distinct genes showed those involved in defense responses and plant secondary metabolisms were highly enriched. More than 87,000 single nucleotide polymorphisms (SNPs) and 2,000 small insertions or deletions (indels) were identified between each genotype and PN40024, and approximately 20% of the SNPs caused nonsynonymous mutations. Finally, we discovered 100 to 200 highly confident cis-natural antisense transcript (cis-NAT) pairs in each genotype. These transcripts were significantly enriched with genes involved in secondary metabolisms and plant responses to abiotic stresses. The three de novo assembled transcriptomes provide a comprehensive sequence resource for molecular genetic research in grape. The newly discovered genes from wild Vitis, as well as SNPs and small indels we identified, may facilitate future studies on the molecular mechanisms related to valuable traits possessed by these wild Vitis and contribute to the grape breeding programs. Furthermore, we identified hundreds of cis-NAT pairs which showed their potential regulatory roles in secondary metabolism and abiotic stress responses.

  13. An integrated pipeline of open source software adapted for multi-CPU architectures: use in the large-scale identification of single nucleotide polymorphisms.

    PubMed

    Jayashree, B; Hanspal, Manindra S; Srinivasan, Rajgopal; Vigneshwaran, R; Varshney, Rajeev K; Spurthi, N; Eshwar, K; Ramesh, N; Chandra, S; Hoisington, David A

    2007-01-01

    The large amounts of EST sequence data available from a single species of an organism as well as for several species within a genus provide an easy source of identification of intra- and interspecies single nucleotide polymorphisms (SNPs). In the case of model organisms, the data available are numerous, given the degree of redundancy in the deposited EST data. There are several available bioinformatics tools that can be used to mine this data; however, using them requires a certain level of expertise: the tools have to be used sequentially with accompanying format conversion and steps like clustering and assembly of sequences become time-intensive jobs even for moderately sized datasets. We report here a pipeline of open source software extended to run on multiple CPU architectures that can be used to mine large EST datasets for SNPs and identify restriction sites for assaying the SNPs so that cost-effective CAPS assays can be developed for SNP genotyping in genetics and breeding applications. At the International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), the pipeline has been implemented to run on a Paracel high-performance system consisting of four dual AMD Opteron processors running Linux with MPICH. The pipeline can be accessed through user-friendly web interfaces at http://hpc.icrisat.cgiar.org/PBSWeb and is available on request for academic use. We have validated the developed pipeline by mining chickpea ESTs for interspecies SNPs, development of CAPS assays for SNP genotyping, and confirmation of restriction digestion pattern at the sequence level.

  14. Whole genome comparison between table and wine grapes reveals a comprehensive catalog of structural variants

    PubMed Central

    2014-01-01

    Background Grapevine (Vitis vinifera L.) is the most important Mediterranean fruit crop, used to produce both wine and spirits as well as table grape and raisins. Wine and table grape cultivars represent two divergent germplasm pools with different origins and domestication history, as well as differential characteristics for berry size, cluster architecture and berry chemical profile, among others. ‘Sultanina’ plays a pivotal role in modern table grape breeding providing the main source of seedlessness. This cultivar is also one of the most planted for fresh consumption and raisins production. Given its importance, we sequenced it and implemented a novel strategy for the de novo assembly of its highly heterozygous genome. Results Our approach produced a draft genome of 466 Mb, recovering 82% of the genes present in the grapevine reference genome; in addition, we identified 240 novel genes. A large number of structural variants and SNPs were identified. Among them, 45 (21 SNPs and 24 INDELs) were experimentally confirmed in ‘Sultanina’ and six SNPs in other 23 table grape varieties. Transposable elements corresponded to ca. 80% of the repetitive sequences involved in structural variants and more than 2,000 genes were affected in their structure by these variants. Some of these genes are likely involved in embryo development, suggesting that they may contribute to seedlessness, a key trait for table grapes. Conclusions This work produced the first structural variants and SNPs catalog for grapevine, constituting a novel and very powerful tool for genomic studies in this key fruit crop, particularly useful to support marker assisted breeding in table grapes. PMID:24397443

  15. Single-Nucleotide Polymorphism Markers from De-Novo Assembly of the Pomegranate Transcriptome Reveal Germplasm Genetic Diversity

    PubMed Central

    Ophir, Ron; Sherman, Amir; Rubinstein, Mor; Eshed, Ravit; Sharabi Schwager, Michal; Harel-Beja, Rotem; Bar-Ya'akov, Irit; Holland, Doron

    2014-01-01

    Pomegranate is a valuable crop that is grown commercially in many parts of the world. Wild species have been reported from India, Turkmenistan and Socotra. Pomegranate fruit has a variety of health-beneficial qualities. However, despite this crop's importance, only moderate effort has been invested in studying its biochemical or physiological properties or in establishing genomic and genetic infrastructures. In this study, we reconstructed a transcriptome from two phenotypically different accessions using 454-GS-FLX Titanium technology. These data were used to explore the functional annotation of 45,187 fully annotated contigs. We further compiled a genetic-variation resource of 7,155 simple-sequence repeats (SSRs) and 6,500 single-nucleotide polymorphisms (SNPs). A subset of 480 SNPs was sampled to investigate the genetic structure of the broad pomegranate germplasm collection at the Agricultural Research Organization (ARO), which includes accessions from different geographical areas worldwide. This subset of SNPs was found to be polymorphic, with 10.7% loci with minor allele frequencies of (MAF<0.05). These SNPs were successfully used to classify the ARO pomegranate collection into two major groups of accessions: one from India, China and Iran, composed of mainly unknown country origin and which was more of an admixture than the other major group, composed of accessions mainly from the Mediterranean basin, Central Asia and California. This study establishes a high-throughput transcriptome and genetic-marker infrastructure. Moreover, it sheds new light on the genetic interrelations between pomegranate species worldwide and more accurately defines their genetic nature. PMID:24558460

  16. Single-nucleotide polymorphism markers from de-novo assembly of the pomegranate transcriptome reveal germplasm genetic diversity.

    PubMed

    Ophir, Ron; Sherman, Amir; Rubinstein, Mor; Eshed, Ravit; Sharabi Schwager, Michal; Harel-Beja, Rotem; Bar-Ya'akov, Irit; Holland, Doron

    2014-01-01

    Pomegranate is a valuable crop that is grown commercially in many parts of the world. Wild species have been reported from India, Turkmenistan and Socotra. Pomegranate fruit has a variety of health-beneficial qualities. However, despite this crop's importance, only moderate effort has been invested in studying its biochemical or physiological properties or in establishing genomic and genetic infrastructures. In this study, we reconstructed a transcriptome from two phenotypically different accessions using 454-GS-FLX Titanium technology. These data were used to explore the functional annotation of 45,187 fully annotated contigs. We further compiled a genetic-variation resource of 7,155 simple-sequence repeats (SSRs) and 6,500 single-nucleotide polymorphisms (SNPs). A subset of 480 SNPs was sampled to investigate the genetic structure of the broad pomegranate germplasm collection at the Agricultural Research Organization (ARO), which includes accessions from different geographical areas worldwide. This subset of SNPs was found to be polymorphic, with 10.7% loci with minor allele frequencies of (MAF<0.05). These SNPs were successfully used to classify the ARO pomegranate collection into two major groups of accessions: one from India, China and Iran, composed of mainly unknown country origin and which was more of an admixture than the other major group, composed of accessions mainly from the Mediterranean basin, Central Asia and California. This study establishes a high-throughput transcriptome and genetic-marker infrastructure. Moreover, it sheds new light on the genetic interrelations between pomegranate species worldwide and more accurately defines their genetic nature.

  17. Characterization of the canine desmin (DES) gene and evaluation as a candidate gene for dilated cardiomyopathy in the Dobermann.

    PubMed

    Stabej, Polona; Imholz, Sandra; Versteeg, Serge A; Zijlstra, Carla; Stokhof, Arnold A; Domanjko-Petric, Aleksandra; Leegwater, Peter A J; van Oost, Bernard A

    2004-10-13

    Canine-dilated cardiomyopathy (DCM) in dogs is a disease of the myocardium associated with dilatation and impaired contraction of the ventricles and is suspected to have a genetic cause. A missense mutation in the desmin gene (DES) causes DCM in a human family. Human DCM closely resembles the canine disease. In the present study, we evaluated whether DES gene mutations are responsible for DCM in Dobermann dogs. We have isolated bacterial artificial chromosome clones (BACs) containing the canine DES gene and determined the chromosomal location by fluorescence in situ hybridization (FISH). Using data deposited in the NCBI trace archive and GenBank, the canine DES gene DNA sequence was assembled and seven single nucleotide polymorphisms (SNPs) were identified. From the canine DES gene BAC clones, a polymorphic microsatellite marker was isolated. The microsatellite marker and four informative desmin SNPs were typed in a Dobermann family with frequent DCM occurrence, but the disease phenotype did not associate with a desmin haplotype. We concluded that mutations in the DES gene do not play a role in Dobermann DCM. Availability of the microsatellite marker, SNPs and DNA sequence reported in this study enable fast evaluation of the DES gene as a DCM candidate gene in other dog breeds with DCM occurrence.

  18. Comprehensive Search for Alzheimer Disease Susceptibility Loci in the APOE Region

    PubMed Central

    Jun, Gyungah; Vardarajan, Badri N.; Buros, Jacqueline; Yu, Chang-En; Hawk, Michele V.; Dombroski, Beth A.; Crane, Paul K.; Larson, Eric B.; Mayeux, Richard; Haines, Jonathan L.; Lunetta, Kathryn L.; Pericak-Vance, Margaret A.; Schellenberg, Gerard D.; Farrer, Lindsay A.

    2013-01-01

    Objective To evaluate the association of risk and age at onset (AAO) of Alzheimer disease (AD) with single-nucleotide polymorphisms (SNPs) in the chromosome 19 region including apolipoprotein E (APOE) and a repeat-length polymorphism in TOMM40 (poly-T, rs10524523). Design Conditional logistic regression models and survival analysis. Setting Fifteen genome-wide association study data sets assembled by the Alzheimer's Disease Genetics Consortium. Participants Eleven thousand eight hundred forty AD cases and 10 931 cognitively normal elderly controls. Main Outcome Measures Association of AD risk and AAO with genotyped and imputed SNPs located in an 800-Mb region including APOE in the entire Alzheimer's Disease Genetics Consortium data set and with the TOMM40 poly-T marker genotyped in a subset of 1256 cases and 1605 controls. Results In models adjusting for APOE ε4, no SNPs in the entire region were significantly associated with AAO at P<.001. Rs10524523 was not significantly associated with AD or AAO in models adjusting for APOE genotype or within the subset of ε3/ε3 subjects. Conclusions APOE alleles ε2, ε3, and ε4 account for essentially all the inherited risk of AD associated with this region. Other variants including a poly-T track in TOMM40 are not independent risk or AAO loci. PMID:22869155

  19. No Association between Variation in Longevity Candidate Genes and Aging-related Phenotypes in Oldest-old Danes.

    PubMed

    Soerensen, Mette; Nygaard, Marianne; Debrabant, Birgit; Mengel-From, Jonas; Dato, Serena; Thinggaard, Mikael; Christensen, Kaare; Christiansen, Lene

    2016-06-01

    In this study we explored the association between aging-related phenotypes previously reported to predict survival in old age and variation in 77 genes from the DNA repair pathway, 32 genes from the growth hormone 1/ insulin-like growth factor 1/insulin (GH/IGF-1/INS) signalling pathway and 16 additional genes repeatedly considered as candidates for human longevity: APOE, APOA4, APOC3, ACE, CETP, HFE, IL6, IL6R, MTHFR, TGFB1, SIRTs 1, 3, 6; and HSPAs 1A, 1L, 14. Altogether, 1,049 single nucleotide polymorphisms (SNPs) were genotyped in 1,088 oldest-old (age 92-93 years) Danes and analysed with phenotype data on physical functioning (hand grip strength), cognitive functioning (mini mental state examination and a cognitive composite score), activity of daily living and self-rated health. Five SNPs showed association to one of the phenotypes; however, none of these SNPs were associated with a change in the relevant phenotype over time (7 years of follow-up) and none of the SNPs could be confirmed in a replication sample of 1,281 oldest-old Danes (age 94-100). Hence, our study does not support association between common variation in the investigated longevity candidate genes and aging-related phenotypes consistently shown to predict survival. It is possible that larger sample sizes are needed to robustly reveal associations with small effect sizes. Copyright © 2016 Elsevier Inc. All rights reserved.

  20. Exploration of structural stability in deleterious nsSNPs of the XPA gene: A molecular dynamics approach

    PubMed Central

    NagaSundaram, N; Priya Doss, C George

    2011-01-01

    Background: Distinguishing the deleterious from the massive number of non-functional nsSNPs that occur within a single genome is a considerable challenge in mutation research. In this approach, we have used the existing in silico methods to explore the mutation-structure-function relationship in the XPAgene. Materials and Methods: We used the Sorting Intolerant From Tolerant (SIFT), Polymorphism Phenotyping (PolyPhen), I-Mutant 2.0, and the Protein Analysis THrough Evolutionary Relationships methods to predict the effects of deleterious nsSNPs on protein function and evaluated the impact of mutation on protein stability by Molecular Dynamics simulations. Results: By comparing the scores of all the four in silico methods, nsSNP with an ID rs104894131 at position C108F was predicted to be highly deleterious. We extended our Molecular dynamics approach to gain insight into the impact of this non-synonymous polymorphism on structural changes that may affect the activity of the XPAgene. Conclusion: Based on the in silico methods score, potential energy, root-mean-square deviation, and root-mean-square fluctuation, we predict that deleterious nsSNP at position C108F would play a significant role in causing disease by the XPA gene. Our approach would present the application of in silicotools in understanding the functional variation from the perspective of structure, evolution, and phenotype. PMID:22190868

  1. Pharmacogenomics of Methotrexate Membrane Transport Pathway: Can Clinical Response to Methotrexate in Rheumatoid Arthritis Be Predicted?

    PubMed Central

    Lima, Aurea; Bernardes, Miguel; Azevedo, Rita; Medeiros, Rui; Seabra, Vitor

    2015-01-01

    Background: Methotrexate (MTX) is widely used for rheumatoid arthritis (RA) treatment. Single nucleotide polymorphisms (SNPs) could be used as predictors of patients’ therapeutic outcome variability. Therefore, this study aims to evaluate the influence of SNPs in genes encoding for MTX membrane transport proteins in order to predict clinical response to MTX. Methods: Clinicopathological data from 233 RA patients treated with MTX were collected, clinical response defined, and patients genotyped for 23 SNPs. Genotype and haplotype analyses were performed using multivariate methods and a genetic risk index (GRI) for non-response was created. Results: Increased risk for non-response was associated to SLC22A11 rs11231809 T carriers; ABCC1 rs246240 G carriers; ABCC1 rs3784864 G carriers; CGG haplotype for ABCC1 rs35592, rs2074087 and rs3784864; and CGG haplotype for ABCC1 rs35592, rs246240 and rs3784864. GRI demonstrated that patients with Index 3 were 16-fold more likely to be non-responders than those with Index 1. Conclusions: This study revealed that SLC22A11 and ABCC1 may be important to identify those patients who will not benefit from MTX treatment, highlighting the relevance in translating these results to clinical practice. However, further validation by independent studies is needed to develop the field of personalized medicine to predict clinical response to MTX treatment. PMID:26086825

  2. Pharmacogenomics of Methotrexate Membrane Transport Pathway: Can Clinical Response to Methotrexate in Rheumatoid Arthritis Be Predicted?

    PubMed

    Lima, Aurea; Bernardes, Miguel; Azevedo, Rita; Medeiros, Rui; Seabra, Vítor

    2015-06-16

    Methotrexate (MTX) is widely used for rheumatoid arthritis (RA) treatment. Single nucleotide polymorphisms (SNPs) could be used as predictors of patients' therapeutic outcome variability. Therefore, this study aims to evaluate the influence of SNPs in genes encoding for MTX membrane transport proteins in order to predict clinical response to MTX. Clinicopathological data from 233 RA patients treated with MTX were collected, clinical response defined, and patients genotyped for 23 SNPs. Genotype and haplotype analyses were performed using multivariate methods and a genetic risk index (GRI) for non-response was created. Increased risk for non-response was associated to SLC22A11 rs11231809 T carriers; ABCC1 rs246240 G carriers; ABCC1 rs3784864 G carriers; CGG haplotype for ABCC1 rs35592, rs2074087 and rs3784864; and CGG haplotype for ABCC1 rs35592, rs246240 and rs3784864. GRI demonstrated that patients with Index 3 were 16-fold more likely to be non-responders than those with Index 1. This study revealed that SLC22A11 and ABCC1 may be important to identify those patients who will not benefit from MTX treatment, highlighting the relevance in translating these results to clinical practice. However, further validation by independent studies is needed to develop the field of personalized medicine to predict clinical response to MTX treatment.

  3. Impact of obesity-related genes in Spanish population

    PubMed Central

    2013-01-01

    Background The objective was to investigate the association between BMI and single nucleotide polymorphisms previously identified of obesity-related genes in two Spanish populations. Forty SNPs in 23 obesity-related genes were evaluated in a rural population characterized by a high prevalence of obesity (869 subjects, mean age 46 yr, 62% women, 36% obese) and in an urban population (1425 subjects, mean age 54 yr, 50% women, 19% obese). Genotyping was assessed by using SNPlex and PLINK for the association analysis. Results Polymorphisms of the FTO were significantly associated with BMI, in the rural population (beta 0.87, p-value <0.001). None of the other SNPs showed significant association after Bonferroni correction in the two populations or in the pooled analysis. A weighted genetic risk score (wGRS) was constructed using the risk alleles of the Tag-SNPs with a positive Beta parameter in both populations. From the first to the fifth quintile of the score, the BMI increased 0.45 kg/m2 in Hortega and 2.0 kg/m2 in Pizarra. Overall, the obesity predictive value was low (less than 1%). Conclusion The risk associated with polymorphisms is low and the overall effect on BMI or obesity prediction is minimal. A weighted genetic risk score based on genes mainly acting through central nervous system mechanisms was associated with BMI but it yields minimal clinical prediction for the obesity risk in the general population. PMID:24267414

  4. Impact of obesity-related genes in Spanish population.

    PubMed

    Martínez-García, Fernando; Mansego, María L; Rojo-Martínez, Gemma; De Marco-Solar, Griselda; Morcillo, Sonsoles; Soriguer, Federico; Redón, Josep; Pineda Alonso, Monica; Martín-Escudero, Juan C; Cooper, Richard S; Chaves, Felipe J

    2013-11-23

    The objective was to investigate the association between BMI and single nucleotide polymorphisms previously identified of obesity-related genes in two Spanish populations. Forty SNPs in 23 obesity-related genes were evaluated in a rural population characterized by a high prevalence of obesity (869 subjects, mean age 46 yr, 62% women, 36% obese) and in an urban population (1425 subjects, mean age 54 yr, 50% women, 19% obese). Genotyping was assessed by using SNPlex and PLINK for the association analysis. Polymorphisms of the FTO were significantly associated with BMI, in the rural population (beta 0.87, p-value <0.001). None of the other SNPs showed significant association after Bonferroni correction in the two populations or in the pooled analysis. A weighted genetic risk score (wGRS) was constructed using the risk alleles of the Tag-SNPs with a positive Beta parameter in both populations. From the first to the fifth quintile of the score, the BMI increased 0.45 kg/m2 in Hortega and 2.0 kg/m2 in Pizarra. Overall, the obesity predictive value was low (less than 1%). The risk associated with polymorphisms is low and the overall effect on BMI or obesity prediction is minimal. A weighted genetic risk score based on genes mainly acting through central nervous system mechanisms was associated with BMI but it yields minimal clinical prediction for the obesity risk in the general population.

  5. Mango (Mangifera indica L.) germplasm diversity based on single nucleotide polymorphisms derived from the transcriptome.

    PubMed

    Sherman, Amir; Rubinstein, Mor; Eshed, Ravit; Benita, Miri; Ish-Shalom, Mazal; Sharabi-Schwager, Michal; Rozen, Ada; Saada, David; Cohen, Yuval; Ophir, Ron

    2015-11-14

    Germplasm collections are an important source for plant breeding, especially in fruit trees which have a long duration of juvenile period. Thus, efforts have been made to study the diversity of fruit tree collections. Even though mango is an economically important crop, most of the studies on diversity in mango collections have been conducted with a small number of genetic markers. We describe a de novo transcriptome assembly from mango cultivar 'Keitt'. Variation discovery was performed using Illumina resequencing of 'Keitt' and 'Tommy Atkins' cultivars identified 332,016 single-nucleotide polymorphisms (SNPs) and 1903 simple-sequence repeats (SSRs). Most of the SSRs (70.1%) were of trinucleotide with the preponderance of motif (GGA/AAG)n and only 23.5% were di-nucleotide SSRs with the mostly of (AT/AT)n motif. Further investigation of the diversity in the Israeli mango collection was performed based on a subset of 293 SNPs. Those markers have divided the Israeli mango collection into two major groups: one group included mostly mango accessions from Southeast Asia (Malaysia, Thailand, Indonesia) and India and the other with mainly of Floridian and Israeli mango cultivars. The latter group was more polymorphic (FS=-0.1 on the average) and was more of an admixture than the former group. A slight population differentiation was detected (FST=0.03), suggesting that if the mango accessions of the western world apparently was originated from Southeast Asia, as has been previously suggested, the duration of cultivation was not long enough to develop a distinct genetic background. Whole-transcriptome reconstruction was used to significantly broaden the mango's genetic variation resources, i.e., SNPs and SSRs. The set of SNP markers described in this study is novel. A subset of SNPs was sampled to explore the Israeli mango collection and most of them were polymorphic in many mango accessions. Therefore, we believe that these SNPs will be valuable as they recapitulate and strengthen the history of mango diversity.

  6. Fast Screening Technology for Drug Emergency Management: Predicting Suspicious SNPs for ADR with Information Theory-based Models.

    PubMed

    Liang, Zhaohui; Liu, Jun; Huang, Jimmy X; Zeng, Xing

    2018-01-01

    The genetic polymorphism of Cytochrome P450 (CYP 450) is considered as one of the main causes for adverse drug reactions (ADRs). In order to explore the latent correlations between ADRs and potentially corresponding single-nucleotide polymorphism (SNPs) in CYP450, three algorithms based on information theory are used as the main method to predict the possible relation. The study uses a retrospective case-control study to explore the potential relation of ADRs to specific genomic locations and single-nucleotide polymorphism (SNP). The genomic data collected from 53 healthy volunteers are applied for the analysis, another group of genomic data collected from 30 healthy volunteers excluded from the study are used as the control group. The SNPs respective on five loci of CYP2D6*2,*10,*14 and CYP1A2*1C, *1F are detected by the Applied Biosystem 3130xl. The raw data is processed by ChromasPro to detect the specific alleles on the above loci from each sample. The secondary data are reorganized and processed by R combined with the reports of ADRs from clinical reports. Three information theory based algorithms are implemented for the screening task: JMI, CMIM, and mRMR. If a SNP is selected by more than two algorithms, we are confident to conclude that it is related to the corresponding ADR. The selection results are compared with the control decision tree + LASSO regression model. In the study group where ADRs occur, 10 SNPs are considered relevant to the occurrence of a specific ADR by the combined information theory model. In comparison, only 5 SNPs are considered relevant to a specific ADR by the decision tree + LASSO regression model. In addition, the new method detects more relevant pairs of SNP and ADR which are affected by both SNP and dosage. This implies that the new information theory based model is effective to discover correlations of ADRs and CYP 450 SNPs and is helpful in predicting the potential vulnerable genotype for some ADRs. The newly proposed information theory based model has superiority performance in detecting the relation between SNP and ADR compared to the decision tree + LASSO regression model. The new model is more sensitive to detect ADRs compared to the old method, while the old method is more reliable. Therefore, the selection criteria for selecting algorithms should depend on the pragmatic needs. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. Identification of an Interaction between VWF rs7965413 and Platelet Count as a Novel Risk Marker for Metabolic Syndrome: An Extensive Search of Candidate Polymorphisms in a Case-Control Study

    PubMed Central

    Nakatochi, Masahiro; Ushida, Yasunori; Yasuda, Yoshinari; Yoshida, Yasuko; Kawai, Shun; Kato, Ryuji; Nakashima, Toru; Iwata, Masamitsu; Kuwatsuka, Yachiyo; Ando, Masahiko; Hamajima, Nobuyuki; Kondo, Takaaki; Oda, Hiroaki; Hayashi, Mutsuharu; Kato, Sawako; Yamaguchi, Makoto; Maruyama, Shoichi; Matsuo, Seiichi; Honda, Hiroyuki

    2015-01-01

    Although many single nucleotide polymorphisms (SNPs) have been identified to be associated with metabolic syndrome (MetS), there was only a slight improvement in the ability to predict future MetS by the simply addition of SNPs to clinical risk markers. To improve the ability to predict future MetS, combinational effects, such as SNP—SNP interaction, SNP—environment interaction, and SNP—clinical parameter (SNP × CP) interaction should be also considered. We performed a case-control study to explore novel SNP × CP interactions as risk markers for MetS based on health check-up data of Japanese male employees. We selected 99 SNPs that were previously reported to be associated with MetS and components of MetS; subsequently, we genotyped these SNPs from 360 cases and 1983 control subjects. First, we performed logistic regression analyses to assess the association of each SNP with MetS. Of these SNPs, five SNPs were significantly associated with MetS (P < 0.05): LRP2 rs2544390, rs1800592 between UCP1 and TBC1D9, APOA5 rs662799, VWF rs7965413, and rs1411766 between MYO16 and IRS2. Furthermore, we performed multiple logistic regression analyses, including an SNP term, a CP term, and an SNP × CP interaction term for each CP and SNP that was significantly associated with MetS. We identified a novel SNP × CP interaction between rs7965413 and platelet count that was significantly associated with MetS [SNP term: odds ratio (OR) = 0.78, P = 0.004; SNP × CP interaction term: OR = 1.33, P = 0.001]. This association of the SNP × CP interaction with MetS remained nominally significant in multiple logistic regression analysis after adjustment for either the number of MetS components or MetS components excluding obesity. Our results reveal new insight into platelet count as a risk marker for MetS. PMID:25646961

  8. Identification of an interaction between VWF rs7965413 and platelet count as a novel risk marker for metabolic syndrome: an extensive search of candidate polymorphisms in a case-control study.

    PubMed

    Nakatochi, Masahiro; Ushida, Yasunori; Yasuda, Yoshinari; Yoshida, Yasuko; Kawai, Shun; Kato, Ryuji; Nakashima, Toru; Iwata, Masamitsu; Kuwatsuka, Yachiyo; Ando, Masahiko; Hamajima, Nobuyuki; Kondo, Takaaki; Oda, Hiroaki; Hayashi, Mutsuharu; Kato, Sawako; Yamaguchi, Makoto; Maruyama, Shoichi; Matsuo, Seiichi; Honda, Hiroyuki

    2015-01-01

    Although many single nucleotide polymorphisms (SNPs) have been identified to be associated with metabolic syndrome (MetS), there was only a slight improvement in the ability to predict future MetS by the simply addition of SNPs to clinical risk markers. To improve the ability to predict future MetS, combinational effects, such as SNP-SNP interaction, SNP-environment interaction, and SNP-clinical parameter (SNP × CP) interaction should be also considered. We performed a case-control study to explore novel SNP × CP interactions as risk markers for MetS based on health check-up data of Japanese male employees. We selected 99 SNPs that were previously reported to be associated with MetS and components of MetS; subsequently, we genotyped these SNPs from 360 cases and 1983 control subjects. First, we performed logistic regression analyses to assess the association of each SNP with MetS. Of these SNPs, five SNPs were significantly associated with MetS (P < 0.05): LRP2 rs2544390, rs1800592 between UCP1 and TBC1D9, APOA5 rs662799, VWF rs7965413, and rs1411766 between MYO16 and IRS2. Furthermore, we performed multiple logistic regression analyses, including an SNP term, a CP term, and an SNP × CP interaction term for each CP and SNP that was significantly associated with MetS. We identified a novel SNP × CP interaction between rs7965413 and platelet count that was significantly associated with MetS [SNP term: odds ratio (OR) = 0.78, P = 0.004; SNP × CP interaction term: OR = 1.33, P = 0.001]. This association of the SNP × CP interaction with MetS remained nominally significant in multiple logistic regression analysis after adjustment for either the number of MetS components or MetS components excluding obesity. Our results reveal new insight into platelet count as a risk marker for MetS.

  9. Genetically deprived vitamin D exposure predisposes to atrial fibrillation.

    PubMed

    Chan, Yap-Hang; Yiu, Kai-Hang; Hai, Jo Jo; Chan, Pak-Hei; Lam, Tai-Hing; Cowling, Ben J; Sham, Pak-Chung; Lau, Chu-Pak; Lam, Karen Siu-Ling; Siu, Chung-Wah; Tse, Hung-Fat

    2017-12-01

    Low vitamin D level is associated with atrial fibrillation (AF) and may be implicated in its pathogenesis. We studied single nucleotide polymorphisms (SNPs) of vitamin D mechanistic pathways and serum 25-hydroxyvitamin D [25(OH)D] levels in an age- and gender-matched case-control study (controls without AF: mean age 68.6 ± 8.7 years, female 25%; n = 1019; with AF: mean age 69.7 ± 9.5 years, female 30%; n = 156) recruited from a Chinese clinical cohort of patients with stable coronary artery disease. Twelve SNPs involved in the vitamin D mechanistic pathways were studied [biosynthetic: rs4646536, rs10877012, rs3829251, rs1790349; activation: rs2060793, rs1993116; vitamin D-binding protein (VBP)/group-specific component (GC): rs4588, rs7041, rs2282679, rs1155563; and vitamin D receptor: rs1544410, rs10735810]. A genetic risk score (GRS) (0-8) was constructed from SNPs associated with serum 25(OH)D as a proxy to lifelong vitamin D-deficient state. All 4 SNPs involved in the VBP/GC were significantly associated with serum 25(OH)D (rs4588, P < 0.001; rs2282679, P < 0.001; rs7041, P = 0.011; rs1155563, P < 0.001; all other SNPs, P > 0.05). Vitamin D GRS (points 0-8) generated from these 4 SNPs was independently predictive of serum 25(OH)D [B = 0.54, 95% confidence interval (CI) 0.30-0.79; P < 0.001]. Genetically deprived vitamin D status as denoted by a low GRS (0-3) independently predicted an increased risk of AF, compared to a high GRS (4-8) (odds ratio = 1.848, 95% CI 1.217-2.805; P = 0.004). Genetically deprived vitamin D exposure predisposes to increased AF among patients with coronary artery disease. Whether VBP/GC may alter the risk of AF via alternative mechanisms warrants further studies. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2017. For permissions, please email: journals.permissions@oup.com.

  10. SNP discovery by high-throughput sequencing in soybean

    PubMed Central

    2010-01-01

    Background With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology. Results A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp. Conclusions We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops. PMID:20701770

  11. Association of genetic polymorphisms in SLCO1B3 and ABCC2 with docetaxel-induced leukopenia.

    PubMed

    Kiyotani, Kazuma; Mushiroda, Taisei; Kubo, Michiaki; Zembutsu, Hitoshi; Sugiyama, Yuichi; Nakamura, Yusuke

    2008-05-01

    Despite long-term clinical experience with docetaxel, unpredictable severe adverse reactions remain an important determinant for limiting the use of the drug. To identify a genetic factor(s) determining the risk of docetaxel-induced leukopenia/neutropenia, we selected subjects who received docetaxel chemotherapy from samples recruited at BioBank Japan, and conducted a case-control association study. We genotyped 84 patients, 28 patients with grade 3 or 4 leukopenia/neutropenia, and 56 with no toxicity (patients with grade 1 or 2 were excluded), for a total of 79 single nucleotide polymorphisms (SNPs) in seven genes possibly involved in the metabolism or transport of this drug: CYP3A4, CYP3A5, ABCB1, ABCC2, SLCO1B3, NR1I2, and NR1I3. Since one SNP in ABCB1, four SNPs in ABCC2, four SNPs in SLCO1B3, and one SNP in NR1I2 showed a possible association with the grade 3 leukopenia/neutropenia (P-value of <0.05), we further examined these 10 SNPs using 29 additionally obtained patients, 11 patients with grade 3/4 leukopenia/neutropenia, and 18 with no toxicity. The combined analysis indicated a significant association of rs12762549 in ABCC2 (P = 0.00022) and rs11045585 in SLCO1B3 (P = 0.00017) with docetaxel-induced leukopenia/neutropenia. When patients were classified into three groups by the scoring system based on the genotypes of these two SNPs, patients with a score of 1 or 2 were shown to have a significantly higher risk of docetaxel-induced leukopenia/neutropenia as compared to those with a score of 0 (P = 0.0000057; odds ratio [OR], 7.00; 95% CI [confidence interval], 2.95-16.59). This prediction system correctly classified 69.2% of severe leukopenia/neutropenia and 75.7% of non-leukopenia/neutropenia into the respective categories, indicating that SNPs in ABCC2 and SLCO1B3 may predict the risk of leukopenia/neutropenia induced by docetaxel chemotherapy.

  12. Nonsynonymous Polymorphism in Guanine Monophosphate Synthetase Is a Risk Factor for Unfavorable Thiopurine Metabolite Ratios in Patients With Inflammatory Bowel Disease.

    PubMed

    Roberts, Rebecca L; Wallace, Mary C; Seinen, Margien L; van Bodegraven, Adriaan A; Krishnaprasad, Krupa; Jones, Gregory T; van Rij, Andre M; Baird, Angela; Lawrance, Ian C; Prosser, Ruth; Bampton, Peter; Grafton, Rachel; Simms, Lisa A; Studd, Corrie; Bell, Sally J; Kennedy, Martin A; Halliwell, Jacob; Gearry, Richard B; Radford-Smith, Graham; Andrews, Jane M; McHugh, Patrick C; Barclay, Murray L

    2018-05-16

    Up to 20% of patients with inflammatory bowel disease (IBD) who are refractory to thiopurine therapy preferentially produce 6-methylmercaptopurine (6-MMP) at the expense of 6-thioguanine nucleotides (6-TGN), resulting in a high 6-MMP:6-TGN ratio (>20). The objective of this study was to evaluate whether genetic variability in guanine monophosphate synthetase (GMPS) contributes to preferential 6-MMP metabolizer phenotype. Exome sequencing was performed in a cohort of IBD patients with 6-MMP:6-TGN ratios of >100 to identify nonsynonymous single nucleotide polymorphisms (nsSNPs). In vitro assays were performed to measure GMPS activity associated with these nsSNPs. Frequency of the nsSNPs was measured in a cohort of 530 Caucasian IBD patients. Two nsSNPs in GMPS (rs747629729, rs61750370) were detected in 11 patients with very high 6-MMP:6-TGN ratios. The 2 nsSNPs were predicted to be damaging by in silico analysis. In vitro assays demonstrated that both nsSNPs resulted in a significant reduction in GMPS activity (P < 0.05). The SNP rs61750370 was significantly associated with 6-MMP:6-TGN ratios ≥100 (odds ratio, 5.64; 95% confidence interval, 1.01-25.12; P < 0.031) in a subset of 264 Caucasian IBD patients. The GMPS SNP rs61750370 may be a reliable risk factor for extreme 6MMP preferential metabolism.

  13. Genomic and transcriptomic predictors of triglyceride response to regular exercise

    PubMed Central

    Sarzynski, Mark A; Davidsen, Peter K; Sung, Yun Ju; Hesselink, Matthijs K C; Schrauwen, Patrick; Rice, Treva K; Rao, D C; Falciani, Francesco; Bouchard, Claude

    2015-01-01

    Aim We performed genome-wide and transcriptome-wide profiling to identify genes and single nucleotide polymorphisms (SNPs) associated with the response of triglycerides (TG) to exercise training. Methods Plasma TG levels were measured before and after a 20-week endurance training programme in 478 white participants from the HERITAGE Family Study. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. Affymetrix HG-U133+2 arrays were used to quantitate gene expression levels from baseline muscle biopsies of a subset of participants (N=52). Genome-wide association study (GWAS) analysis was performed using MERLIN, while transcriptomic predictor models were developed using the R-package GALGO. Results The GWAS results showed that eight SNPs were associated with TG training-response (ΔTG) at p<9.9×10−6, while another 31 SNPs showed p values <1×10−4. In multivariate regression models, the top 10 SNPs explained 32.0% of the variance in ΔTG, while conditional heritability analysis showed that four SNPs statistically accounted for all of the heritability of ΔTG. A molecular signature based on the baseline expression of 11 genes predicted 27% of ΔTG in HERITAGE, which was validated in an independent study. A composite SNP score based on the top four SNPs, each from the genomic and transcriptomic analyses, was the strongest predictor of ΔTG (R2=0.14, p=3.0×10−68). Conclusions Our results indicate that skeletal muscle transcript abundance at 11 genes and SNPs at a number of loci contribute to TG response to exercise training. Combining data from genomics and transcriptomics analyses identified a SNP-based gene signature that should be further tested in independent samples. PMID:26491034

  14. Machine learning shows association between genetic variability in PPARG and cerebral connectivity in preterm infants

    PubMed Central

    Krishnan, Michelle L.; Wang, Zi; Aljabar, Paul; Ball, Gareth; Mirza, Ghazala; Saxena, Alka; Counsell, Serena J.; Hajnal, Joseph V.; Montana, Giovanni

    2017-01-01

    Preterm infants show abnormal structural and functional brain development, and have a high risk of long-term neurocognitive problems. The molecular and cellular mechanisms involved are poorly understood, but novel methods now make it possible to address them by examining the relationship between common genetic variability and brain endophenotype. We addressed the hypothesis that variability in the Peroxisome Proliferator Activated Receptor (PPAR) pathway would be related to brain development. We employed machine learning in an unsupervised, unbiased, combined analysis of whole-brain diffusion tractography together with genomewide, single-nucleotide polymorphism (SNP)-based genotypes from a cohort of 272 preterm infants, using Sparse Reduced Rank Regression (sRRR) and correcting for ethnicity and age at birth and imaging. Empirical selection frequencies for SNPs associated with cerebral connectivity ranged from 0.663 to zero, with multiple highly selected SNPs mapping to genes for PPARG (six SNPs), ITGA6 (four SNPs), and FXR1 (two SNPs). SNPs in PPARG were significantly overrepresented (ranked 7–11 and 67 of 556,000 SNPs; P < 2.2 × 10−7), and were mostly in introns or regulatory regions with predicted effects including protein coding and nonsense-mediated decay. Edge-centric graph-theoretic analysis showed that highly selected white-matter tracts were consistent across the group and important for information transfer (P < 2.2 × 10−17); they most often connected to the insula (P < 6 × 10−17). These results suggest that the inhibited brain development seen in humans exposed to the stress of a premature extrauterine environment is modulated by genetic factors, and that PPARG signaling has a previously unrecognized role in cerebral development. PMID:29229843

  15. Drug Metabolizing Enzyme and Transporter Gene Variation, Nicotine Metabolism, Prospective Abstinence, and Cigarette Consumption.

    PubMed

    Bergen, Andrew W; Michel, Martha; Nishita, Denise; Krasnow, Ruth; Javitz, Harold S; Conneely, Karen N; Lessov-Schlaggar, Christina N; Hops, Hyman; Zhu, Andy Z X; Baurley, James W; McClure, Jennifer B; Hall, Sharon M; Baker, Timothy B; Conti, David V; Benowitz, Neal L; Lerman, Caryn; Tyndale, Rachel F; Swan, Gary E

    2015-01-01

    The Nicotine Metabolite Ratio (NMR, ratio of trans-3'-hydroxycotinine and cotinine), has previously been associated with CYP2A6 activity, response to smoking cessation treatments, and cigarette consumption. We searched for drug metabolizing enzyme and transporter (DMET) gene variation associated with the NMR and prospective abstinence in 2,946 participants of laboratory studies of nicotine metabolism and of clinical trials of smoking cessation therapies. Stage I was a meta-analysis of the association of 507 common single nucleotide polymorphisms (SNPs) at 173 DMET genes with the NMR in 449 participants of two laboratory studies. Nominally significant associations were identified in ten genes after adjustment for intragenic SNPs; CYP2A6 and two CYP2A6 SNPs attained experiment-wide significance adjusted for correlated SNPs (CYP2A6 PACT=4.1E-7, rs4803381 PACT=4.5E-5, rs1137115, PACT=1.2E-3). Stage II was mega-regression analyses of 10 DMET SNPs with pretreatment NMR and prospective abstinence in up to 2,497 participants from eight trials. rs4803381 and rs1137115 SNPs were associated with pretreatment NMR at genome-wide significance. In post-hoc analyses of CYP2A6 SNPs, we observed nominally significant association with: abstinence in one pharmacotherapy arm; cigarette consumption among all trial participants; and lung cancer in four case:control studies. CYP2A6 minor alleles were associated with reduced NMR, CPD, and lung cancer risk. We confirmed the major role that CYP2A6 plays in nicotine metabolism, and made novel findings with respect to genome-wide significance and associations with CPD, abstinence and lung cancer risk. Additional multivariate analyses with patient variables and genetic modeling will improve prediction of nicotine metabolism, disease risk and smoking cessation treatment prognosis.

  16. The contribution of individual and pairwise combinations of SNPs in the APOA1 and APOC3 genes to interindividual HDL-C variability.

    PubMed

    Brown, C M; Rea, T J; Hamon, S C; Hixson, J E; Boerwinkle, E; Clark, A G; Sing, C F

    2006-07-01

    Apolipoproteins (apo) A-I and C-III are components of high-density lipoprotein-cholesterol (HDL-C), a quantitative trait negatively correlated with risk of cardiovascular disease (CVD). We analyzed the contribution of individual and pairwise combinations of single nucleotide polymorphisms (SNPs) in the APOA1/APOC3 genes to HDL-C variability to evaluate (1) consistency of published single-SNP studies with our single-SNP analyses; (2) consistency of single-SNP and two-SNP phenotype-genotype relationships across race-, gender-, and geographical location-dependent contexts; and (3) the contribution of single SNPs and pairs of SNPs to variability beyond that explained by plasma apo A-I concentration. We analyzed 45 SNPs in 3,831 young African-American (N=1,858) and European-American (N=1,973) females and males ascertained by the Coronary Artery Risk Development in Young Adults (CARDIA) study. We found three SNPs that significantly impact HDL-C variability in both the literature and the CARDIA sample. Single-SNP analyses identified only one of five significant HDL-C SNP genotype relationships in the CARDIA study that was consistent across all race-, gender-, and geographical location-dependent contexts. The other four were consistent across geographical locations for a particular race-gender context. The portion of total phenotypic variance explained by single-SNP genotypes and genotypes defined by pairs of SNPs was less than 3%, an amount that is miniscule compared to the contribution explained by variability in plasma apo A-I concentration. Our findings illustrate the impact of context-dependence on SNP selection for prediction of CVD risk factor variability.

  17. Haplotype Detection from Next-Generation Sequencing in High-Ploidy-Level Species: 45S rDNA Gene Copies in the Hexaploid Spartina maritima

    PubMed Central

    Boutte, Julien; Aliaga, Benoît; Lima, Oscar; Ferreira de Carvalho, Julie; Ainouche, Abdelkader; Macas, Jiri; Rousseau-Gueutin, Mathieu; Coriton, Olivier; Ainouche, Malika; Salmon, Armel

    2015-01-01

    Gene and whole-genome duplications are widespread in plant nuclear genomes, resulting in sequence heterogeneity. Identification of duplicated genes may be particularly challenging in highly redundant genomes, especially when there are no diploid parents as a reference. Here, we developed a pipeline to detect the different copies in the ribosomal RNA gene family in the hexaploid grass Spartina maritima from next-generation sequencing (Roche-454) reads. The heterogeneity of the different domains of the highly repeated 45S unit was explored by identifying single nucleotide polymorphisms (SNPs) and assembling reads based on shared polymorphisms. SNPs were validated using comparisons with Illumina sequence data sets and by cloning and Sanger (re)sequencing. Using this approach, 29 validated polymorphisms and 11 validated haplotypes were reported (out of 34 and 20, respectively, that were initially predicted by our program). The rDNA domains of S. maritima have similar lengths as those found in other Poaceae, apart from the 5′-ETS, which is approximately two-times longer in S. maritima. Sequence homogeneity was encountered in coding regions and both internal transcribed spacers (ITS), whereas high intragenomic variability was detected in the intergenic spacer (IGS) and the external transcribed spacer (ETS). Molecular cytogenetic analysis by fluorescent in situ hybridization (FISH) revealed the presence of one pair of 45S rDNA signals on the chromosomes of S. maritima instead of three expected pairs for a hexaploid genome, indicating loss of duplicated homeologous loci through the diploidization process. The procedure developed here may be used at any ploidy level and using different sequencing technologies. PMID:26530424

  18. Outcomes of methotrexate therapy for psoriasis and relationship to genetic polymorphisms.

    PubMed

    Warren, R B; Smith, R L; Campalani, E; Eyre, S; Smith, C H; Barker, J N W N; Worthington, J; Griffiths, C E M

    2009-02-01

    The use of methotrexate is limited by interindividual variability in response. Previous studies in patients with either rheumatoid arthritis or psoriasis suggest that genetic variation across the methotrexate metabolic pathway might enable prediction of both efficacy and toxicity of the drug. To assess if single nucleotide polymorphisms (SNPs) across four genes that are relevant to methotrexate metabolism [folypolyglutamate synthase (FPGS), gamma-glutamyl hydrolase (GGH), methylenetetrahydrofolate reductase (MTHFR) and 5-aminoimidazole-4-carboxamide ribonucleotide transformylase (ATIC)] are related to treatment outcomes in patients with psoriasis. DNA was collected from 374 patients with psoriasis who had been treated with methotrexate. Data were available on individual outcomes to therapy, namely efficacy and toxicity. Haplotype-tagging SNPs (r(2) > 0.8) for the four genes with a minor allele frequency of > 5% were selected from the HAPMAP phase II data. Genotyping was undertaken using the MassARRAY spectrometric method (Sequenom). There were no significant associations detected between clinical outcomes in patients with psoriasis treated with methotrexate and SNPs in the four genes investigated. Genetic variation in four key genes relevant to the intracellular metabolism of methotrexate does not appear to predict response to methotrexate therapy in patients with psoriasis.

  19. A pharmacogenetics study to predict outcome in patients receiving anti-VEGF therapy in age related macular degeneration

    PubMed Central

    Kitchens, John W; Kassem, Nawal; Wood, William; Stone, Thomas W; Isernhagen, Rick; Wood, Edward; Hancock, Brad A; Radovich, Milan; Waymire, Josh; Li, Lang; Schneider, Bryan P

    2013-01-01

    Purpose To ascertain whether single nucleotide polymorphisms (SNPs) in the Vascular Endothelial Growth factor (VEGFA), Complement Factor H (CFH), and LOC387715 genes could predict outcome to anti-VEGF therapy for patients with age related macular degeneration (AMD). Methods Patients with “wet” AMD were identified by chart review. Baseline optical coherence tomography (OCT) and visual acuity (VA) data, and at least 6 months of clinical follow up after 3 initial monthly injections of bevacizumab or ranibizumab were required for inclusion. Based on OCT and VA, patients were categorized into two possible clinical outcomes: (a) responders and (b) non-responders. DNA was extracted from saliva and genotyped for candidate SNPs in the VEGFA, LOC387715, and CFH genes. Clinical outcomes were statistically compared to patient genotypes. Results 101 patients were recruited, and one eye from each patient was included in the analysis. 97% of samples were successfully genotyped for all SNPs. We found a statistically significant association between the LOC387715 A69S TT genotype and outcome based on OCT. Conclusion Genetic variation may be associated with outcome in patients receiving anti-VEGF therapy. PMID:24143065

  20. Development and Evaluation of a 9K SNP Array for Peach by Internationally Coordinated SNP Detection and Validation in Breeding Germplasm

    PubMed Central

    Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron

    2012-01-01

    Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421

  1. Predicting stroke through genetic risk functions: The CHARGE risk score project

    PubMed Central

    Ibrahim-Verbaas, Carla A; Fornage, Myriam; Bis, Joshua C; Choi, Seung Hoan; Psaty, Bruce M; Meigs, James B; Rao, Madhu; Nalls, Mike; Fontes, Joao D; O’Donnell, Christopher J.; Kathiresan, Sekar; Ehret, Georg B.; Fox, Caroline S; Malik, Rainer; Dichgans, Martin; Schmidt, Helena; Lahti, Jari; Heckbert, Susan R; Lumley, Thomas; Rice, Kenneth; Rotter, Jerome I; Taylor, Kent D; Folsom, Aaron R; Boerwinkle, Eric; Rosamond, Wayne D; Shahar, Eyal; Gottesman, Rebecca F.; Koudstaal, Peter J; Amin, Najaf; Wieberdink, Renske G.; Dehghan, Abbas; Hofman, Albert; Uitterlinden, André G; DeStefano, Anita L.; Debette, Stephanie; Xue, Luting; Beiser, Alexa; Wolf, Philip A.; DeCarli, Charles; Ikram, M. Arfan; Seshadri, Sudha; Mosley, Thomas H; Longstreth, WT; van Duijn, Cornelia M; Launer, Lenore J

    2014-01-01

    Background and Purpose Beyond the Framingham Stroke Risk Score (FSRS), prediction of future stroke may improve with a genetic risk score (GRS) based on Single nucleotide polymorphisms (SNPs) associated with stroke and its risk factors. Methods The study includes four population-based cohorts with 2,047 first incident strokes from 22,720 initially stroke-free European origin participants aged 55 years and older, who were followed for up to 20 years. GRS were constructed with 324 SNPs implicated in stroke and 9 risk factors. The association of the GRS to first incident stroke was tested using Cox regression; the GRS predictive properties were assessed with Area under the curve (AUC) statistics comparing the GRS to age sex, and FSRS models, and with reclassification statistics. These analyses were performed per cohort and in a meta-analysis of pooled data. Replication was sought in a case-control study of ischemic stroke (IS). Results In the meta-analysis, adding the GRS to the FSRS, age and sex model resulted in a significant improvement in discrimination (All stroke: Δjoint AUC =0.016, p-value=2.3*10-6; IS: Δ joint AUC =0.021, p-value=3.7*10−7), although the overall AUC remained low. In all studies there was a highly significantly improved net reclassification index (p-values <10−4). Conclusions The SNPs associated with stroke and its risk factors result only in a small improvement in prediction of future stroke compared to the classical epidemiological risk factors for stroke. PMID:24436238

  2. Genome wide analysis of flowering time trait in multiple environments via high-throughput genotyping technique in Brassica napus L.

    PubMed

    Li, Lun; Long, Yan; Zhang, Libin; Dalton-Morgan, Jessica; Batley, Jacqueline; Yu, Longjiang; Meng, Jinling; Li, Maoteng

    2015-01-01

    The prediction of the flowering time (FT) trait in Brassica napus based on genome-wide markers and the detection of underlying genetic factors is important not only for oilseed producers around the world but also for the other crop industry in the rotation system in China. In previous studies the low density and mixture of biomarkers used obstructed genomic selection in B. napus and comprehensive mapping of FT related loci. In this study, a high-density genome-wide SNP set was genotyped from a double-haploid population of B. napus. We first performed genomic prediction of FT traits in B. napus using SNPs across the genome under ten environments of three geographic regions via eight existing genomic predictive models. The results showed that all the models achieved comparably high accuracies, verifying the feasibility of genomic prediction in B. napus. Next, we performed a large-scale mapping of FT related loci among three regions, and found 437 associated SNPs, some of which represented known FT genes, such as AP1 and PHYE. The genes tagged by the associated SNPs were enriched in biological processes involved in the formation of flowers. Epistasis analysis showed that significant interactions were found between detected loci, even among some known FT related genes. All the results showed that our large scale and high-density genotype data are of great practical and scientific values for B. napus. To our best knowledge, this is the first evaluation of genomic selection models in B. napus based on a high-density SNP dataset and large-scale mapping of FT loci.

  3. Computational insights of K1444N substitution in GAP-related domain of NF1 gene associated with neurofibromatosis type 1 disease: a molecular modeling and dynamics approach.

    PubMed

    Agrahari, Ashish Kumar; Muskan, Meghana; George Priya Doss, C; Siva, R; Zayed, Hatem

    2018-05-27

    The NF1 gene encodes for neurofibromin protein, which is ubiquitously expressed, but most highly in the central nervous system. Non-synonymous SNPs (nsSNPs) in the NF1 gene were found to be associated with Neurofibromatosis Type 1 disease, which is characterized by the growth of tumors along nerves in the skin, brain, and other parts of the body. In this study, we used several in silico predictions tools to analyze 16 nsSNPs in the RAS-GAP domain of neurofibromin, the K1444N (K1423N) mutation was predicted as the most pathogenic. The comparative molecular dynamic simulation (MDS; 50 ns) between the wild type and the K1444N (K1423N) mutant suggested a significant change in the electrostatic potential. In addition, the RMSD, RMSF, Rg, hydrogen bonds, and PCA analysis confirmed the loss of flexibility and increase in compactness of the mutant protein. Further, SASA analysis revealed exchange between hydrophobic and hydrophilic residues from the core of the RAS-GAP domain to the surface of the mutant domain, consistent with the secondary structure analysis that showed significant alteration in the mutant protein conformation. Our data concludes that the K1444N (K1423N) mutant lead to increasing the rigidity and compactness of the protein. This study provides evidence of the benefits of the computational tools in predicting the pathogenicity of genetic mutations and suggests the application of MDS and different in silico prediction tools for variant assessment and classification in genetic clinics.

  4. The Effect on Melanoma Risk of Genes Previously Associated With Telomere Length

    PubMed Central

    Bishop, D. Timothy; Taylor, John C.; Hayward, Nicholas K.; Brossard, Myriam; Cust, Anne E.; Dunning, Alison M.; Lee, Jeffrey E.; Moses, Eric K.; Akslen, Lars A.; Andresen, Per A.; Avril, Marie-Françoise; Azizi, Esther; Scarrà, Giovanna Bianchi; Brown, Kevin M.; Dębniak, Tadeusz; Elder, David E.; Friedman, Eitan; Ghiorzo, Paola; Gillanders, Elizabeth M.; Goldstein, Alisa M.; Gruis, Nelleke A.; Hansson, Johan; Harland, Mark; Helsing, Per; Hočevar, Marko; Höiom, Veronica; Ingvar, Christian; Kanetsky, Peter A.; Landi, Maria Teresa; Lang, Julie; Lathrop, G. Mark; Lubiński, Jan; Mackie, Rona M.; Martin, Nicholas G.; Molven, Anders; Montgomery, Grant W.; Novaković, Srdjan; Olsson, Håkan; Puig, Susana; Puig-Butille, Joan Anton; Radford-Smith, Graham L.; Randerson-Moor, Juliette; van der Stoep, Nienke; van Doorn, Remco; Whiteman, David C.; MacGregor, Stuart; Pooley, Karen A.; Ward, Sarah V.; Mann, Graham J.; Amos, Christopher I.; Pharoah, Paul D. P.; Demenais, Florence; Law, Matthew H.; Newton Bishop, Julia A.; Barrett, Jennifer H.

    2014-01-01

    Telomere length has been associated with risk of many cancers, but results are inconsistent. Seven single nucleotide polymorphisms (SNPs) previously associated with mean leukocyte telomere length were either genotyped or well-imputed in 11108 case patients and 13933 control patients from Europe, Israel, the United States and Australia, four of the seven SNPs reached a P value under .05 (two-sided). A genetic score that predicts telomere length, derived from these seven SNPs, is strongly associated (P = 8.92x10-9, two-sided) with melanoma risk. This demonstrates that the previously observed association between longer telomere length and increased melanoma risk is not attributable to confounding via shared environmental effects (such as ultraviolet exposure) or reverse causality. We provide the first proof that multiple germline genetic determinants of telomere length influence cancer risk. PMID:25231748

  5. Optical droplet vaporization of nanoparticle-loaded stimuli-responsive microbubbles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Si, Ting; Department of Biomedical Engineering, The Ohio State University, Columbus, Ohio 43210; Li, Guangbin

    2016-03-14

    A capillary co-flow focusing process is developed to generate stimuli-responsive microbubbles (SRMs) that comprise perfluorocarbon (PFC) suspension of silver nanoparticles (SNPs) in a lipid shell. Upon continuous laser irradiation at around their surface plasmon resonance band, the SNPs effectively absorb electromagnetic energy, induce heat accumulation in SRMs, trigger PFC vaporization, and eventually lead to thermal expansion and fragmentation of the SRMs. This optical droplet vaporization (ODV) process is further simulated by a theoretical model that combines heat generation of SNPs, phase change of PFC, and thermal expansion of SRMs. The model is validated by benchtop experiments, where the ODV processmore » is monitored by microscopic imaging. The effects of primary process parameters on behaviors of ODV are predicted by the theoretical model, indicating the technical feasibility for process control and optimization in future drug delivery applications.« less

  6. Polymorphisms in Host Immunity-Modulating Genes and Risk of Invasive Aspergillosis: Results from the AspBIOmics Consortium

    PubMed Central

    Lupiañez, C. B.; Canet, L. M.; Carvalho, A.; Alcazar-Fuoli, L.; Springer, J.; Lackner, M.; Segura-Catena, J.; Comino, A.; Olmedo, C.; Ríos, R.; Fernández-Montoya, A.; Cuenca-Estrella, M.; Solano, C.; López-Nevot, M. Á.; Cunha, C.; Oliveira-Coelho, A.; Villaescusa, T.; Fianchi, L.; Aguado, J. M.; Pagano, L.; López-Fernández, E.; Potenza, L.; Luppi, M.; Lass-Flörl, C.; Loeffler, J.; Einsele, H.; Vazquez, L.; Jurado, M.

    2015-01-01

    Recent studies suggest that immune-modulating single-nucleotide polymorphisms (SNPs) influence the risk of developing cancer-related infections. Here, we evaluated whether 36 SNPs within 14 immune-related genes are associated with the risk of invasive aspergillosis (IA) and whether genotyping of these variants might improve disease risk prediction. We conducted a case-control association study of 781 immunocompromised patients, 149 of whom were diagnosed with IA. Association analysis showed that the IL4Rrs2107356 and IL8rs2227307 SNPs (using dbSNP numbering) were associated with an increased risk of IA (IL4Rrs2107356 odds ratio [OR], 1.92; 95% confidence interval [CI], 1.20 to 3.09; IL8rs2227307 OR, 1.73; 95% CI, 1.06 to 2.81), whereas the IL12Brs3212227 and IFNγrs2069705 variants were significantly associated with a decreased risk of developing the infection (IL12Brs3212227 OR, 0.60; 95% CI, 0.38 to 0.96; IFNγrs2069705 OR, 0.63; 95% CI, 0.41 to 0.97). An allogeneic hematopoietic stem cell transplantation (allo-HSCT)-stratified analysis revealed that the effect observed for the IL4Rrs2107356 and IFNγrs2069705 SNPs was stronger in allo-HSCT (IL4Rrs2107356 OR, 5.63; 95% CI, 1.20 to 3.09; IFNγrs2069705 OR, 0.24; 95% CI, 0.10 to 0.59) than in non-HSCT patients, suggesting that the presence of these SNPs renders patients more vulnerable to infection, especially under severe and prolonged immunosuppressive conditions. Importantly, in vitro studies revealed that carriers of the IFNγrs2069705C allele showed a significantly increased macrophage-mediated neutralization of fungal conidia (P = 0.0003) and, under stimulation conditions, produced higher levels of gamma interferon (IFNγ) mRNA (P = 0.049) and IFNγ and tumor necrosis factor alpha (TNF-α) cytokines (P value for 96 h of treatment with lipopolysaccharide [PLPS-96 h], 0.057; P value for 96 h of treatment with phytohemagglutinin [PPHA-96 h], 0.036; PLPS+PHA-96 h = 0.030; PPHA-72 h = 0.045; PLPS+PHA-72 h = 0.018; PLPS-96 h = 0.058; PLPS+PHA-96 h = 0.0058). Finally, we also observed that the addition of SNPs significantly associated with IA to a model including clinical variables led to a substantial improvement in the discriminatory ability to predict disease (area under the concentration-time curve [AUC] of 0.659 versus AUC of 0.564; P−2 log likehood ratio test = 5.2 · 10−4 and P50.000 permutation test = 9.34 · 10−5). These findings suggest that the IFNγrs2069705 SNP influences the risk of IA and that predictive models built with IFNγ, IL8, IL12p70, and VEGFA variants can used to predict disease risk and to implement risk-adapted prophylaxis or diagnostic strategies. PMID:26667837

  7. Genetic regulation of gene expression in the lung identifies CST3 and CD22 as potential causal genes for airflow obstruction.

    PubMed

    Lamontagne, Maxime; Timens, Wim; Hao, Ke; Bossé, Yohan; Laviolette, Michel; Steiling, Katrina; Campbell, Joshua D; Couture, Christian; Conti, Massimo; Sherwood, Karen; Hogg, James C; Brandsma, Corry-Anke; van den Berge, Maarten; Sandford, Andrew; Lam, Stephen; Lenburg, Marc E; Spira, Avrum; Paré, Peter D; Nickle, David; Sin, Don D; Postma, Dirkje S

    2014-11-01

    COPD is a complex chronic disease with poorly understood pathogenesis. Integrative genomic approaches have the potential to elucidate the biological networks underlying COPD and lung function. We recently combined genome-wide genotyping and gene expression in 1111 human lung specimens to map expression quantitative trait loci (eQTL). To determine causal associations between COPD and lung function-associated single nucleotide polymorphisms (SNPs) and lung tissue gene expression changes in our lung eQTL dataset. We evaluated causality between SNPs and gene expression for three COPD phenotypes: FEV(1)% predicted, FEV(1)/FVC and COPD as a categorical variable. Different models were assessed in the three cohorts independently and in a meta-analysis. SNPs associated with a COPD phenotype and gene expression were subjected to causal pathway modelling and manual curation. In silico analyses evaluated functional enrichment of biological pathways among newly identified causal genes. Biologically relevant causal genes were validated in two separate gene expression datasets of lung tissues and bronchial airway brushings. High reliability causal relations were found in SNP-mRNA-phenotype triplets for FEV(1)% predicted (n=169) and FEV(1)/FVC (n=80). Several genes of potential biological relevance for COPD were revealed. eQTL-SNPs upregulating cystatin C (CST3) and CD22 were associated with worse lung function. Signalling pathways enriched with causal genes included xenobiotic metabolism, apoptosis, protease-antiprotease and oxidant-antioxidant balance. By using integrative genomics and analysing the relationships of COPD phenotypes with SNPs and gene expression in lung tissue, we identified CST3 and CD22 as potential causal genes for airflow obstruction. This study also augmented the understanding of previously described COPD pathways. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  8. Evolution of the Bovine TLR Gene Family and Member Associations with Mycobacterium avium Subspecies paratuberculosis Infection

    PubMed Central

    Fisher, Colleen A.; Bhattarai, Eric K.; Osterstock, Jason B.; Dowd, Scot E.; Seabury, Paul M.; Vikram, Meenu; Whitlock, Robert H.; Schukken, Ynte H.; Schnabel, Robert D.; Taylor, Jeremy F.; Womack, James E.; Seabury, Christopher M.

    2011-01-01

    Members of the Toll-like receptor (TLR) gene family occupy key roles in the mammalian innate immune system by functioning as sentries for the detection of invading pathogens, thereafter provoking host innate immune responses. We utilized a custom next-generation sequencing approach and allele-specific genotyping assays to detect and validate 280 biallelic variants across all 10 bovine TLR genes, including 71 nonsynonymous single nucleotide polymorphisms (SNPs) and one putative nonsense SNP. Bayesian haplotype reconstructions and median joining networks revealed haplotype sharing between Bos taurus taurus and Bos taurus indicus breeds at every locus, and specialized beef and dairy breeds could not be differentiated despite an average polymorphism density of 1 marker/158 bp. Collectively, 160 tagSNPs and two tag insertion-deletion mutations (indels) were sufficient to predict 100% of the variation at 280 variable sites for both Bos subspecies and their hybrids, whereas 118 tagSNPs and 1 tagIndel predictively captured 100% of the variation at 235 variable sites for B. t. taurus. Polyphen and SIFT analyses of amino acid (AA) replacements encoded by bovine TLR SNPs indicated that up to 32% of the AA substitutions were expected to impact protein function. Classical and newly developed tests of diversity provide strong support for balancing selection operating on TLR3 and TLR8, and purifying selection acting on TLR10. An investigation of the persistence and continuity of linkage disequilibrium (r2≥0.50) between adjacent variable sites also supported the presence of selection acting on TLR3 and TLR8. A case-control study employing validated variants from bovine TLR genes recognizing bacterial ligands revealed six SNPs potentially eliciting small effects on susceptibility to Mycobacterium avium spp paratuberculosis infection in dairy cattle. The results of this study will broadly impact domestic cattle research by providing the necessary foundation to explore several avenues of bovine translational genomics, and the potential for marker-assisted vaccination. PMID:22164200

  9. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica).

    PubMed

    Jia, Guanqing; Huang, Xuehui; Zhi, Hui; Zhao, Yan; Zhao, Qiang; Li, Wenjun; Chai, Yang; Yang, Lifang; Liu, Kunyan; Lu, Hengyun; Zhu, Chuanrang; Lu, Yiqi; Zhou, Congcong; Fan, Danlin; Weng, Qijun; Guo, Yunli; Huang, Tao; Zhang, Lei; Lu, Tingting; Feng, Qi; Hao, Hangfei; Liu, Hongkuan; Lu, Ping; Zhang, Ning; Li, Yuhui; Guo, Erhu; Wang, Shujun; Wang, Suying; Liu, Jinrong; Zhang, Wenfei; Chen, Guoqiu; Zhang, Baojin; Li, Wei; Wang, Yongfang; Li, Haiquan; Zhao, Baohua; Li, Jiayang; Diao, Xianmin; Han, Bin

    2013-08-01

    Foxtail millet (Setaria italica) is an important grain crop that is grown in arid regions. Here we sequenced 916 diverse foxtail millet varieties, identified 2.58 million SNPs and used 0.8 million common SNPs to construct a haplotype map of the foxtail millet genome. We classified the foxtail millet varieties into two divergent groups that are strongly correlated with early and late flowering times. We phenotyped the 916 varieties under five different environments and identified 512 loci associated with 47 agronomic traits by genome-wide association studies. We performed a de novo assembly of deeply sequenced genomes of a Setaria viridis accession (the wild progenitor of S. italica) and an S. italica variety and identified complex interspecies and intraspecies variants. We also identified 36 selective sweeps that seem to have occurred during modern breeding. This study provides fundamental resources for genetics research and genetic improvement in foxtail millet.

  10. Potential role of polymorphisms in the transporter genes ENT1 and MATE1/OCT2 in predicting TAS-102 efficacy and toxicity in patients with refractory metastatic colorectal cancer.

    PubMed

    Suenaga, Mitsukuni; Schirripa, Marta; Cao, Shu; Zhang, Wu; Yang, Dongyun; Dadduzio, Vincenzo; Salvatore, Lisa; Borelli, Beatrice; Pietrantonio, Filippo; Ning, Yan; Okazaki, Satoshi; Berger, Martin D; Miyamoto, Yuji; Gopez, Roel; Barzi, Afsaneh; Yamaguchi, Toshiharu; Loupakis, Fotios; Lenz, Heinz-Josef

    2017-11-01

    Trifluridine (FTD) is an active cytotoxic component of the metastatic colorectal cancer (mCRC) drug TAS-102, and thymidine phosphorylase inhibitor (TPI) inhibits the rapid degradation of FTD. We tested whether single nucleotide polymorphisms (SNPs) in genes involved in FTD metabolism and TPI excretion could predict outcome in patients with mCRC treated with TAS-102. We investigated three different cohorts: a training cohort (n = 52) and a testing cohort (n = 129) both receiving TAS-102 and a control cohort (n = 52) receiving regorafenib. SNPs of TK1, ENT1, CNT1, MATE1, MATE2 and OCT2 were analysed by polymerase chain reaction-based direct DNA sequencing. In the training cohort, patients with any ENT1 rs760370 G allele had a significantly longer progression-free survival (PFS; 3.5 versus 2.1 months, respectively, hazard ratio [HR] 0.44, P = 0.004) and overall survival (OS; 8.7 versus 5.3 months, respectively, HR 0.27, P = 0.003) than the A/A genotype. These findings were validated in the testing cohort (P = 0.021 and 0.009 for PFS and OS, respectively). In addition, the combination of ENT1 rs760370, MATE1 rs2289669 and OCT2 rs316019 SNPs significantly stratified patients with the risk of PFS and OS in both cohorts (P < 0.001 for PFS and OS in the training cohort; P = 0.053 and 0.025 for PFS and OS, respectively, in the testing cohort). No significant differences were observed in the control group. The combination of ENT1, MATE1 and OCT2 SNPs may serve as a predictive and prognostic marker in mCRC patients treated with TAS-102. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Covariance Between Genotypic Effects and its Use for Genomic Inference in Half-Sib Families

    PubMed Central

    Wittenburg, Dörte; Teuscher, Friedrich; Klosa, Jan; Reinsch, Norbert

    2016-01-01

    In livestock, current statistical approaches utilize extensive molecular data, e.g., single nucleotide polymorphisms (SNPs), to improve the genetic evaluation of individuals. The number of model parameters increases with the number of SNPs, so the multicollinearity between covariates can affect the results obtained using whole genome regression methods. In this study, dependencies between SNPs due to linkage and linkage disequilibrium among the chromosome segments were explicitly considered in methods used to estimate the effects of SNPs. The population structure affects the extent of such dependencies, so the covariance among SNP genotypes was derived for half-sib families, which are typical in livestock populations. Conditional on the SNP haplotypes of the common parent (sire), the theoretical covariance was determined using the haplotype frequencies of the population from which the individual parent (dam) was derived. The resulting covariance matrix was included in a statistical model for a trait of interest, and this covariance matrix was then used to specify prior assumptions for SNP effects in a Bayesian framework. The approach was applied to one family in simulated scenarios (few and many quantitative trait loci) and using semireal data obtained from dairy cattle to identify genome segments that affect performance traits, as well as to investigate the impact on predictive ability. Compared with a method that does not explicitly consider any of the relationship among predictor variables, the accuracy of genetic value prediction was improved by 10–22%. The results show that the inclusion of dependence is particularly important for genomic inference based on small sample sizes. PMID:27402363

  12. Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers

    PubMed Central

    Paschou, Peristera

    2010-01-01

    Recent large-scale studies of European populations have demonstrated the existence of population genetic structure within Europe and the potential to accurately infer individual ancestry when information from hundreds of thousands of genetic markers is used. In fact, when genomewide genetic variation of European populations is projected down to a two-dimensional Principal Components Analysis plot, a surprising correlation with actual geographic coordinates of self-reported ancestry has been reported. This substructure can hamper the search of susceptibility genes for common complex disorders leading to spurious correlations. The identification of genetic markers that can correct for population stratification becomes therefore of paramount importance. Analyzing 1,200 individuals from 11 populations genotyped for more than 500,000 SNPs (Population Reference Sample), we present a systematic exploration of the extent to which geographic coordinates of origin within Europe can be predicted, with small panels of SNPs. Markers are selected to correlate with the top principal components of the dataset, as we have previously demonstrated. Performing thorough cross-validation experiments we show that it is indeed possible to predict individual ancestry within Europe down to a few hundred kilometers from actual individual origin, using information from carefully selected panels of 500 or 1,000 SNPs. Furthermore, we show that these panels can be used to correctly assign the HapMap Phase 3 European populations to their geographic origin. The SNPs that we propose can prove extremely useful in a variety of different settings, such as stratification correction or genetic ancestry testing, and the study of the history of European populations. PMID:20805874

  13. Oceanographic variation influences spatial genomic structure in the sea scallop, Placopecten magellanicus.

    PubMed

    Van Wyngaarden, Mallory; Snelgrove, Paul V R; DiBacco, Claudio; Hamilton, Lorraine C; Rodríguez-Ezpeleta, Naiara; Zhan, Luyao; Beiko, Robert G; Bradbury, Ian R

    2018-03-01

    Environmental factors can influence diversity and population structure in marine species and accurate understanding of this influence can both improve fisheries management and help predict responses to environmental change. We used 7163 SNPs derived from restriction site-associated DNA sequencing genotyped in 245 individuals of the economically important sea scallop, Placopecten magellanicus , to evaluate the correlations between oceanographic variation and a previously identified latitudinal genomic cline. Sea scallops span a broad latitudinal area (>10 degrees), and we hypothesized that climatic variation significantly drives clinal trends in allele frequency. Using a large environmental dataset, including temperature, salinity, chlorophyll a, and nutrient concentrations, we identified a suite of SNPs (285-621, depending on analysis and environmental dataset) potentially under selection through correlations with environmental variation. Principal components analysis of different outlier SNPs and environmental datasets revealed similar northern and southern clusters, with significant associations between the first axes of each ( R 2 adj  = .66-.79). Multivariate redundancy analysis of outlier SNPs and the environmental principal components indicated that environmental factors explained more than 32% of the variance. Similarly, multiple linear regressions and random-forest analysis identified winter average and minimum ocean temperatures as significant parameters in the link between genetic and environmental variation. This work indicates that oceanographic variation is associated with the observed genomic cline in this species and that seasonal periods of extreme cold may restrict gene flow along a latitudinal gradient in this marine benthic bivalve. Incorporating this finding into management may improve accuracy of management strategies and future predictions.

  14. Common Genetic Variation in CYP17A1 and Response to Abiraterone Acetate in Patients with Metastatic Castration-Resistant Prostate Cancer

    PubMed Central

    Binder, Moritz; Zhang, Ben Y.; Hillman, David W.; Kohli, Rhea; Kohli, Tanvi; Lee, Adam; Kohli, Manish

    2016-01-01

    Treatment with abiraterone acetate and prednisone (AA/P) prolongs survival in metastatic castration-resistant prostate cancer (mCRPC) patients. We evaluated the genetic variation in CYP17A1 as predictive of response to AA/P. A prospective collection of germline DNA prior to AA/P initiation and follow-up of a mCRPC cohort was performed. Five common single-nucleotide polymorphisms (SNPs) in CYP17A1 identified using a haplotype-based tagging algorithm were genotyped. Clinical outcomes included biochemical response and time to biochemical progression on AA/P. Logistic regression was used to assess the association between tag SNPs and biochemical response. Proportional hazards regression was used to assess the association between tag SNPs and time to biochemical progression. Odds or hazard ratio per minor allele were estimated and p-values below 0.05 were considered statistically significant. Germline DNA was successfully genotyped for four tag SNPs in 87 patients. The median age was 73 years (54–90); the median prostate-specific antigen was 66 ng/dL (0.1–99.9). A single SNP, rs2486758, was associated with lower odds of experiencing a biochemical response (Odds ratio 0.22, 95% confidence interval 0.07–0.63, p = 0.005) and a shorter time to biochemical progression (Hazard ratio 2.23, 95% confidence interval 1.39–3.56, p < 0.001). This tag SNP located in the promoter region of CYP17A1 will need further validation as a predictive biomarker for AA/P therapy. PMID:27409606

  15. Regionally clustered ABCC8 polymorphisms in a prospective cohort predict cerebral oedema and outcome in severe traumatic brain injury.

    PubMed

    Jha, Ruchira Menka; Koleck, Theresa A; Puccio, Ava M; Okonkwo, David O; Park, Seo-Young; Zusman, Benjamin E; Clark, Robert S B; Shutter, Lori A; Wallisch, Jessica S; Empey, Philip E; Kochanek, Patrick M; Conley, Yvette P

    2018-04-19

    ABCC8 encodes sulfonylurea receptor 1, a key regulatory protein of cerebral oedema in many neurological disorders including traumatic brain injury (TBI). Sulfonylurea-receptor-1 inhibition has been promising in ameliorating cerebral oedema in clinical trials. We evaluated whether ABCC8 tag single-nucleotide polymorphisms predicted oedema and outcome in TBI. DNA was extracted from 485 prospectively enrolled patients with severe TBI. 410 were analysed after quality control. ABCC8 tag single-nucleotide polymorphisms (SNPs) were identified (Hapmap, r 2 >0.8, minor-allele frequency >0.20) and sequenced (iPlex-Gold, MassArray). Outcomes included radiographic oedema, intracranial pressure (ICP) and 3-month Glasgow Outcome Scale (GOS) score. Proxy SNPs, spatial modelling, amino acid topology and functional predictions were determined using established software programs. Wild-type rs7105832 and rs2237982 alleles and genotypes were associated with lower average ICP (β=-2.91, p=0.001; β=-2.28, p=0.003) and decreased radiographic oedema (OR 0.42, p=0.012; OR 0.52, p=0.017). Wild-type rs2237982 also increased favourable 3-month GOS (OR 2.45, p=0.006); this was partially mediated by oedema (p=0.03). Different polymorphisms predicted 3-month outcome: variant rs11024286 increased (OR 1.84, p=0.006) and wild-type rs4148622 decreased (OR 0.40, p=0.01) the odds of favourable outcome. Significant tag and concordant proxy SNPs regionally span introns/exons 2-15 of the 39-exon gene. This study identifies four ABCC8 tag SNPs associated with cerebral oedema and/or outcome in TBI, tagging a region including 33 polymorphisms. In polymorphisms predictive of oedema, variant alleles/genotypes confer increased risk. Different variant polymorphisms were associated with favourable outcome, potentially suggesting distinct mechanisms. Significant polymorphisms spatially clustered flanking exons encoding the sulfonylurea receptor site and transmembrane domain 0/loop 0 (juxtaposing the channel pore/binding site). This, if validated, may help build a foundation for developing future strategies that may guide individualised care, treatment response, prognosis and patient selection for clinical trials. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. SNP Discovery for mapping alien introgressions in wheat

    PubMed Central

    2014-01-01

    Background Monitoring alien introgressions in crop plants is difficult due to the lack of genetic and molecular mapping information on the wild crop relatives. The tertiary gene pool of wheat is a very important source of genetic variability for wheat improvement against biotic and abiotic stresses. By exploring the 5Mg short arm (5MgS) of Aegilops geniculata, we can apply chromosome genomics for the discovery of SNP markers and their use for monitoring alien introgressions in wheat (Triticum aestivum L). Results The short arm of chromosome 5Mg of Ae. geniculata Roth (syn. Ae. ovata L.; 2n = 4x = 28, UgUgMgMg) was flow-sorted from a wheat line in which it is maintained as a telocentric chromosome. DNA of the sorted arm was amplified and sequenced using an Illumina Hiseq 2000 with ~45x coverage. The sequence data was used for SNP discovery against wheat homoeologous group-5 assemblies. A total of 2,178 unique, 5MgS-specific SNPs were discovered. Randomly selected samples of 59 5MgS-specific SNPs were tested (44 by KASPar assay and 15 by Sanger sequencing) and 84% were validated. Of the selected SNPs, 97% mapped to a chromosome 5Mg addition to wheat (the source of t5MgS), and 94% to 5Mg introgressed from a different accession of Ae. geniculata substituting for chromosome 5D of wheat. The validated SNPs also identified chromosome segments of 5MgS origin in a set of T5D-5Mg translocation lines; eight SNPs (25%) mapped to TA5601 [T5DL · 5DS-5MgS(0.75)] and three (8%) to TA5602 [T5DL · 5DS-5MgS (0.95)]. SNPs (gsnp_5ms83 and gsnp_5ms94), tagging chromosome T5DL · 5DS-5MgS(0.95) with the smallest introgression carrying resistance to leaf rust (Lr57) and stripe rust (Yr40), were validated in two released germplasm lines with Lr57 and Yr40 genes. Conclusion This approach should be widely applicable for the identification of species/genome-specific SNPs. The development of a large number of SNP markers will facilitate the precise introgression and monitoring of alien segments in crop breeding programs and further enable mapping and cloning novel genes from the wild relatives of crop plants. PMID:24716476

  17. Observational study to calculate addictive risk to opioids: a validation study of a predictive algorithm to evaluate opioid use disorder.

    PubMed

    Brenton, Ashley; Richeimer, Steven; Sharma, Maneesh; Lee, Chee; Kantorovich, Svetlana; Blanchard, John; Meshkin, Brian

    2017-01-01

    Opioid abuse in chronic pain patients is a major public health issue, with rapidly increasing addiction rates and deaths from unintentional overdose more than quadrupling since 1999. This study seeks to determine the predictability of aberrant behavior to opioids using a comprehensive scoring algorithm incorporating phenotypic risk factors and neuroscience-associated single-nucleotide polymorphisms (SNPs). The Proove Opioid Risk (POR) algorithm determines the predictability of aberrant behavior to opioids using a comprehensive scoring algorithm incorporating phenotypic risk factors and neuroscience-associated SNPs. In a validation study with 258 subjects with diagnosed opioid use disorder (OUD) and 650 controls who reported using opioids, the POR successfully categorized patients at high and moderate risks of opioid misuse or abuse with 95.7% sensitivity. Regardless of changes in the prevalence of opioid misuse or abuse, the sensitivity of POR remained >95%. The POR correctly stratifies patients into low-, moderate-, and high-risk categories to appropriately identify patients at need for additional guidance, monitoring, or treatment changes.

  18. SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS.

    PubMed

    Merelli, Ivan; Calabria, Andrea; Cozzi, Paolo; Viti, Federica; Mosca, Ettore; Milanesi, Luciano

    2013-01-01

    The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies.

  19. Functional polymorphisms of circadian negative feedback regulation genes are associated with clinical outcome in hepatocellular carcinoma patients receiving radical resection.

    PubMed

    Zhang, Zhaohui; Ma, Fei; Zhou, Feng; Chen, Yibing; Wang, Xiaoyan; Zhang, Hongxin; Zhu, Yong; Bi, Jianwei; Zhang, Yiguan

    2014-12-01

    Previous studies have demonstrated that circadian negative feedback loop genes play an important role in the development and progression of many cancers. However, the associations between single-nucleotide polymorphisms (SNPs) in these genes and the clinical outcomes of hepatocellular carcinoma (HCC) after surgical resection have not been studied so far. Thirteen functional SNPs in circadian genes were genotyped using the Sequenom iPLEX genotyping system in a cohort of 489 Chinese HCC patients who received radical resection. Multivariate Cox proportional hazards model and Kaplan-Meier curve were used for the prognosis analysis. Cumulative effect analysis and survival tree analysis were used for the multiple SNPs analysis. Four individual SNPs, including rs3027178 in PER1, rs228669 and rs2640908 in PER3 and rs3809236 in CRY1, were significantly associated with overall survival (OS) of HCC patients, and three SNPs, including rs3027178 in PER1, rs228729 in PER3 and rs3809236 in CRY1, were significantly associated with recurrence-free survival (RFS). Moreover, we observed a cumulative effect of significant SNPs on OS and RFS (P for trend < 0.001 for both). Survival tree analysis indicated that wild genotype of rs228729 in PER3 was the primary risk factor contributing to HCC patients' RFS. Our study suggests that the polymorphisms in circadian negative feedback loop genes may serve as independent prognostic biomarkers in predicting clinical outcomes for HCC patients who received radical resection. Further studies with different ethnicities are needed to validate our findings and generalize its clinical utility.

  20. Individual and cumulative effect of prostate cancer risk-associated variants on clinicopathologic variables in 5,895 prostate cancer patients.

    PubMed

    Kader, A Karim; Sun, Jielin; Isaacs, Sarah D; Wiley, Kathleen E; Yan, Guifang; Kim, Seong-Tae; Fedor, Helen; DeMarzo, Angelo M; Epstein, Jonathan I; Walsh, Patrick C; Partin, Alan W; Trock, Bruce; Zheng, S Lilly; Xu, Jianfeng; Isaacs, William

    2009-08-01

    More than a dozen single nucleotide polymorphisms (SNPs) have been associated with prostate cancer (PCa) risk from genome-wide association studies (GWAS). Their association with PCa aggressiveness and clinicopathologic variables is inconclusive. Twenty PCa risk SNPs implicated in GWAS and fine mapping studies were evaluated in 5,895 PCa cases treated by radical prostatectomy at Johns Hopkins Hospital, where each tumor was uniformly graded and staged using the same protocol. For 18 of the 20 SNPs examined, no statistically significant differences (P > 0.05) were observed in risk allele frequencies between patients with more aggressive (Gleason scores > or =4 + 3, or stage > or =T3b, or N+) or less aggressive disease (Gleason scores < or =3 + 4, and stage < or =T2, and N0). For the two SNPs that had significant differences between more and less aggressive disease rs2735839 in KLK3 (P = 8.4 x 10(-7)) and rs10993994 in MSMB (P = 0.046), the alleles that are associated with increased risk for PCa were more frequent in patients with less aggressive disease. Since these SNPs are known to be associated with PSA levels in men without PCa diagnoses, these latter associations may reflect the enrichment of low grade, low stage cases diagnosed by contemporary disease screening with PSA. The vast majority of PCa risk-associated SNPs are not associated with aggressiveness and clinicopathologic variables of PCa. Correspondingly, they have minimal utility in predicting the risk for developing more or less aggressive forms of PCa.

  1. Effects of GWAS-Associated Genetic Variants on lncRNAs within IBD and T1D Candidate Loci

    PubMed Central

    Brorsson, Caroline A.; Pociot, Flemming

    2014-01-01

    Long non-coding RNAs are a new class of non-coding RNAs that are at the crosshairs in many human diseases such as cancers, cardiovascular disorders, inflammatory and autoimmune disease like Inflammatory Bowel Disease (IBD) and Type 1 Diabetes (T1D). Nearly 90% of the phenotype-associated single-nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) lie outside of the protein coding regions, and map to the non-coding intervals. However, the relationship between phenotype-associated loci and the non-coding regions including the long non-coding RNAs (lncRNAs) is poorly understood. Here, we systemically identified all annotated IBD and T1D loci-associated lncRNAs, and mapped nominally significant GWAS/ImmunoChip SNPs for IBD and T1D within these lncRNAs. Additionally, we identified tissue-specific cis-eQTLs, and strong linkage disequilibrium (LD) signals associated with these SNPs. We explored sequence and structure based attributes of these lncRNAs, and also predicted the structural effects of mapped SNPs within them. We also identified lncRNAs in IBD and T1D that are under recent positive selection. Our analysis identified putative lncRNA secondary structure-disruptive SNPs within and in close proximity (+/−5 kb flanking regions) of IBD and T1D loci-associated candidate genes, suggesting that these RNA conformation-altering polymorphisms might be associated with diseased-phenotype. Disruption of lncRNA secondary structure due to presence of GWAS SNPs provides valuable information that could be potentially useful for future structure-function studies on lncRNAs. PMID:25144376

  2. Association between long non-coding RNA polymorphisms and cancer risk: a meta-analysis.

    PubMed

    Huang, Xin; Zhang, Weiyue; Shao, Zengwu

    2018-05-25

    Several studies have suggested that long non-coding RNA (lncRNA) gene polymorphisms are associated with cancer risk. In the present study, we conducted a meta-analysis related to studies on the association between lncRNA single-nucleotide polymorphisms (SNPs) and the overall risk of cancer. A total 12 SNPs in five common lncRNA genes were finally included in the meta-analysis. In the lncRNA antisense noncoding RNA in the INK4 locus (ANRIL), the rs1333048 A/C, rs4977574 A/G, and rs10757278 A/G polymorphisms, but not rs1333045 C/T, were correlated with overall cancer risk. Our study also demonstrated that other SNPs were correlated with overall cancer risk, namely, metastasis-associated lung adenocarcinoma transcript 1 (MALAT1, rs619586 A/G), HOXA distal transcript antisense RNA (HOTTIP, rs1859168 A/C) and highly up-regulated in liver cancer (HULC, rs7763881 A/C). Moreover, four prostate cancer‑associated non‑coding RNA 1 (PRNCR1, rs16901946 G/A, rs13252298 G/A, rs1016343 T/C, and rs1456315 G/A) SNPs were in association with cancer risk. No association was found between the PRNCR1 (rs7007694 C/T) SNP and the risk of cancer. In conclusion, our results suggest that several studied lncRNA SNPs are associated with overall cancer risk. Therefore, they might be potential predictive biomarkers for the risk of cancer. More studies based on larger sample sizes and more lncRNA SNPs are warranted to confirm these findings. ©2018 The Author(s).

  3. Single-nucleotide polymorphisms g.151435C>T and g.173057T>C in PRLR gene regulated by bta-miR-302a are associated with litter size in goats.

    PubMed

    An, Xiaopeng; Hou, Jinxing; Gao, Teyang; Lei, Yingnan; Li, Guang; Song, Yuxuan; Wang, Jiangang; Cao, Binyun

    2015-06-01

    Single-nucleotide polymorphisms (SNPs) located at microRNA-binding sites (miR-SNPs) can affect the expression of genes. This study aimed to identify the miR-SNPs associated with litter size. Guanzhong (n = 321) and Boer (n = 191) goat breeds were used to detect SNPs in the caprine prolactin receptor (PRLR) gene by DNA sequencing, primer-introduced restriction analysis-polymerase chain reaction, and polymerase chain reaction-restriction fragment length polymorphism. Three novel SNPs (g.151435C>T, g.151454A>G, and g.173057T>C) were identified in the caprine PRLR gene. Statistical results indicated that the g.151435C>T and g.173057T>C SNPs were significantly associated with litter size in Guanzhong and Boer goat breeds. Further analysis revealed that combinative genotype C6 (TTAACC) was better than the others for litter size in both goat breeds. Furthermore, the PRLR g.173057T>C polymorphism was predicted to regulate the binding activity of bta-miR-302a. Luciferase reporter gene assay confirmed that 173057C to T substitution disrupted the binding site for bta-miR-302a, resulting in the reduced levels of luciferase. Taken together, these findings suggested that bta-miR-302a can influence the expression of PRLR protein by binding with 3'untranslated region, resulting in that the g.173057T>C SNP had significant effects on litter size. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. Individual and cumulative effect of prostate cancer risk-associated variants on clinicopathologic variables in 5,895 prostate cancer patients

    PubMed Central

    Kader, A. Karim; Sun, Jielin; Isaacs, Sarah D.; Wiley, Kathleen E.; Yan, Guifang; Kim, Seong-Tae; Fedor, Helen; DeMarzo, Angelo M.; Epstein, Jonathan I.; Walsh, Patrick C.; Partin, Alan W.; Trock, Bruce; Zheng, S. Lilly; Xu, Jianfeng; Isaacs, William

    2009-01-01

    Background More than a dozen single nucleotide polymorphisms (SNPs) have been associated with prostate cancer (PCa) risk from genome-wide association studies (GWAS). Their association with PCa aggressiveness and clinicopathologic variables is inconclusive. Methods Twenty PCa risk SNPs implicated in GWAS and fine mapping studies were evaluated in 5,895 PCa cases treated by radical prostatectomy at Johns Hopkins Hospital, where each tumor was uniformly graded and staged using the same protocol. Results For 18 of the 20 SNPs examined, no statistically significant differences (P > 0.05) were observed in risk allele frequencies between patients with more aggressive (Gleason Scores ≥ 4+3, or stage ≥ T3b, or N+) or less aggressive disease (Gleason Scores ≤ 3+4, and stage ≤ T2, and N0). For the two SNPs that had significant differences between more and less aggressive disease (rs2735839 in KLK3 (P = 8.4 × 10−7) and rs10993994 in MSMB (P = 0.046), the alleles that are associated with increased risk for PCa were more frequent in patients with less aggressive disease. Since these SNPs are known to be associated with PSA levels in men without PCa diagnoses, these latter associations may reflect the enrichment of low grade, low stage cases diagnosed by contemporary disease screening with PSA. Conclusions The vast majority of PCa risk-associated SNPs are not associated with aggressiveness and clinicopathologic variables of PCa. Correspondingly, they have minimal utility in predicting the risk for developing more or less aggressive forms of PCa. PMID:19434657

  5. Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak

    PubMed Central

    2010-01-01

    Background The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. Results We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. Conclusions This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations. PMID:21092232

  6. Modeling heterogeneous (co)variances from adjacent-SNP groups improves genomic prediction for milk protein composition traits.

    PubMed

    Gebreyesus, Grum; Lund, Mogens S; Buitenhuis, Bart; Bovenhuis, Henk; Poulsen, Nina A; Janss, Luc G

    2017-12-05

    Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula. BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions. Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.

  7. LincSNP 2.0: an updated database for linking disease-associated SNPs to human long non-coding RNAs and their TFBSs.

    PubMed

    Ning, Shangwei; Yue, Ming; Wang, Peng; Liu, Yue; Zhi, Hui; Zhang, Yan; Zhang, Jizhou; Gao, Yue; Guo, Maoni; Zhou, Dianshuang; Li, Xin; Li, Xia

    2017-01-04

    We describe LincSNP 2.0 (http://bioinfo.hrbmu.edu.cn/LincSNP), an updated database that is used specifically to store and annotate disease-associated single nucleotide polymorphisms (SNPs) in human long non-coding RNAs (lncRNAs) and their transcription factor binding sites (TFBSs). In LincSNP 2.0, we have updated the database with more data and several new features, including (i) expanding disease-associated SNPs in human lncRNAs; (ii) identifying disease-associated SNPs in lncRNA TFBSs; (iii) updating LD-SNPs from the 1000 Genomes Project; and (iv) collecting more experimentally supported SNP-lncRNA-disease associations. Furthermore, we developed three flexible online tools to retrieve and analyze the data. Linc-Mart is a convenient way for users to customize their own data. Linc-Browse is a tool for all data visualization. Linc-Score predicts the associations between lncRNA and disease. In addition, we provided users a newly designed, user-friendly interface to search and download all the data in LincSNP 2.0 and we also provided an interface to submit novel data into the database. LincSNP 2.0 is a continually updated database and will serve as an important resource for investigating the functions and mechanisms of lncRNAs in human diseases. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Association of interactions between dietary salt consumption and hypertension-susceptibility genetic polymorphisms with blood pressure among Japanese male workers.

    PubMed

    Imaizumi, Takahiro; Ando, Masahiko; Nakatochi, Masahiro; Maruyama, Shoichi; Yasuda, Yoshinari; Honda, Hiroyuki; Kuwatsuka, Yachiyo; Kato, Sawako; Kondo, Takaaki; Iwata, Masamitsu; Nakashima, Toru; Yasui, Hiroshi; Takamatsu, Hideki; Okajima, Hiroshi; Yoshida, Yasuko; Matsuo, Seiichi

    2017-06-01

    Blood pressure is influenced by hereditary factors and dietary habits. The objective of this study was to examine the effect of dietary salt consumption and single-nucleotide polymorphisms (SNPs) on blood pressure (BP). This was a cross-sectional analysis of 2728 male participants who participated in a health examination in 2009. Average dietary salt consumption was estimated using electronically collected meal purchase data from cafeteria. A multivariate analysis, adjusting for clinically relevant factors, was conducted to examine whether the effect on BP of salt consumption, SNPs, and interaction between salt consumption and each SNP. This study examined the SNPs AGT rs699 (Met235Thr), ADD1 rs4961 (Gly460Trp), NPPA rs5063 (Val32Met), GPX1 rs1050450 (Pro198Leu), and AGTR1 rs5186 (A1166C) in relation to hypertension and salt sensitivity. BP was not significantly associated with SNPs or salt consumption. The interaction between salt consumption and SNPs with systolic BP showed a significant association in NPPA rs5063 (Val32Met) (P = 0.023) and a marginal trend toward significance in rs4961 and rs1050450 (P = 0.060 and 0.067, respectively). The effect of salt consumption on BP differed by genotype. Dietary salt consumption and genetic variation can predict a high risk of hypertension.

  9. Efficient SNP Discovery by Combining Microarray and Lab-on-a-Chip Data for Animal Breeding and Selection

    PubMed Central

    Huang, Chao-Wei; Lin, Yu-Tsung; Ding, Shih-Torng; Lo, Ling-Ling; Wang, Pei-Hwa; Lin, En-Chung; Liu, Fang-Wei; Lu, Yen-Wen

    2015-01-01

    The genetic markers associated with economic traits have been widely explored for animal breeding. Among these markers, single-nucleotide polymorphism (SNPs) are gradually becoming a prevalent and effective evaluation tool. Since SNPs only focus on the genetic sequences of interest, it thereby reduces the evaluation time and cost. Compared to traditional approaches, SNP genotyping techniques incorporate informative genetic background, improve the breeding prediction accuracy and acquiesce breeding quality on the farm. This article therefore reviews the typical procedures of animal breeding using SNPs and the current status of related techniques. The associated SNP information and genotyping techniques, including microarray and Lab-on-a-Chip based platforms, along with their potential are highlighted. Examples in pig and poultry with different SNP loci linked to high economic trait values are given. The recommendations for utilizing SNP genotyping in nimal breeding are summarized. PMID:27600241

  10. The impact of low-frequency and rare variants on lipid levels

    PubMed Central

    Surakka, Ida; Horikoshi, Momoko; Mägi, Reedik; Sarin, Antti-Pekka; Mahajan, Anubha; Lagou, Vasiliki; Marullo, Letizia; Ferreira, Teresa; Miraglio, Benjamin; Timonen, Sanna; Kettunen, Johannes; Pirinen, Matti; Karjalainen, Juha; Thorleifsson, Gudmar; Hägg, Sara; Hottenga, Jouke-Jan; Isaacs, Aaron; Ladenvall, Claes; Beekman, Marian; Esko, Tõnu; Ried, Janina S; Nelson, Christopher P; Willenborg, Christina; Gustafsson, Stefan; Westra, Harm-Jan; Blades, Matthew; de Craen, Anton JM; de Geus, Eco J; Deelen, Joris; Grallert, Harald; Hamsten, Anders; Havulinna, Aki S.; Hengstenberg, Christian; Houwing-Duistermaat, Jeanine J; Hyppönen, Elina; Karssen, Lennart C; Lehtimäki, Terho; Lyssenko, Valeriya; Magnusson, Patrik KE; Mihailov, Evelin; Müller-Nurasyid, Martina; Mpindi, John-Patrick; Pedersen, Nancy L; Penninx, Brenda WJH; Perola, Markus; Pers, Tune H; Peters, Annette; Rung, Johan; Smit, Johannes H; Steinthorsdottir, Valgerdur; Tobin, Martin D; Tsernikova, Natalia; van Leeuwen, Elisabeth M; Viikari, Jorma S; Willems, Sara M; Willemsen, Gonneke; Schunkert, Heribert; Erdmann, Jeanette; Samani, Nilesh J; Kaprio, Jaakko; Lind, Lars; Gieger, Christian; Metspalu, Andres; Slagboom, P Eline; Groop, Leif; van Duijn, Cornelia M; Eriksson, Johan G; Jula, Antti; Salomaa, Veikko; Boomsma, Dorret I; Power, Christine; Raitakari, Olli T; Ingelsson, Erik; Järvelin, Marjo-Riitta; Stefansson, Kari; Franke, Lude; Ikonen, Elina; Kallioniemi, Olli; Pietiäinen, Vilja; Lindgren, Cecilia M; Thorsteinsdottir, Unnur; Palotie, Aarno; McCarthy, Mark I; Morris, Andrew P; Prokopenko, Inga; Ripatti, Samuli

    2016-01-01

    Using a genome-wide screen of 9.6 million genetic variants achieved through 1000 Genomes imputation in 62,166 samples, we identify association to lipids in 93 loci including 79 previously identified loci with new lead-SNPs, 10 new loci, 15 loci with a low-frequency and 10 loci with missense lead-SNPs, and, 2 loci with an accumulation of rare variants. In six loci, SNPs with established function in lipid genetics (CELSR2, GCKR, LIPC, and APOE), or candidate missense mutations with predicted damaging function (CD300LG and TM6SF2), explained the locus associations. The low-frequency variants increased the proportion of variance explained, particularly for LDL-C and TC. Altogether, our results highlight the impact of low-frequency variants in complex traits and show that imputation offers a cost-effective alternative to re-sequencing. PMID:25961943

  11. Single nucleotide polymorphisms associated with coronary heart disease predict incident ischemic stroke in the atherosclerosis risk in communities study.

    PubMed

    Morrison, Alanna C; Bare, Lance A; Luke, May M; Pankow, James S; Mosley, Thomas H; Devlin, James J; Willerson, James T; Boerwinkle, Eric

    2008-01-01

    Ischemic stroke and coronary heart disease (CHD) may share genetic factors contributing to a common etiology. This study investigates whether 51 single nucleotide polymorphisms (SNPs) associated with CHD in multiple antecedent studies are associated with incident ischemic stroke in the Atherosclerosis Risk in Communities (ARIC) study. From the multiethnic ARIC cohort of 14,215 individuals, 495 validated ischemic strokes were identified. Cox proportional hazards models, adjusted for age and gender, identified three SNPs in Whites and two SNPs in Blacks associated with incident stroke (p

  12. Genetic polymorphism in ATG16L1 gene influences the response to adalimumab in Crohn's disease patients.

    PubMed

    Koder, Silvo; Repnik, Katja; Ferkolj, Ivan; Pernat, Cvetka; Skok, Pavel; Weersma, Rinse K; Potočnik, Uroš

    2015-01-01

    To see if SNPs could help predict response to biological therapy using adalimumab (ADA) in Crohn's disease (CD). IBDQ index and CRP levels were used to monitor therapy response. We genotyped 31 CD-associated genes in 102 Slovenian CD patients. The strongest association for treatment response defined as decrease in CRP levels was found for ATG16L1 SNP rs10210302. Additional SNPs in 7 out of 31 tested CD-associated genes (PTGER4, CASP9, IL27, C11orf30, CCNY, IL13, NR1I2) showed suggestive association with ADA response. Our results suggest ADA response in CD patients is genetically predisposed by SNPs in CD risk genes and suggest ATG16L1 as most promising candidate gene for drug response in ADA treatment. Original submitted 24 September 2014; Revision submitted 1 December 2014.

  13. Physical mapping of QTL for tuber yield, starch content and starch yield in tetraploid potato (Solanum tuberosum L.) by means of genome wide genotyping by sequencing and the 8.3 K SolCAP SNP array.

    PubMed

    Schönhals, Elske Maria; Ding, Jia; Ritter, Enrique; Paulo, Maria João; Cara, Nicolás; Tacke, Ekhard; Hofferbert, Hans-Reinhard; Lübeck, Jens; Strahwald, Josef; Gebhardt, Christiane

    2017-08-22

    Tuber yield and starch content of the cultivated potato are complex traits of decisive importance for breeding improved varieties. Natural variation of tuber yield and starch content depends on the environment and on multiple, mostly unknown genetic factors. Dissection and molecular identification of the genes and their natural allelic variants controlling these complex traits will lead to the development of diagnostic DNA-based markers, by which precision and efficiency of selection can be increased (precision breeding). Three case-control populations were assembled from tetraploid potato cultivars based on maximizing the differences between high and low tuber yield (TY), starch content (TSC) and starch yield (TSY, arithmetic product of TY and TSC). The case-control populations were genotyped by restriction-site associated DNA sequencing (RADseq) and the 8.3 k SolCAP SNP genotyping array. The allele frequencies of single nucleotide polymorphisms (SNPs) were compared between cases and controls. RADseq identified, depending on data filtering criteria, between 6664 and 450 genes with one or more differential SNPs for one, two or all three traits. Differential SNPs in 275 genes were detected using the SolCAP array. A genome wide association study using the SolCAP array on an independent, unselected population identified SNPs associated with tuber starch content in 117 genes. Physical mapping of the genes containing differential or associated SNPs, and comparisons between the two genome wide genotyping methods and two different populations identified genome segments on all twelve potato chromosomes harboring one or more quantitative trait loci (QTL) for TY, TSC and TSY. Several hundred genes control tuber yield and starch content in potato. They are unequally distributed on all potato chromosomes, forming clusters between 0.5-4 Mbp width. The largest fraction of these genes had unknown function, followed by genes with putative signalling and regulatory functions. The genetic control of tuber yield and starch content is interlinked. Most differential SNPs affecting both traits had antagonistic effects: The allele increasing TY decreased TSC and vice versa. Exceptions were 89 SNP alleles which had synergistic effects on TY, TSC and TSY. These and the corresponding genes are primary targets for developing diagnostic markers.

  14. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing.

    PubMed

    Gao, Guangtu; Nome, Torfinn; Pearse, Devon E; Moen, Thomas; Naish, Kerry A; Thorgaard, Gary H; Lien, Sigbjørn; Palti, Yniv

    2018-01-01

    Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout ( Oncorhynchus mykiss ), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup , followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.

  15. Preliminary evidence of an interaction between the FOXP2 gene and childhood emotional abuse predicting likelihood of auditory verbal hallucinations in schizophrenia.

    PubMed

    McCarthy-Jones, Simon; Green, Melissa J; Scott, Rodney J; Tooney, Paul A; Cairns, Murray J; Wu, Jing Qin; Oldmeadow, Christopher; Carr, Vaughan

    2014-03-01

    The FOXP2 gene is involved in the development of speech and language. As some single nucleotide polymorphisms (SNPs) of FOXP2 have been found to be associated with auditory verbal hallucinations (AVHs) at trend levels, this study set out to undertake the first examination into whether interactions between candidate FOXP2 SNPs and environmental factors (specifically, child abuse) predict the likelihood of AVHs. Data on parental child abuse and FOXP2 SNPs previously linked to AVHs (rs1456031, rs2396753, rs2253478) were obtained from the Australian Schizophrenia Research Bank for people with schizophrenia-spectrum disorders, both with (n = 211) and without (n = 122) a lifetime history of AVHs. Genotypic frequencies did not differ between the two groups; however, logistic regression found that childhood parental emotional abuse (CPEA) interacted with rs1456031 to predict lifetime experience of AVH. CPEA was only associated with significantly higher levels of AVHs in people with CC genotypes (odds ratio = 4.25), yet in the absence of CPEA, people with TT genotypes had significantly higher levels of AVHs than people with CC genotypes (odds ratio = 4.90). This interaction was specific to auditory verbal hallucinations, and did not predict the likelihood of non-verbal auditory hallucinations. Our findings offer tentative evidence that FOXP2 may be a susceptibility gene for AVHs, influencing the probability people experience AVHs in the presence and absence of CPEA. However, these findings are in need of replication in a larger study that addresses the methodological limitations of the present investigation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Clinical Utility of Five Genetic Variants for Predicting Prostate Cancer Risk and Mortality

    PubMed Central

    Salinas, Claudia A.; Koopmeiners, Joseph S.; Kwon, Erika M.; FitzGerald, Liesel; Lin, Daniel W.; Ostrander, Elaine A.; Feng, Ziding; Stanford, Janet L.

    2009-01-01

    Background A recent report suggests that the combination of five single-nucleotide polymorphisms (SNPs) at 8q24, 17q12, 17q24.3 and a family history of the disease may predict risk of prostate cancer. The present study tests the performance of these factors in prediction models for prostate cancer risk and prostate cancer-specific mortality. Methods SNPs were genotyped in population-based samples from Caucasians in King County, Washington. Incident cases (n=1308), aged 35–74, were compared to age-matched controls (n=1266) using logistic regression to estimate odds ratios (OR) associated with genotypes and family history. Cox proportional hazards models estimated hazard ratios for prostate cancer-specific mortality according to genotypes. Results The combination of SNP genotypes and family history was significantly associated with prostate cancer risk (ptrend=1.5 × 10−20). Men with ≥ five risk factors had an OR of 4.9 (95% CI 1.6 to 18.5) compared to men with none. However, this combination of factors did not improve the ROC curve after accounting for known risk predictors (i.e., age, serum PSA, family history). Neither the individual nor combined risk factors was associated with prostate cancer-specific mortality. Conclusion Genotypes for five SNPs plus family history are associated with a significant elevation in risk for prostate cancer and may explain up to 45% of prostate cancer in our population. However, they do not improve prediction models for assessing who is at risk of getting or dying from the disease, once known risk or prognostic factors are taken into account. Thus, this SNP panel may have limited clinical utility. PMID:19058137

  17. Single Nucleotide Polymorphisms Predict Symptom Severity of Autism Spectrum Disorder

    ERIC Educational Resources Information Center

    Jiao, Yun; Chen, Rong; Ke, Xiaoyan; Cheng, Lu; Chu, Kangkang; Lu, Zuhong; Herskovits, Edward H.

    2012-01-01

    Autism is widely believed to be a heterogeneous disorder; diagnosis is currently based solely on clinical criteria, although genetic, as well as environmental, influences are thought to be prominent factors in the etiology of most forms of autism. Our goal is to determine whether a predictive model based on single-nucleotide polymorphisms (SNPs)…

  18. Genomic prediction in bi-parental tropical maize populations in water-stressed and well-watered environments using low density and GBS SNPs

    USDA-ARS?s Scientific Manuscript database

    One of the most important applications of genomic selection in maize breeding is to predict and identify the best-untested individuals from bi-parental populations, when the training and validation sets are derived from the same cross. Nineteen tropical maize bi-parental populations evaluated in mul...

  19.  PARK2 polymorphisms predict disease progression in patients infected with hepatitis C virus.

    PubMed

    Al-Qahtani, Ahmed A; Al-Anazi, Mashael R; Al-Zoghaibi, Fahad A; Abdo, Ayman A; Sanai, Faisal M; Al-Hamoudi, Waleed K; Alswat, Khalid A; Al-Ashgar, Hamad I; Khan, Mohammed Q; Albenmousa, Ali; Khalak, Hanif; Al-Ahdal, Mohammed N

     Background. The protein encoded by PARK2 gene is a component of the ubiquitin-proteasome system that mediates targeting of proteins for the degradation pathway. Genetic variations at PARK2 gene were linked to various diseases including leprosy, typhoid and cancer. The present study investigated the association of single nucleotide polymorphisms (SNPs) in the PARK2 gene with the development of hepatitis C virus (HCV) infection and its progression to severe liver diseases. A total of 800 subjects, including 400 normal healthy subjects and 400 HCV-infected patients, were analyzed in this study. The patients were classified as chronic HCV patients (group I), patients with cirrhosis (group II) and patients with hepatocellular carcinoma (HCC) in the context of cirrhosis (group III). DNA was extracted and was genotyped for the SNPs rs10945859, rs2803085, rs2276201 and rs1931223. Among these SNPs, CT genotype of rs10945859 was found to have a significant association towards the clinical progression of chronic HCV infection to cirrhosis alone (OR = 1.850; 95% C. I. 1.115-3.069; p = 0.016) or cirrhosis and HCC (OR = 1.768; 95% C. I. 1.090-2.867; p value = 0.020). SNP rs10945859 in the PARK2 gene could prove useful in predicting the clinical outcome in HCV-infected patients.

  20. The genomic architecture and association genetics of adaptive characters using a candidate SNP approach in boreal black spruce

    PubMed Central

    2013-01-01

    Background The genomic architecture of adaptive traits remains poorly understood in non-model plants. Various approaches can be used to bridge this gap, including the mapping of quantitative trait loci (QTL) in pedigrees, and genetic association studies in non-structured populations. Here we present results on the genomic architecture of adaptive traits in black spruce, which is a widely distributed conifer of the North American boreal forest. As an alternative to the usual candidate gene approach, a candidate SNP approach was developed for association testing. Results A genetic map containing 231 gene loci was used to identify QTL that were related to budset timing and to tree height assessed over multiple years and sites. Twenty-two unique genomic regions were identified, including 20 that were related to budset timing and 6 that were related to tree height. From results of outlier detection and bulk segregant analysis for adaptive traits using DNA pool sequencing of 434 genes, 52 candidate SNPs were identified and subsequently tested in genetic association studies for budset timing and tree height assessed over multiple years and sites. A total of 34 (65%) SNPs were significantly associated with budset timing, or tree height, or both. Although the percentages of explained variance (PVE) by individual SNPs were small, several significant SNPs were shared between sites and among years. Conclusions The sharing of genomic regions and significant SNPs between budset timing and tree height indicates pleiotropic effects. Significant QTLs and SNPs differed quite greatly among years, suggesting that different sets of genes for the same characters are involved at different stages in the tree’s life history. The functional diversity of genes carrying significant SNPs and low observed PVE further indicated that a large number of polymorphisms are involved in adaptive genetic variation. Accordingly, for undomesticated species such as black spruce with natural populations of large effective size and low linkage disequilibrium, efficient marker systems that are predictive of adaptation should require the survey of large numbers of SNPs. Candidate SNP approaches like the one developed in the present study could contribute to reducing these numbers. PMID:23724860

  1. Drug Metabolizing Enzyme and Transporter Gene Variation, Nicotine Metabolism, Prospective Abstinence, and Cigarette Consumption

    PubMed Central

    Bergen, Andrew W.; Michel, Martha; Nishita, Denise; Krasnow, Ruth; Javitz, Harold S.; Conneely, Karen N.; Lessov-Schlaggar, Christina N.; Hops, Hyman; Zhu, Andy Z. X.; Baurley, James W.; McClure, Jennifer B.; Hall, Sharon M.; Baker, Timothy B.; Conti, David V.; Benowitz, Neal L.; Lerman, Caryn; Tyndale, Rachel F.; Swan, Gary E.

    2015-01-01

    The Nicotine Metabolite Ratio (NMR, ratio of trans-3’-hydroxycotinine and cotinine), has previously been associated with CYP2A6 activity, response to smoking cessation treatments, and cigarette consumption. We searched for drug metabolizing enzyme and transporter (DMET) gene variation associated with the NMR and prospective abstinence in 2,946 participants of laboratory studies of nicotine metabolism and of clinical trials of smoking cessation therapies. Stage I was a meta-analysis of the association of 507 common single nucleotide polymorphisms (SNPs) at 173 DMET genes with the NMR in 449 participants of two laboratory studies. Nominally significant associations were identified in ten genes after adjustment for intragenic SNPs; CYP2A6 and two CYP2A6 SNPs attained experiment-wide significance adjusted for correlated SNPs (CYP2A6 P ACT=4.1E-7, rs4803381 P ACT=4.5E-5, rs1137115, P ACT=1.2E-3). Stage II was mega-regression analyses of 10 DMET SNPs with pretreatment NMR and prospective abstinence in up to 2,497 participants from eight trials. rs4803381 and rs1137115 SNPs were associated with pretreatment NMR at genome-wide significance. In post-hoc analyses of CYP2A6 SNPs, we observed nominally significant association with: abstinence in one pharmacotherapy arm; cigarette consumption among all trial participants; and lung cancer in four case:control studies. CYP2A6 minor alleles were associated with reduced NMR, CPD, and lung cancer risk. We confirmed the major role that CYP2A6 plays in nicotine metabolism, and made novel findings with respect to genome-wide significance and associations with CPD, abstinence and lung cancer risk. Additional multivariate analyses with patient variables and genetic modeling will improve prediction of nicotine metabolism, disease risk and smoking cessation treatment prognosis. PMID:26132489

  2. The Impact of Polymorphic Variations in the 5p15, 6p12, 6p21 and 15q25 Loci on the Risk and Prognosis of Portuguese Patients with Non-Small Cell Lung Cancer

    PubMed Central

    de Mello, Ramon Andrade; Ferreira, Mónica; Soares-Pires, Filipa; Costa, Sandra; Cunha, João; Oliveira, Pedro; Hespanhol, Venceslau; Reis, Rui Manuel

    2013-01-01

    Introduction Polymorphic variants in the 5p15, 6p12, 6p21, and 15q25 loci were demonstrated to potentially contribute to lung cancer carcinogenesis. Therefore, this study was performed to assess the role of those variants in non-small cell lung cancer (NSCLC) risk and prognosis in a Portuguese population. Materials and Methods Blood from patients with NSCLC was prospectively collected. To perform an association study, DNA from these patients and healthy controls were genotyped for a panel of 19 SNPs using a Sequenom® MassARRAY platform. Kaplan-Meier curves were used to assess the overall survival (OS) and progression-free survival (PFS). Results One hundred and forty-four patients with NSCLC were successfully consecutively genotyped for the 19 SNPs. One SNP was associated with NSCLC risk: rs9295740 G/A. Two SNPs were associated with non-squamous histology: rs3024994 (VEGF intron 2) T/C and rs401681 C/T. Three SNPs were associated with response rate: rs3025035 (VEGF intron 7) C/T, rs833061 (VEGF –460) C/T and rs9295740 G/A. One SNP demonstrated an influence on PFS: rs401681 C/T at 5p15, p = 0.021. Four SNPs demonstrated an influence on OS: rs2010963 (VEGF +405 G/C), p = 0.042; rs3025010 (VEGF intron 5 C/T), p = 0.047; rs401681 C/T at 5p15, p = 0.046; and rs31489 C/A at 5p15, p = 0.029. Conclusions Our study suggests that SNPs in the 6p12, 6p21, and 5p15 loci may serve as risk, predictive and prognostic NSCLC biomarkers. In the future, SNPs identified in the genomes of patients may improve NSCLC screening strategies and therapeutic management as well. PMID:24039754

  3. Influence of promoter/enhancer region haplotypes on MGMT transcriptional regulation: a potential biomarker for human sensitivity to alkylating agents.

    PubMed

    Xu, Meixiang; Nekhayeva, Ilona; Cross, Courtney E; Rondelli, Catherine M; Wickliffe, Jeffrey K; Abdel-Rahman, Sherif Z

    2014-03-01

    The O6-methylguanine-DNA methyltransferase gene (MGMT) encodes the direct reversal DNA repair protein that removes alkyl adducts from the O6 position of guanine. Several single-nucleotide polymorphisms (SNPs) exist in the MGMT promoter/enhancer (P/E) region. However, the haplotype structure encompassing these SNPs and their functional/biological significance are currently unknown. We hypothesized that MGMT P/E haplotypes, rather than individual SNPs, alter MGMT transcription and can thus alter human sensitivity to alkylating agents. To identify the haplotype structure encompassing the MGMT P/E region SNPs, we sequenced 104 DNA samples from healthy individuals and inferred the haplotypes using the data generated. We identified eight SNPs in this region, namely T7C (rs180989103), T135G (rs1711646), G290A (rs61859810), C485A (rs1625649), C575A (rs113813075), G666A (rs34180180), C777A (rs34138162) and C1099T (rs16906252). Phylogenetics and Sequence Evolution analysis predicted 21 potential haplotypes that encompass these SNPs ranging in frequencies from 0.000048 to 0.39. Of these, 10 were identified in our study population as 20 paired haplotype combinations. To determine the functional significance of these haplotypes, luciferase reporter constructs representing these haplotypes were transfected into glioblastoma cells and their effect on MGMT promoter activity was determined. Compared with the most common (reference) haplotype 1, seven haplotypes significantly upregulated MGMT promoter activity (18-119% increase; P < 0.05), six significantly downregulated MGMT promoter activity (29-97% decrease; P < 0.05) and one haplotype had no effect. Mechanistic studies conducted support the conclusion that MGMT P/E haplotypes, rather than individual SNPs, differentially regulate MGMT transcription and could thus play a significant role in human sensitivity to environmental and therapeutic alkylating agents.

  4. Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

    PubMed

    Abraham, Gad; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2013-02-01

    A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease. © 2012 WILEY PERIODICALS, INC.

  5. Characterization of the Gray Whale Eschrichtius robustus Genome and a Genotyping Array Based on Single-Nucleotide Polymorphisms in Candidate Genes.

    PubMed

    DeWoody, J Andrew; Fernandez, Nadia B; Brüniche-Olsen, Anna; Antonides, Jennifer D; Doyle, Jacqueline M; San Miguel, Phillip; Westerman, Rick; Vertyankin, Vladimir V; Godard-Codding, Céline A J; Bickham, John W

    2017-06-01

    Genetic and genomic approaches have much to offer in terms of ecology, evolution, and conservation. To better understand the biology of the gray whale Eschrichtius robustus (Lilljeborg, 1861), we sequenced the genome and produced an assembly that contains ∼95% of the genes known to be highly conserved among eukaryotes. From this assembly, we annotated 22,711 genes and identified 2,057,254 single-nucleotide polymorphisms (SNPs). Using this assembly, we generated a curated list of candidate genes potentially subject to strong natural selection, including genes associated with osmoregulation, oxygen binding and delivery, and other aspects of marine life. From these candidate genes, we queried 92 autosomal protein-coding markers with a panel of 96 SNPs that also included 2 sexing and 2 mitochondrial markers. Genotyping error rates, calculated across loci and across 69 intentional replicate samples, were low (0.021%), and observed heterozygosity was 0.33 averaged over all autosomal markers. This level of variability provides substantial discriminatory power across loci (mean probability of identity of 1.6 × 10 -25 and mean probability of exclusion >0.999 with neither parent known), indicating that these markers provide a powerful means to assess parentage and relatedness in gray whales. We found 29 unique multilocus genotypes represented among our 36 biopsies (indicating that we inadvertently sampled 7 whales twice). In total, we compiled an individual data set of 28 western gray whales (WGSs) and 1 presumptive eastern gray whale (EGW). The lone EGW we sampled was no more or less related to the WGWs than expected by chance alone. The gray whale genomes reported here will enable comparative studies of natural selection in cetaceans, and the SNP markers should be highly informative for future studies of gray whale evolution, population structure, demography, and relatedness.

  6. EuroPineDB: a high-coverage web database for maritime pine transcriptome

    PubMed Central

    2011-01-01

    Background Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases. Description EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided. Conclusions The EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome. PMID:21762488

  7. Replication and validation of genetic polymorphisms associated with survival after allogeneic blood or marrow transplant

    PubMed Central

    Karaesmen, Ezgi; Rizvi, Abbas A.; Preus, Leah M.; McCarthy, Philip L.; Pasquini, Marcelo C.; Onel, Kenan; Zhu, Xiaochun; Spellman, Stephen; Haiman, Christopher A.; Stram, Daniel O.; Pooler, Loreall; Sheng, Xin; Zhu, Qianqian; Yan, Li; Liu, Qian; Hu, Qiang; Webb, Amy; Brock, Guy; Clay-Gilmour, Alyssa I.; Battaglia, Sebastiano; Tritchler, David; Liu, Song; Hahn, Theresa

    2017-01-01

    Multiple candidate gene-association studies of non-HLA single-nucleotide polymorphisms (SNPs) and outcomes after blood or marrow transplant (BMT) have been conducted. We identified 70 publications reporting 45 SNPs in 36 genes significantly associated with disease-related mortality, progression-free survival, transplant-related mortality, and/or overall survival after BMT. Replication and validation of these SNP associations were performed using DISCOVeRY-BMT (Determining the Influence of Susceptibility COnveying Variants Related to one-Year mortality after BMT), a well-powered genome-wide association study consisting of 2 cohorts, totaling 2888 BMT recipients with acute myeloid leukemia, acute lymphoblastic leukemia, or myelodysplastic syndrome, and their HLA-matched unrelated donors, reported to the Center for International Blood and Marrow Transplant Research. Gene-based tests were used to assess the aggregate effect of SNPs on outcome. None of the previously reported significant SNPs replicated at P < .05 in DISCOVeRY-BMT. Validation analyses showed association with one previously reported donor SNP at P < .05 and survival; more associations would be anticipated by chance alone. No gene-based tests were significant at P < .05. Functional annotation with publicly available data shows these candidate SNPs most likely do not have biochemical function; only 13% of candidate SNPs correlate with gene expression or are predicted to impact transcription factor binding. Of these, half do not impact the candidate gene of interest; the other half correlate with expression of multiple genes. These findings emphasize the peril of pursing candidate approaches and the importance of adequately powered tests of unbiased genome-wide associations with BMT clinical outcomes given the ultimate goal of improving patient outcomes. PMID:28811306

  8. Functional Polymorphisms of Base Excision Repair Genes XRCC1 and APEX1 Predict Risk of Radiation Pneumonitis in Patients With Non-Small Cell Lung Cancer Treated With Definitive Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yin Ming; Liao Zhongxing; Liu Zhensheng

    2011-11-01

    Purpose: To explore whether functional single nucleotide polymorphisms (SNPs) of base-excision repair genes are predictors of radiation treatment-related pneumonitis (RP), we investigated associations between functional SNPs of ADPRT, APEX1, and XRCC1 and RP development. Methods and Materials: We genotyped SNPs of ADPRT (rs1136410 [V762A]), XRCC1 (rs1799782 [R194W], rs25489 [R280H], and rs25487 [Q399R]), and APEX1 (rs1130409 [D148E]) in 165 patients with non-small cell lung cancer (NSCLC) who received definitive chemoradiation therapy. Results were assessed by both Logistic and Cox regression models for RP risk. Kaplan-Meier curves were generated for the cumulative RP probability by the genotypes. Results: We found that SNPsmore » of XRCC1 Q399R and APEX1 D148E each had a significant effect on the development of Grade {>=}2 RP (XRCC1: AA vs. GG, adjusted hazard ratio [HR] = 0.48, 95% confidence interval [CI], 0.24-0.97; APEX1: GG vs. TT, adjusted HR = 3.61, 95% CI, 1.64-7.93) in an allele-dose response manner (Trend tests: p = 0.040 and 0.001, respectively). The number of the combined protective XRCC1 A and APEX1 T alleles (from 0 to 4) also showed a significant trend of predicting RP risk (p = 0.001). Conclusions: SNPs of the base-excision repair genes may be biomarkers for susceptibility to RP. Larger prospective studies are needed to validate our findings.« less

  9. The CRHR1 Gene, Trauma Exposure, and Alcoholism Risk: A Test of G × E Effects

    PubMed Central

    Ray, Lara A.; Sehl, Mary; Bujarski, Spencer; Hutchison, Kent; Blaine, Sara; Enoch, Mary-Anne

    2014-01-01

    The corticotropin-releasing hormone type I receptor (CRHR1) gene has been implicated in the liability for neuropsychiatric disorders, particularly under conditions of stress. Based on the hypothesized effects of CRHR1 variation on stress reactivity, measures of adulthood traumatic stress exposure were analyzed for their interaction with CRHR1 haplotypes and SNPs in predicting the risk for alcoholism. Phenotypic data on 2,533 non-related Caucasian individuals (1167 alcoholics and 1366 controls) were culled from the publically available Study of Addiction: Genetics and Environment (SAGE) genome-wide association study (GWAS). Genotypes were available for 19 tag SNPs. Logistic regression models examined the interaction between CRHR1 haplotypes / SNPs and adulthood traumatic stress exposure in predicting alcoholism risk. Two haplotype blocks spanned CRHR1. Haplotype analyses identified one haplotype in the proximal block 1 (p = 0.029) and two haplotypes in the distal block 2 (p = 0.026, 0.042) that showed nominally significant (corrected p < .025) genotype × traumatic stress interactive effects on the likelihood of developing alcoholism. The block 1 haplotype effect was driven by SNPs rs110402 (p = 0.019) and rs242924 (p = 0.019). In block 2, rs17689966 (p = 0.018) showed significant, and rs173365 (p = 0.026) showed nominally significant, gene × environment (G × E) effects on alcoholism status. This study extends the literature on the interplay between CRHR1 variation and alcoholism, in the context of exposure to traumatic stress. These findings are consistent with the hypothesized role of the extra hypothalamic CRF system dysregulation in the initiation and maintenance of alcoholism. Molecular and experimental studies are needed to more fully understand the mechanisms of risk and protection conferred by genetic variation at the identified loci. PMID:23473364

  10. Association of ITPA polymorphisms rs6051702/rs1127354 instead of rs7270101/rs1127354 as predictor of ribavirin-associated anemia in chronic hepatitis C treated patients.

    PubMed

    D'Avolio, Antonio; De Nicolò, Amedeo; Cusato, Jessica; Ciancio, Alessia; Boglione, Lucio; Strona, Silvia; Cariti, Giuseppe; Troshina, Giulia; Caviglia, Gian Paolo; Smedile, Antonina; Rizzetto, Mario; Di Perri, Giovanni

    2013-10-01

    Functional variants rs7270101 and rs1127354 of inosine triphosphatase (ITPA) were recently found to protect against ribavirin (RBV)-induced hemolytic anemia. However, no definitive data are yet available on the role of no functional rs6051702 polymorphism. Since a simultaneous evaluation of the three ITPA SNPs for hemolytic anemia has not yet been investigated, we aimed to understand the contribution of each SNPs and its potential clinical use to predict anemia in HCV treated patients. A retrospective analysis included 379 HCV treated patients. The ITPA variants rs6051702, rs7270101 and rs1127354 were genotyped and tested for association with achieving anemia at week 4. We also investigated, using multivariate logistic regression, the impact of each single and paired associated polymorphism on anemia onset. All SNPs were associated with Hb decrease. The carrier of at least one variant allele in the functional ITPA SNPs was associated with a lower decrement of Hb, as compared to patients without a variant allele. In multivariate logistic regression analyses the carrier of a variant allele in the rs6051702/rs1127354 association (OR=0.11, p=1.75×10(-5)) and Hb at baseline (OR=1.51, p=1.21×10(-4)) were independently associated with protection against clinically significant anemia at week 4. All ITPA polymorphisms considered were shown to be significantly associated with anemia onset. A multivariate regression model based on ITPA genetic polymorphisms was developed for predicting the risk of anemia. Considering the characterization of pre-therapy anemia predictors, rs6051702 SNP in association to rs1127354 is more informative in order to avoid this relevant adverse event. Copyright © 2013 Elsevier B.V. All rights reserved.

  11. A genetic risk score based on direct associations with coronary heart disease improves coronary heart disease risk prediction in the Atherosclerosis Risk in Communities (ARIC), but not in the Rotterdam and Framingham Offspring, Studies

    PubMed Central

    Brautbar, Ariel; Pompeii, Lisa A.; Dehghan, Abbas; Ngwa, Julius S.; Nambi, Vijay; Virani, Salim S.; Rivadeneira, Fernando; Uitterlinden, André G.; Hofman, Albert; Witteman, Jacqueline C.M.; Pencina, Michael J.; Folsom, Aaron R.; Cupples, L. Adrienne; Ballantyne, Christie M.; Boerwinkle, Eric

    2013-01-01

    Objective Multiple studies have identified single-nucleotide polymorphisms (SNPs) that are associated with coronary heart disease (CHD). We examined whether SNPs selected based on predefined criteria will improve CHD risk prediction when added to traditional risk factors (TRFs). Methods SNPs were selected from the literature based on association with CHD, lack of association with a known CHD risk factor, and successful replication. A genetic risk score (GRS) was constructed based on these SNPs. Cox proportional hazards model was used to calculate CHD risk based on the Atherosclerosis Risk in Communities (ARIC) and Framingham CHD risk scores with and without the GRS. Results The GRS was associated with risk for CHD (hazard ratio [HR] = 1.10; 95% confidence interval [CI]: 1.07–1.13). Addition of the GRS to the ARIC risk score significantly improved discrimination, reclassification, and calibration beyond that afforded by TRFs alone in non-Hispanic whites in the ARIC study. The area under the receiver operating characteristic curve (AUC) increased from 0.742 to 0.749 (Δ= 0.007; 95% CI, 0.004–0.013), and the net reclassification index (NRI) was 6.3%. Although the risk estimates for CHD in the Framingham Offspring (HR = 1.12; 95% CI: 1.10–1.14) and Rotterdam (HR = 1.08; 95% CI: 1.02–1.14) Studies were significantly improved by adding the GRS to TRFs, improvements in AUC and NRI were modest. Conclusion Addition of a GRS based on direct associations with CHD to TRFs significantly improved discrimination and reclassification in white participants of the ARIC Study, with no significant improvement in the Rotterdam and Framingham Offspring Studies. PMID:22789513

  12. Prospective assessment of XRCC3, XPD and Aurora kinase A single-nucleotide polymorphisms in advanced lung cancer.

    PubMed

    Provencio, M; Camps, C; Cobo, M; De las Peñas, R; Massuti, B; Blanco, R; Alberola, V; Jimenez, U; Delgado, J R; Cardenal, F; Tarón, M; Ramírez, J L; Sanchez, A; Rosell, R

    2012-12-01

    New therapeutic approaches are being developed based on findings that several genetic abnormalities underlying non-small-cell lung cancer (NSCLC) can influence chemosensitivity. The identification of molecular markers, useful for therapeutic decisions in lung cancer, is thus crucial for disease management. The present study evaluated single-nucleotide polymorphisms (SNPs) in XRCC3, XPD and Aurora kinase A in NSCLC patients in order to assess whether these biomarkers were able to predict the outcomes of the patients. The Spanish Lung Cancer Group prospectively assessed this clinical study. Eligible patients had histologically confirmed stage IV or IIIB (with malignant pleural effusion) NSCLC, which had not previously been treated with chemotherapy, and a World Health Organization performance status (PS) of 0-1. Patients received intravenous doses of vinorelbine 25 mg/m(2) on days 1 and 8, and cisplatin 75 mg/m(2) on day 1, every 21 days for a maximum of 6 cycles. Venous blood was collected from each, and genomic DNA was isolated. SNPs in XRCC3 T241M, XPD K751Q, XPD D312N, AURORA 91, AURORA 169 were assessed. The study included 180 patients. Median age was 62 years; 87 % were male; 34 % had PS 0; and 83 % had stage IV disease. The median number of cycles was 4. Time to progression was 5.1 months (95 % CI, 4.2-5.9). Overall median survival was 8.6 months (95 % CI, 7.1-10.1). There was no significant association between SNPs in XRCC3 T241M, XPD K751Q, XPD D312N, AURORA 91, AURORA 169 in outcome or toxicity. Our findings indicate that SNPs in XRCC3, XPD or Aurora kinase A cannot predict outcomes in advanced NSCLC patients treated with platinum-based chemotherapy.

  13. NATRIURETIC PEPTIDE SYSTEM GENE VARIANTS ARE ASSOCIATED WITH VENTRICULAR DYSFUNCTION AFTER CORONARY ARTERY BYPASS GRAFTING

    PubMed Central

    Fox, Amanda A.; Collard, Charles D.; Shernan, Stanton K.; Seidman, Christine E.; Seidman, Jonathan G.; Liu, Kuang-Yu; Muehlschlegel, Jochen D.; Perry, Tjorvi E.; Aranki, Sary F.; Lange, Christoph; Herman, Daniel S.; Meitinger, Thomas; Lichtner, Peter; Body, Simon C.

    2009-01-01

    Background Ventricular dysfunction (VnD) after primary coronary artery bypass grafting is associated with increased hospital stay and mortality. Natriuretic peptides have compensatory vasodilatory, natriuretic and paracrine influences on myocardial failure and ischemia. We hypothesized that natriuretic peptide system gene variants independently predict risk of VnD after primary coronary artery bypass grafting. Methods 1164 patients undergoing primary coronary artery bypass grafting with cardiopulmonary bypass at two institutions were prospectively enrolled. After prospectively defined exclusions, 697 Caucasian patients (76 with VnD) were analyzed. VnD was defined as need for ≥ 2 new inotropes and/or new mechanical ventricular support after coronary artery bypass grafting. 139 haplotype-tagging SNPs within 7 genes (NPPA; NPPB; NPPC; NPR1; NPR2; NPR3; CORIN) were genotyped. SNPs univariately associated with VnD were entered into logistic regression models adjusting for clinical covariates predictive of VnD. To control for multiple comparisons, permutation analyses were conducted for all SNP associations. Results After adjusting for clinical covariates and multiple comparisons within each gene, seven NPPA/NPPB SNPs (rs632793, rs6668352, rs549596, rs198388, rs198389, rs6676300, rs1009592) were associated with decreased risk of postoperative VnD (additive model; odds ratios 0.44–0.55; P = 0.010–0.036), and four NPR3 SNPs (rs700923, rs16890196, rs765199, rs700926) were associated with increased risk of postoperative VnD (recessive model; odds ratios 3.89–4.28; P = 0.007–0.034). Conclusions Genetic variation within the NPPA/NPPB and NPR3 genes is associated with risk of VnD after primary coronary artery bypass grafting. Knowledge of such genotypic predictors may result in better understanding of the molecular mechanisms underlying postoperative VnD. PMID:19326473

  14. Effect prediction of identified SNPs linked to fruit quality and chilling injury in peach [Prunus persica (L.) Batsch].

    PubMed

    Martínez-García, Pedro J; Fresnedo-Ramírez, Jonathan; Parfitt, Dan E; Gradziel, Thomas M; Crisosto, Carlos H

    2013-01-01

    Single nucleotide polymorphisms (SNPs) are a fundamental source of genomic variation. Large SNP panels have been developed for Prunus species. Fruit quality traits are essential peach breeding program objectives since they determine consumer acceptance, fruit consumption, industry trends and cultivar adoption. For many cultivars, these traits are negatively impacted by cold storage, used to extend fruit market life. The major symptoms of chilling injury are lack of flavor, off flavor, mealiness, flesh browning, and flesh bleeding. A set of 1,109 SNPs was mapped previously and 67 were linked with these complex traits. The prediction of the effects associated with these SNPs on downstream products from the 'peach v1.0' genome sequence was carried out. A total of 2,163 effects were detected, 282 effects (non-synonymous, synonymous or stop codon gained) were located in exonic regions (13.04 %) and 294 placed in intronic regions (13.59 %). An extended list of genes and proteins that could be related to these traits was developed. Two SNP markers that explain a high percentage of the observed phenotypic variance, UCD_SNP_1084 and UCD_SNP_46, are associated with zinc finger (C3HC4-type RING finger) family protein and AOX1A (alternative oxidase 1a) protein groups, respectively. In addition, phenotypic variation suggests that the observed polymorphism for SNP UCD_SNP_1084 [A/G] mutation could be a candidate quantitative trait nucleotide affecting quantitative trait loci for mealiness. The interaction and expression of affected proteins could explain the variation observed in each individual and facilitate understanding of gene regulatory networks for fruit quality traits in peach.

  15. Genome-wide association uncovers shared genetic effects among personality traits and mood states.

    PubMed

    Luciano, Michelle; Huffman, Jennifer E; Arias-Vásquez, Alejandro; Vinkhuyzen, Anna A E; Middeldorp, Christel M; Giegling, Ina; Payton, Antony; Davies, Gail; Zgaga, Lina; Janzing, Joost; Ke, Xiayi; Galesloot, Tessel; Hartmann, Annette M; Ollier, William; Tenesa, Albert; Hayward, Caroline; Verhagen, Maaike; Montgomery, Grant W; Hottenga, Jouke-Jan; Konte, Bettina; Starr, John M; Vitart, Veronique; Vos, Pieter E; Madden, Pamela A F; Willemsen, Gonneke; Konnerth, Heike; Horan, Michael A; Porteous, David J; Campbell, Harry; Vermeulen, Sita H; Heath, Andrew C; Wright, Alan; Polasek, Ozren; Kovacevic, Sanja B; Hastie, Nicholas D; Franke, Barbara; Boomsma, Dorret I; Martin, Nicholas G; Rujescu, Dan; Wilson, James F; Buitelaar, Jan; Pendleton, Neil; Rudan, Igor; Deary, Ian J

    2012-09-01

    Measures of personality and psychological distress are correlated and exhibit genetic covariance. We conducted univariate genome-wide SNP (~2.5 million) and gene-based association analyses of these traits and examined the overlap in results across traits, including a prediction analysis of mood states using genetic polygenic scores for personality. Measures of neuroticism, extraversion, and symptoms of anxiety, depression, and general psychological distress were collected in eight European cohorts (n ranged 546-1,338; maximum total n = 6,268) whose mean age ranged from 55 to 79 years. Meta-analysis of the cohort results was performed, with follow-up associations of the top SNPs and genes investigated in independent cohorts (n = 527-6,032). Suggestive association (P = 8 × 10(-8)) of rs1079196 in the FHIT gene was observed with symptoms of anxiety. Other notable associations (P < 6.09 × 10(-6)) included SNPs in five genes for neuroticism (LCE3C, POLR3A, LMAN1L, ULK3, SCAMP2), KIAA0802 for extraversion, and NOS1 for general psychological distress. An association between symptoms of depression and rs7582472 (near to MGAT5 and NCKAP5) was replicated in two independent samples, but other replication findings were less consistent. Gene-based tests identified a significant locus on chromosome 15 (spanning five genes) associated with neuroticism which replicated (P < 0.05) in an independent cohort. Support for common genetic effects among personality and mood (particularly neuroticism and depressive symptoms) was found in terms of SNP association overlap and polygenic score prediction. The variance explained by individual SNPs was very small (up to 1%) confirming that there are no moderate/large effects of common SNPs on personality and related traits. Copyright © 2012 Wiley Periodicals, Inc.

  16. SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

    PubMed Central

    Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

    2013-01-01

    Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype. PMID:24058483

  17. Differential contribution of genomic regions to marked genetic variation and prediction of quantitative traits in broiler chickens.

    PubMed

    Abdollahi-Arpanahi, Rostam; Morota, Gota; Valente, Bruno D; Kranis, Andreas; Rosa, Guilherme J M; Gianola, Daniel

    2016-02-03

    Genome-wide association studies in humans have found enrichment of trait-associated single nucleotide polymorphisms (SNPs) in coding regions of the genome and depletion of these in intergenic regions. However, a recent release of the ENCyclopedia of DNA elements showed that ~80 % of the human genome has a biochemical function. Similar studies on the chicken genome are lacking, thus assessing the relative contribution of its genic and non-genic regions to variation is relevant for biological studies and genetic improvement of chicken populations. A dataset including 1351 birds that were genotyped with the 600K Affymetrix platform was used. We partitioned SNPs according to genome annotation data into six classes to characterize the relative contribution of genic and non-genic regions to genetic variation as well as their predictive power using all available quality-filtered SNPs. Target traits were body weight, ultrasound measurement of breast muscle and hen house egg production in broiler chickens. Six genomic regions were considered: intergenic regions, introns, missense, synonymous, 5' and 3' untranslated regions, and regions that are located 5 kb upstream and downstream of coding genes. Genomic relationship matrices were constructed for each genomic region and fitted in the models, separately or simultaneously. Kernel-based ridge regression was used to estimate variance components and assess predictive ability. Contribution of each class of genomic regions to dominance variance was also considered. Variance component estimates indicated that all genomic regions contributed to marked additive genetic variation and that the class of synonymous regions tended to have the greatest contribution. The marked dominance genetic variation explained by each class of genomic regions was similar and negligible (~0.05). In terms of prediction mean-square error, the whole-genome approach showed the best predictive ability. All genic and non-genic regions contributed to phenotypic variation for the three traits studied. Overall, the contribution of additive genetic variance to the total genetic variance was much greater than that of dominance variance. Our results show that all genomic regions are important for the prediction of the targeted traits, and the whole-genome approach was reaffirmed as the best tool for genome-enabled prediction of quantitative traits.

  18. Association of FKBP5 Polymorphisms and Childhood Abuse With Risk of Posttraumatic Stress Disorder Symptoms in Adults

    PubMed Central

    Binder, Elisabeth B.; Bradley, Rebekah G.; Liu, Wei; Epstein, Michael P.; Deveau, Todd C.; Mercer, Kristina B.; Tang, Yilang; Gillespie, Charles F.; Heim, Christine M.; Nemeroff, Charles B.; Schwartz, Ann C.; Cubells, Joseph F.; Ressler, Kerry J.

    2008-01-01

    Context In addition to trauma exposure, other factors contribute to risk for development of posttraumatic stress disorder (PTSD) in adulthood. Both genetic and environmental factors are contributory, with child abuse providing significant risk liability. Objective To increase understanding of genetic and environmental risk factors as well as their interaction in the development of PTSD by gene × environment interactions of child abuse, level of non–child abuse trauma exposure, and genetic polymorphisms at the stress-related gene FKBP5. Design, Setting, and Participants A cross-sectional study examining genetic and psychological risk factors in 900 non psychiatric clinic patients (762 included for all genotype studies) with significant levels of childhood abuse as well as non–child abuse trauma using a verbally presented survey combined with single-nucleotide polymorphism (SNP) genotyping. Participants were primarily urban, low-income, black (>95%) men and women seeking care in the general medical care and obstetrics-gynecology clinics of an urban public hospital in Atlanta, Georgia, between 2005 and 2007. Main Outcome Measures Severity of adult PTSD symptomatology, measured with the modified PTSD Symptom Scale, non–child abuse (primarily adult) trauma exposure and child abuse measured using the traumatic events inventory and 8 SNPs spanning the FKBP5 locus. Results Level of child abuse and non–child abuse trauma each separately predicted level of adult PTSD symptomatology (mean [SD], PTSD Symptom Scale for no child abuse, 8.03 [10.48] vs ≥2 types of abuse, 20.93 [14.32]; and for no non–child abuse trauma, 3.58 [6.27] vs ≥4 types, 16.74 [12.90]; P<.001). Although FKBP5 SNPs did not directly predict PTSD symptom outcome or interact with level of non–child abuse trauma to predict PTSD symptom severity, 4 SNPs in the FKBP5 locus significantly interacted (rs9296158, rs3800373, rs1360780, and rs9470080; minimum P=.0004) with the severity of child abuse to predict level of adult PTSD symptoms after correcting for multiple testing. This gene × environment interaction remained significant when controlling for depression severity scores, age, sex, levels of non–child abuse trauma exposure, and genetic ancestry. This genetic interaction was also paralleled by FKBP5 genotype-dependent and PTSD-dependent effects on glucocorticoid receptor sensitivity, measured by the dexamethasone suppression test. Conclusions Four SNPs of the FKBP5 gene interacted with severity of child abuse as a predictor of adult PTSD symptoms. There were no main effects of the SNPs on PTSD symptoms and no significant genetic interactions with level of non–child abuse trauma as predictor of adult PTSD symptoms, suggesting a potential gene-childhood environment interaction for adult PTSD. PMID:18349090

  19. Genetic markers enhance coronary risk prediction in men: the MORGAM prospective cohorts.

    PubMed

    Hughes, Maria F; Saarela, Olli; Stritzke, Jan; Kee, Frank; Silander, Kaisa; Klopp, Norman; Kontto, Jukka; Karvanen, Juha; Willenborg, Christina; Salomaa, Veikko; Virtamo, Jarmo; Amouyel, Phillippe; Arveiler, Dominique; Ferrières, Jean; Wiklund, Per-Gunner; Baumert, Jens; Thorand, Barbara; Diemert, Patrick; Trégouët, David-Alexandre; Hengstenberg, Christian; Peters, Annette; Evans, Alun; Koenig, Wolfgang; Erdmann, Jeanette; Samani, Nilesh J; Kuulasmaa, Kari; Schunkert, Heribert

    2012-01-01

    More accurate coronary heart disease (CHD) prediction, specifically in middle-aged men, is needed to reduce the burden of disease more effectively. We hypothesised that a multilocus genetic risk score could refine CHD prediction beyond classic risk scores and obtain more precise risk estimates using a prospective cohort design. Using data from nine prospective European cohorts, including 26,221 men, we selected in a case-cohort setting 4,818 healthy men at baseline, and used Cox proportional hazards models to examine associations between CHD and risk scores based on genetic variants representing 13 genomic regions. Over follow-up (range: 5-18 years), 1,736 incident CHD events occurred. Genetic risk scores were validated in men with at least 10 years of follow-up (632 cases, 1361 non-cases). Genetic risk score 1 (GRS1) combined 11 SNPs and two haplotypes, with effect estimates from previous genome-wide association studies. GRS2 combined 11 SNPs plus 4 SNPs from the haplotypes with coefficients estimated from these prospective cohorts using 10-fold cross-validation. Scores were added to a model adjusted for classic risk factors comprising the Framingham risk score and 10-year risks were derived. Both scores improved net reclassification (NRI) over the Framingham score (7.5%, p = 0.017 for GRS1, 6.5%, p = 0.044 for GRS2) but GRS2 also improved discrimination (c-index improvement 1.11%, p = 0.048). Subgroup analysis on men aged 50-59 (436 cases, 603 non-cases) improved net reclassification for GRS1 (13.8%) and GRS2 (12.5%). Net reclassification improvement remained significant for both scores when family history of CHD was added to the baseline model for this male subgroup improving prediction of early onset CHD events. Genetic risk scores add precision to risk estimates for CHD and improve prediction beyond classic risk factors, particularly for middle aged men.

  20. Use of single nucleotide polymorphisms in candidate genes associated with daughter pregnancy rate for prediction of genetic merit for reproduction in Holstein cows

    USDA-ARS?s Scientific Manuscript database

    We evaluated 69 SNPs in genes previously related to fertility and production traits for relationship to daughter pregnancy rate (DPR), cow conception rate (CCR) and heifer conception rate (HCR) in a separate population of Holstein cows grouped according to their predicted transmitting ability for DP...

  1. Development of a novel forensic STR multiplex for ancestry analysis and extended identity testing.

    PubMed

    Phillips, Chris; Fernandez-Formoso, Luis; Gelabert-Besada, Miguel; Garcia-Magariños, Manuel; Santos, Carla; Fondevila, Manuel; Carracedo, Angel; Lareu, Maria Victoria

    2013-04-01

    There is growing interest in developing additional DNA typing techniques to provide better investigative leads in forensic analysis. These include inference of genetic ancestry and prediction of common physical characteristics of DNA donors. To date, forensic ancestry analysis has centered on population-divergent SNPs but these binary loci cannot reliably detect DNA mixtures, common in forensic samples. Furthermore, STR genotypes, forming the principal DNA profiling system, are not routinely combined with forensic SNPs to strengthen frequency data available for ancestry inference. We report development of a 12-STR multiplex composed of ancestry informative marker STRs (AIM-STRs) selected from 434 tetranucleotide repeat loci. We adapted our online Bayesian classifier for AIM-SNPs: Snipper, to handle multiallele STR data using frequency-based training sets. We assessed the ability of the 12-plex AIM-STRs to differentiate CEPH Human Genome Diversity Panel populations, plus their informativeness combined with established forensic STRs and AIM-SNPs. We found combining STRs and SNPs improves the success rate of ancestry assignments while providing a reliable mixture detection system lacking from SNP analysis alone. As the 12 STRs generally show a broad range of alleles in all populations, they provide highly informative supplementary STRs for extended relationship testing and identification of missing persons with incomplete reference pedigrees. Lastly, mixed marker approaches (combining STRs with binary loci) for simple ancestry inference tests beyond forensic analysis bring advantages and we discuss the genotyping options available. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Genome-wide Association Study Implicates PARD3B-based AIDS Restriction

    PubMed Central

    Nelson, George W.; Lautenberger, James A.; Chinn, Leslie; McIntosh, Carl; Johnson, Randall C.; Sezgin, Efe; Kessing, Bailey; Malasky, Michael; Hendrickson, Sher L.; Pontius, Joan; Tang, Minzhong; An, Ping; Winkler, Cheryl A.; Limou, Sophie; Le Clerc, Sigrid; Delaneau, Olivier; Zagury, Jean-François; Schuitemaker, Hanneke; van Manen, Daniëlle; Bream, Jay H.; Gomperts, Edward D.; Buchbinder, Susan; Goedert, James J.; Kirk, Gregory D.; O'Brien, Stephen J.

    2011-01-01

    Background. Host genetic variation influences human immunodeficiency virus (HIV) infection and progression to AIDS. Here we used clinically well-characterized subjects from 5 pretreatment HIV/AIDS cohorts for a genome-wide association study to identify gene associations with rate of AIDS progression. Methods.  European American HIV seroconverters (n = 755) were interrogated for single-nucleotide polymorphisms (SNPs) (n = 700,022) associated with progression to AIDS 1987 (Cox proportional hazards regression analysis, co-dominant model). Results.  Association with slower progression was observed for SNPs in the gene PARD3B. One of these, rs11884476, reached genome-wide significance (relative hazard = 0.3; P =3. 370 × 10−9) after statistical correction for 700,022 SNPs and contributes 4.52% of the overall variance in AIDS progression in this study. Nine of the top-ranked SNPs define a PARD3B haplotype that also displays significant association with progression to AIDS (hazard ratio, 0.3; P = 3.220 × 10−8). One of these SNPs, rs10185378, is a predicted exonic splicing enhancer; significant alteration in the expression profile of PARD3B splicing transcripts was observed in B cell lines with alternate rs10185378 genotypes. This SNP was typed in European cohorts of rapid progressors and was found to be protective for AIDS 1993 definition (odds ratio, 0.43, P = .025). Conclusions. These observations suggest a potential unsuspected pathway of host genetic influence on the dynamics of AIDS progression. PMID:21502085

  3. The Influence of AHI1 Variants on the Diagnosis and Treatment Outcome in Schizophrenia

    PubMed Central

    Porcelli, Stefano; Pae, Chi-Un; Han, Changsu; Lee, Soo-Jung; Patkar, Ashwin A.; Masand, Prakash S.; Balzarro, Beatrice; Alberti, Siegfried; De Ronchi, Diana; Serretti, Alessandro

    2015-01-01

    The present study aimed to explore whether four single nucleotide polymorphisms (SNPs) within the AHI1 gene could be associated with schizophrenia (SCZ) and whether they could predict the clinical outcomes in SCZ patients treated with antipsychotics. Four hundred twenty-six (426) in-patients with SCZ and 345 controls were genotyped for four AHI1 SNPs (rs11154801, rs7750586, rs9647635 and rs9321501). Baseline and clinical measures for SCZ patients were assessed through the Positive and Negative Syndrome Scale (PANSS). Allelic and genotypic frequencies in SCZ subjects were compared with those of controls using the χ2 statistics. The repeated-measure ANOVA was used for the assessment of treatment outcomes measured by PANSS changes. The case-control analysis did not show any difference in the genotypic distribution of the SNPs, while in the allelic analysis, a weak association was found between the rs9647635 A allele and SCZ. Furthermore, in the haplotype analysis, three haplotypes resulted in being associated with SCZ. On the other hand, two SNPs (rs7750586 and rs9647635) were associated with clinical improvement of negative symptoms in the allelic analysis, although in the genotypic analysis, only trends of association were found for the same SNPs. Our findings suggest a possible influence of AHI1 variants on SCZ susceptibility and antipsychotic response, particularly concerning negative symptomatology. Subsequent well-designed studies would be mandatory to confirm our results due to the methodological shortcomings of the present study. PMID:25622261

  4. Lack of Association for Reported Endocrine Pancreatic Cancer Risk Loci in the PANDoRA Consortium.

    PubMed

    Campa, Daniele; Obazee, Ofure; Pastore, Manuela; Panzuto, Francesco; Liço, Valbona; Greenhalf, William; Katzke, Verena; Tavano, Francesca; Costello, Eithne; Corbo, Vincenzo; Talar-Wojnarowska, Renata; Strobel, Oliver; Zambon, Carlo Federico; Neoptolemos, John P; Zerboni, Giulia; Kaaks, Rudolf; Key, Timothy J; Lombardo, Carlo; Jamroziak, Krzysztof; Gioffreda, Domenica; Hackert, Thilo; Khaw, Kay-Tee; Landi, Stefano; Milanetto, Anna Caterina; Landoni, Luca; Lawlor, Rita T; Bambi, Franco; Pirozzi, Felice; Basso, Daniela; Pasquali, Claudio; Capurso, Gabriele; Canzian, Federico

    2017-08-01

    Background: Pancreatic neuroendocrine tumors (PNETs) are rare neoplasms for which very little is known about either environmental or genetic risk factors. Only a handful of association studies have been performed so far, suggesting a small number of risk loci. Methods: To replicate the best findings, we have selected 16 SNPs suggested in previous studies to be relevant in PNET etiogenesis. We genotyped the selected SNPs (rs16944, rs1052536, rs1059293, rs1136410, rs1143634, rs2069762, rs2236302, rs2387632, rs3212961, rs3734299, rs3803258, rs4962081, rs7234941, rs7243091, rs12957119, and rs1800629) in 344 PNET sporadic cases and 2,721 controls in the context of the PANcreatic Disease ReseArch (PANDoRA) consortium. Results: After correction for multiple testing, we did not observe any statistically significant association between the SNPs and PNET risk. We also used three online bioinformatic tools (HaploReg, RegulomeDB, and GTEx) to predict a possible functional role of the SNPs, but we did not observe any clear indication. Conclusions: None of the selected SNPs were convincingly associated with PNET risk in the PANDoRA consortium. Impact: We can exclude a major role of the selected polymorphisms in PNET etiology, and this highlights the need for replication of epidemiologic findings in independent populations, especially in rare diseases such as PNETs. Cancer Epidemiol Biomarkers Prev; 26(8); 1349-51. ©2017 AACR . ©2017 American Association for Cancer Research.

  5. Association of Toll-Like Receptor 4 Polymorphisms with Diabetic Foot Ulcers and Application of Artificial Neural Network in DFU Risk Assessment in Type 2 Diabetes Patients

    PubMed Central

    Singh, Kanhaiya; Agrawal, Neeraj K.; Gupta, Sanjeev K.

    2013-01-01

    The Toll-Like receptor 4 (TLR4) plays an important role in immunity, tissue repair, and regeneration. The objective of the present work was to evaluate the association of TLR4 single nucleotide polymorphisms (SNPs) rs4986790, rs4986791, rs11536858 (merged into rs10759931), rs1927911, and rs1927914 with increased diabetic foot ulcer (DFU) risk in patients with type 2 diabetes mellitus (T2DM). PCR-RFLP was used for genotyping TLR4 SNPs in 125 T2DM patients with DFU and 130 controls. The haplotypes and linkage disequilibrium between the SNPs were determined using Haploview software. Multivariate linear regression (MLR) and artificial neural network (ANN) modeling was done to observe their predictability for the risk of DFU in T2DM patients. Risk genotypes of all SNPs except rs1927914 were significantly associated with DFU. Haplotype ACATC (P value = 9.3E − 5) showed strong association with DFU risk. Two haplotypes ATATC (P value = 0.0119) and ATGTT (P value = 0.0087) were found to be protective against DFU. In conclusion TLR4 SNPs and their haplotypes may increase the risk of impairment of wound healing in T2DM patients. ANN model (83%) is found to be better than the MLR model (76%) and can be used as a tool for the DFU risk assessment in T2DM patients. PMID:23936790

  6. Using imputed genotype data in the joint score tests for genetic association and gene-environment interactions in case-control studies.

    PubMed

    Song, Minsun; Wheeler, William; Caporaso, Neil E; Landi, Maria Teresa; Chatterjee, Nilanjan

    2018-03-01

    Genome-wide association studies (GWAS) are now routinely imputed for untyped single nucleotide polymorphisms (SNPs) based on various powerful statistical algorithms for imputation trained on reference datasets. The use of predicted allele counts for imputed SNPs as the dosage variable is known to produce valid score test for genetic association. In this paper, we investigate how to best handle imputed SNPs in various modern complex tests for genetic associations incorporating gene-environment interactions. We focus on case-control association studies where inference for an underlying logistic regression model can be performed using alternative methods that rely on varying degree on an assumption of gene-environment independence in the underlying population. As increasingly large-scale GWAS are being performed through consortia effort where it is preferable to share only summary-level information across studies, we also describe simple mechanisms for implementing score tests based on standard meta-analysis of "one-step" maximum-likelihood estimates across studies. Applications of the methods in simulation studies and a dataset from GWAS of lung cancer illustrate ability of the proposed methods to maintain type-I error rates for the underlying testing procedures. For analysis of imputed SNPs, similar to typed SNPs, the retrospective methods can lead to considerable efficiency gain for modeling of gene-environment interactions under the assumption of gene-environment independence. Methods are made available for public use through CGEN R software package. © 2017 WILEY PERIODICALS, INC.

  7. Genome-Wide Association Studies for Taxane-Induced Peripheral Neuropathy in ECOG-5103 and ECOG-1199.

    PubMed

    Schneider, Bryan P; Li, Lang; Radovich, Milan; Shen, Fei; Miller, Kathy D; Flockhart, David A; Jiang, Guanglong; Vance, Gail; Gardner, Laura; Vatta, Matteo; Bai, Shaochun; Lai, Dongbing; Koller, Daniel; Zhao, Fengmin; O'Neill, Anne; Smith, Mary Lou; Railey, Elda; White, Carol; Partridge, Ann; Sparano, Joseph; Davidson, Nancy E; Foroud, Tatiana; Sledge, George W

    2015-11-15

    Taxane-induced peripheral neuropathy (TIPN) is an important survivorship issue for many cancer patients. Currently, there are no clinically implemented biomarkers to predict which patients might be at increased risk for TIPN. We present a comprehensive approach to identification of genetic variants to predict TIPN. We performed a genome-wide association study (GWAS) in 3,431 patients from the phase III adjuvant breast cancer trial, ECOG-5103 to compare genotypes with TIPN. We performed candidate validation of top SNPs for TIPN in another phase III adjuvant breast cancer trial, ECOG-1199. When evaluating for grade 3-4 TIPN, 120 SNPs had a P value of <10(-4) from patients of European descent (EA) in ECOG-5103. Thirty candidate SNPs were subsequently tested in ECOG-1199 and SNP rs3125923 was found to be significantly associated with grade 3-4 TIPN (P = 1.7 × 10(-3); OR, 1.8). Race was also a major predictor of TIPN, with patients of African descent (AA) experiencing increased risk of grade 2-4 TIPN (HR, 2.1; P = 5.6 × 10(-16)) and grade 3-4 TIPN (HR, 2.6; P = 1.1 × 10(-11)) compared with others. An SNP in FCAMR, rs1856746, had a trend toward an association with grade 2-4 TIPN in AA patients from the GWAS in ECOG-5103 (OR, 5.5; P = 1.6 × 10(-7)). rs3125923 represents a validated SNP to predict grade 3-4 TIPN. Genetically determined AA race represents the most significant predictor of TIPN. ©2015 American Association for Cancer Research.

  8. Comparison of Family History and SNPs for Predicting Risk of Complex Disease

    PubMed Central

    Do, Chuong B.; Hinds, David A.; Francke, Uta; Eriksson, Nicholas

    2012-01-01

    The clinical utility of family history and genetic tests is generally well understood for simple Mendelian disorders and rare subforms of complex diseases that are directly attributable to highly penetrant genetic variants. However, little is presently known regarding the performance of these methods in situations where disease susceptibility depends on the cumulative contribution of multiple genetic factors of moderate or low penetrance. Using quantitative genetic theory, we develop a model for studying the predictive ability of family history and single nucleotide polymorphism (SNP)–based methods for assessing risk of polygenic disorders. We show that family history is most useful for highly common, heritable conditions (e.g., coronary artery disease), where it explains roughly 20%–30% of disease heritability, on par with the most successful SNP models based on associations discovered to date. In contrast, we find that for diseases of moderate or low frequency (e.g., Crohn disease) family history accounts for less than 4% of disease heritability, substantially lagging behind SNPs in almost all cases. These results indicate that, for a broad range of diseases, already identified SNP associations may be better predictors of risk than their family history–based counterparts, despite the large fraction of missing heritability that remains to be explained. Our model illustrates the difficulty of using either family history or SNPs for standalone disease prediction. On the other hand, we show that, unlike family history, SNP–based tests can reveal extreme likelihood ratios for a relatively large percentage of individuals, thus providing potentially valuable adjunctive evidence in a differential diagnosis. PMID:23071447

  9. Observational study to calculate addictive risk to opioids: a validation study of a predictive algorithm to evaluate opioid use disorder

    PubMed Central

    Brenton, Ashley; Richeimer, Steven; Sharma, Maneesh; Lee, Chee; Kantorovich, Svetlana; Blanchard, John; Meshkin, Brian

    2017-01-01

    Background Opioid abuse in chronic pain patients is a major public health issue, with rapidly increasing addiction rates and deaths from unintentional overdose more than quadrupling since 1999. Purpose This study seeks to determine the predictability of aberrant behavior to opioids using a comprehensive scoring algorithm incorporating phenotypic risk factors and neuroscience-associated single-nucleotide polymorphisms (SNPs). Patients and methods The Proove Opioid Risk (POR) algorithm determines the predictability of aberrant behavior to opioids using a comprehensive scoring algorithm incorporating phenotypic risk factors and neuroscience-associated SNPs. In a validation study with 258 subjects with diagnosed opioid use disorder (OUD) and 650 controls who reported using opioids, the POR successfully categorized patients at high and moderate risks of opioid misuse or abuse with 95.7% sensitivity. Regardless of changes in the prevalence of opioid misuse or abuse, the sensitivity of POR remained >95%. Conclusion The POR correctly stratifies patients into low-, moderate-, and high-risk categories to appropriately identify patients at need for additional guidance, monitoring, or treatment changes. PMID:28572737

  10. [The joint applications of DNA chips and single nucleotide polymorphisms in forensic science].

    PubMed

    Bai, Peng; Tian, Li; Zhou, Xue-ping

    2005-05-01

    DNA chip technology, being a new high-technology, shows its vigorous life and rapid growth. Single Nucleotide Polymorphisms (SNPs) is the most common diversity in the human genome. It provides suitable genetic markers which play a key role in disease linkage study, pharmacogenomics, forensic medicine, population evolution and immigration study. Their advantage such as being analyzed with DNA chips technology, is predicted to play an important role in the field of forensic medicine, especially in paternity test and individual identification. This report mainly reviews the characteristics of DNA chip and SNPs, and their joint applications in the practice of forensic medicine.

  11. Whole-Genome Sequences of DA and F344 Rats with Different Susceptibilities to Arthritis, Autoimmunity, Inflammation and Cancer

    PubMed Central

    Guo, Xiaosen; Brenner, Max; Zhang, Xuemei; Laragione, Teresina; Tai, Shuaishuai; Li, Yanhong; Bu, Junjie; Yin, Ye; Shah, Anish A.; Kwan, Kevin; Li, Yingrui; Jun, Wang; Gulko, Pércio S.

    2013-01-01

    DA (D-blood group of Palm and Agouti, also known as Dark Agouti) and F344 (Fischer) are two inbred rat strains with differences in several phenotypes, including susceptibility to autoimmune disease models and inflammatory responses. While these strains have been extensively studied, little information is available about the DA and F344 genomes, as only the Brown Norway (BN) and spontaneously hypertensive rat strains have been sequenced to date. Here we report the sequencing of the DA and F344 genomes using next-generation Illumina paired-end read technology and the first de novo assembly of a rat genome. DA and F344 were sequenced with an average depth of 32-fold, covered 98.9% of the BN reference genome, and included 97.97% of known rat ESTs. New sequences could be assigned to 59 million positions with previously unknown data in the BN reference genome. Differences between DA, F344, and BN included 19 million positions in novel scaffolds, 4.09 million single nucleotide polymorphisms (SNPs) (including 1.37 million new SNPs), 458,224 short insertions and deletions, and 58,174 structural variants. Genetic differences between DA, F344, and BN, including high-impact SNPs and short insertions and deletions affecting >2500 genes, are likely to account for most of the phenotypic variation between these strains. The new DA and F344 genome sequencing data should facilitate gene discovery efforts in rat models of human disease. PMID:23695301

  12. Whole-genome sequences of DA and F344 rats with different susceptibilities to arthritis, autoimmunity, inflammation and cancer.

    PubMed

    Guo, Xiaosen; Brenner, Max; Zhang, Xuemei; Laragione, Teresina; Tai, Shuaishuai; Li, Yanhong; Bu, Junjie; Yin, Ye; Shah, Anish A; Kwan, Kevin; Li, Yingrui; Jun, Wang; Gulko, Pércio S

    2013-08-01

    DA (D-blood group of Palm and Agouti, also known as Dark Agouti) and F344 (Fischer) are two inbred rat strains with differences in several phenotypes, including susceptibility to autoimmune disease models and inflammatory responses. While these strains have been extensively studied, little information is available about the DA and F344 genomes, as only the Brown Norway (BN) and spontaneously hypertensive rat strains have been sequenced to date. Here we report the sequencing of the DA and F344 genomes using next-generation Illumina paired-end read technology and the first de novo assembly of a rat genome. DA and F344 were sequenced with an average depth of 32-fold, covered 98.9% of the BN reference genome, and included 97.97% of known rat ESTs. New sequences could be assigned to 59 million positions with previously unknown data in the BN reference genome. Differences between DA, F344, and BN included 19 million positions in novel scaffolds, 4.09 million single nucleotide polymorphisms (SNPs) (including 1.37 million new SNPs), 458,224 short insertions and deletions, and 58,174 structural variants. Genetic differences between DA, F344, and BN, including high-impact SNPs and short insertions and deletions affecting >2500 genes, are likely to account for most of the phenotypic variation between these strains. The new DA and F344 genome sequencing data should facilitate gene discovery efforts in rat models of human disease.

  13. A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety

    PubMed Central

    Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto

    2007-01-01

    Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749

  14. Structural and functional effects of nucleotide variation on the human TB drug metabolizing enzyme arylamine N-acetyltransferase 1.

    PubMed

    Cloete, Ruben; Akurugu, Wisdom A; Werely, Cedric J; van Helden, Paul D; Christoffels, Alan

    2017-08-01

    The human arylamine N-acetyltransferase 1 (NAT1) enzyme plays a vital role in determining the duration of action of amine-containing drugs such as para-aminobenzoic acid (PABA) by influencing the balance between detoxification and metabolic activation of these drugs. Recently, four novel single nucleotide polymorphisms (SNPs) were identified within a South African mixed ancestry population. Modeling the effects of these SNPs within the structural protein was done to assess possible structure and function changes in the enzyme. The use of molecular dynamics simulations and stability predictions indicated less thermodynamically stable protein structures containing E264K and V231G, while the N245I change showed a stabilizing effect. Coincidently the N245I change displayed a similar free energy landscape profile to the known R64W amino acid substitution (slow acetylator), while the R242M displayed a similar profile to the published variant, I263V (proposed fast acetylator), and the wild type protein structure. Similarly, principal component analysis indicated that two amino acid substitutions (E264K and V231G) occupied less conformational clusters of folded states as compared to the WT and were found to be destabilizing (may affect protein function). However, two of the four novel SNPs that result in amino acid changes: (V231G and N245I) were predicted by both SIFT and POLYPHEN-2 algorithms to affect NAT1 protein function, while two other SNPs that result in R242M and E264K substitutions showed contradictory results based on SIFT and POLYPHEN-2 analysis. In conclusion, the structural methods were able to verify that two non-synonymous substitutions (E264K and V231G) can destabilize the protein structure, and are in agreement with mCSM predictions, and should therefore be experimentally tested for NAT1 activity. These findings could inform a strategy of incorporating genotypic data (i.e., functional SNP alleles) with phenotypic information (slow or fast acetylator) to better prescribe effective treatment using drugs metabolized by NAT1. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  15. Clinical Utility of a Coronary Heart Disease Risk Prediction Gene Score in UK Healthy Middle Aged Men and in the Pakistani Population

    PubMed Central

    Beaney, Katherine E.; Cooper, Jackie A.; Ullah Shahid, Saleem; Ahmed, Waqas; Qamar, Raheel; Drenos, Fotios; Crockard, Martin A.; Humphries, Steve E.

    2015-01-01

    Background Numerous risk prediction algorithms based on conventional risk factors for Coronary Heart Disease (CHD) are available but provide only modest discrimination. The inclusion of genetic information may improve clinical utility. Methods We tested the use of two gene scores (GS) in the prospective second Northwick Park Heart Study (NPHSII) of 2775 healthy UK men (284 cases), and Pakistani case-control studies from Islamabad/Rawalpindi (321 cases/228 controls) and Lahore (414 cases/219 controls). The 19-SNP GS included SNPs in loci identified by GWAS and candidate gene studies, while the 13-SNP GS only included SNPs in loci identified by the CARDIoGRAMplusC4D consortium. Results In NPHSII, the mean of both gene scores was higher in those who went on to develop CHD over 13.5 years of follow-up (19-SNP p=0.01, 13-SNP p=7x10-3). In combination with the Framingham algorithm the GSs appeared to show improvement in discrimination (increase in area under the ROC curve, 19-SNP p=0.48, 13-SNP p=0.82) and risk classification (net reclassification improvement (NRI), 19-SNP p=0.28, 13-SNP p=0.42) compared to the Framingham algorithm alone, but these were not statistically significant. When considering only individuals who moved up a risk category with inclusion of the GS, the improvement in risk classification was statistically significant (19-SNP p=0.01, 13-SNP p=0.04). In the Pakistani samples, risk allele frequencies were significantly lower compared to NPHSII for 13/19 SNPs. In the Islamabad study, the mean gene score was higher in cases than controls only for the 13-SNP GS (2.24 v 2.34, p=0.04). There was no association with CHD and either score in the Lahore study. Conclusion The performance of both GSs showed potential clinical utility in European men but much less utility in subjects from Pakistan, suggesting that a different set of risk loci or SNPs may be required for risk prediction in the South Asian population. PMID:26133560

  16. Genetic information and the prediction of incident type 2 diabetes in a high-risk multiethnic population: the EpiDREAM genetic study.

    PubMed

    Anand, Sonia S; Meyre, David; Pare, Guillaume; Bailey, Swneke D; Xie, Changchun; Zhang, Xiaohe; Montpetit, Alexandre; Desai, Dipika; Bosch, Jackie; Mohan, Viswanathan; Diaz, Rafael; McQueen, Matthew J; Cordell, Heather J; Keavney, Bernard; Yusuf, Salim; Gaudet, Daniel; Gerstein, Hertzel; Engert, James C

    2013-09-01

    To determine if 16 single nucleotide polymorphisms (SNPs) associated with type 2 diabetes (T2DM) in Europeans are also associated with T2DM in South Asians and Latinos and if they can add to the prediction of incident T2DM in a high-risk population. In the EpiDREAM prospective cohort study, physical measures, questionnaires, and blood samples were collected from 25,063 individuals at risk for dysglycemia. Sixteen SNPs that have been robustly associated with T2DM in Europeans were genotyped. Among 15,466 European, South Asian, and Latino subjects, we examined the association of these 16 SNPs alone and combined in a gene score with incident cases of T2DM (n = 1,016) that developed during 3.3 years of follow-up. Nine of the 16 SNPs were significantly associated with T2DM, and their direction of effect was consistent across the three ethnic groups. The gene score was significantly higher among subjects who developed incident T2DM (cases vs. noncases: 16.47 [2.50] vs. 15.99 [2.56]; P = 0.00001). The gene score remained an independent predictor of incident T2DM, with an odds ratio of 1.08 (95% CI 1.05-1.11) per additional risk allele after adjustment for T2DM risk factors. The gene score in those with no family history of T2DM was 16.02, whereas it was 16.19 in those with one parent with T2DM and it was 16.32 in those with two parents with T2DM (P trend = 0.0004). The C statistic of T2DM risk factors was 0.708 (0.691-0.725) and increased only marginally to 0.714 (0.698-0.731) with the addition of the gene score (P for C statistic change = 0.0052). T2DM genetic associations are generally consistent across ethnic groups, and a gene score only adds marginal information to clinical factors for T2DM prediction.

  17. Estimated allele substitution effects underlying genomic evaluation models depend on the scaling of allele counts.

    PubMed

    Bouwman, Aniek C; Hayes, Ben J; Calus, Mario P L

    2017-10-30

    Genomic evaluation is used to predict direct genomic values (DGV) for selection candidates in breeding programs, but also to estimate allele substitution effects (ASE) of single nucleotide polymorphisms (SNPs). Scaling of allele counts influences the estimated ASE, because scaling of allele counts results in less shrinkage towards the mean for low minor allele frequency (MAF) variants. Scaling may become relevant for estimating ASE as more low MAF variants will be used in genomic evaluations. We show the impact of scaling on estimates of ASE using real data and a theoretical framework, and in terms of power, model fit and predictive performance. In a dairy cattle dataset with 630 K SNP genotypes, the correlation between DGV for stature from a random regression model using centered allele counts (RRc) and centered and scaled allele counts (RRcs) was 0.9988, whereas the overall correlation between ASE using RRc and RRcs was 0.27. The main difference in ASE between both methods was found for SNPs with a MAF lower than 0.01. Both the ratio (ASE from RRcs/ASE from RRc) and the regression coefficient (regression of ASE from RRcs on ASE from RRc) were much higher than 1 for low MAF SNPs. Derived equations showed that scenarios with a high heritability, a large number of individuals and a small number of variants have lower ratios between ASE from RRc and RRcs. We also investigated the optimal scaling parameter [from - 1 (RRcs) to 0 (RRc) in steps of 0.1] in the bovine stature dataset. We found that the log-likelihood was maximized with a scaling parameter of - 0.8, while the mean squared error of prediction was minimized with a scaling parameter of - 1, i.e., RRcs. Large differences in estimated ASE were observed for low MAF SNPs when allele counts were scaled or not scaled because there is less shrinkage towards the mean for scaled allele counts. We derived a theoretical framework that shows that the difference in ASE due to shrinkage is heavily influenced by the power of the data. Increasing the power results in smaller differences in ASE whether allele counts are scaled or not.

  18. Association between genetic variants and esophageal cancer risk.

    PubMed

    Yue, Chenli; Li, Miao; Da, Chenxing; Meng, Hongtao; Lv, Shaomin; Zhao, Xinhan

    2017-07-18

    We investigated whether single nucleotide polymorphisms (SNPs) in the nuclear assembly factor 1 (NAF1) and TNFAIP3-interacting protein 1 (TNIP1) gene were associated with susceptibility to esophageal cancer in a Chinese Han population. Five SNPs were genotyped and their relationship with esophageal cancer risk was analyzed in a sample of 386 esophageal cancer patients and 495 unrelated healthy controls recruited from the First Affiliated Hospital of Xi'an Jiaotong University. Patients with the AG genotype of rs2320615 were at lower risk of developing esophageal cancer than those with the GG genotype (adjusted odds ratio [OR] = 0.64, 95% confidence interval [CI] = 0.46-0.90, P = 0.009). The rs2320615 SNP was found to be associated with a decreased the risk of esophageal cancer in the dominant model (adjusted OR = 0.70, 95% CI = 0.51-0.96, P = 0.026). These results provide the first evidence that the rs2320615 in NAF1 was associated with reduced risk of esophageal cancer. Further studies with larger samples are warranted to confirm our findings.

  19. HRM and SNaPshot as alternative forensic SNP genotyping methods.

    PubMed

    Mehta, Bhavik; Daniel, Runa; McNevin, Dennis

    2017-09-01

    Single nucleotide polymorphisms (SNPs) have been widely used in forensics for prediction of identity, biogeographical ancestry (BGA) and externally visible characteristics (EVCs). Single base extension (SBE) assays, most notably SNaPshot® (Thermo Fisher Scientific), are commonly used for forensic SNP genotyping as they can be employed on standard instrumentation in forensic laboratories (e.g. capillary electrophoresis). High resolution melt (HRM) analysis is an alternative method and is a simple, fast, single tube assay for low throughput SNP typing. This study compares HRM and SNaPshot®. HRM produced reproducible and concordant genotypes at 500 pg, however, difficulties were encountered when genotyping SNPs with high GC content in flanking regions and differentiating variants of symmetrical SNPs. SNaPshot® was reproducible at 100 pg and is less dependent on SNP choice. HRM has a shorter processing time in comparison to SNaPshot®, avoids post PCR contamination risk and has potential as a screening tool for many forensic applications.

  20. Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information.

    PubMed

    Hieke, Stefanie; Benner, Axel; Schlenl, Richard F; Schumacher, Martin; Bullinger, Lars; Binder, Harald

    2016-08-30

    High-throughput technology allows for genome-wide measurements at different molecular levels for the same patient, e.g. single nucleotide polymorphisms (SNPs) and gene expression. Correspondingly, it might be beneficial to also integrate complementary information from different molecular levels when building multivariable risk prediction models for a clinical endpoint, such as treatment response or survival. Unfortunately, such a high-dimensional modeling task will often be complicated by a limited overlap of molecular measurements at different levels between patients, i.e. measurements from all molecular levels are available only for a smaller proportion of patients. We propose a sequential strategy for building clinical risk prediction models that integrate genome-wide measurements from two molecular levels in a complementary way. To deal with partial overlap, we develop an imputation approach that allows us to use all available data. This approach is investigated in two acute myeloid leukemia applications combining gene expression with either SNP or DNA methylation data. After obtaining a sparse risk prediction signature e.g. from SNP data, an automatically selected set of prognostic SNPs, by componentwise likelihood-based boosting, imputation is performed for the corresponding linear predictor by a linking model that incorporates e.g. gene expression measurements. The imputed linear predictor is then used for adjustment when building a prognostic signature from the gene expression data. For evaluation, we consider stability, as quantified by inclusion frequencies across resampling data sets. Despite an extremely small overlap in the application example with gene expression and SNPs, several genes are seen to be more stably identified when taking the (imputed) linear predictor from the SNP data into account. In the application with gene expression and DNA methylation, prediction performance with respect to survival also indicates that the proposed approach might work well. We consider imputation of linear predictor values to be a feasible and sensible approach for dealing with partial overlap in complementary integrative analysis of molecular measurements at different levels. More generally, these results indicate that a complementary strategy for integrating different molecular levels can result in more stable risk prediction signatures, potentially providing a more reliable insight into the underlying biology.

  1. On the distance of genetic relationships and the accuracy of genomic prediction in pig breeding.

    PubMed

    Meuwissen, Theo H E; Odegard, Jorgen; Andersen-Ranberg, Ina; Grindflek, Eli

    2014-08-01

    With the advent of genomic selection, alternative relationship matrices are used in animal breeding, which vary in their coverage of distant relationships due to old common ancestors. Relationships based on pedigree (A) and linkage analysis (GLA) cover only recent relationships because of the limited depth of the known pedigree. Relationships based on identity-by-state (G) include relationships up to the age of the SNP (single nucleotide polymorphism) mutations. We hypothesised that the latter relationships were too old, since QTL (quantitative trait locus) mutations for traits under selection were probably more recent than the SNPs on a chip, which are typically selected for high minor allele frequency. In addition, A and GLA relationships are too recent to cover genetic differences accurately. Thus, we devised a relationship matrix that considered intermediate-aged relationships and compared all these relationship matrices for their accuracy of genomic prediction in a pig breeding situation. Haplotypes were constructed and used to build a haplotype-based relationship matrix (GH), which considers more intermediate-aged relationships, since haplotypes recombine more quickly than SNPs mutate. Dense genotypes (38 453 SNPs) on 3250 elite breeding pigs were combined with phenotypes for growth rate (2668 records), lean meat percentage (2618), weight at three weeks of age (7387) and number of teats (5851) to estimate breeding values for all animals in the pedigree (8187 animals) using the aforementioned relationship matrices. Phenotypes on the youngest 424 to 486 animals were masked and predicted in order to assess the accuracy of the alternative genomic predictions. Correlations between the relationships and regressions of older on younger relationships revealed that the age of the relationships increased in the order A, GLA, GH and G. Use of genomic relationship matrices yielded significantly higher prediction accuracies than A. GH and G, differed not significantly, but were significantly more accurate than GLA. Our hypothesis that intermediate-aged relationships yield more accurate genomic predictions than G was confirmed for two of four traits, but these results were not statistically significant. Use of estimated genotype probabilities for ungenotyped animals proved to be an efficient method to include the phenotypes of ungenotyped animals.

  2. A Novel Multiplex HRM Assay to Detect Clopidogrel Resistance.

    PubMed

    Zhang, Lichen; Ma, Xiaowei; You, Guoling; Zhang, Xiaoqing; Fu, Qihua

    2017-11-22

    Clopidogrel is an antiplatelet medicine used to prevent blood clots in patients who have had a heart attack, stroke, or other symptoms. Variability in the clinical response to clopidogrel treatment has been attributed to genetic factors. In particular, five SNPs of rs4244285, rs4986893, rs12248560, rs662 and rs1045642 have been associated with resistance to clopidogrel therapy in Chinese population. This work involves the development of a multiplex high-resolution melting (HRM) assay to genotype all five of these loci in 2 tubes. Amplicons corresponding to distinct SNPs in a common tube were designed with the aid of uMelt prediction software to have different melting temperatures Tm by addition of a GC-rich tail to the 5' end of the certain primers. Two kinds of commercial methods, Digital Fluorescence Molecular Hybridization (DFMH) and Sanger sequencing, were used as a control. Three hundred sixteen DFMH pretested samples from consecutive acute coronary syndrome patients were used for a blinded study of multiplex HRM. The sensitivity of HRM was 100% and the specificity was 99.93% reflecting detection of variants other than the known resistance SNPs. Multiplex HRM is an effective closed-tube, highly accurate, fast, and inexpensive method for genotyping the 5 clopidogrel resistance associated SNPs.

  3. Validation of Clinical Testing for Warfarin Sensitivity

    PubMed Central

    Langley, Michael R.; Booker, Jessica K.; Evans, James P.; McLeod, Howard L.; Weck, Karen E.

    2009-01-01

    Responses to warfarin (Coumadin) anticoagulation therapy are affected by genetic variability in both the CYP2C9 and VKORC1 genes. Validation of pharmacogenetic testing for warfarin responses includes demonstration of analytical validity of testing platforms and of the clinical validity of testing. We compared four platforms for determining the relevant single nucleotide polymorphisms (SNPs) in both CYP2C9 and VKORC1 that are associated with warfarin sensitivity (Third Wave Invader Plus, ParagonDx/Cepheid Smart Cycler, Idaho Technology LightCycler, and AutoGenomics Infiniti). Each method was examined for accuracy, cost, and turnaround time. All genotyping methods demonstrated greater than 95% accuracy for identifying the relevant SNPs (CYP2C9 *2 and *3; VKORC1 −1639 or 1173). The ParagonDx and Idaho Technology assays had the shortest turnaround and hands-on times. The Third Wave assay was readily scalable to higher test volumes but had the longest hands-on time. The AutoGenomics assay interrogated the largest number of SNPs but had the longest turnaround time. Four published warfarin-dosing algorithms (Washington University, UCSF, Louisville, and Newcastle) were compared for accuracy for predicting warfarin dose in a retrospective analysis of a local patient population on long-term, stable warfarin therapy. The predicted doses from both the Washington University and UCSF algorithms demonstrated the best correlation with actual warfarin doses. PMID:19324988

  4. Validation of clinical testing for warfarin sensitivity: comparison of CYP2C9-VKORC1 genotyping assays and warfarin-dosing algorithms.

    PubMed

    Langley, Michael R; Booker, Jessica K; Evans, James P; McLeod, Howard L; Weck, Karen E

    2009-05-01

    Responses to warfarin (Coumadin) anticoagulation therapy are affected by genetic variability in both the CYP2C9 and VKORC1 genes. Validation of pharmacogenetic testing for warfarin responses includes demonstration of analytical validity of testing platforms and of the clinical validity of testing. We compared four platforms for determining the relevant single nucleotide polymorphisms (SNPs) in both CYP2C9 and VKORC1 that are associated with warfarin sensitivity (Third Wave Invader Plus, ParagonDx/Cepheid Smart Cycler, Idaho Technology LightCycler, and AutoGenomics Infiniti). Each method was examined for accuracy, cost, and turnaround time. All genotyping methods demonstrated greater than 95% accuracy for identifying the relevant SNPs (CYP2C9 *2 and *3; VKORC1 -1639 or 1173). The ParagonDx and Idaho Technology assays had the shortest turnaround and hands-on times. The Third Wave assay was readily scalable to higher test volumes but had the longest hands-on time. The AutoGenomics assay interrogated the largest number of SNPs but had the longest turnaround time. Four published warfarin-dosing algorithms (Washington University, UCSF, Louisville, and Newcastle) were compared for accuracy for predicting warfarin dose in a retrospective analysis of a local patient population on long-term, stable warfarin therapy. The predicted doses from both the Washington University and UCSF algorithms demonstrated the best correlation with actual warfarin doses.

  5. Functional relevance for type 1 diabetes mellitus-associated genetic variants by using integrative analyses.

    PubMed

    Qiu, Ying-Hua; Deng, Fei-Yan; Tang, Zai-Xiang; Jiang, Zhen-Huan; Lei, Shu-Feng

    2015-10-01

    Type 1 diabetes mellitus (type 1 DM) is an autoimmune disease. Although genome-wide association studies (GWAS) and meta-analyses have successfully identified numerous type 1 DM-associated susceptibility loci, the underlying mechanisms for these susceptibility loci are currently largely unclear. Based on publicly available datasets, we performed integrative analyses (i.e., integrated gene relationships among implicated loci, differential gene expression analysis, functional prediction and functional annotation clustering analysis) and combined with expression quantitative trait loci (eQTL) results to further explore function mechanisms underlying the associations between genetic variants and type 1 DM. Among a total of 183 type 1 DM-associated SNPs, eQTL analysis showed that 17 SNPs with cis-regulated eQTL effects on 9 genes. All the 9 eQTL genes enrich in immune-related pathways or Gene Ontology (GO) terms. Functional prediction analysis identified 5 SNPs located in transcription factor (TF) binding sites. Of the 9 eQTL genes, 6 (TAP2, HLA-DOB, HLA-DQB1, HLA-DQA1, HLA-DRB5 and CTSH) were differentially expressed in type 1 DM-associated related cells. Especially, rs3825932 in CTSH has integrative functional evidence supporting the association with type 1 DM. These findings indicated that integrative analyses can yield important functional information to link genetic variants and type 1 DM. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  6. Quantitative self-assembly prediction yields targeted nanomedicines

    NASA Astrophysics Data System (ADS)

    Shamay, Yosi; Shah, Janki; Işık, Mehtap; Mizrachi, Aviram; Leibold, Josef; Tschaharganeh, Darjus F.; Roxbury, Daniel; Budhathoki-Uprety, Januka; Nawaly, Karla; Sugarman, James L.; Baut, Emily; Neiman, Michelle R.; Dacek, Megan; Ganesh, Kripa S.; Johnson, Darren C.; Sridharan, Ramya; Chu, Karen L.; Rajasekhar, Vinagolu K.; Lowe, Scott W.; Chodera, John D.; Heller, Daniel A.

    2018-02-01

    Development of targeted nanoparticle drug carriers often requires complex synthetic schemes involving both supramolecular self-assembly and chemical modification. These processes are generally difficult to predict, execute, and control. We describe herein a targeted drug delivery system that is accurately and quantitatively predicted to self-assemble into nanoparticles based on the molecular structures of precursor molecules, which are the drugs themselves. The drugs assemble with the aid of sulfated indocyanines into particles with ultrahigh drug loadings of up to 90%. We devised quantitative structure-nanoparticle assembly prediction (QSNAP) models to identify and validate electrotopological molecular descriptors as highly predictive indicators of nano-assembly and nanoparticle size. The resulting nanoparticles selectively targeted kinase inhibitors to caveolin-1-expressing human colon cancer and autochthonous liver cancer models to yield striking therapeutic effects while avoiding pERK inhibition in healthy skin. This finding enables the computational design of nanomedicines based on quantitative models for drug payload selection.

  7. Assembly of the Genome of the Disease Vector Aedes aegypti onto a Genetic Linkage Map Allows Mapping of Genes Affecting Disease Transmission

    PubMed Central

    Juneja, Punita; Osei-Poku, Jewelna; Ho, Yung S.; Ariani, Cristina V.; Palmer, William J.; Pain, Arnab; Jiggins, Francis M.

    2014-01-01

    The mosquito Aedes aegypti transmits some of the most important human arboviruses, including dengue, yellow fever and chikungunya viruses. It has a large genome containing many repetitive sequences, which has resulted in the genome being poorly assembled — there are 4,758 scaffolds, few of which have been assigned to a chromosome. To allow the mapping of genes affecting disease transmission, we have improved the genome assembly by scoring a large number of SNPs in recombinant progeny from a cross between two strains of Ae. aegypti, and used these to generate a genetic map. This revealed a high rate of misassemblies in the current genome, where, for example, sequences from different chromosomes were found on the same scaffold. Once these were corrected, we were able to assign 60% of the genome sequence to chromosomes and approximately order the scaffolds along the chromosome. We found that there are very large regions of suppressed recombination around the centromeres, which can extend to as much as 47% of the chromosome. To illustrate the utility of this new genome assembly, we mapped a gene that makes Ae. aegypti resistant to the human parasite Brugia malayi, and generated a list of candidate genes that could be affecting the trait. PMID:24498447

  8. Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD.

    PubMed

    Sun, Wei; Kechris, Katerina; Jacobson, Sean; Drummond, M Bradley; Hawkins, Gregory A; Yang, Jenny; Chen, Ting-Huei; Quibrera, Pedro Miguel; Anderson, Wayne; Barr, R Graham; Basta, Patricia V; Bleecker, Eugene R; Beaty, Terri; Casaburi, Richard; Castaldi, Peter; Cho, Michael H; Comellas, Alejandro; Crapo, James D; Criner, Gerard; Demeo, Dawn; Christenson, Stephanie A; Couper, David J; Curtis, Jeffrey L; Doerschuk, Claire M; Freeman, Christine M; Gouskova, Natalia A; Han, MeiLan K; Hanania, Nicola A; Hansel, Nadia N; Hersh, Craig P; Hoffman, Eric A; Kaner, Robert J; Kanner, Richard E; Kleerup, Eric C; Lutz, Sharon; Martinez, Fernando J; Meyers, Deborah A; Peters, Stephen P; Regan, Elizabeth A; Rennard, Stephen I; Scholand, Mary Beth; Silverman, Edwin K; Woodruff, Prescott G; O'Neal, Wanda K; Bowler, Russell P

    2016-08-01

    Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10-10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10-392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group.

  9. Common Genetic Polymorphisms Influence Blood Biomarker Measurements in COPD

    PubMed Central

    Drummond, M. Bradley; Hawkins, Gregory A.; Yang, Jenny; Chen, Ting-huei; Quibrera, Pedro Miguel; Anderson, Wayne; Barr, R. Graham; Bleecker, Eugene R.; Beaty, Terri; Casaburi, Richard; Castaldi, Peter; Cho, Michael H.; Comellas, Alejandro; Crapo, James D.; Criner, Gerard; Demeo, Dawn; Christenson, Stephanie A.; Couper, David J.; Doerschuk, Claire M.; Freeman, Christine M.; Gouskova, Natalia A.; Han, MeiLan K.; Hanania, Nicola A.; Hansel, Nadia N.; Hersh, Craig P.; Hoffman, Eric A.; Kaner, Robert J.; Kanner, Richard E.; Kleerup, Eric C.; Lutz, Sharon; Martinez, Fernando J.; Meyers, Deborah A.; Peters, Stephen P.; Regan, Elizabeth A.; Rennard, Stephen I.; Scholand, Mary Beth; Silverman, Edwin K.; Woodruff, Prescott G.; O’Neal, Wanda K.; Bowler, Russell P.

    2016-01-01

    Implementing precision medicine for complex diseases such as chronic obstructive lung disease (COPD) will require extensive use of biomarkers and an in-depth understanding of how genetic, epigenetic, and environmental variations contribute to phenotypic diversity and disease progression. A meta-analysis from two large cohorts of current and former smokers with and without COPD [SPIROMICS (N = 750); COPDGene (N = 590)] was used to identify single nucleotide polymorphisms (SNPs) associated with measurement of 88 blood proteins (protein quantitative trait loci; pQTLs). PQTLs consistently replicated between the two cohorts. Features of pQTLs were compared to previously reported expression QTLs (eQTLs). Inference of causal relations of pQTL genotypes, biomarker measurements, and four clinical COPD phenotypes (airflow obstruction, emphysema, exacerbation history, and chronic bronchitis) were explored using conditional independence tests. We identified 527 highly significant (p < 8 X 10−10) pQTLs in 38 (43%) of blood proteins tested. Most pQTL SNPs were novel with low overlap to eQTL SNPs. The pQTL SNPs explained >10% of measured variation in 13 protein biomarkers, with a single SNP (rs7041; p = 10−392) explaining 71%-75% of the measured variation in vitamin D binding protein (gene = GC). Some of these pQTLs [e.g., pQTLs for VDBP, sRAGE (gene = AGER), surfactant protein D (gene = SFTPD), and TNFRSF10C] have been previously associated with COPD phenotypes. Most pQTLs were local (cis), but distant (trans) pQTL SNPs in the ABO blood group locus were the top pQTL SNPs for five proteins. The inclusion of pQTL SNPs improved the clinical predictive value for the established association of sRAGE and emphysema, and the explanation of variance (R2) for emphysema improved from 0.3 to 0.4 when the pQTL SNP was included in the model along with clinical covariates. Causal modeling provided insight into specific pQTL-disease relationships for airflow obstruction and emphysema. In conclusion, given the frequency of highly significant local pQTLs, the large amount of variance potentially explained by pQTL, and the differences observed between pQTLs and eQTLs SNPs, we recommend that protein biomarker-disease association studies take into account the potential effect of common local SNPs and that pQTLs be integrated along with eQTLs to uncover disease mechanisms. Large-scale blood biomarker studies would also benefit from close attention to the ABO blood group. PMID:27532455

  10. Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

    PubMed

    Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

    2016-01-01

    Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.

  11. The Development of a High Density Linkage Map for Black Tiger Shrimp (Penaeus monodon) Based on cSNPs

    PubMed Central

    Baranski, Matthew; Gopikrishna, Gopalapillay; Robinson, Nicholas A.; Katneni, Vinaya Kumar; Shekhar, Mudagandur S.; Shanmugakarthik, Jayakani; Jothivel, Sarangapani; Gopal, Chavali; Ravichandran, Pitchaiyappan; Kent, Matthew; Arnyasi, Mariann; Ponniah, Alphis G.

    2014-01-01

    Transcriptome sequencing using Illumina RNA-seq was performed on populations of black tiger shrimp from India. Samples were collected from (i) four landing centres around the east coastline (EC) of India, (ii) survivors of a severe WSSV infection during pond culture (SUR) and (iii) the Andaman Islands (AI) in the Bay of Bengal. Equal quantities of purified total RNA from homogenates of hepatopancreas, muscle, nervous tissue, intestinal tract, heart, gonad, gills, pleopod and lymphoid organs were combined to create AI, EC and SUR pools for RNA sequencing. De novo transcriptome assembly resulted in 136,223 contigs (minimum size 100 base pairs, bp) with a total length 61 Mb, an average length of 446 bp and an average coverage of 163× across all pools. Approximately 16% of contigs were annotated with BLAST hit information and gene ontology annotations. A total of 473,620 putative SNPs/indels were identified. An Illumina iSelect genotyping array containing 6,000 SNPs was developed and used to genotype 1024 offspring belonging to seven full-sibling families. A total of 3959 SNPs were mapped to 44 linkage groups. The linkage groups consisted of between 16–129 and 13–130 markers, of length between 139–10.8 and 109.1–10.5 cM and with intervals averaging between 1.2 and 0.9 cM for the female and male maps respectively. The female map was 28% longer than the male map (4060 and 2917 cM respectively) with a 1.6 higher recombination rate observed for female compared to male meioses. This approach has substantially increased expressed sequence and DNA marker resources for tiger shrimp and is a useful resource for QTL mapping and association studies for evolutionarily and commercially important traits. PMID:24465553

  12. An SNP resource for rice genetics and breeding based on subspecies indica and japonica genome alignments.

    PubMed

    Feltus, F Alex; Wan, Jun; Schulze, Stefan R; Estill, James C; Jiang, Ning; Paterson, Andrew H

    2004-09-01

    Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% +/- 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% +/- 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp.

  13. An SNP Resource for Rice Genetics and Breeding Based on Subspecies Indica and Japonica Genome Alignments

    PubMed Central

    Feltus, F. Alex; Wan, Jun; Schulze, Stefan R.; Estill, James C.; Jiang, Ning; Paterson, Andrew H.

    2004-01-01

    Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% ± 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% ± 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp. PMID:15342564

  14. Chromatin remodeling gene EZH2 involved in the genetic etiology of autism in Chinese Han population.

    PubMed

    Li, Jun; You, Yang; Yue, Weihua; Yu, Hao; Lu, Tianlan; Wu, Zhiliu; Jia, Meixiang; Ruan, Yanyan; Liu, Jing; Zhang, Dai; Wang, Lifang

    2016-01-01

    Autism spectrum disorder (ASD) is a group of severe neurodevelopmental disorders. Epigenetic factors play a critical role in the etiology of ASD. Enhancer of zest homolog 2 (EZH2), which encodes a histone methyltransferase, plays an important role in the process of chromatin remodeling during neurodevelopment. Further, EZH2 is located in chromosome 7q35-36, which is one of the linkage regions for autism. However, the genetic relationship between autism and EZH2 remains unclear. To investigate the association between EZH2 and autism in Chinese Han population, we performed a family-based association study between autism and three tagged single nucleotide polymorphisms (SNPs) that covered 95.4% of the whole region of EZH2. In the discovery cohort of 239 trios, two SNPs (rs740949 and rs6464926) showed a significant association with autism. To decrease false positive results, we expanded the sample size to 427 trios. A SNP (rs6464926) was significantly associated with autism even after Bonferroni correction (p=0.008). Haplotype G-T (rs740949 and rs6464926) was a risk factor for autism (Z=2.655, p=0.008, Global p=0.024). In silico function prediction for SNPs indicated that these two SNPs might be regulatory SNPs. Expression pattern of EZH2 showed that it is highly expressed in human embryonic brains. In conclusion, our findings demonstrate that EZH2 might contribute to the genetic etiology of autism in Chinese Han population. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  15. Application of Multi-SNP Approaches Bayesian LASSO and AUC-RF to Detect Main Effects of Inflammatory-Gene Variants Associated with Bladder Cancer Risk

    PubMed Central

    Calle, M. Luz; Rothman, Nathaniel; Urrea, Víctor; Kogevinas, Manolis; Petrus, Sandra; Chanock, Stephen J.; Tardón, Adonina; García-Closas, Montserrat; González-Neira, Anna; Vellalta, Gemma; Carrato, Alfredo; Navarro, Arcadi; Lorente-Galdós, Belén; Silverman, Debra T.; Real, Francisco X.; Wu, Xifeng; Malats, Núria

    2013-01-01

    The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk. PMID:24391818

  16. SNPs selection using support vector regression and genetic algorithms in GWAS

    PubMed Central

    2014-01-01

    Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332

  17. IL28B polymorphisms of both recipient and donor cooperate to influence IFN treatment response in HCV recurrence after liver transplantation, but IL28B SNPs of the recipient play a major role in IFN-induced blocking of HCV replication.

    PubMed

    Barbera, Floriana; Russelli, Giovanna; Pipitone, Loredana; Pietrosi, Giada; Corsale, Sveva; Vizzini, Giovanni; Gridelli, Bruno; Conaldi, Pier Giulio

    2015-04-01

    Single nucleotide polymorphisms (SNPs) of the IL28B locus are associated with a positive response to pegylated interferon-alpha and ribavirin (pegIFN-alpha/RBV) treatment of HCV-infected patients. This study evaluated the association between SNPs rs12980275, rs12979860 and rs8099917 and treatment outcome of HCV recurrent infection in HCV-positive patients who underwent liver transplant. We aimed to assess to what extent recipient and/or graft donor IL28B polymorphisms contribute to HCV clearance after transplantation influencing the response to the antiviral treatment. We found that the allele frequencies in donors were in agreement with the pattern expected in the European population. The frequency of favourable genotypes was significantly lower in recipients than in donors, reasonably because the recipients represented a group of patients affected by chronic Hepatitis C. Our study demonstrated that the positive outcome of the pegIFN-alpha/RBV treatment of HCV recurrence is associated with the co-presence of favourable genotypes of both donors and recipients. However, IL28B SNPs of the recipient seem to play a major role in this clinical setting. In particular, homozygosis of rs12979860 favourable genotype in recipients was associated with sustained virological response independently from the donor's genotype. Thus, identification of these SNPs may be useful to predict the response to IFN-based therapy of HCV recurrent infection in liver-transplanted patients.

  18. Landscape genomic analysis of candidate genes for climate adaptation in a California endemic oak, Quercus lobata.

    PubMed

    Sork, Victoria L; Squire, Kevin; Gugger, Paul F; Steele, Stephanie E; Levy, Eric D; Eckert, Andrew J

    2016-01-01

    The ability of California tree populations to survive anthropogenic climate change will be shaped by the geographic structure of adaptive genetic variation. Our goal is to test whether climate-associated candidate genes show evidence of spatially divergent selection in natural populations of valley oak, Quercus lobata, as preliminary indication of local adaptation. Using DNA from 45 individuals from 13 localities across the species' range, we sequenced portions of 40 candidate genes related to budburst/flowering, growth, osmotic stress, and temperature stress. Using 195 single nucleotide polymorphisms (SNPs), we estimated genetic differentiation across populations and correlated allele frequencies with climate gradients using single-locus and multivariate models. The top 5% of FST estimates ranged from 0.25 to 0.68, yielding loci potentially under spatially divergent selection. Environmental analyses of SNP frequencies with climate gradients revealed three significantly correlated SNPs within budburst/flowering genes and two SNPs within temperature stress genes with mean annual precipitation, after controlling for multiple testing. A redundancy model showed a significant association between SNPs and climate variables and revealed a similar set of SNPs with high loadings on the first axis. In the RDA, climate accounted for 67% of the explained variation, when holding climate constant, in contrast to a putatively neutral SSR data set where climate accounted for only 33%. Population differentiation and geographic gradients of allele frequencies in climate-associated functional genes in Q. lobata provide initial evidence of adaptive genetic variation and background for predicting population response to climate change. © 2016 Botanical Society of America.

  19. Insights Into Upland Cotton (Gossypium hirsutum L.) Genetic Recombination Based on 3 High-Density Single-Nucleotide Polymorphism and a Consensus Map Developed Independently With Common Parents.

    PubMed

    Ulloa, Mauricio; Hulse-Kemp, Amanda M; De Santiago, Luis M; Stelly, David M; Burke, John J

    2017-01-01

    High-density linkage maps are vital to supporting the correct placement of scaffolds and gene sequences on chromosomes and fundamental to contemporary organismal research and scientific approaches to genetic improvement, especially in paleopolyploids with exceptionally complex genomes, eg, upland cotton ( Gossypium hirsutum L., "2n = 52"). Three independently developed intraspecific upland mapping populations were analyzed to generate 3 high-density genetic linkage single-nucleotide polymorphism (SNP) maps and a consensus map using the CottonSNP63K array. The populations consisted of a previously reported F 2 , a recombinant inbred line (RIL), and reciprocal RIL population, from "Phytogen 72" and "Stoneville 474" cultivars. The cluster file provided 7417 genotyped SNP markers, resulting in 26 linkage groups corresponding to the 26 chromosomes (c) of the allotetraploid upland cotton (AD) 1 arisen from the merging of 2 genomes ("A" Old World and "D" New World). Patterns of chromosome-specific recombination were largely consistent across mapping populations. The high-density genetic consensus map included 7244 SNP markers that spanned 3538 cM and comprised 3824 SNP bins, of which 1783 and 2041 were in the A t and D t subgenomes with 1825 and 1713 cM map lengths, respectively. Subgenome average distances were nearly identical, indicating that subgenomic differences in bin number arose due to the high numbers of SNPs on the D t subgenome. Examination of expected recombination frequency or crossovers (COs) on the chromosomes within each population of the 2 subgenomes revealed that COs were also not affected by the SNPs or SNP bin number in these subgenomes. Comparative alignment analyses identified historical ancestral A t -subgenomic translocations of c02 and c03, as well as of c04 and c05. The consensus map SNP sequences aligned with high congruency to the NBI assembly of Gossypium hirsutum . However, the genomic comparisons revealed evidence of additional unconfirmed possible duplications, inversions and translocations, and unbalance SNP sequence homology or SNP sequence/loci genomic dominance, or homeolog loci bias of the upland tetraploid A t and D t subgenomes. The alignments indicated that 364 SNP-associated previously unintegrated scaffolds can be placed in pseudochromosomes of the NBI G hirsutum assembly. This is the first intraspecific SNP genetic linkage consensus map assembled in G hirsutum with a core of reproducible mendelian SNP markers assayed on different populations and it provides further knowledge of chromosome arrangement of genic and nongenic SNPs. Together, the consensus map and RIL populations provide a synergistically useful platform for localizing and identifying agronomically important loci for improvement of the cotton crop.

  20. Identification of IDUA and WNT16 Phosphorylation-Related Non-Synonymous Polymorphisms for Bone Mineral Density in Meta-Analyses of Genome-Wide Association Studies

    PubMed Central

    Niu, Tianhua; Liu, Ning; Yu, Xun; Zhao, Ming; Choi, Hyung Jin; Leo, Paul J.; Brown, Matthew A.; Zhang, Lei; Pei, Yu-Fang; Shen, Hui; He, Hao; Fu, Xiaoying; Lu, Shan; Chen, Xiang-Ding; Tan, Li-Jun; Yang, Tie-Lin; Guo, Yan; Cho, Nam H.; Shen, Jie; Guo, Yan-Fang; Nicholson, Geoffrey C.; Prince, Richard L.; Eisman, John A.; Jones, Graeme; Sambrook, Philip N.; Tian, Qing; Zhu, Xue-Zhen; Papasian, Christopher J.; Duncan, Emma L.; Uitterlinden, André G.; Shin, Chan Soo; Xiang, Shuanglin; Deng, Hong-Wen

    2016-01-01

    Protein phosphorylation regulates a wide variety of cellular processes. Thus, we hypothesize that single nucleotide polymorphisms (SNPs) that may modulate protein phosphorylation could affect osteoporosis risk. Based on a previous conventional genome-wide association (GWA) study, we conducted a three-stage meta-analysis targeting phosphorylation-related SNPs (phosSNPs) for femoral neck (FN)-, total hip (HIP)-, and Lumbar Spine (LS)-BMD phenotypes. In stage 1, 9,593 phosSNPs were meta-analyzed in 11,140 individuals of various ancestries. Genome-wide significance (GWS) and suggestive significance were defined by α = 5.21×10−6 (0.05/9,593) and 1.00×10−4, respectively. In stage 2, 9 stage 1-discovered phosSNPs (based on α = 1.00×10−4) were in silico meta-analyzed in Dutch, Korean, and Australian cohorts. In stage 3, four phosSNPs that replicated in stage 2 (based on α = 5.56×10−3, 0.05/9) were de novo genotyped in two independent cohorts. IDUA rs3755955 and rs6831280, and WNT16 rs2707466 were associated with BMD phenotypes in each respective stage, and in 3 stages combined, achieving GWS for both FN-BMD (P-value = 8.36×10−10, 5.26×10−10, and 3.01×10−10, respectively) and HIP-BMD (P-value = 3.26×10−6, 1.97×10−6, and 1.63×10−12, respectively). Although in vitro studies demonstrated no differences in expressions of wild-type and mutant forms of IDUA and WNT16B proteins, in silico analysis predicts that WNT16 rs2707466 directly abolishes a phosphorylation site, which could cause a deleterious effect on WNT16 protein, and that IDUA phosSNPs rs3755955 and rs6831280 could exert indirect effects on nearby phosphorylation sites. Further studies will be required to determine the detailed and specific molecular effects of these BMD-associated non-synonymous variants. PMID:26256109

  1. A qualitative study on Singaporean women's views towards breast cancer screening and Single Nucleotide Polymorphisms (SNPs) gene testing to guide personalised screening strategies.

    PubMed

    Wong, Xin Yi; Chong, Kok Joon; van Til, Janine A; Wee, Hwee Lin

    2017-11-21

    Breast cancer is the top cancer by incidence and mortality in Singaporean women. Mammography is by far its best screening tool, but current recommended age and interval may not yield the most benefit. Recent studies have demonstrated the potential of single nucleotide polymorphisms (SNPs) to improve discriminatory accuracy of breast cancer risk assessment models. This study was conducted to understand Singaporean women's views towards breast cancer screening and SNPs gene testing to guide personalised screening strategies. Focus group discussions were conducted among English-speaking women (n = 27) between 40 to 65 years old, both current and lapsed mammogram users. Women were divided into four groups based on age and mammogram usage. Discussions about breast cancer and screening experience, as well as perception and attitude towards SNPs gene testing were conducted by an experienced moderator. Women were also asked for factors that will influence their uptake of the test. Transcripts were analysed using thematic analysis to captured similarities and differences in views expressed. Barriers to repeat mammogram attendance include laziness to make appointment and painful and uncomfortable screening process. However, the underlying reason may be low perceived susceptibility to breast cancer. Facilitators to repeat mammogram attendance include ease of making appointment and timely reminders. Women were generally receptive towards SNPs gene testing, but required information on accuracy, cost, invasiveness, and side effects before they decide whether to go for it. Other factors include waiting time for results and frequency interval. On average, women gave a rating of 7.5 (range 5 to 10) when asked how likely they will go for the test. Addressing concerns such as pain and discomfort during mammogram, providing timely reminders and debunking breast cancer myths can help to improve screening uptake. Women demonstrated a spectrum of responses towards a novel test like SNPs gene testing, but need more information to make an informed decision. Future public health education on predictive genetic testing should adequately address both benefits and risks. Findings from this study is used to inform a discrete choice experiment to empirically quantify women preferences and willingness-to-pay for SNPs gene testing.

  2. Improvement of marker-based predictability of Apparent Amylose Content in japonica rice through GBSSI allele mining

    PubMed Central

    2014-01-01

    Background Apparent Amylose Content (AAC), regulated by the Waxy gene, represents the key determinant of rice cooking properties. In occidental countries high AAC rice represents the most requested market class but the availability of molecular markers allowing specific selection of high AAC varieties is limited. Results In this study, the effectiveness of available molecular markers in predicting AAC was evaluated in a collection of 127 rice accessions (125 japonica ssp. and 2 indica ssp.) characterized by AAC values from glutinous to 26%. The analyses highlighted the presence of several different allelic patterns identifiable by a few molecular markers, and two of them, i.e., the SNPs at intron1 and exon 6, were able to explain a maximum of 79.5% of AAC variation. However, the available molecular markers haplotypes did not provide tools for predicting accessions with AAC higher than 24.5%. To identify additional polymorphisms, the re-sequencing of the Waxy gene and 1kbp of the putative upstream regulatory region was performed in 21 genotypes representing all the AAC classes identified. Several previously un-characterized SNPs were identified and four of them were used to develop dCAPS markers. Conclusions The addition of the SNPs newly identified slightly increased the AAC explained variation and allowed the identification of a haplotype almost unequivocally associated to AAC higher than 24.5%. Haplotypes at the waxy locus were also associated to grain length and length/width (L/W) ratio. In particular, the SNP at the first intron, which identifies the Wx a and Wx b alleles, was associated with differences in the width of the grain, the L/W ratio and the length of the kernel, most likely as a result of human selection. PMID:24383761

  3. Steroid Sex Hormones, Sex Hormone-Binding Globulin, and Diabetes Incidence in the Diabetes Prevention Program.

    PubMed

    Mather, K J; Kim, C; Christophi, C A; Aroda, V R; Knowler, W C; Edelstein, S E; Florez, J C; Labrie, F; Kahn, S E; Goldberg, R B; Barrett-Connor, E

    2015-10-01

    Steroid sex hormones and SHBG may modify metabolism and diabetes risk, with implications for sex-specific diabetes risk and effects of prevention interventions. This study aimed to evaluate the relationships of steroid sex hormones, SHBG and SHBG single-nucleotide polymorphisms (SNPs) with diabetes risk factors and with progression to diabetes in the Diabetes Prevention Program (DPP). This was a secondary analysis of a multicenter randomized clinical trial involving 27 U.S. academic institutions. The study included 2898 DPP participants: 969 men, 948 premenopausal women not taking exogenous sex hormones, 550 postmenopausal women not taking exogenous sex hormones, and 431 postmenopausal women taking exogenous sex hormones. Participants were randomized to receive intensive lifestyle intervention, metformin, or placebo. Associations of steroid sex hormones, SHBG, and SHBG SNPs with glycemia and diabetes risk factors, and with incident diabetes over median 3.0 years (maximum, 5.0 y). T and DHT were inversely associated with fasting glucose in men, and estrone sulfate was directly associated with 2-hour post-challenge glucose in men and premenopausal women. SHBG was associated with fasting glucose in premenopausal women not taking exogenous sex hormones, and in postmenopausal women taking exogenous sex hormones, but not in the other groups. Diabetes incidence was directly associated with estrone and estradiol and inversely with T in men; the association with T was lost after adjustment for waist circumference. Sex steroids were not associated with diabetes outcomes in women. SHBG and SHBG SNPs did not predict incident diabetes in the DPP population. Estrogens and T predicted diabetes risk in men but not in women. SHBG and its polymorphisms did not predict risk in men or women. Diabetes risk is more potently determined by obesity and glycemia than by sex hormones.

  4. Deep Sequencing-Based Analysis of the Cymbidium ensifolium Floral Transcriptome

    PubMed Central

    Li, Xiaobai; Luo, Jie; Yan, Tianlian; Xiang, Lin; Jin, Feng; Qin, Dehui; Sun, Chongbo; Xie, Ming

    2013-01-01

    Cymbidium ensifolium is a Chinese Cymbidium with an elegant shape, beautiful appearance, and a fragrant aroma. C. ensifolium has a long history of cultivation in China and it has excellent commercial value as a potted plant and cut flower. The development of C. ensifolium genomic resources has been delayed because of its large genome size. Taking advantage of technical and cost improvement of RNA-Seq, we extracted total mRNA from flower buds and mature flowers and obtained a total of 9.52 Gb of filtered nucleotides comprising 98,819,349 filtered reads. The filtered reads were assembled into 101,423 isotigs, representing 51,696 genes. Of the 101,423 isotigs, 41,873 were putative homologs of annotated sequences in the public databases, of which 158 were associated with floral development and 119 were associated with flowering. The isotigs were categorized according to their putative functions. In total, 10,212 of the isotigs were assigned into 25 eukaryotic orthologous groups (KOGs), 41,690 into 58 gene ontology (GO) terms, and 9,830 into 126 Arabidopsis Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, and 9,539 isotigs into 123 rice pathways. Comparison of the isotigs with those of the two related orchid species P. equestris and C. sinense showed that 17,906 isotigs are unique to C. ensifolium. In addition, a total of 7,936 SSRs and 16,676 putative SNPs were identified. To our knowledge, this transcriptome database is the first major genomic resource for C. ensifolium and the most comprehensive transcriptomic resource for genus Cymbidium. These sequences provide valuable information for understanding the molecular mechanisms of floral development and flowering. Sequences predicted to be unique to C. ensifolium would provide more insights into C. ensifolium gene diversity. The numerous SNPs and SSRs identified in the present study will contribute to marker development for C. ensifolium. PMID:24392013

  5. Experimental evaluation of models for predicting Cherenkov light intensities from short-cooled nuclear fuel assemblies

    NASA Astrophysics Data System (ADS)

    Branger, E.; Grape, S.; Jansson, P.; Jacobsson Svärd, S.

    2018-02-01

    The Digital Cherenkov Viewing Device (DCVD) is a tool used by nuclear safeguards inspectors to verify irradiated nuclear fuel assemblies in wet storage based on the recording of Cherenkov light produced by the assemblies. One type of verification involves comparing the measured light intensity from an assembly with a predicted intensity, based on assembly declarations. Crucial for such analyses is the performance of the prediction model used, and recently new modelling methods have been introduced to allow for enhanced prediction capabilities by taking the irradiation history into account, and by including the cross-talk radiation from neighbouring assemblies in the predictions. In this work, the performance of three models for Cherenkov-light intensity prediction is evaluated by applying them to a set of short-cooled PWR 17x17 assemblies for which experimental DCVD measurements and operator-declared irradiation data was available; (1) a two-parameter model, based on total burnup and cooling time, previously used by the safeguards inspectors, (2) a newly introduced gamma-spectrum-based model, which incorporates cycle-wise burnup histories, and (3) the latter gamma-spectrum-based model with the addition to account for contributions from neighbouring assemblies. The results show that the two gamma-spectrum-based models provide significantly higher precision for the measured inventory compared to the two-parameter model, lowering the standard deviation between relative measured and predicted intensities from 15.2 % to 8.1 % respectively 7.8 %. The results show some systematic differences between assemblies of different designs (produced by different manufacturers) in spite of their similar PWR 17x17 geometries, and possible ways are discussed to address such differences, which may allow for even higher prediction capabilities. Still, it is concluded that the gamma-spectrum-based models enable confident verification of the fuel assembly inventory at the currently used detection limit for partial defects, being a 30 % discrepancy between measured and predicted intensities, while some false detection occurs with the two-parameter model. The results also indicate that the gamma-spectrum-based prediction methods are accurate enough that the 30 % discrepancy limit could potentially be lowered.

  6. Novel quantitative pigmentation phenotyping enhances genetic association, epistasis, and prediction of human eye colour.

    PubMed

    Wollstein, Andreas; Walsh, Susan; Liu, Fan; Chakravarthy, Usha; Rahu, Mati; Seland, Johan H; Soubrane, Gisèle; Tomazzoli, Laura; Topouzis, Fotis; Vingerling, Johannes R; Vioque, Jesus; Böhringer, Stefan; Fletcher, Astrid E; Kayser, Manfred

    2017-02-27

    Success of genetic association and the prediction of phenotypic traits from DNA are known to depend on the accuracy of phenotype characterization, amongst other parameters. To overcome limitations in the characterization of human iris pigmentation, we introduce a fully automated approach that specifies the areal proportions proposed to represent differing pigmentation types, such as pheomelanin, eumelanin, and non-pigmented areas within the iris. We demonstrate the utility of this approach using high-resolution digital eye imagery and genotype data from 12 selected SNPs from over 3000 European samples of seven populations that are part of the EUREYE study. In comparison to previous quantification approaches, (1) we achieved an overall improvement in eye colour phenotyping, which provides a better separation of manually defined eye colour categories. (2) Single nucleotide polymorphisms (SNPs) known to be involved in human eye colour variation showed stronger associations with our approach. (3) We found new and confirmed previously noted SNP-SNP interactions. (4) We increased SNP-based prediction accuracy of quantitative eye colour. Our findings exemplify that precise quantification using the perceived biological basis of pigmentation leads to enhanced genetic association and prediction of eye colour. We expect our approach to deliver new pigmentation genes when applied to genome-wide association testing.

  7. Computational methods using genome-wide association studies to predict radiotherapy complications and to identify correlative molecular processes

    NASA Astrophysics Data System (ADS)

    Oh, Jung Hun; Kerns, Sarah; Ostrer, Harry; Powell, Simon N.; Rosenstein, Barry; Deasy, Joseph O.

    2017-02-01

    The biological cause of clinically observed variability of normal tissue damage following radiotherapy is poorly understood. We hypothesized that machine/statistical learning methods using single nucleotide polymorphism (SNP)-based genome-wide association studies (GWAS) would identify groups of patients of differing complication risk, and furthermore could be used to identify key biological sources of variability. We developed a novel learning algorithm, called pre-conditioned random forest regression (PRFR), to construct polygenic risk models using hundreds of SNPs, thereby capturing genomic features that confer small differential risk. Predictive models were trained and validated on a cohort of 368 prostate cancer patients for two post-radiotherapy clinical endpoints: late rectal bleeding and erectile dysfunction. The proposed method results in better predictive performance compared with existing computational methods. Gene ontology enrichment analysis and protein-protein interaction network analysis are used to identify key biological processes and proteins that were plausible based on other published studies. In conclusion, we confirm that novel machine learning methods can produce large predictive models (hundreds of SNPs), yielding clinically useful risk stratification models, as well as identifying important underlying biological processes in the radiation damage and tissue repair process. The methods are generally applicable to GWAS data and are not specific to radiotherapy endpoints.

  8. Development of a forensic skin colour predictive test.

    PubMed

    Maroñas, Olalla; Phillips, Chris; Söchtig, Jens; Gomez-Tato, Antonio; Cruz, Raquel; Alvarez-Dios, José; de Cal, María Casares; Ruiz, Yarimar; Fondevila, Manuel; Carracedo, Ángel; Lareu, María V

    2014-11-01

    There is growing interest in skin colour prediction in the forensic field. However, a lack of consensus approaches for recording skin colour phenotype plus the complicating factors of epistatic effects, environmental influences such as exposure to the sun and unidentified genetic variants, present difficulties for the development of a forensic skin colour predictive test centred on the most strongly associated SNPs. Previous studies have analysed skin colour variation in single unadmixed population groups, including South Asians (Stokowski et al., 2007, Am. J. Hum. Genet, 81: 1119-32) and Europeans (Jacobs et al., 2013, Hum Genet. 132: 147-58). Nevertheless, a major challenge lies in the analysis of skin colour in admixed individuals, where co-ancestry proportions do not necessarily dictate any one person's skin colour. Our study sought to analyse genetic differences between African, European and admixed African-European subjects where direct spectrometric measurements and photographs of skin colour were made in parallel. We identified strong associations to skin colour variation in the subjects studied from a pigmentation SNP discovery panel of 59 markers and developed a forensic online classifier based on naïve Bayes analysis of the SNP profiles made. A skin colour predictive test is described using the ten most strongly associated SNPs in 8 genes linked to skin pigmentation variation. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  9. Novel quantitative pigmentation phenotyping enhances genetic association, epistasis, and prediction of human eye colour

    PubMed Central

    Wollstein, Andreas; Walsh, Susan; Liu, Fan; Chakravarthy, Usha; Rahu, Mati; Seland, Johan H.; Soubrane, Gisèle; Tomazzoli, Laura; Topouzis, Fotis; Vingerling, Johannes R.; Vioque, Jesus; Böhringer, Stefan; Fletcher, Astrid E.; Kayser, Manfred

    2017-01-01

    Success of genetic association and the prediction of phenotypic traits from DNA are known to depend on the accuracy of phenotype characterization, amongst other parameters. To overcome limitations in the characterization of human iris pigmentation, we introduce a fully automated approach that specifies the areal proportions proposed to represent differing pigmentation types, such as pheomelanin, eumelanin, and non-pigmented areas within the iris. We demonstrate the utility of this approach using high-resolution digital eye imagery and genotype data from 12 selected SNPs from over 3000 European samples of seven populations that are part of the EUREYE study. In comparison to previous quantification approaches, (1) we achieved an overall improvement in eye colour phenotyping, which provides a better separation of manually defined eye colour categories. (2) Single nucleotide polymorphisms (SNPs) known to be involved in human eye colour variation showed stronger associations with our approach. (3) We found new and confirmed previously noted SNP-SNP interactions. (4) We increased SNP-based prediction accuracy of quantitative eye colour. Our findings exemplify that precise quantification using the perceived biological basis of pigmentation leads to enhanced genetic association and prediction of eye colour. We expect our approach to deliver new pigmentation genes when applied to genome-wide association testing. PMID:28240252

  10. A SNP-based blood test for predicting breast cancer survival and determining treatment strategies | NCI Technology Transfer Center | TTC

    Cancer.gov

    The NCI seeks licensing of methods that provide significant improvements in examining additional SNPs for improved prognostics and to evaluate whether the SNP signature is associated with overall cancer incidence or effective treatment strategies.

  11. Role of pharmacogenetics on deferasirox AUC and efficacy.

    PubMed

    Cusato, Jessica; Allegra, Sarah; De Francia, Silvia; Massano, Davide; Piga, Antonio; D'Avolio, Antonio

    2016-04-01

    We evaluated deferasirox pharmacokinetic according to SNPs in genes involved in its metabolism and elimination. Moreover, we defined a plasma area under the curve cut-off value predicting therapy response. Allelic discrimination was performed by real-time PCR. Drug plasma concentrations were measured by a high performance liquid chromatography system coupled with an ultraviolet method. Pharmacokinetic parameters were significantly influenced by UGT1A1 rs887829C>T, UGT1A3 rs1983023C>T and rs3806596A>G SNPs. Area under the curve cut-off values of 360 μg/ml/h for efficacy were here defined and 250 μg/ml/h for nonresponse was reported. UGT1A3 rs3806596GG and ABCG2 rs13120400CC genotypes were factors able to predict efficacy, whereas UGT1A3 rs3806596GG was a nonresponse predictor. These data show how screening patient's genetic profile may help clinicians to optimize iron chelation therapy with deferasirox.

  12. A genome-wide association study reveals novel genomic regions and positional candidate genes for fat deposition in broiler chickens.

    PubMed

    Moreira, Gabriel Costa Monteiro; Boschiero, Clarissa; Cesar, Aline Silva Mello; Reecy, James M; Godoy, Thaís Fernanda; Trevisoli, Priscila Anchieta; Cantão, Maurício E; Ledur, Mônica Corrêa; Ibelli, Adriana Mércia Guaratini; Peixoto, Jane de Oliveira; Moura, Ana Silvia Alves Meira Tavares; Garrick, Dorian; Coutinho, Luiz Lehmann

    2018-05-21

    Excess fat content in chickens has a negative impact on poultry production. The discovery of QTL associated with fat deposition in the carcass allows the identification of positional candidate genes (PCGs) that might regulate fat deposition and be useful for selection against excess fat content in chicken's carcass. This study aimed to estimate genomic heritability coefficients and to identify QTLs and PCGs for abdominal fat (ABF) and skin (SKIN) traits in a broiler chicken population, originated from the White Plymouth Rock and White Cornish breeds. ABF and SKIN are moderately heritable traits in our broiler population with estimates ranging from 0.23 to 0.33. Using a high density SNP panel (355,027 informative SNPs), we detected nine unique QTLs that were associated with these fat traits. Among these, four QTL were novel, while five have been previously reported in the literature. Thirteen PCGs were identified that might regulate fat deposition in these QTL regions: JDP2, PLCG1, HNF4A, FITM2, ADIPOR1, PTPN11, MVK, APOA1, APOA4, APOA5, ENSGALG00000000477, ENSGALG00000000483, and ENSGALG00000005043. We used sequence information from founder animals to detect 4843 SNPs in the 13 PCGs. Among those, two were classified as potentially deleterious and two as high impact SNPs. This study generated novel results that can contribute to a better understanding of fat deposition in chickens. The use of high density array of SNPs increases genome coverage and improves QTL resolution than would have been achieved with low density. The identified PCGs were involved in many biological processes that regulate lipid storage. The SNPs identified in the PCGs, especially those predicted as potentially deleterious and high impact, may affect fat deposition. Validation should be undertaken before using these SNPs for selection against carcass fat accumulation and to improve feed efficiency in broiler chicken production.

  13. A 2-Stage Genome-Wide Association Study to Identify Single Nucleotide Polymorphisms Associated With Development of Erectile Dysfunction Following Radiation Therapy for Prostate Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kerns, Sarah L.; Departments of Pathology and Genetics, Albert Einstein College of Medicine, Bronx, New York; Stock, Richard

    2013-01-01

    Purpose: To identify single nucleotide polymorphisms (SNPs) associated with development of erectile dysfunction (ED) among prostate cancer patients treated with radiation therapy. Methods and Materials: A 2-stage genome-wide association study was performed. Patients were split randomly into a stage I discovery cohort (132 cases, 103 controls) and a stage II replication cohort (128 cases, 102 controls). The discovery cohort was genotyped using Affymetrix 6.0 genome-wide arrays. The 940 top ranking SNPs selected from the discovery cohort were genotyped in the replication cohort using Illumina iSelect custom SNP arrays. Results: Twelve SNPs identified in the discovery cohort and validated in themore » replication cohort were associated with development of ED following radiation therapy (Fisher combined P values 2.1 Multiplication-Sign 10{sup -5} to 6.2 Multiplication-Sign 10{sup -4}). Notably, these 12 SNPs lie in or near genes involved in erectile function or other normal cellular functions (adhesion and signaling) rather than DNA damage repair. In a multivariable model including nongenetic risk factors, the odds ratios for these SNPs ranged from 1.6 to 5.6 in the pooled cohort. There was a striking relationship between the cumulative number of SNP risk alleles an individual possessed and ED status (Sommers' D P value = 1.7 Multiplication-Sign 10{sup -29}). A 1-allele increase in cumulative SNP score increased the odds for developing ED by a factor of 2.2 (P value = 2.1 Multiplication-Sign 10{sup -19}). The cumulative SNP score model had a sensitivity of 84% and specificity of 75% for prediction of developing ED at the radiation therapy planning stage. Conclusions: This genome-wide association study identified a set of SNPs that are associated with development of ED following radiation therapy. These candidate genetic predictors warrant more definitive validation in an independent cohort.« less

  14. The role of genetic polymorphisms in cytochrome P450 and effects of tuberculosis co-treatment on the predictive value of CYP2B6 SNPs and on efavirenz plasma levels in adult HIV patients.

    PubMed

    Bienvenu, Emile; Swart, Marelize; Dandara, Collet; Ashton, Michael

    2014-02-01

    Efavirenz (EFV) exhibits interindividual pharmacokinetic variability caused by differences in cytochrome P450 (CYP) expression. Most tuberculosis (TB) drugs interact with the CYP metabolizing enzymes, while the clinical validity of genotyping in predicting EFV plasma levels in Rwandan subjects is not known. We investigated in patients co-infected with human immunodeficiency virus (HIV) and TB recruited in Rwanda the effects of 10 SNPs in five drug-metabolizing enzymes on EFV plasma levels and treatment response when patients are treated with EFV-containing therapy alone (n=28) and when combined with rifampicin-based TB treatment (n=62), and the validity of genotyping for CYP2B6 single nucleotide polymorphisms in predicting supra-therapeutic EFV levels. There was a significant difference between CYP1A2 -739T/G and T/T genotypes when patients were treated with EFV-containing therapy combined with rifampicin-based TB treatment, but not when EFV-containing therapy was alone. CYP2B6 516T/T genotype was associated with high EFV levels compared to other CYP2B6 516G>T genotypes in the presence and in the absence of rifampicin-based TB treatment. Predictive factors of EFV plasma levels in the presence of rifampicin-based TB treatment were CYP2A6 1093G>A, CYP2B6 516G>T, and CYP2B6 983T>C accounting for 27%, 43%, and 29% of the total variance in EFV levels, respectively. There was a high positive predictive value (PPV) (100%) for CYP2B6 516T/T and 983T/T genotypes in predicting EFV plasma levels above the therapeutic range, but this PPV decreased in the presence of rifampicin-based TB treatment. Rifampicin-based TB treatment was also shown to affect EFV plasma levels significantly, but did not affect the significant reduction of HIV-RNA copies. These results indicate that genotyping for CYP2B6 SNPs could be used as a tool in predicting supra-therapeutic EFV plasma levels, which could minimize adverse drug events. Copyright © 2013 Elsevier B.V. All rights reserved.

  15. A hybrid expectation maximisation and MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL mapping.

    PubMed

    Wang, Tingting; Chen, Yi-Ping Phoebe; Bowman, Phil J; Goddard, Michael E; Hayes, Ben J

    2016-09-21

    Bayesian mixture models in which the effects of SNP are assumed to come from normal distributions with different variances are attractive for simultaneous genomic prediction and QTL mapping. These models are usually implemented with Monte Carlo Markov Chain (MCMC) sampling, which requires long compute times with large genomic data sets. Here, we present an efficient approach (termed HyB_BR), which is a hybrid of an Expectation-Maximisation algorithm, followed by a limited number of MCMC without the requirement for burn-in. To test prediction accuracy from HyB_BR, dairy cattle and human disease trait data were used. In the dairy cattle data, there were four quantitative traits (milk volume, protein kg, fat% in milk and fertility) measured in 16,214 cattle from two breeds genotyped for 632,002 SNPs. Validation of genomic predictions was in a subset of cattle either from the reference set or in animals from a third breeds that were not in the reference set. In all cases, HyB_BR gave almost identical accuracies to Bayesian mixture models implemented with full MCMC, however computational time was reduced by up to 1/17 of that required by full MCMC. The SNPs with high posterior probability of a non-zero effect were also very similar between full MCMC and HyB_BR, with several known genes affecting milk production in this category, as well as some novel genes. HyB_BR was also applied to seven human diseases with 4890 individuals genotyped for around 300 K SNPs in a case/control design, from the Welcome Trust Case Control Consortium (WTCCC). In this data set, the results demonstrated again that HyB_BR performed as well as Bayesian mixture models with full MCMC for genomic predictions and genetic architecture inference while reducing the computational time from 45 h with full MCMC to 3 h with HyB_BR. The results for quantitative traits in cattle and disease in humans demonstrate that HyB_BR can perform equally well as Bayesian mixture models implemented with full MCMC in terms of prediction accuracy, but with up to 17 times faster than the full MCMC implementations. The HyB_BR algorithm makes simultaneous genomic prediction, QTL mapping and inference of genetic architecture feasible in large genomic data sets.

  16. IMHOTEP—a composite score integrating popular tools for predicting the functional consequences of non-synonymous sequence variants

    PubMed Central

    Knecht, Carolin; Mort, Matthew; Junge, Olaf; Cooper, David N.; Krawczak, Michael

    2017-01-01

    Abstract The in silico prediction of the functional consequences of mutations is an important goal of human pathogenetics. However, bioinformatic tools that classify mutations according to their functionality employ different algorithms so that predictions may vary markedly between tools. We therefore integrated nine popular prediction tools (PolyPhen-2, SNPs&GO, MutPred, SIFT, MutationTaster2, Mutation Assessor and FATHMM as well as conservation-based Grantham Score and PhyloP) into a single predictor. The optimal combination of these tools was selected by means of a wide range of statistical modeling techniques, drawing upon 10 029 disease-causing single nucleotide variants (SNVs) from Human Gene Mutation Database and 10 002 putatively ‘benign’ non-synonymous SNVs from UCSC. Predictive performance was found to be markedly improved by model-based integration, whilst maximum predictive capability was obtained with either random forest, decision tree or logistic regression analysis. A combination of PolyPhen-2, SNPs&GO, MutPred, MutationTaster2 and FATHMM was found to perform as well as all tools combined. Comparison of our approach with other integrative approaches such as Condel, CoVEC, CAROL, CADD, MetaSVM and MetaLR using an independent validation dataset, revealed the superiority of our newly proposed integrative approach. An online implementation of this approach, IMHOTEP (‘Integrating Molecular Heuristics and Other Tools for Effect Prediction’), is provided at http://www.uni-kiel.de/medinfo/cgi-bin/predictor/. PMID:28180317

  17. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

    PubMed Central

    2013-01-01

    Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206

  18. Analysing the Effect of Mutation on Protein Function and Discovering Potential Inhibitors of CDK4: Molecular Modelling and Dynamics Studies

    PubMed Central

    N, Nagasundaram; Zhu, Hailong; Liu, Jiming; V, Karthick; C, George Priya Doss; Chakraborty, Chiranjib; Chen, Luonan

    2015-01-01

    The cyclin-dependent kinase 4 (CDK4)-cyclin D1 complex plays a crucial role in the transition from the G1 phase to S phase of the cell cycle. Among the CDKs, CDK4 is one of the genes most frequently affected by somatic genetic variations that are associated with various forms of cancer. Thus, because the abnormal function of the CDK4-cyclin D1 protein complex might play a vital role in causing cancer, CDK4 can be considered a genetically validated therapeutic target. In this study, we used a systematic, integrated computational approach to identify deleterious nsSNPs and predict their effects on protein-protein (CDK4-cyclin D1) and protein-ligand (CDK4-flavopiridol) interactions. This analysis resulted in the identification of possible inhibitors of mutant CDK4 proteins that bind the conformations induced by deleterious nsSNPs. Using computational prediction methods, we identified five nsSNPs as highly deleterious: R24C, Y180H, A205T, R210P, and R246C. From molecular docking and molecular dynamic studies, we observed that these deleterious nsSNPs affected CDK4-cyclin D1 and CDK4-flavopiridol interactions. Furthermore, in a virtual screening approach, the drug 5_7_DIHYDROXY_ 2_ (3_4_5_TRI HYDROXYPHENYL) _4H_CHROMEN_ 4_ONE displayed good binding affinity for proteins with the mutations R24C or R246C, the drug diosmin displayed good binding affinity for the protein with the mutation Y180H, and the drug rutin displayed good binding affinity for proteins with the mutations A205T and R210P. Overall, this computational investigation of the CDK4 gene highlights the link between genetic variation and biological phenomena in human cancer and aids in the discovery of molecularly targeted therapies for personalized treatment. PMID:26252490

  19. A genetic risk score based on direct associations with coronary heart disease improves coronary heart disease risk prediction in the Atherosclerosis Risk in Communities (ARIC), but not in the Rotterdam and Framingham Offspring, Studies.

    PubMed

    Brautbar, Ariel; Pompeii, Lisa A; Dehghan, Abbas; Ngwa, Julius S; Nambi, Vijay; Virani, Salim S; Rivadeneira, Fernando; Uitterlinden, André G; Hofman, Albert; Witteman, Jacqueline C M; Pencina, Michael J; Folsom, Aaron R; Cupples, L Adrienne; Ballantyne, Christie M; Boerwinkle, Eric

    2012-08-01

    Multiple studies have identified single-nucleotide polymorphisms (SNPs) that are associated with coronary heart disease (CHD). We examined whether SNPs selected based on predefined criteria will improve CHD risk prediction when added to traditional risk factors (TRFs). SNPs were selected from the literature based on association with CHD, lack of association with a known CHD risk factor, and successful replication. A genetic risk score (GRS) was constructed based on these SNPs. Cox proportional hazards model was used to calculate CHD risk based on the Atherosclerosis Risk in Communities (ARIC) and Framingham CHD risk scores with and without the GRS. The GRS was associated with risk for CHD (hazard ratio [HR] = 1.10; 95% confidence interval [CI]: 1.07-1.13). Addition of the GRS to the ARIC risk score significantly improved discrimination, reclassification, and calibration beyond that afforded by TRFs alone in non-Hispanic whites in the ARIC study. The area under the receiver operating characteristic curve (AUC) increased from 0.742 to 0.749 (Δ = 0.007; 95% CI, 0.004-0.013), and the net reclassification index (NRI) was 6.3%. Although the risk estimates for CHD in the Framingham Offspring (HR = 1.12; 95% CI: 1.10-1.14) and Rotterdam (HR = 1.08; 95% CI: 1.02-1.14) Studies were significantly improved by adding the GRS to TRFs, improvements in AUC and NRI were modest. Addition of a GRS based on direct associations with CHD to TRFs significantly improved discrimination and reclassification in white participants of the ARIC Study, with no significant improvement in the Rotterdam and Framingham Offspring Studies. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  20. Increased fire frequency promotes stronger spatial genetic structure and natural selection at regional and local scales in Pinus halepensis Mill

    PubMed Central

    González-Martínez, Santiago C.; Navascués, Miguel; Burgarella, Concetta; Mosca, Elena; Lorenzo, Zaida; Zabal-Aguirre, Mario; Vendramin, Giovanni G.; Verdú, Miguel; Pausas, Juli G.

    2017-01-01

    Background and Aims The recurrence of wildfires is predicted to increase due to global climate change, resulting in severe impacts on biodiversity and ecosystem functioning. Recurrent fires can drive plant adaptation and reduce genetic diversity; however, the underlying population genetic processes have not been studied in detail. In this study, the neutral and adaptive evolutionary effects of contrasting fire regimes were examined in the keystone tree species Pinus halepensis Mill. (Aleppo pine), a fire-adapted conifer. The genetic diversity, demographic history and spatial genetic structure were assessed at local (within-population) and regional scales for populations exposed to different crown fire frequencies. Methods Eight natural P. halepensis stands were sampled in the east of the Iberian Peninsula, five of them in a region exposed to frequent crown fires (HiFi) and three of them in an adjacent region with a low frequency of crown fires (LoFi). Samples were genotyped at nine neutral simple sequence repeats (SSRs) and at 251 single nucleotide polymorphisms (SNPs) from coding regions, some of them potentially important for fire adaptation. Key Results Fire regime had no effects on genetic diversity or demographic history. Three high-differentiation outlier SNPs were identified between HiFi and LoFi stands, suggesting fire-related selection at the regional scale. At the local scale, fine-scale spatial genetic structure (SGS) was overall weak as expected for a wind-pollinated and wind-dispersed tree species. HiFi stands displayed a stronger SGS than LoFi stands at SNPs, which probably reflected the simultaneous post-fire recruitment of co-dispersed related seeds. SNPs with exceptionally strong SGS, a proxy for microenvironmental selection, were only reliably identified under the HiFi regime. Conclusions An increasing fire frequency as predicted due to global change can promote increased SGS with stronger family structures and alter natural selection in P. halepensis and in plants with similar life history traits. PMID:28159988

  1. In silico analysis of miRNA-mediated gene regulation in OCA and OA genes.

    PubMed

    Kamaraj, Balu; Gopalakrishnan, Chandrasekhar; Purohit, Rituraj

    2014-12-01

    Albinism is an autosomal recessive genetic disorder due to low secretion of melanin. The oculocutaneous albinism (OCA) and ocular albinism (OA) genes are responsible for melanin production and also act as a potential targets for miRNAs. The role of miRNA is to inhibit the protein synthesis partially or completely by binding with the 3'UTR of the mRNA thus regulating gene expression. In this analysis, we predicted the genetic variation that occurred in 3'UTR of the transcript which can be a reason for low melanin production thus causing albinism. The single nucleotide polymorphisms (SNPs) in 3'UTR cause more new binding sites for miRNA which binds with mRNA which leads to inhibit the translation process either partially or completely. The SNPs in the mRNA of OCA and OA genes can create new binding sites for miRNA which may control the gene expression and lead to hypopigmentation. We have developed a computational procedure to determine the SNPs in the 3'UTR region of mRNA of OCA (TYR, OCA2, TYRP1 and SLC45A2) and OA (GPR143) genes which will be a potential cause for albinism. We identified 37 SNPs in five genes that are predicted to create 87 new binding sites on mRNA, which may lead to abrogation of the translation process. Expression analysis confirms that these genes are highly expressed in skin and eye regions. It is well supported by enrichment analysis that these genes are mainly involved in eye pigmentation and melanin biosynthesis process. The network analysis also shows how the genes are interacting and expressing in a complex network. This insight provides clue to wet-lab researches to understand the expression pattern of OCA and OA genes and binding phenomenon of mRNA and miRNA upon mutation, which is responsible for inhibition of translation process at genomic levels.

  2. Structural insight of dopamine β-hydroxylase, a drug target for complex traits, and functional significance of exonic single nucleotide polymorphisms.

    PubMed

    Kapoor, Abhijeet; Shandilya, Manish; Kundu, Suman

    2011-01-01

    Human dopamine β-hydroxylase (DBH) is an important therapeutic target for complex traits. Several single nucleotide polymorphisms (SNPs) have also been identified in DBH with potential adverse physiological effect. However, difficulty in obtaining diffractable crystals and lack of a suitable template for modeling the protein has ensured that neither crystallographic three-dimensional structure nor computational model for the enzyme is available to aid rational drug design, prediction of functional significance of SNPs or analytical protein engineering. Adequate biochemical information regarding human DBH, structural coordinates for peptidylglycine alpha-hydroxylating monooxygenase and computational data from a partial model of rat DBH were used along with logical manual intervention in a novel way to build an in silico model of human DBH. The model provides structural insight into the active site, metal coordination, subunit interface, substrate recognition and inhibitor binding. It reveals that DOMON domain potentially promotes tetramerization, while substrate dopamine and a potential therapeutic inhibitor nepicastat are stabilized in the active site through multiple hydrogen bonding. Functional significance of several exonic SNPs could be described from a structural analysis of the model. The model confirms that SNP resulting in Ala318Ser or Leu317Pro mutation may not influence enzyme activity, while Gly482Arg might actually do so being in the proximity of the active site. Arg549Cys may cause abnormal oligomerization through non-native disulfide bond formation. Other SNPs like Glu181, Glu250, Lys239 and Asp290 could potentially inhibit tetramerization thus affecting function. The first three-dimensional model of full-length human DBH protein was obtained in a novel manner with a set of experimental data as guideline for consistency of in silico prediction. Preliminary physicochemical tests validated the model. The model confirms, rationalizes and provides structural basis for several biochemical data and claims testable hypotheses regarding function. It provides a reasonable template for drug design as well.

  3. The CRHR1 gene, trauma exposure, and alcoholism risk: a test of G × E effects.

    PubMed

    Ray, L A; Sehl, M; Bujarski, S; Hutchison, K; Blaine, S; Enoch, M-A

    2013-06-01

    The corticotropin-releasing hormone type I receptor (CRHR1) gene has been implicated in the liability for neuropsychiatric disorders, particularly under conditions of stress. On the basis of the hypothesized effects of CRHR1 variation on stress reactivity, measures of adulthood traumatic stress exposure were analyzed for their interaction with CRHR1 haplotypes and single-nucleotide polymorphisms (SNPs) in predicting the risk for alcoholism. Phenotypic data on 2533 non-related Caucasian individuals (1167 alcoholics and 1366 controls) were culled from the publically available Study of Addiction: Genetics and Environment genome-wide association study. Genotypes were available for 19 tag SNPs. Logistic regression models examined the interaction between CRHR1 haplotypes/SNPs and adulthood traumatic stress exposure in predicting alcoholism risk. Two haplotype blocks spanned CRHR1. Haplotype analyses identified one haplotype in the proximal block 1 (P = 0.029) and two haplotypes in the distal block 2 (P = 0.026, 0.042) that showed nominally significant (corrected P < 0.025) genotype × traumatic stress interactive effects on the likelihood of developing alcoholism. The block 1 haplotype effect was driven by SNPs rs110402 (P = 0.019) and rs242924 (P = 0.019). In block 2, rs17689966 (P = 0.018) showed significant and rs173365 (P = 0.026) showed nominally significant, gene × environment (G × E) effects on alcoholism status. This study extends the literature on the interplay between CRHR1 variation and alcoholism, in the context of exposure to traumatic stress. These findings are consistent with the hypothesized role of the extra hypothalamic corticotropin-releasing factor system dysregulation in the initiation and maintenance of alcoholism. Molecular and experimental studies are needed to more fully understand the mechanisms of risk and protection conferred by genetic variation at the identified loci. © 2013 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  4. De novo assembly and transcriptome analysis of the rubber tree (Hevea brasiliensis) and SNP markers development for rubber biosynthesis pathways.

    PubMed

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.

  5. Comparative analysis of genes encoding key steroid core oxidation enzymes in fast-growing Mycobacterium spp. strains.

    PubMed

    Bragin, E Yu; Shtratnikova, V Yu; Dovbnya, D V; Schelkunov, M I; Pekov, Yu A; Malakho, S G; Egorova, O V; Ivashina, T V; Sokolov, S L; Ashapkin, V V; Donova, M V

    2013-11-01

    A comparative genome analysis of Mycobacterium spp. VKM Ac-1815D, 1816D and 1817D strains used for efficient production of key steroid intermediates (androst-4-ene-3,17-dione, AD, androsta-1,4-diene-3,17-dione, ADD, 9α-hydroxy androst-4-ene-3,17-dione, 9-OH-AD) from phytosterol has been carried out by deep sequencing. The assembled contig sequences were analyzed for the presence putative genes of steroid catabolism pathways. Since 3-ketosteroid-9α-hydroxylases (KSH) and 3-ketosteroid-Δ(1)-dehydrogenase (Δ(1) KSTD) play key role in steroid core oxidation, special attention was paid to the genes encoding these enzymes. At least three genes of Δ(1) KSTD (kstD), five genes of KSH subunit A (kshA), and one gene of KSH subunit B of 3-ketosteroid-9α-hydroxylases (kshB) have been found in Mycobacterium sp. VKM Ac-1817D. Strains of Mycobacterium spp. VKM Ac-1815D and 1816D were found to possess at least one kstD, one kshB and two kshA genes. The assembled genome sequence of Mycobacterium sp. VKM Ac-1817D differs from those of 1815D and 1816D strains, whereas these last two are nearly identical, differing by 13 single nucleotide substitutions (SNPs). One of these SNPs is located in the coding region of a kstD gene and corresponds to an amino acid substitution Lys (135) in 1816D for Ser (135) in 1815D. The findings may be useful for targeted genetic engineering of the biocatalysts for biotechnological application. Copyright © 2013. Published by Elsevier Ltd.

  6. Genome resources for climate-resilient cowpea, an essential crop for food security.

    PubMed

    Muñoz-Amatriaín, María; Mirebrahim, Hamid; Xu, Pei; Wanamaker, Steve I; Luo, MingCheng; Alhakami, Hind; Alpert, Matthew; Atokple, Ibrahim; Batieno, Benoit J; Boukar, Ousmane; Bozdag, Serdar; Cisse, Ndiaga; Drabo, Issa; Ehlers, Jeffrey D; Farmer, Andrew; Fatokun, Christian; Gu, Yong Q; Guo, Yi-Ning; Huynh, Bao-Lam; Jackson, Scott A; Kusi, Francis; Lawley, Cynthia T; Lucas, Mitchell R; Ma, Yaqin; Timko, Michael P; Wu, Jiajie; You, Frank; Barkley, Noelle A; Roberts, Philip A; Lonardi, Stefano; Close, Timothy J

    2017-03-01

    Cowpea (Vigna unguiculata L. Walp.) is a legume crop that is resilient to hot and drought-prone climates, and a primary source of protein in sub-Saharan Africa and other parts of the developing world. However, genome resources for cowpea have lagged behind most other major crops. Here we describe foundational genome resources and their application to the analysis of germplasm currently in use in West African breeding programs. Resources developed from the African cultivar IT97K-499-35 include a whole-genome shotgun (WGS) assembly, a bacterial artificial chromosome (BAC) physical map, and assembled sequences from 4355 BACs. These resources and WGS sequences of an additional 36 diverse cowpea accessions supported the development of a genotyping assay for 51 128 SNPs, which was then applied to five bi-parental RIL populations to produce a consensus genetic map containing 37 372 SNPs. This genetic map enabled the anchoring of 100 Mb of WGS and 420 Mb of BAC sequences, an exploration of genetic diversity along each linkage group, and clarification of macrosynteny between cowpea and common bean. The SNP assay enabled a diversity analysis of materials from West African breeding programs. Two major subpopulations exist within those materials, one of which has significant parentage from South and East Africa and more diversity. There are genomic regions of high differentiation between subpopulations, one of which coincides with a cluster of nodulin genes. The new resources and knowledge help to define goals and accelerate the breeding of improved varieties to address food security issues related to limited-input small-holder farming and climate stress. © 2016 The Authors. The Plant Journal published by John Wiley & Sons Ltd and Society for Experimental Biology.

  7. Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

    PubMed

    Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

    2012-10-01

    Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  8. A Transcriptomic Analysis of Cave, Surface, and Hybrid Isopod Crustaceans of the Species Asellus aquaticus

    PubMed Central

    Stahl, Bethany A.; Gross, Joshua B.; Speiser, Daniel I.; Oakley, Todd H.; Patel, Nipam H.; Gould, Douglas B.; Protas, Meredith E.

    2015-01-01

    Cave animals, compared to surface-dwelling relatives, tend to have reduced eyes and pigment, longer appendages, and enhanced mechanosensory structures. Pressing questions include how certain cave-related traits are gained and lost, and if they originate through the same or different genetic programs in independent lineages. An excellent system for exploring these questions is the isopod, Asellus aquaticus. This species includes multiple cave and surface populations that have numerous morphological differences between them. A key feature is that hybrids between cave and surface individuals are viable, which enables genetic crosses and linkage analyses. Here, we advance this system by analyzing single animal transcriptomes of Asellus aquaticus. We use high throughput sequencing of non-normalized cDNA derived from the head of a surface-dwelling male, the head of a cave-dwelling male, the head of a hybrid male (produced by crossing a surface individual with a cave individual), and a pooled sample of surface embryos and hatchlings. Assembling reads from surface and cave head RNA pools yielded an integrated transcriptome comprised of 23,984 contigs. Using this integrated assembly as a reference transcriptome, we aligned reads from surface-, cave- and hybrid- head tissue and pooled surface embryos and hatchlings. Our approach identified 742 SNPs and placed four new candidate genes to an existing linkage map for A. aquaticus. In addition, we examined SNPs for allele-specific expression differences in the hybrid individual. All of these resources will facilitate identification of genes and associated changes responsible for cave adaptation in A. aquaticus and, in concert with analyses of other species, will inform our understanding of the evolutionary processes accompanying adaptation to the subterranean environment. PMID:26462237

  9. Lattice-free prediction of three-dimensional structure of programmed DNA assemblies

    PubMed Central

    Pan, Keyao; Kim, Do-Nyun; Zhang, Fei; Adendorff, Matthew R.; Yan, Hao; Bathe, Mark

    2014-01-01

    DNA can be programmed to self-assemble into high molecular weight 3D assemblies with precise nanometer-scale structural features. Although numerous sequence design strategies exist to realize these assemblies in solution, there is currently no computational framework to predict their 3D structures on the basis of programmed underlying multi-way junction topologies constrained by DNA duplexes. Here, we introduce such an approach and apply it to assemblies designed using the canonical immobile four-way junction. The procedure is used to predict the 3D structure of high molecular weight planar and spherical ring-like origami objects, a tile-based sheet-like ribbon, and a 3D crystalline tensegrity motif, in quantitative agreement with experiments. Our framework provides a new approach to predict programmed nucleic acid 3D structure on the basis of prescribed secondary structure motifs, with possible application to the design of such assemblies for use in biomolecular and materials science. PMID:25470497

  10. Population and performance analyses of four major populations with Illumina's FGx Forensic Genomics System.

    PubMed

    Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Seah, Lay Hong; Budowle, Bruce

    2017-09-01

    The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation. Copyright © 2017 Elsevier B.V. All rights reserved.

  11. Psychological impact of providing women with personalised 10-year breast cancer risk estimates.

    PubMed

    French, David P; Southworth, Jake; Howell, Anthony; Harvie, Michelle; Stavrinos, Paula; Watterson, Donna; Sampson, Sarah; Evans, D Gareth; Donnelly, Louise S

    2018-05-08

    The Predicting Risk of Cancer at Screening (PROCAS) study estimated 10-year breast cancer risk for 53,596 women attending NHS Breast Screening Programme. The present study, nested within the PROCAS study, aimed to assess the psychological impact of receiving breast cancer risk estimates, based on: (a) the Tyrer-Cuzick (T-C) algorithm including breast density or (b) T-C including breast density plus single-nucleotide polymorphisms (SNPs), versus (c) comparison women awaiting results. A sample of 2138 women from the PROCAS study was stratified by testing groups: T-C only, T-C(+SNPs) and comparison women; and by 10-year risk estimates received: 'moderate' (5-7.99%), 'average' (2-4.99%) or 'below average' (<1.99%) risk. Postal questionnaires were returned by 765 (36%) women. Overall state anxiety and cancer worry were low, and similar for women in T-C only and T-C(+SNPs) groups. Women in both T-C only and T-C(+SNPs) groups showed lower-state anxiety but slightly higher cancer worry than comparison women awaiting results. Risk information had no consistent effects on intentions to change behaviour. Most women were satisfied with information provided. There was considerable variation in understanding. No major harms of providing women with 10-year breast cancer risk estimates were detected. Research to establish the feasibility of risk-stratified breast screening is warranted.

  12. Multiple thrombophilic single nucleotide polymorphisms lack a significant effect on outcomes in fresh IVF cycles: an analysis of 1717 patients.

    PubMed

    Patounakis, George; Bergh, Eric; Forman, Eric J; Tao, Xin; Lonczak, Agnieszka; Franasiak, Jason M; Treff, Nathan; Scott, Richard T

    2016-01-01

    The aim of the study is to determine if thrombophilic single nucleotide polymorphisms (SNPs) affect outcomes in fresh in vitro fertilization (IVF) cycles in a large general infertility population. A prospective cohort analysis was performed at a university-affiliated private IVF center of female patients undergoing fresh non-donor IVF cycles. The effect of the following thrombophilic SNPs on IVF outcomes were explored: factor V (Leiden and H1299R), prothrombin (G20210A), factor XIII (V34L), β-fibrinogen (-455G → A), plasminogen activator inhibitor-1 (4G/5G), human platelet antigen-1 (a/b9L33P), and methylenetetrahydrofolate reductase (C677T and A1298C). The main outcome measures included positive pregnancy test, clinical pregnancy, embryo implantation, live birth, and pregnancy loss. Patients (1717) were enrolled in the study, and a total of 4169 embryos were transferred. There were no statistically significant differences in positive pregnancy test, clinical pregnancy, embryo implantation, live birth, or pregnancy loss in the analysis of 1717 patients attempting their first cycle of IVF. Receiver operator characteristics and logistic regression analyses showed that outcomes cannot be predicted by the cumulative number of thrombophilic mutations present in the patient. Individual and cumulative thrombophilic SNPs do not affect IVF outcomes. Therefore, initial screening for these SNPs is not indicated.

  13. Molecular characterization of heat shock protein 70 (HSP 70) promoter in Japanese flounder (Paralichthys olivaceus), and the association of Pohsp70 SNPs with heat-resistant trait.

    PubMed

    Qi, Jie; Liu, Xudong; Liu, Jinxiang; Yu, Haiyang; Wang, Wenji; Wang, Zhigang; Zhang, Quanqi

    2014-08-01

    Ambient temperature is one of the major abiotic environmental factors determining the main parameters of fish vital activity. HSP70 plays an essential role in heat response. In this investigation, the promoter and structure of Paralichthys olivaceus hsp70 (Pohsp70) gene was cloned and predicted. 2558 bp upstream regulatory region of Pohsp70 was annotated with four potential promoter elements and four putative binding sites of transcription factors heat shock elements (HSE, nGAAn) in the upstream of the transcription start site. In addition, one intron with 454 bp in the 5'-noncoding region was found. Quantitative Real Time PCR analysis indicated that the transcript level of Pohsp70 was raised markedly after 1 h by heat shocked. Furthermore, 25 SNPs were identified in Pohsp70 by resequencing, seven of which was associated with heat resistance. In addition, two of the seven SNPs, namely SNP14 and SNP16, were observed in strong linkage disequilibrium. The haplotype with association analysis showed TAGGAG haplotype was more represented in heat susceptible group while (DEL/T) GAATA haplotype was more frequent in heat resistant group. The heat resistant SNPs and haplotype could be candidate markers potentially serving for selective breeding programs of Japanese flounder aimed at improving anti-stress and production. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. Genome-wide association study of rust traits in orchardgrass using SLAF-seq technology.

    PubMed

    Zeng, Bing; Yan, Haidong; Liu, Xinchun; Zang, Wenjing; Zhang, Ailing; Zhou, Sifan; Huang, Linkai; Liu, Jinping

    2017-01-01

    While orchardgrass ( Dactylis glomerata L.) is a well-known perennial forage species, rust diseases cause serious reductions in the yield and quality of orchardgrass; however, genetic mechanisms of rust resistance are not well understood in orchardgrass. In this study, a genome-wide association study (GWAS) was performed using specific-locus amplified fragment sequencing (SLAF-seq) technology in orchardgrass. A total of 2,334,889 SLAF tags were generated to produce 2,309,777 SNPs. ADMIXTURE analysis revealed unstructured subpopulations for 33 accessions, indicating that this orchardgrass population could be used for association analysis. Linkage disequilibrium (LD) analysis revealed an average r 2 of 0.4 across all SNP pairs, indicating a high extent of LD in these samples. Through GWAS, a total of 4,604 SNPs were found to be significantly ( P  < 0.01) associated with the rust trait. The bulk analysis discovered a number of 5,211 SNPs related to rust trait. Two candidate genes, including cytochrome P450, and prolamin were implicated in disease resistance through prediction of functional genes surrounding each high-quality SNP ( P  < 0.01) associated with rust traits based on GWAS analysis and bulk analysis. The large number of SNPs associated with rust traits and these two candidate genes may provide the basis for further research on rust resistance mechanisms and marker-assisted selection (MAS) for rust-resistant lineages.

  15. Genetic analysis of ancestry, admixture and selection in Bolivian and Totonac populations of the New World.

    PubMed

    Watkins, W Scott; Xing, Jinchuan; Huff, Chad; Witherspoon, David J; Zhang, Yuhua; Perego, Ugo A; Woodward, Scott R; Jorde, Lynn B

    2012-05-20

    Populations of the Americas were founded by early migrants from Asia, and some have experienced recent genetic admixture. To better characterize the native and non-native ancestry components in populations from the Americas, we analyzed 815,377 autosomal SNPs, mitochondrial hypervariable segments I and II, and 36 Y-chromosome STRs from 24 Mesoamerican Totonacs and 23 South American Bolivians. We analyzed common genomic regions from native Bolivian and Totonac populations to identify 324 highly predictive Native American ancestry informative markers (AIMs). As few as 40-50 of these AIMs perform nearly as well as large panels of random genome-wide SNPs for predicting and estimating Native American ancestry and admixture levels. These AIMs have greater New World vs. Old World specificity than previous AIMs sets. We identify highly-divergent New World SNPs that coincide with high-frequency haplotypes found at similar frequencies in all populations examined, including the HGDP Pima, Maya, Colombian, Karitiana, and Surui American populations. Some of these regions are potential candidates for positive selection. European admixture in the Bolivian sample is approximately 12%, though individual estimates range from 0-48%. We estimate that the admixture occurred ~360-384 years ago. Little evidence of European or African admixture was found in Totonac individuals. Bolivians with pre-Columbian mtDNA and Y-chromosome haplogroups had 5-30% autosomal European ancestry, demonstrating the limitations of Y-chromosome and mtDNA haplogroups and the need for autosomal ancestry informative markers for assessing ancestry in admixed populations.

  16. Genetic Variants in the Apoptosis Gene BCL2L1 Improve Response to Interferon-Based Treatment of Hepatitis C Virus Genotype 3 Infection

    PubMed Central

    Clausen, Louise Nygaard; Weis, Nina; Ladelund, Steen; Madsen, Lone; Lunding, Suzanne; Tarp, Britta; Christensen, Peer Brehm; Krarup, Henrik Bygum; Møller, Axel; Gerstoft, Jan; Clausen, Mette Rye; Benfield, Thomas

    2015-01-01

    Genetic variation upstream of the apoptosis pathway has been associated with outcome of hepatitis C virus (HCV) infection. We investigated genetic polymorphisms in the intrinsic apoptosis pathway to assess their influence on sustained virological response (SVR) to pegylated interferon-α and ribavirin (pegIFN/RBV) treatment of HCV genotypes 1 and 3 infections. We conducted a candidate gene association study in a prospective cohort of 201 chronic HCV-infected individuals undergoing treatment with pegIFN/RBV. Differences between groups were compared in logistic regression adjusted for age, HCV viral load and interleukin 28B genotypes. Four single nucleotide polymorphisms (SNPs) located in the B-cell lymphoma 2-like 1 (BCL2L1) gene were significantly associated with SVR. SVR rates were significantly higher for carriers of the beneficial rs1484994 CC genotypes. In multivariate logistic regression, the rs1484994 SNP combined CC + TC genotypes were associated with a 3.4 higher odds ratio (OR) in SVR for the HCV genotype 3 (p = 0.02). The effect estimate was similar for genotype 1, but the association did not reach statistical significance. In conclusion, anti-apoptotic SNPs in the BCL2L1 gene were predictive of SVR to pegIFN/RBV treatment in HCV genotypes 1 and 3 infected individuals. These SNPs may be used in prediction of SVR, but further studies are needed. PMID:25648321

  17. Docking-based modeling of protein-protein interfaces for extensive structural and functional characterization of missense mutations.

    PubMed

    Barradas-Bautista, Didier; Fernández-Recio, Juan

    2017-01-01

    Next-generation sequencing (NGS) technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level.

  18. A powerful approach reveals numerous expression quantitative trait haplotypes in multiple tissues.

    PubMed

    Ying, Dingge; Li, Mulin Jun; Sham, Pak Chung; Li, Miaoxin

    2018-04-26

    Recently many studies showed single nucleotide polymorphisms (SNPs) affect gene expression and contribute to development of complex traits/diseases in a tissue context-dependent manner. However, little is known about haplotype's influence on gene expression and complex traits, which reflects the interaction effect between SNPs. In the present study, we firstly proposed a regulatory region guided eQTL haplotype association analysis approach, and then systematically investigate the expression quantitative trait loci (eQTL) haplotypes in 20 different tissues by the approach. The approach has a powerful design of reducing computational burden by the utilization of regulatory predictions for candidate SNP selection and multiple testing corrections on non-independent haplotypes. The application results in multiple tissues showed that haplotype-based eQTLs not only increased the number of eQTL genes in a tissue specific manner, but were also enriched in loci that associated with complex traits in a tissue-matched manner. In addition, we found that tag SNPs of eQTL haplotypes from whole blood were selectively enriched in certain combination of regulatory elements (e.g. promoters and enhancers) according to predicted chromatin states. In summary, this eQTL haplotype detection approach, together with the application results, shed insights into synergistic effect of sequence variants on gene expression and their susceptibility to complex diseases. The executable application "eHaplo" is implemented in Java and is publicly available at http://grass.cgs.hku.hk/limx/ehaplo/. jonsonfox@gmail.com, limiaoxin@mail.sysu.edu.cn. Supplementary data are available at Bioinformatics online.

  19. Melanoma risk prediction using a multilocus genetic risk score in the Women's Health Initiative cohort.

    PubMed

    Cho, Hyunje G; Ransohoff, Katherine J; Yang, Lingyao; Hedlin, Haley; Assimes, Themistocles; Han, Jiali; Stefanick, Marcia; Tang, Jean Y; Sarin, Kavita Y

    2018-07-01

    Single-nucleotide polymorphisms (SNPs) associated with melanoma have been identified though genome-wide association studies. However, the combined impact of these SNPs on melanoma development remains unclear, particularly in postmenopausal women who carry a lower melanoma risk. We examine the contribution of a combined polygenic risk score on melanoma development in postmenopausal women. Genetic risk scores were calculated using 21 genome-wide association study-significant SNPs. Their combined effect on melanoma development was evaluated in 19,102 postmenopausal white women in the clinical trial and observational study arms of the Women's Health Initiative dataset. Compared to the tertile of weighted genetic risk score with the lowest genetic risk, the women in the tertile with the highest genetic risk were 1.9 times more likely to develop melanoma (95% confidence interval 1.50-2.42). The incremental change in c-index from adding genetic risk scores to age were 0.075 (95% confidence interval 0.041-0.109) for incident melanoma. Limitations include a lack of information on nevi count, Fitzpatrick skin type, family history of melanoma, and potential reporting and selection bias in the Women's Health Initiative cohort. Higher genetic risk is associated with increased melanoma prevalence and incidence in postmenopausal women, but current genetic information may have a limited role in risk prediction when phenotypic information is available. Copyright © 2018 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.

  20. Docking-based modeling of protein-protein interfaces for extensive structural and functional characterization of missense mutations

    PubMed Central

    2017-01-01

    Next-generation sequencing (NGS) technologies are providing genomic information for an increasing number of healthy individuals and patient populations. In the context of the large amount of generated genomic data that is being generated, understanding the effect of disease-related mutations at molecular level can contribute to close the gap between genotype and phenotype and thus improve prevention, diagnosis or treatment of a pathological condition. In order to fully characterize the effect of a pathological mutation and have useful information for prediction purposes, it is important first to identify whether the mutation is located at a protein-binding interface, and second to understand the effect on the binding affinity of the affected interaction/s. Computational methods, such as protein docking are currently used to complement experimental efforts and could help to build the human structural interactome. Here we have extended the original pyDockNIP method to predict the location of disease-associated nsSNPs at protein-protein interfaces, when there is no available structure for the protein-protein complex. We have applied this approach to the pathological interaction networks of six diseases with low structural data on PPIs. This approach can almost double the number of nsSNPs that can be characterized and identify edgetic effects in many nsSNPs that were previously unknown. This can help to annotate and interpret genomic data from large-scale population studies, and to achieve a better understanding of disease at molecular level. PMID:28841721

  1. Cost-effective HLA typing with tagging SNPs predicts celiac disease risk haplotypes in the Finnish, Hungarian, and Italian populations.

    PubMed

    Koskinen, Lotta; Romanos, Jihane; Kaukinen, Katri; Mustalahti, Kirsi; Korponay-Szabo, Ilma; Barisani, Donatella; Bardella, Maria Teresa; Ziberna, Fabiana; Vatta, Serena; Széles, György; Pocsai, Zsuzsa; Karell, Kati; Haimila, Katri; Adány, Róza; Not, Tarcisio; Ventura, Alessandro; Mäki, Markku; Partanen, Jukka; Wijmenga, Cisca; Saavalainen, Päivi

    2009-04-01

    Human leukocyte antigen (HLA) genes, located on chromosome 6p21.3, have a crucial role in susceptibility to various autoimmune and inflammatory diseases, such as celiac disease and type 1 diabetes. Certain HLA heterodimers, namely DQ2 (encoded by the DQA1*05 and DQB1*02 alleles) and DQ8 (DQA1*03 and DQB1*0302), are necessary for the development of celiac disease. Traditional genotyping of HLA genes is laborious, time-consuming, and expensive. A novel HLA-genotyping method, using six HLA-tagging single-nucleotide polymorphisms (SNPs) and suitable for high-throughput approaches, was described recently. Our aim was to validate this method in the Finnish, Hungarian, and Italian populations. The six previously reported HLA-tagging SNPs were genotyped in patients with celiac disease and in healthy individuals from Finland, Hungary, and two distinct regions of Italy. The potential of this method was evaluated in analyzing how well the tag SNP results correlate with the HLA genotypes previously determined using traditional HLA-typing methods. Using the tagging SNP method, it is possible to determine the celiac disease risk haplotypes accurately in Finnish, Hungarian, and Italian populations, with specificity and sensitivity ranging from 95% to 100%. In addition, it predicts homozygosity and heterozygosity for a risk haplotype, allowing studies on genotypic risk effects. The method is transferable between populations and therefore suited for large-scale research studies and screening of celiac disease among high-risk individuals or at the population level.

  2. Relevance of genetic relationship in GWAS and genomic prediction.

    PubMed

    Pereira, Helcio Duarte; Soriano Viana, José Marcelo; Andrade, Andréa Carla Bastos; Fonseca E Silva, Fabyano; Paes, Geísa Pinheiro

    2018-02-01

    The objective of this study was to analyze the relevance of relationship information on the identification of low heritability quantitative trait loci (QTLs) from a genome-wide association study (GWAS) and on the genomic prediction of complex traits in human, animal and cross-pollinating populations. The simulation-based data sets included 50 samples of 1000 individuals of seven populations derived from a common population with linkage disequilibrium. The populations had non-inbred and inbred progeny structure (50 to 200) with varying number of members (5 to 20). The individuals were genotyped for 10,000 single nucleotide polymorphisms (SNPs) and phenotyped for a quantitative trait controlled by 10 QTLs and 90 minor genes showing dominance. The SNP density was 0.1 cM and the narrow sense heritability was 25%. The QTL heritabilities ranged from 1.1 to 2.9%. We applied mixed model approaches for both GWAS and genomic prediction using pedigree-based and genomic relationship matrices. For GWAS, the observed false discovery rate was kept below the significance level of 5%, the power of detection for the low heritability QTLs ranged from 14 to 50%, and the average bias between significant SNPs and a QTL ranged from less than 0.01 to 0.23 cM. The QTL detection power was consistently higher using genomic relationship matrix. Regardless of population and training set size, genomic prediction provided higher prediction accuracy of complex trait when compared to pedigree-based prediction. The accuracy of genomic prediction when there is relatedness between individuals in the training set and the reference population is much higher than the value for unrelated individuals.

  3. The Echinococcus canadensis (G7) genome: a key knowledge of parasitic platyhelminth human diseases.

    PubMed

    Maldonado, Lucas L; Assis, Juliana; Araújo, Flávio M Gomes; Salim, Anna C M; Macchiaroli, Natalia; Cucher, Marcela; Camicia, Federico; Fox, Adolfo; Rosenzvit, Mara; Oliveira, Guilherme; Kamenetzky, Laura

    2017-02-27

    The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.

  4. Rice SNP-seek database update: new SNPs, indels, and queries.

    PubMed

    Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai

    2017-01-04

    We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Detection of single-nucleotide polymorphisms using gold nanoparticles and single-strand-specific nucleases.

    PubMed

    Chen, Yen-Ting; Hsu, Chiao-Ling; Hou, Shao-Yi

    2008-04-15

    The current study reports an assay approach that can detect single-nucleotide polymorphisms (SNPs) and identify the position of the point mutation through a single-strand-specific nuclease reaction and a gold nanoparticle assembly. The assay can be implemented via three steps: a single-strand-specific nuclease reaction that allows the enzyme to truncate the mutant DNA; a purification step that uses capture probe-gold nanoparticles and centrifugation; and a hybridization reaction that induces detector probe-gold nanoparticles, capture probe-gold nanoparticles, and the target DNA to form large DNA-linked three-dimensional aggregates of gold nanoparticles. At high temperature (63 degrees C in the current case), the purple color of the perfect match solution would not change to red, whereas a mismatched solution becomes red as the assembled gold nanoparticles separate. Using melting analysis, the position of the point mutation could be identified. This assay provides a convenient colorimetric detection that enables point mutation identification without the need for expensive mass spectrometry. To our knowledge, this is the first report concerning SNP detection based on a single-strand-specific nuclease reaction and a gold nanoparticle assembly.

  6. SNPs in NRXN1 and CHRNA5 are associated to smoking and regulation of GABAergic and glutamatergic pathways.

    PubMed

    Pérez-Rubio, Gloria; Pérez-Rodríguez, Martha E; Fernández-López, Juan Carlos; Ramírez-Venegas, Alejandra; García-Colunga, Jesús; Ávila-Moreno, Federico; Camarena, Angel; Sansores, Raúl H; Falfán-Valencia, Ramcés

    2016-07-01

    To identify genetic variants associated with greater tobacco consumption in a Mexican population. Daily smokers were classified as light smokers (LS; n = 742), heavy smokers (HS; n = 601) and nonsmokers (NS; n = 606). In the first stage, a genotyping microarray that included 347 SNPs in CHRNA2-CHRNA7/CHRNA10, CHRNB2-CHRNB4 and NRXN1 genes and 37 ancestry-informative markers was used to analyze 707 samples (187 HS, 328 LS and 192 NS). In the second stage, 14 SNPs from stage 1 were validated in the remaining samples (HS, LS and NS; n = 414 in each group) using real-time PCR. To predict the role of the associated SNPs, an in silico analysis was performed. Two SNPs in NRXN1 and two in CHRNA5 were associated with cigarette consumption, while rs10865246/C (NRXN1) was associated with high nicotine addiction. The in silico analysis revealed that rs1882296/T had a high level of homology with Hsa-miR-6740-5p, which encodes a putative miRNA that targets glutamate receptor subunits (GRIA2, GRID2) and GABA receptor subunits (GABRG1, GABRA4, GABRB2), while rs1882296/C had a high level of homology with Hsa-miR-6866-5p, which encodes a different miRNA that targets GRID2 and GABRB2. In a Mexican Mestizo population, greater consumption of cigarettes was influenced by polymorphisms in the NRXN1 and CHRNA5 genes. We proposed new hypotheses regarding the putative roles of miRNAs that influence the GABAergic and glutamatergic pathways in smoking addiction.

  7. Resequencing of Capsicum annuum parental lines (YCM334 and Taean) for the genetic analysis of bacterial wilt resistance.

    PubMed

    Kang, Yang Jae; Ahn, Yul-Kyun; Kim, Ki-Taek; Jun, Tae-Hwan

    2016-10-28

    Bacterial wilt (BW) is a widespread plant disease that affects a broad range of dicot and monocot hosts and is particularly harmful for solanaceous plants, such as pepper, tomato, and eggplant. The pathogen responsible for BW is the soil-borne bacterium, Ralstonia solanacearum, which can adapt to diverse temperature conditions and is found in climates ranging from tropical to temperate. Resistance to BW has been detected in some pepper plant lines; however, the genomic loci and alleles that mediate this are poorly studied in this species. We resequenced the pepper cultivars YCM344 and Taean, which are parental recombinant inbred lines (RIL) that display differential resistance phenotypes against BW, with YCM344 being highly resistant to infection with this pathogen. We identified novel single nucleotide polymorphisms (SNPs) and insertions/deletions (Indels) that are only present in both parental lines, as compared to the reference genome and further determined variations that distinguish these two cultivars from one another. We then identified potentially informative SNPs that were found in genes related to those that have been previously associated with disease resistance, such as the R genes and stress response genes. Moreover, via comparative analysis, we identified SNPs located in genomic regions that have homology to known resistance genes in the tomato genomes. From our SNP profiling in both parental lines, we could identify SNPs that are potentially responsible for BW resistance, and practically, these may be used as markers for assisted breeding schemes using these populations. We predict that our analyses will be valuable for both better understanding the YCM334/Taean-derived populations, as well as for enhancing our knowledge of critical SNPs present in the pepper genome.

  8. Evaluation of a SNP map of 6q24-27 confirms diabetic nephropathy loci and identifies novel associations type 2 diabetes patients enriched with nephropathy from an African American population

    PubMed Central

    Leak, Tennille S.; Mychaleckyj, Josyf C.; Smith, Shelly G.; Keene, Keith L.; Gordon, Candace J.; Hicks, Pamela J.; Freedman, Barry I.; Bowden, Donald W.; Sale, Michèle M.

    2009-01-01

    Previously we performed a genome scan for type 2 diabetes (T2DM) using 638 African-American (AA) affected sibling pairs from 247 families; non-parametric linkage analysis suggested evidence of linkage at 6q24-27 (LOD 2.26). To comprehensively evaluate this region we performed a 2-stage association study by first constructing a SNP map of 754 SNPs selected from HapMap on the basis of linkage disequilibrium (LD) in 300 AAT2DM-ESRD subjects, 311 AA controls, 43 European American controls and 45 Yoruba Nigerian samples (Set 1). Replication analyses were conducted in an independent population of 283 AA T2DM-ESRD subjects and 282 AA controls (Set 2). In addition, we adjusted for the impact of admixture on association results by using ancestry informative markers (AIMs). In Stage 1, 137 (18.2%) SNPs showed nominal evidence of association (P<0.05) in one or more of tests of association: allelic (n=33), dominant (n=36), additive (n=29), or recessive (n=34) genotypic models, and 2- (n=47) and 3-SNP (n=43) haplotypic analyses. These SNPs were selected for follow-up genotyping. Stage 2 analyses confirmed association with a predicted 2-SNP “risk” haplotype in the PARK2 gene. Also, two intergenic SNPs showed consistent genotypic association with T2DM-ESRD: rs12197043 and rs4897081. Combined analysis of all subjects from both stages revealed nominal associations with 17 SNPs within genes; including suggestive associations in ESR1 and PARK2. This study confirms known diabetic nephropathy loci and identifies potentially novel susceptibility variants located within 6q24-27 in AA. PMID:18560894

  9. Advances in Exercise, Fitness, and Performance Genomics in 2015.

    PubMed

    Sarzynski, Mark A; Loos, Ruth J F; Lucia, Alejandro; Pérusse, Louis; Roth, Stephen M; Wolfarth, Bernd; Rankinen, Tuomo; Bouchard, Claude

    2016-10-01

    This review of the exercise genomics literature encompasses the highest-quality articles published in 2015 across seven broad topics: physical activity behavior, muscular strength and power, cardiorespiratory fitness and endurance performance, body weight and adiposity, insulin and glucose metabolism, lipid and lipoprotein metabolism, and hemodynamic traits. One study used a quantitative trait locus for wheel running in mice to identify single nucleotide polymorphisms (SNPs) in humans associated with physical activity levels. Two studies examined the association of candidate gene ACTN3 R577X genotype on muscular performance. Several studies examined gene-physical activity interactions on cardiometabolic traits. One study showed that physical inactivity exacerbated the body mass index (BMI)-increasing effect of an FTO SNP but only in individuals of European ancestry, whereas another showed that high-density lipoprotein cholesterol (HDL-C) SNPs from genome-wide association studies exerted a smaller effect in active individuals. Increased levels of moderate-to-vigorous-intensity physical activity were associated with higher Matsuda insulin sensitivity index in PPARG Ala12 carriers but not Pro12 homozygotes. One study combined genome-wide and transcriptome-wide profiling to identify genes and SNPs associated with the response of triglycerides (TG) to exercise training. The genome-wide association study results showed that four SNPs accounted for all of the heritability of △TG, whereas the baseline expression of 11 genes predicted 27% of △TG. A composite SNP score based on the top eight SNPs derived from the genomic and transcriptomic analyses was the strongest predictor of ΔTG, explaining 14% of the variance. The review concludes with a discussion of a conceptual framework defining some of the critical conditions for exercise genomics studies and highlights the importance of the recently launched National Institutes of Health Common Fund program titled "Molecular Transducers of Physical Activity in Humans."

  10. Strong influence of dietary intake and physical activity on body fatness in elderly Japanese men: age-associated loss of polygenic resistance against obesity.

    PubMed

    Tanisawa, Kumpei; Ito, Tomoko; Sun, Xiaomin; Ise, Ryuken; Oshima, Satomi; Cao, Zhen-Bo; Sakamoto, Shizuo; Tanaka, Masashi; Higuchi, Mitsuru

    2014-09-01

    Genome-wide association studies identified single nucleotide polymorphisms (SNPs) associated with body mass index (BMI) in middle-aged populations; however, it is unclear whether these SNPs are associated with body fatness in elderly people. We examined the association between genetic risk score (GRS) from BMI-associated SNPs and body fatness in elderly Japanese men. We also examined the contribution of GRS, dietary macronutrient intake, and physical activity to body fatness by different age groups. GRS was calculated from 10 BMI-associated SNPs in 84 middle-aged (30-64 years) and 97 elderly (65-79 years) Japanese men; subjects were divided into low, middle, and high GRS groups. Dietary macronutrient intake was assessed using a questionnaire, and physical activity was evaluated using both a questionnaire and an accelerometer. The middle-aged individuals with a high GRS had greater BMI; waist circumference; and total abdominal fat, visceral fat, and subcutaneous fat areas than the middle-aged individuals with low GRS, whereas the indicators were not different between the GRS groups in elderly individuals. Multiple linear regression analysis showed that GRS was the strongest predictor of BMI, total abdominal fat, and visceral fat in the middle-aged group, whereas fat, alcohol, and protein intakes or vigorous-intensity physical activity were more strongly associated with these indicators than was GRS in the elderly group. These results suggest that GRS from BMI-associated SNPs is not predictive of body fatness in elderly Japanese men. The stronger contribution of dietary macronutrient intake and physical activity to body fatness may attenuate the genetic predisposition in elderly men.

  11. Genetic signatures in choline and 1-carbon metabolism are associated with the severity of hepatic steatosis

    PubMed Central

    Corbin, Karen D.; Abdelmalek, Manal F.; Spencer, Melanie D.; da Costa, Kerry-Ann; Galanko, Joseph A.; Sha, Wei; Suzuki, Ayako; Guy, Cynthia D.; Cardona, Diana M.; Torquati, Alfonso; Diehl, Anna Mae; Zeisel, Steven H.

    2013-01-01

    Choline metabolism is important for very low-density lipoprotein secretion, making this nutritional pathway an important contributor to hepatic lipid balance. The purpose of this study was to assess whether the cumulative effects of multiple single nucleotide polymorphisms (SNPs) across genes of choline/1-carbon metabolism and functionally related pathways increase susceptibility to developing hepatic steatosis. In biopsy-characterized cases of nonalcoholic fatty liver disease and controls, we assessed 260 SNPs across 21 genes in choline/1-carbon metabolism. When SNPs were examined individually, using logistic regression, we only identified a single SNP (PNPLA3 rs738409) that was significantly associated with severity of hepatic steatosis after adjusting for confounders and multiple comparisons (P=0.02). However, when groupings of SNPs in similar metabolic pathways were defined using unsupervised hierarchical clustering, we identified groups of subjects with shared SNP signatures that were significantly correlated with steatosis burden (P=0.0002). The lowest and highest steatosis clusters could also be differentiated by ethnicity. However, unique SNP patterns defined steatosis burden irrespective of ethnicity. Our results suggest that analysis of SNP patterns in genes of choline/1-carbon metabolism may be useful for prediction of severity of steatosis in specific subsets of people, and the metabolic inefficiencies caused by these SNPs should be examined further.—Corbin, K. D., Abdelmalek, M. F., Spencer, M. D., da Costa, K.-A., Galanko, J. A., Sha, W., Suzuki, A., Guy, C. D., Cardona, D. M., Torquati, A., Diehl, A. M., Zeisel, S. H. Genetic signatures in choline and 1-carbon metabolism are associated with the severity of hepatic steatosis. PMID:23292069

  12. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Du, Zhongli; Department of Etiology and Carcinogenesis; Zhang, Wencheng

    Purpose: To investigate whether single nucleotide polymorphisms (SNPs) in the ataxia telangiectasia mutated (ATM) gene are associated with survival in patients with esophageal squamous cell carcinoma (ESCC) receiving radiation therapy or chemoradiation therapy or surgery only. Methods and Materials: Four tagSNPs of ATM were genotyped in 412 individuals with clinical stage III or IV ESCC receiving radiation therapy or chemoradiation therapy, and in 388 individuals with stage I, II, or III ESCC treated with surgery only. Overall survival time of ESCC among different genotypes was estimated by Kaplan-Meier plot, and the significance was examined by log-rank test. The hazard ratios (HRs)more » and 95% confidence intervals (CIs) for death from ESCC among different genotypes were computed by a Cox proportional regression model. Results: We found 2 SNPs, rs664143 and rs664677, associated with survival time of ESCC patients receiving radiation therapy. Individuals with the rs664143A allele had poorer median survival time compared with the rs664143G allele (14.0 vs 20.0 months), with the HR for death being 1.45 (95% CI 1.12-1.89). Individuals with the rs664677C allele also had worse median survival time than those with the rs664677T allele (14.0 vs 23.5 months), with the HR of 1.57 (95% CI 1.18-2.08). Stratified analysis showed that these associations were present in both stage III and IV cancer and different radiation therapy techniques. Significant associations were also found between the SNPs and locosregional progression or progression-free survival. No association between these SNPs and survival time was detected in ESCC patients treated with surgery only. Conclusion: These results suggest that the ATM polymorphisms might serve as independent biomarkers for predicting prognosis in ESCC patients receiving radiation therapy.« less

  13. Genome-Wide Association Study of Seed Dormancy and the Genomic Consequences of Improvement Footprints in Rice (Oryza sativa L.)

    PubMed Central

    Lu, Qing; Niu, Xiaojun; Zhang, Mengchen; Wang, Caihong; Xu, Qun; Feng, Yue; Yang, Yaolong; Wang, Shan; Yuan, Xiaoping; Yu, Hanyong; Wang, Yiping; Chen, Xiaoping; Liang, Xuanqiang; Wei, Xinghua

    2018-01-01

    Seed dormancy is an important agronomic trait affecting grain yield and quality because of pre-harvest germination and is influenced by both environmental and genetic factors. However, our knowledge of the factors controlling seed dormancy remains limited. To better reveal the molecular mechanism underlying this trait, a genome-wide association study was conducted in an indica-only population consisting of 453 accessions genotyped using 5,291 SNPs. Nine known and new significant SNPs were identified on eight chromosomes. These lead SNPs explained 34.9% of the phenotypic variation, and four of them were designed as dCAPS markers in the hope of accelerating molecular breeding. Moreover, a total of 212 candidate genes was predicted and eight candidate genes showed plant tissue-specific expression in expression profile data from different public bioinformatics databases. In particular, LOC_Os03g10110, which had a maize homolog involved in embryo development, was identified as a candidate regulator for further biological function investigations. Additionally, a polymorphism information content ratio method was used to screen improvement footprints and 27 selective sweeps were identified, most of which harbored domestication-related genes. Further studies suggested that three significant SNPs were adjacent to the candidate selection signals, supporting the accuracy of our genome-wide association study (GWAS) results. These findings show that genome-wide screening for selective sweeps can be used to identify new improvement-related DNA regions, although the phenotypes are unknown. This study enhances our knowledge of the genetic variation in seed dormancy, and the new dormancy-associated SNPs will provide real benefits in molecular breeding. PMID:29354150

  14. Variation in Telangiectasia Predisposing Genes Is Associated With Overall Radiation Toxicity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tanteles, George A.; Department of Cancer Studies and Molecular Medicine, University Hospitals of Leicester, Leicester Royal Infirmary, Leicester; Murray, Robert J.S.

    2012-11-15

    Purpose: In patients receiving radiotherapy for breast cancer where the heart is within the radiation field, cutaneous telangiectasiae could be a marker of potential radiation-induced heart disease. We hypothesized that single nucleotide polymorphisms (SNPs) in genes known to cause heritable telangiectasia-associated disorders could predispose to such late, normal tissue vascular damage. Methods and Materials: The relationship between cutaneous telangiectasia as a late normal tissue radiation injury phenotype in 633 breast cancer patients treated with radiotherapy was examined. Patients were clinically assessed for the presence of cutaneous telangiectasia and genotyped at nine SNPs in three candidate genes. Candidate SNPs were withinmore » the endoglin (ENG) and activin A receptor, type II-like 1 (ACVRL1) genes, mutations in which cause hereditary hemorrhagic telangiectasia and the ataxia-telangiectasia mutated (ATM) gene associated with ataxia-telangiectasia. Results: A total of 121 (19.1%) patients exhibited a degree of cutaneous telangiectasiae on clinical examination. Regression was used to examine the associations between the presence of telangiectasiae in patients who underwent breast-conserving surgery, controlling for the effects of boost and known brassiere size (n=388), and individual geno- or haplotypes. Inheritance of ACVRL1 SNPs marginally contributed to the risk of cutaneous telangiectasiae. Haplotypic analysis revealed a stronger association between inheritance of a ATM haplotype and the presence of cutaneous telangiectasiae, fibrosis and overall toxicity. No significant association was observed between telangiectasiae and the coinheritance of the candidate ENG SNPs. Conclusions: Genetic variation in the ATM gene influences reaction to radiotherapy through both vascular damage and increased fibrosis. The predisposing variation in the ATM gene will need to be better defined to optimize it as a predictive marker for assessing radiotherapy late effects.« less

  15. Incorporation of causative quantitative trait nucleotides in single-step GBLUP.

    PubMed

    Fragomeni, Breno O; Lourenco, Daniela A L; Masuda, Yutaka; Legarra, Andres; Misztal, Ignacy

    2017-07-26

    Much effort is put into identifying causative quantitative trait nucleotides (QTN) in animal breeding, empowered by the availability of dense single nucleotide polymorphism (SNP) information. Genomic selection using traditional SNP information is easily implemented for any number of genotyped individuals using single-step genomic best linear unbiased predictor (ssGBLUP) with the algorithm for proven and young (APY). Our aim was to investigate whether ssGBLUP is useful for genomic prediction when some or all QTN are known. Simulations included 180,000 animals across 11 generations. Phenotypes were available for all animals in generations 6 to 10. Genotypes for 60,000 SNPs across 10 chromosomes were available for 29,000 individuals. The genetic variance was fully accounted for by 100 or 1000 biallelic QTN. Raw genomic relationship matrices (GRM) were computed from (a) unweighted SNPs, (b) unweighted SNPs and causative QTN, (c) SNPs and causative QTN weighted with results obtained with genome-wide association studies, (d) unweighted SNPs and causative QTN with simulated weights, (e) only unweighted causative QTN, (f-h) as in (b-d) but using only the top 10% causative QTN, and (i) using only causative QTN with simulated weight. Predictions were computed by pedigree-based BLUP (PBLUP) and ssGBLUP. Raw GRM were blended with 1 or 5% of the numerator relationship matrix, or 1% of the identity matrix. Inverses of GRM were obtained directly or with APY. Accuracy of breeding values for 5000 genotyped animals in the last generation with PBLUP was 0.32, and for ssGBLUP it increased to 0.49 with an unweighted GRM, 0.53 after adding unweighted QTN, 0.63 when QTN weights were estimated, and 0.89 when QTN weights were based on true effects known from the simulation. When the GRM was constructed from causative QTN only, accuracy was 0.95 and 0.99 with blending at 5 and 1%, respectively. Accuracies simulating 1000 QTN were generally lower, with a similar trend. Accuracies using the APY inverse were equal or higher than those with a regular inverse. Single-step GBLUP can account for causative QTN via a weighted GRM. Accuracy gains are maximum when variances of causative QTN are known and blending is at 1%.

  16. ETHNOPRED: a novel machine learning method for accurate continental and sub-continental ancestry identification and population stratification correction.

    PubMed

    Hajiloo, Mohsen; Sapkota, Yadav; Mackey, John R; Robson, Paula; Greiner, Russell; Damaraju, Sambasivarao

    2013-02-22

    Population stratification is a systematic difference in allele frequencies between subpopulations. This can lead to spurious association findings in the case-control genome wide association studies (GWASs) used to identify single nucleotide polymorphisms (SNPs) associated with disease-linked phenotypes. Methods such as self-declared ancestry, ancestry informative markers, genomic control, structured association, and principal component analysis are used to assess and correct population stratification but each has limitations. We provide an alternative technique to address population stratification. We propose a novel machine learning method, ETHNOPRED, which uses the genotype and ethnicity data from the HapMap project to learn ensembles of disjoint decision trees, capable of accurately predicting an individual's continental and sub-continental ancestry. To predict an individual's continental ancestry, ETHNOPRED produced an ensemble of 3 decision trees involving a total of 10 SNPs, with 10-fold cross validation accuracy of 100% using HapMap II dataset. We extended this model to involve 29 disjoint decision trees over 149 SNPs, and showed that this ensemble has an accuracy of ≥ 99.9%, even if some of those 149 SNP values were missing. On an independent dataset, predominantly of Caucasian origin, our continental classifier showed 96.8% accuracy and improved genomic control's λ from 1.22 to 1.11. We next used the HapMap III dataset to learn classifiers to distinguish European subpopulations (North-Western vs. Southern), East Asian subpopulations (Chinese vs. Japanese), African subpopulations (Eastern vs. Western), North American subpopulations (European vs. Chinese vs. African vs. Mexican vs. Indian), and Kenyan subpopulations (Luhya vs. Maasai). In these cases, ETHNOPRED produced ensembles of 3, 39, 21, 11, and 25 disjoint decision trees, respectively involving 31, 502, 526, 242 and 271 SNPs, with 10-fold cross validation accuracy of 86.5% ± 2.4%, 95.6% ± 3.9%, 95.6% ± 2.1%, 98.3% ± 2.0%, and 95.9% ± 1.5%. However, ETHNOPRED was unable to produce a classifier that can accurately distinguish Chinese in Beijing vs. Chinese in Denver. ETHNOPRED is a novel technique for producing classifiers that can identify an individual's continental and sub-continental heritage, based on a small number of SNPs. We show that its learned classifiers are simple, cost-efficient, accurate, transparent, flexible, fast, applicable to large scale GWASs, and robust to missing values.

  17. Usage of mitochondrial D-loop variation to predict risk for Huntington disease.

    PubMed

    Mousavizadeh, Kazem; Rajabi, Peyman; Alaee, Mahsa; Dadgar, Sepideh; Houshmand, Massoud

    2015-08-01

    Huntington's disease (HD) is an inherited autosomal neurodegenerative disease caused by the abnormal expansion of the CAG repeats in the Huntingtin (Htt) gene. It has been proven that mitochondrial dysfunction is contributed to the pathogenesis of Huntington's disease. The mitochondrial displacement loop (D-loop) is proven to accumulate mutations at a higher rate than other regions of mtDNA. Thus, we hypothesized that specific SNPs in the D-loop may contribute to the pathogenesis of Huntington's disease. In the present study, 30 patients with Huntington's disease and 463 healthy controls were evaluated for mitochondrial mutation sites within the D-loop region using PCR-sequencing method. Sequence analysis revealed 35 variations in HD group from Cambridge Mitochondrial Sequences. A significant difference (p < 0.05) was seen between patients and control group in eight SNPs. Polymorphisms at C16069T, T16126C, T16189C, T16519C and C16223T were correlated with an increased risk of HD while SNPs at C16150T, T16086C and T16195C were associated with a decreased risk of Huntington's disease.

  18. The UCSC genome browser and associated tools

    PubMed Central

    Haussler, David; Kent, W. James

    2013-01-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting. PMID:22908213

  19. The UCSC genome browser and associated tools.

    PubMed

    Kuhn, Robert M; Haussler, David; Kent, W James

    2013-03-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.

  20. Analyses of Hypomethylated Oil Palm Gene Space

    PubMed Central

    Jayanthi, Nagappan; Mohd-Amin, Ab Halim; Azizi, Norazah; Chan, Kuang-Lim; Maqbool, Nauman J.; Maclean, Paul; Brauning, Rudi; McCulloch, Alan; Moraga, Roger; Ong-Abdullah, Meilina; Singh, Rajinder

    2014-01-01

    Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm. PMID:24497974

  1. Transcriptome-enabled marker discovery and mapping of plastochron-related genes in Petunia spp.

    PubMed

    Guo, Yufang; Wiegert-Rininger, Krystle E; Vallejo, Veronica A; Barry, Cornelius S; Warner, Ryan M

    2015-09-24

    Petunia (Petunia × hybrida), derived from a hybrid between P. axillaris and P. integrifolia, is one of the most economically important bedding plant crops and Petunia spp. serve as model systems for investigating the mechanisms underlying diverse mating systems and pollination syndromes. In addition, we have previously described genetic variation and quantitative trait loci (QTL) related to petunia development rate and morphology, which represent important breeding targets for the floriculture industry to improve crop production and performance. Despite the importance of petunia as a crop, the floriculture industry has been slow to adopt marker assisted selection to facilitate breeding strategies and there remains a limited availability of sequences and molecular markers from the genus compared to other economically important members of the Solanaceae family such as tomato, potato and pepper. Here we report the de novo assembly, annotation and characterization of transcriptomes from P. axillaris, P. exserta and P. integrifolia. Each transcriptome assembly was derived from five tissue libraries (callus, 3-week old seedlings, shoot apices, flowers of mixed developmental stages, and trichomes). A total of 74,573, 54,913, and 104,739 assembled transcripts were recovered from P. axillaris, P. exserta and P. integrifolia, respectively and following removal of multiple isoforms, 32,994 P. axillaris, 30,225 P. exserta, and 33,540 P. integrifolia high quality representative transcripts were extracted for annotation and expression analysis. The transcriptome data was mined for single nucleotide polymorphisms (SNP) and simple sequence repeat (SSR) markers, yielding 89,007 high quality SNPs and 2949 SSRs, respectively. 15,701 SNPs were computationally converted into user-friendly cleaved amplified polymorphic sequence (CAPS) markers and a subset of SNP and CAPS markers were experimentally verified. CAPS markers developed from plastochron-related homologous transcripts from P. axillaris were mapped in an interspecific Petunia population and evaluated for co-localization with QTL for development rate. The high quality of the three Petunia spp. transcriptomes coupled with the utility of the SNP data will serve as a resource for further exploration of genetic diversity within the genus and will facilitate efforts to develop genetic and physical maps to aid the identification of QTL associated with traits of interest.

  2. Development of Molecular Markers Linked to Powdery Mildew Resistance Gene Pm4b by Combining SNP Discovery from Transcriptome Sequencing Data with Bulked Segregant Analysis (BSR-Seq) in Wheat.

    PubMed

    Wu, Peipei; Xie, Jingzhong; Hu, Jinghuang; Qiu, Dan; Liu, Zhiyong; Li, Jingting; Li, Miaomiao; Zhang, Hongjun; Yang, Li; Liu, Hongwei; Zhou, Yang; Zhang, Zhongjun; Li, Hongjie

    2018-01-01

    Powdery mildew resistance gene Pm4b , originating from Triticum persicum , is effective against the prevalent Blumeria graminis f. sp. tritici ( Bgt ) isolates from certain regions of wheat production in China. The lack of tightly linked molecular markers with the target gene prevents the precise identification of Pm4b during the application of molecular marker-assisted selection (MAS). The strategy that combines the RNA-Seq technique and the bulked segregant analysis (BSR-Seq) was applied in an F 2:3 mapping population (237 families) derived from a pair of isogenic lines VPM1/7 ∗ Bainong 3217 F 4 (carrying Pm4b ) and Bainong 3217 to develop more closely linked molecular markers. RNA-Seq analysis of the two phenotypically contrasting RNA bulks prepared from the representative F 2:3 families generated 20,745,939 and 25,867,480 high-quality read pairs, and 82.8 and 80.2% of them were uniquely mapped to the wheat whole genome draft assembly for the resistant and susceptible RNA bulks, respectively. Variant calling identified 283,866 raw single nucleotide polymorphisms (SNPs) and InDels between the two bulks. The SNPs that were closely associated with the powdery mildew resistance were concentrated on chromosome 2AL. Among the 84 variants that were potentially associated with the disease resistance trait, 46 variants were enriched in an about 25 Mb region at the distal end of chromosome arm 2AL. Four Pm4b -linked SNP markers were developed from these variants. Based on the sequences of Chinese Spring where these polymorphic SNPs were located, 98 SSR primer pairs were designed to develop distal markers flanking the Pm4b gene. Three SSR markers, Xics13 , Xics43 , and Xics76 , were incorporated in the new genetic linkage map, which located Pm4b in a 3.0 cM genetic interval spanning a 6.7 Mb physical genomic region. This region had a collinear relationship with Brachypodium distachyon chromosome 5, rice chromosome 4, and sorghum chromosome 6. Seven genes associated with disease resistance were predicted in this collinear genomic region, which included C2 domain protein, peroxidase activity protein, protein kinases of PKc_like super family, Mlo family protein, and catalytic domain of the serine/threonine kinases (STKc_IRAK like super family). The markers developed in the present study facilitate identification of Pm4b during its MAS practice.

  3. Development of Molecular Markers Linked to Powdery Mildew Resistance Gene Pm4b by Combining SNP Discovery from Transcriptome Sequencing Data with Bulked Segregant Analysis (BSR-Seq) in Wheat

    PubMed Central

    Wu, Peipei; Xie, Jingzhong; Hu, Jinghuang; Qiu, Dan; Liu, Zhiyong; Li, Jingting; Li, Miaomiao; Zhang, Hongjun; Yang, Li; Liu, Hongwei; Zhou, Yang; Zhang, Zhongjun; Li, Hongjie

    2018-01-01

    Powdery mildew resistance gene Pm4b, originating from Triticum persicum, is effective against the prevalent Blumeria graminis f. sp. tritici (Bgt) isolates from certain regions of wheat production in China. The lack of tightly linked molecular markers with the target gene prevents the precise identification of Pm4b during the application of molecular marker-assisted selection (MAS). The strategy that combines the RNA-Seq technique and the bulked segregant analysis (BSR-Seq) was applied in an F2:3 mapping population (237 families) derived from a pair of isogenic lines VPM1/7∗Bainong 3217 F4 (carrying Pm4b) and Bainong 3217 to develop more closely linked molecular markers. RNA-Seq analysis of the two phenotypically contrasting RNA bulks prepared from the representative F2:3 families generated 20,745,939 and 25,867,480 high-quality read pairs, and 82.8 and 80.2% of them were uniquely mapped to the wheat whole genome draft assembly for the resistant and susceptible RNA bulks, respectively. Variant calling identified 283,866 raw single nucleotide polymorphisms (SNPs) and InDels between the two bulks. The SNPs that were closely associated with the powdery mildew resistance were concentrated on chromosome 2AL. Among the 84 variants that were potentially associated with the disease resistance trait, 46 variants were enriched in an about 25 Mb region at the distal end of chromosome arm 2AL. Four Pm4b-linked SNP markers were developed from these variants. Based on the sequences of Chinese Spring where these polymorphic SNPs were located, 98 SSR primer pairs were designed to develop distal markers flanking the Pm4b gene. Three SSR markers, Xics13, Xics43, and Xics76, were incorporated in the new genetic linkage map, which located Pm4b in a 3.0 cM genetic interval spanning a 6.7 Mb physical genomic region. This region had a collinear relationship with Brachypodium distachyon chromosome 5, rice chromosome 4, and sorghum chromosome 6. Seven genes associated with disease resistance were predicted in this collinear genomic region, which included C2 domain protein, peroxidase activity protein, protein kinases of PKc_like super family, Mlo family protein, and catalytic domain of the serine/threonine kinases (STKc_IRAK like super family). The markers developed in the present study facilitate identification of Pm4b during its MAS practice. PMID:29491869

  4. Genome-wide admixture and association study of subclinical atherosclerosis in the Women’s Interagency HIV Study (WIHS)

    PubMed Central

    Shendre, Aditi; Wiener, Howard W.; Irvin, Marguerite R.; Aouizerat, Bradley E.; Overton, Edgar T.; Lazar, Jason; Liu, Chenglong; Hodis, Howard N.; Limdi, Nita A.; Weber, Kathleen M.; Zhi, Degui; Floris-Moore, Michelle A.; Ofotokun, Ighovwerha; Qi, Qibin; Hanna, David B.; Kaplan, Robert C.

    2017-01-01

    Cardiovascular disease (CVD) is a major comorbidity among HIV-infected individuals. Common carotid artery intima-media thickness (cCIMT) is a valid and reliable subclinical measure of atherosclerosis and is known to predict CVD. We performed genome-wide association (GWA) and admixture analysis among 682 HIV-positive and 288 HIV-negative Black, non-Hispanic women from the Women’s Interagency HIV study (WIHS) cohort using a combined and stratified analysis approach. We found some suggestive associations but none of the SNPs reached genome-wide statistical significance in our GWAS analysis. The top GWAS SNPs were rs2280828 in the region intergenic to mediator complex subunit 30 and exostosin glycosyltransferase 1 (MED30 | EXT1) among all women, rs2907092 in the catenin delta 2 (CTNND2) gene among HIV-positive women, and rs7529733 in the region intergenic to family with sequence similarity 5, member C and regulator of G-protein signaling 18 (FAM5C | RGS18) genes among HIV-negative women. The most significant local European ancestry associations were in the region intergenic to the zinc finger and SCAN domain containing 5D gene and NADH: ubiquinone oxidoreductase complex assembly factor 1 (ZSCAN5D | NDUF1) pseudogene on chromosome 19 among all women, in the region intergenic to vomeronasal 1 receptor 6 pseudogene and zinc finger protein 845 (VN1R6P | ZNF845) gene on chromosome 19 among HIV-positive women, and in the region intergenic to the SEC23-interacting protein and phosphatidic acid phosphatase type 2 domain containing 1A (SEC23IP | PPAPDC1A) genes located on chromosome 10 among HIV-negative women. A number of previously identified SNP associations with cCIMT were also observed and included rs2572204 in the ryanodine receptor 3 (RYR3) and an admixture region in the secretion-regulating guanine nucleotide exchange factor (SERGEF) gene. We report several SNPs and gene regions in the GWAS and admixture analysis, some of which are common across HIV-positive and HIV-negative women as demonstrated using meta-analysis, and also across the two analytic approaches (i.e., GWA and admixture). These findings suggest that local European ancestry plays an important role in genetic associations of cCIMT among black women from WIHS along with other environmental factors that are related to CVD and may also be triggered by HIV. These findings warrant confirmation in independent samples. PMID:29206233

  5. Association of Functional SNPs in Pig Calpastatin Regulatory Regions with Tenderness

    USDA-ARS?s Scientific Manuscript database

    The identification of predictive DNA markers for pork quality would allow U.S. pork producers and breeders to more quickly and efficiently select genetically superior animals for production of consistent, high quality meat. Genome scans have identified QTL for tenderness on pig chromosome 2 which ha...

  6. Genetic risk prediction and neurobiological understanding of alcoholism.

    PubMed

    Levey, D F; Le-Niculescu, H; Frank, J; Ayalew, M; Jain, N; Kirlin, B; Learman, R; Winiger, E; Rodd, Z; Shekhar, A; Schork, N; Kiefer, F; Kiefe, F; Wodarz, N; Müller-Myhsok, B; Dahmen, N; Nöthen, M; Sherva, R; Farrer, L; Smith, A H; Kranzler, H R; Rietschel, M; Gelernter, J; Niculescu, A B

    2014-05-20

    We have used a translational Convergent Functional Genomics (CFG) approach to discover genes involved in alcoholism, by gene-level integration of genome-wide association study (GWAS) data from a German alcohol dependence cohort with other genetic and gene expression data, from human and animal model studies, similar to our previous work in bipolar disorder and schizophrenia. A panel of all the nominally significant P-value SNPs in the top candidate genes discovered by CFG  (n=135 genes, 713 SNPs) was used to generate a genetic  risk prediction score (GRPS), which showed a trend towards significance (P=0.053) in separating  alcohol dependent individuals from controls in an independent German test cohort. We then validated and prioritized our top findings from this discovery work, and subsequently tested them in three independent cohorts, from two continents. A panel of all the nominally significant P-value single-nucleotide length polymorphisms (SNPs) in the top candidate genes discovered by CFG (n=135 genes, 713 SNPs) were used to generate a Genetic Risk Prediction Score (GRPS), which showed a trend towards significance (P=0.053) in separating alcohol-dependent individuals from controls in an independent German test cohort. In order to validate and prioritize the key genes that drive behavior without some of the pleiotropic environmental confounds present in humans, we used a stress-reactive animal model of alcoholism developed by our group, the D-box binding protein (DBP) knockout mouse, consistent with the surfeit of stress theory of addiction proposed by Koob and colleagues. A much smaller panel (n=11 genes, 66 SNPs) of the top CFG-discovered genes for alcoholism, cross-validated and prioritized by this stress-reactive animal model showed better predictive ability in the independent German test cohort (P=0.041). The top CFG scoring gene for alcoholism from the initial discovery step, synuclein alpha (SNCA) remained the top gene after the stress-reactive animal model cross-validation. We also tested this small panel of genes in two other independent test cohorts from the United States, one with alcohol dependence (P=0.00012) and one with alcohol abuse (a less severe form of alcoholism; P=0.0094). SNCA by itself was able to separate alcoholics from controls in the alcohol-dependent cohort (P=0.000013) and the alcohol abuse cohort (P=0.023). So did eight other genes from the panel of 11 genes taken individually, albeit to a lesser extent and/or less broadly across cohorts. SNCA, GRM3 and MBP survived strict Bonferroni correction for multiple comparisons. Taken together, these results suggest that our stress-reactive DBP animal model helped to validate and prioritize from the CFG-discovered genes some of the key behaviorally relevant genes for alcoholism. These genes fall into a series of biological pathways involved in signal transduction, transmission of nerve impulse (including myelination) and cocaine addiction. Overall, our work provides leads towards a better understanding of illness, diagnostics and therapeutics, including treatment with omega-3 fatty acids. We also examined the overlap between the top candidate genes for alcoholism from this work and the top candidate genes for bipolar disorder, schizophrenia, anxiety from previous CFG analyses conducted by us, as well as cross-tested genetic risk predictions. This revealed the significant genetic overlap with other major psychiatric disorder domains, providing a basis for comorbidity and dual diagnosis, and placing alcohol use in the broader context of modulating the mental landscape.

  7. Small RNA-based prediction of hybrid performance in maize.

    PubMed

    Seifert, Felix; Thiemann, Alexander; Schrag, Tobias A; Rybka, Dominika; Melchinger, Albrecht E; Frisch, Matthias; Scholten, Stefan

    2018-05-21

    Small RNA (sRNA) sequences are known to have a broad impact on gene regulation by various mechanisms. Their performance for the prediction of hybrid traits has not yet been analyzed. Our objective was to analyze the relation of parental sRNA expression with the performance of their hybrids, to develop a sRNA-based prediction approach, and to compare it to more common SNP and mRNA transcript based predictions using a factorial mating scheme of a maize hybrid breeding program. Correlation of genomic differences and messenger RNA (mRNA) or sRNA expression differences between parental lines with hybrid performance of their hybrids revealed that sRNAs showed an inverse relationship in contrast to the other two data types. We associated differences for SNPs, mRNA and sRNA expression between parental inbred lines with the performance of their hybrid combinations and developed two prediction approaches using distance measures based on associated markers. Cross-validations revealed parental differences in sRNA expression to be strong predictors for hybrid performance for grain yield in maize, comparable to genomic and mRNA data. The integration of both positively and negatively associated markers in the prediction approaches enhanced the prediction accurary. The associated sRNAs belong predominantly to the canonical size classes of 22- and 24-nt that show specific genomic mapping characteristics. Expression profiles of sRNA are a promising alternative to SNPs or mRNA expression profiles for hybrid prediction, especially for plant species without reference genome or transcriptome information. The characteristics of the sRNAs we identified suggest that association studies based on breeding populations facilitate the identification of sRNAs involved in hybrid performance.

  8. Genetic Variation in the Transforming Growth Factor-β Signaling Pathway and Survival After Diagnosis With Colon and Rectal Cancer

    PubMed Central

    Slattery, Martha L.; Lundgreen, Abbie; Herrick, Jennifer S.; Wolff, Roger K.; Caan, Bette J.

    2012-01-01

    BACKGROUND The transforming growth factor-β (TGF-β) signaling pathway is involved in many aspects of tumori-genesis, including angiogenesis and metastasis. The authors evaluated this pathway in association with survival after a diagnosis of colon or rectal cancer. METHODS The study included 1553 patients with colon cancer and 754 patients with rectal cancer who had incident first primary disease and were followed for a minimum of 7 years after diagnosis. Genetic variations were evaluated in the genes TGF-β1 (2 single nucleotide polymorphisms [SNPs]), TGF-β receptor 1 (TGF-βR1) (3 SNPs), smooth muscle actin/mothers against decapentaplegic homolog 1 (Smad1) (5 SNPs), Smad2 (4 SNPs), Smad3 (37 SNPs), Smad4 (2 SNPs), Smad7 (11 SNPs), bone morphogenetic protein 1 (BMP1) (11 SNPs), BMP2 (5 SNPs), BMP4 (3 SNPs), bone morphogenetic protein receptor 1A (BMPR1A) (9 SNPs), BMPR1B (21 SNPs), BMPR2 (11 SNPs), growth differentiation factor 10 (GDF10) (7 SNPs), Runt-related transcription factor 1 (RUNX1) (40 SNPs), RUNX2 (19 SNPs), RUNX3 (9 SNPs), eukaryotic translation initiation factor 4E (eiF4E) (3 SNPs), eukaryotic translation initiation factor 4E-binding protein 3 (eiF4EBP2) (2 SNPs), eiF4EBP3 (2 SNPs), and mitogen-activated protein kinase 1 (MAPK1) (6 SNPs). RESULTS After adjusting for American Joint Committee on Cancer stage and tumor molecular phenotype, 12 genes and 18 SNPs were associated with survival in patients with colon cancer, and 7 genes and 15 tagSNPs were associated with survival after a diagnosis of rectal cancer. A summary score based on “at-risk” genotypes revealed a hazard rate ratio of 5.10 (95% confidence interval, 2.56-10.15) for the group with the greatest number of “at-risk” genotypes; for rectal cancer, the hazard rate ratio was 6.03 (95% confidence interval, 2.83-12.75). CONCLUSIONS The current findings suggest that the presence of several higher risk alleles in the TGF-β signaling pathway increase the likelihood of dying after a diagnosis of colon or rectal cancer. PMID:21365634

  9. Warfarin pharmacogenetics: a single VKORC1 polymorphism is predictive of dose across 3 racial groups.

    PubMed

    Limdi, Nita A; Wadelius, Mia; Cavallari, Larisa; Eriksson, Niclas; Crawford, Dana C; Lee, Ming-Ta M; Chen, Chien-Hsiun; Motsinger-Reif, Alison; Sagreiya, Hersh; Liu, Nianjun; Wu, Alan H B; Gage, Brian F; Jorgensen, Andrea; Pirmohamed, Munir; Shin, Jae-Gook; Suarez-Kurtz, Guilherme; Kimmel, Stephen E; Johnson, Julie A; Klein, Teri E; Wagner, Michael J

    2010-05-06

    Warfarin-dosing algorithms incorporating CYP2C9 and VKORC1 -1639G>A improve dose prediction compared with algorithms based solely on clinical and demographic factors. However, these algorithms better capture dose variability among whites than Asians or blacks. Herein, we evaluate whether other VKORC1 polymorphisms and haplotypes explain additional variation in warfarin dose beyond that explained by VKORC1 -1639G>A among Asians (n = 1103), blacks (n = 670), and whites (n = 3113). Participants were recruited from 11 countries as part of the International Warfarin Pharmacogenetics Consortium effort. Evaluation of the effects of individual VKORC1 single nucleotide polymorphisms (SNPs) and haplotypes on warfarin dose used both univariate and multi variable linear regression. VKORC1 -1639G>A and 1173C>T individually explained the greatest variance in dose in all 3 racial groups. Incorporation of additional VKORC1 SNPs or haplotypes did not further improve dose prediction. VKORC1 explained greater variability in dose among whites than blacks and Asians. Differences in the percentage of variance in dose explained by VKORC1 across race were largely accounted for by the frequency of the -1639A (or 1173T) allele. Thus, clinicians should recognize that, although at a population level, the contribution of VKORC1 toward dose requirements is higher in whites than in nonwhites; genotype predicts similar dose requirements across racial groups.

  10. Candidate SNP markers of aggressiveness-related complications and comorbidities of genetic diseases are predicted by a significant change in the affinity of TATA-binding protein for human gene promoters.

    PubMed

    Chadaeva, Irina V; Ponomarenko, Mikhail P; Rasskazov, Dmitry A; Sharypova, Ekaterina B; Kashina, Elena V; Matveeva, Marina Yu; Arshinova, Tatjana V; Ponomarenko, Petr M; Arkova, Olga V; Bondar, Natalia P; Savinkova, Ludmila K; Kolchanov, Nikolay A

    2016-12-28

    Aggressiveness in humans is a hereditary behavioral trait that mobilizes all systems of the body-first of all, the nervous and endocrine systems, and then the respiratory, vascular, muscular, and others-e.g., for the defense of oneself, children, family, shelter, territory, and other possessions as well as personal interests. The level of aggressiveness of a person determines many other characteristics of quality of life and lifespan, acting as a stress factor. Aggressive behavior depends on many parameters such as age, gender, diseases and treatment, diet, and environmental conditions. Among them, genetic factors are believed to be the main parameters that are well-studied at the factual level, but in actuality, genome-wide studies of aggressive behavior appeared relatively recently. One of the biggest projects of the modern science-1000 Genomes-involves identification of single nucleotide polymorphisms (SNPs), i.e., differences of individual genomes from the reference genome. SNPs can be associated with hereditary diseases, their complications, comorbidities, and responses to stress or a drug. Clinical comparisons between cohorts of patients and healthy volunteers (as a control) allow for identifying SNPs whose allele frequencies significantly separate them from one another as markers of the above conditions. Computer-based preliminary analysis of millions of SNPs detected by the 1000 Genomes project can accelerate clinical search for SNP markers due to preliminary whole-genome search for the most meaningful candidate SNP markers and discarding of neutral and poorly substantiated SNPs. Here, we combine two computer-based search methods for SNPs (that alter gene expression) {i} Web service SNP_TATA_Comparator (DNA sequence analysis) and {ii} PubMed-based manual search for articles on aggressiveness using heuristic keywords. Near the known binding sites for TATA-binding protein (TBP) in human gene promoters, we found aggressiveness-related candidate SNP markers, including rs1143627 (associated with higher aggressiveness in patients undergoing cytokine immunotherapy), rs544850971 (higher aggressiveness in old women taking lipid-lowering medication), and rs10895068 (childhood aggressiveness-related obesity in adolescence with cardiovascular complications in adulthood). After validation of these candidate markers by clinical protocols, these SNPs may become useful for physicians (may help to improve treatment of patients) and for the general population (a lifestyle choice preventing aggressiveness-related complications).

  11. Transcriptome analysis of tube foot and large scale marker discovery in sea cucumber, Apostichopus japonicus.

    PubMed

    Zhou, Xiaoxu; Wang, Hongdi; Cui, Jun; Qiu, Xuemei; Chang, Yaqing; Wang, Xiuli

    2016-12-01

    Tube foot as one of the ambulacral appendages types in Aspidochirote holothurioids, is known for their functions in locomotion, feeding, chemoreception, light sensitivity and respiration. In this study, we explored the characteristic of transcriptome in the tube foot of sea cucumber (Apostichopus japonicus). Our results showed that among 390 unigenes which specifically expressed in the tube foot, 190 of them were annotated. Based on the assembly transcriptome, we found 219,860 SNPs from 34,749 unigenes, 97,683, 53,624, 27,767 and 40,786 were located in CDSs, 5'-UTRs, 3'-UTRs and non-CDS separately. Furthermore, 12,114 SSRs were detected from 7394 unigenes. Target genes of four specifically expressed miRNAs (miR-29a, miR-29b, miR-278-3p and miR-2005) in tube foot were also predicted based on the transcriptome, which contain immune-related factors (MBL, VLRA, AjC3, MyD88, CFB), skin pigmentation (MITF), candidate regeneration factor (TRP) and holothurians autolysis-related factor (CL). These results develop a relatively large number of molecular markers and transcriptome resources, and will provide a foundation for further analyses on the function and molecular mechanisms underlying A. japonicas tube foot. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Harnessing NGS and Big Data Optimally: Comparison of miRNA Prediction from Assembled versus Non-assembled Sequencing Data--The Case of the Grass Aegilops tauschii Complex Genome.

    PubMed

    Budak, Hikmet; Kantar, Melda

    2015-07-01

    MicroRNAs (miRNAs) are small, endogenous, non-coding RNA molecules that regulate gene expression at the post-transcriptional level. As high-throughput next generation sequencing (NGS) and Big Data rapidly accumulate for various species, efforts for in silico identification of miRNAs intensify. Surprisingly, the effect of the input genomics sequence on the robustness of miRNA prediction was not evaluated in detail to date. In the present study, we performed a homology-based miRNA and isomiRNA prediction of the 5D chromosome of bread wheat progenitor, Aegilops tauschii, using two distinct sequence data sets as input: (1) raw sequence reads obtained from 454-GS FLX Titanium sequencing platform and (2) an assembly constructed from these reads. We also compared this method with a number of available plant sequence datasets. We report here the identification of 62 and 22 miRNAs from raw reads and the assembly, respectively, of which 16 were predicted with high confidence from both datasets. While raw reads promoted sensitivity with the high number of miRNAs predicted, 55% (12 out of 22) of the assembly-based predictions were supported by previous observations, bringing specificity forward compared to the read-based predictions, of which only 37% were supported. Importantly, raw reads could identify several repeat-related miRNAs that could not be detected with the assembly. However, raw reads could not capture 6 miRNAs, for which the stem-loops could only be covered by the relatively longer sequences from the assembly. In summary, the comparison of miRNA datasets obtained by these two strategies revealed that utilization of raw reads, as well as assemblies for in silico prediction, have distinct advantages and disadvantages. Consideration of these important nuances can benefit future miRNA identification efforts in the current age of NGS and Big Data driven life sciences innovation.

  13. SNP identification in FBXO32 gene and their associations with growth traits in cattle.

    PubMed

    Wang, Ailan; Zhang, Ya; Li, Mijie; Lan, Xianyong; Wang, Juqiang; Chen, Hong

    2013-02-15

    The F-box protein 32 (FBXO32), also known as Atrogin-1, is one of the four subunits of the ubiquitin protein ligase complex. FBXO32 has been previously shown to be involved in regulation of initiation and development of muscle mass. In the present study, we investigated the polymorphism of FBXO32 gene in 1313 cattle from seven bovine breeds using DNA sequencing, polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and PCR-based amplification-created restriction site (PCR-ACRS) methods. Four novel single nucleotide polymorphisms (SNPs) were identified within bovine FBXO32, and were deposited in the GenBank database. The association studies between these four SNPs and growth traits were performed in NanYang cattle. Notably, the SNPs ss411628932 and ss411628936 were shown to be significantly associated with body length of 24-month-old NanYang cattle. Based on the above four SNPs, 16 haplotypes were identified. The main haplotype was AATA, which occurred at a frequency of more than 40%. Additionally, phylogenetic analysis showed that geographical distance was essential to gene flow among seven cattle breeds. Indigenous bovine breeds displayed genetic difference in comparison to hybrid bovine breeds that have foreign origins. We herein describe for the first time a comprehensive study on the variability of bovine FBXO32 gene that is predictive of genetic potential for body length phenotype. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. A Comparison Between Genotyping-by-sequencing and Array-based Scoring of SNPs for Genomic Prediction Accuracy in Winter Wheat

    USDA-ARS?s Scientific Manuscript database

    The utilization of DNA molecular markers in plant breeding to maximize selection response via marker assisted selection (MAS) and genomic selection (GS) has the potential to revolutionize plant breeding. A key factor affecting GS applicability is the choice of molecular marker platform. Genotypying-...

  15. Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

    PubMed Central

    2013-01-01

    Background Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved. Results We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans. Conclusions The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution. PMID:24025428

  16. Genetic analysis of ancestry, admixture and selection in Bolivian and Totonac populations of the New World

    PubMed Central

    2012-01-01

    Background Populations of the Americas were founded by early migrants from Asia, and some have experienced recent genetic admixture. To better characterize the native and non-native ancestry components in populations from the Americas, we analyzed 815,377 autosomal SNPs, mitochondrial hypervariable segments I and II, and 36 Y-chromosome STRs from 24 Mesoamerican Totonacs and 23 South American Bolivians. Results and Conclusions We analyzed common genomic regions from native Bolivian and Totonac populations to identify 324 highly predictive Native American ancestry informative markers (AIMs). As few as 40–50 of these AIMs perform nearly as well as large panels of random genome-wide SNPs for predicting and estimating Native American ancestry and admixture levels. These AIMs have greater New World vs. Old World specificity than previous AIMs sets. We identify highly-divergent New World SNPs that coincide with high-frequency haplotypes found at similar frequencies in all populations examined, including the HGDP Pima, Maya, Colombian, Karitiana, and Surui American populations. Some of these regions are potential candidates for positive selection. European admixture in the Bolivian sample is approximately 12%, though individual estimates range from 0–48%. We estimate that the admixture occurred ~360–384 years ago. Little evidence of European or African admixture was found in Totonac individuals. Bolivians with pre-Columbian mtDNA and Y-chromosome haplogroups had 5–30% autosomal European ancestry, demonstrating the limitations of Y-chromosome and mtDNA haplogroups and the need for autosomal ancestry informative markers for assessing ancestry in admixed populations. PMID:22606979

  17. Molecular proxies for climate maladaptation in a long-lived tree (Pinus pinaster Aiton, Pinaceae).

    PubMed

    Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G; González-Martínez, Santiago C

    2015-03-01

    Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP-climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. Copyright © 2015 by the Genetics Society of America.

  18. Evidence of genomic adaptation to climate in Eucalyptus microcarpa: Implications for adaptive potential to projected climate change.

    PubMed

    Jordan, Rebecca; Hoffmann, Ary A; Dillon, Shannon K; Prober, Suzanne M

    2017-11-01

    Understanding whether populations can adapt in situ or whether interventions are required is of key importance for biodiversity management under climate change. Landscape genomics is becoming an increasingly important and powerful tool for rapid assessments of climate adaptation, especially in long-lived species such as trees. We investigated climate adaptation in Eucalyptus microcarpa using the DArTseq genomic approach. A combination of F ST outlier and environmental association analyses were performed using >4200 genomewide single nucleotide polymorphisms (SNPs) from 26 populations spanning climate gradients in southeastern Australia. Eighty-one SNPs were identified as putatively adaptive, based on significance in F ST outlier tests and significant associations with one or more climate variables related to temperature (70/81), aridity (37/81) or precipitation (35/81). Adaptive SNPs were located on all 11 chromosomes, with no particular region associated with individual climate variables. Climate adaptation appeared to be characterized by subtle shifts in allele frequencies, with no consistent fixed differences identified. Based on these associations, we predict adaptation under projected changes in climate will include a suite of shifts in allele frequencies. Whether this can occur sufficiently rapidly through natural selection within populations, or would benefit from assisted gene migration, requires further evaluation. In some populations, the absence or predicted increases to near fixation of particular adaptive alleles hint at potential limits to adaptive capacity. Together, these results reinforce the importance of standing genetic variation at the geographic level for maintaining species' evolutionary potential. © 2017 John Wiley & Sons Ltd.

  19. Molecular Proxies for Climate Maladaptation in a Long-Lived Tree (Pinus pinaster Aiton, Pinaceae)

    PubMed Central

    Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H.; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G.; González-Martínez, Santiago C.

    2015-01-01

    Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP–climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. PMID:25549630

  20. Germline Polymorphisms of the VEGF Pathway Predict Recurrence in Nonadvanced Differentiated Thyroid Cancer.

    PubMed

    Marotta, Vincenzo; Sciammarella, Concetta; Capasso, Mario; Testori, Alessandro; Pivonello, Claudia; Chiofalo, Maria Grazia; Gambardella, Claudio; Grasso, Marica; Antonino, Antonio; Annunziata, Annamaria; Macchia, Paolo Emidio; Pivonello, Rosario; Santini, Luigi; Botti, Gerardo; Losito, Simona; Pezzullo, Luciano; Colao, Annamaria; Faggiano, Antongiulio

    2017-02-01

    Tumor angiogenesis is determined by host genetic background rather than environment. Germline single nucleotide polymorphisms (SNPs) of the vascular endothelial growth factor (VEGF) pathway have demonstrated prognostic value in different tumors. Our main objective was to test the prognostic value of germline SNPs of the VEGF pathway in nonadvanced differentiated thyroid cancer (DTC). Secondarily, we sought to correlate analyzed SNPs with microvessel density (MVD). Multicenter, retrospective, observational study. Four referral centers. Blood samples were obtained from consecutive DTC patients. Genotyping was performed according to the TaqMan protocol, including 4 VEGF-A (-2578C>A, -460T>C, +405G>C, and +936C>T) and 2 VEGFR-2 (+1192 C>T and +1719 T>A) SNPs. MVD was estimated by means of CD34 staining. Rate of recurrent structural disease/disease-free survival (DFS). Difference in MVD between tumors from patients with different genotype. Two hundred four patients with stage I-II DTC (mean follow-up, 73 ± 64 months) and 240 patients with low- to intermediate-risk DTC (mean follow-up, 70 ± 60 months) were enrolled. Two "risk" genotypes were identified by combining VEGF-A SNPs -2578 C>A, -460 T>C, and +405 G>C. The ACG homozygous genotype was protective in both stage I-II (odds ratio [OR], 0.08; 95% confidence interval [CI], 0.01 to 1.43; P = 0.018) and low- to intermediate-risk (OR, 0.14; 95% CI, 0.01 to 1.13; P = 0.035) patients. The CTG homozygous genotype was significantly associated with recurrence in stage I-II (OR, 5.47; 95% CI, 1.15 to 26.04; P = 0.018) and was slightly deleterious in low- to intermediate-risk (OR, 3.39; 95% CI, 0.8 to 14.33; P = 0.079) patients. MVD of primary tumors from patients harboring a protective genotype was significantly lower (median MVD, 76.5 ± 12.7 and 86.7 ± 27.9, respectively; P = 0.024). Analysis of germline VEGF-A SNPs could empower a prognostic approach to DTC. Copyright © 2017 by the Endocrine Society

  1. Genetic Risk Score Mendelian Randomization Shows that Obesity Measured as Body Mass Index, but not Waist:Hip Ratio, Is Causal for Endometrial Cancer.

    PubMed

    Painter, Jodie N; O'Mara, Tracy A; Marquart, Louise; Webb, Penelope M; Attia, John; Medland, Sarah E; Cheng, Timothy; Dennis, Joe; Holliday, Elizabeth G; McEvoy, Mark; Scott, Rodney J; Ahmed, Shahana; Healey, Catherine S; Shah, Mitul; Gorman, Maggie; Martin, Lynn; Hodgson, Shirley V; Beckmann, Matthias W; Ekici, Arif B; Fasching, Peter A; Hein, Alexander; Rübner, Matthias; Czene, Kamila; Darabi, Hatef; Hall, Per; Li, Jingmei; Dörk, Thilo; Dürst, Matthias; Hillemanns, Peter; Runnebaum, Ingo B; Amant, Frederic; Annibali, Daniela; Depreeuw, Jeroen; Lambrechts, Diether; Neven, Patrick; Cunningham, Julie M; Dowdy, Sean C; Goode, Ellen L; Fridley, Brooke L; Winham, Stacey J; Njølstad, Tormund S; Salvesen, Helga B; Trovik, Jone; Werner, Henrica M J; Ashton, Katie A; Otton, Geoffrey; Proietto, Anthony; Mints, Miriam; Tham, Emma; Bolla, Manjeet K; Michailidou, Kyriaki; Wang, Qin; Tyrer, Jonathan P; Hopper, John L; Peto, Julian; Swerdlow, Anthony J; Burwinkel, Barbara; Brenner, Hermann; Meindl, Alfons; Brauch, Hiltrud; Lindblom, Annika; Chang-Claude, Jenny; Couch, Fergus J; Giles, Graham G; Kristensen, Vessela N; Cox, Angela; Pharoah, Paul D P; Tomlinson, Ian; Dunning, Alison M; Easton, Douglas F; Thompson, Deborah J; Spurdle, Amanda B

    2016-11-01

    The strongest known risk factor for endometrial cancer is obesity. To determine whether SNPs associated with increased body mass index (BMI) or waist-hip ratio (WHR) are associated with endometrial cancer risk, independent of measured BMI, we investigated relationships between 77 BMI and 47 WHR SNPs and endometrial cancer in 6,609 cases and 37,926 country-matched controls. Logistic regression analysis and fixed effects meta-analysis were used to test for associations between endometrial cancer risk and (i) individual BMI or WHR SNPs, (ii) a combined weighted genetic risk score (wGRS) for BMI or WHR. Causality of BMI for endometrial cancer was assessed using Mendelian randomization, with BMIwGRS as instrumental variable. The BMIwGRS was significantly associated with endometrial cancer risk (P = 3.4 × 10 -17 ). Scaling the effect of the BMIwGRS on endometrial cancer risk by its effect on BMI, the endometrial cancer OR per 5 kg/m 2 of genetically predicted BMI was 2.06 [95% confidence interval (CI), 1.89-2.21], larger than the observed effect of BMI on endometrial cancer risk (OR = 1.55; 95% CI, 1.44-1.68, per 5 kg/m 2 ). The association attenuated but remained significant after adjusting for BMI (OR = 1.22; 95% CI, 1.10-1.39; P = 5.3 × 10 -4 ). There was evidence of directional pleiotropy (P = 1.5 × 10 -4 ). BMI SNP rs2075650 was associated with endometrial cancer at study-wide significance (P < 4.0 × 10 -4 ), independent of BMI. Endometrial cancer was not significantly associated with individual WHR SNPs or the WHRwGRS. BMI, but not WHR, is causally associated with endometrial cancer risk, with evidence that some BMI-associated SNPs alter endometrial cancer risk via mechanisms other than measurable BMI. The causal association between BMI SNPs and endometrial cancer has possible implications for endometrial cancer risk modeling. Cancer Epidemiol Biomarkers Prev; 25(11); 1503-10. ©2016 AACR. ©2016 American Association for Cancer Research.

  2. Clinical and genetic predictors of weight gain in patients diagnosed with breast cancer

    PubMed Central

    Reddy, S M; Sadim, M; Li, J; Yi, N; Agarwal, S; Mantzoros, C S; Kaklamani, V G

    2013-01-01

    Background: Post-diagnosis weight gain in breast cancer patients has been associated with increased cancer recurrence and mortality. Our study was designed to identify risk factors for this weight gain and create a predictive model to identify a high-risk population for targeted interventions. Methods: Chart review was conducted on 459 breast cancer patients from Northwestern Robert H. Lurie Cancer Centre to obtain weights and body mass indices (BMIs) over an 18-month period from diagnosis. We also recorded tumour characteristics, demographics, clinical factors, and treatment regimens. Blood samples were genotyped for 14 single-nucleotide polymorphisms (SNPs) in fat mass and obesity-associated protein (FTO) and adiponectin pathway genes (ADIPOQ and ADIPOR1). Results: In all, 56% of patients had >0.5 kg m–2 increase in BMI from diagnosis to 18 months, with average BMI and weight gain of 1.9 kg m–2 and 5.1 kg, respectively. Our best predictive model was a primarily SNP-based model incorporating all 14 FTO and adiponectin pathway SNPs studied, their epistatic interactions, and age and BMI at diagnosis, with area under receiver operating characteristic curve of 0.85 for 18-month weight gain. Conclusion: We created a powerful risk prediction model that can identify breast cancer patients at high risk for weight gain. PMID:23922112

  3. Receptor for advanced glycation end-products and ARDS prediction: a multicentre observational study.

    PubMed

    Jabaudon, Matthieu; Berthelin, Pauline; Pranal, Thibaut; Roszyk, Laurence; Godet, Thomas; Faure, Jean-Sébastien; Chabanne, Russell; Eisenmann, Nathanael; Lautrette, Alexandre; Belville, Corinne; Blondonnet, Raiko; Cayot, Sophie; Gillart, Thierry; Pascal, Julien; Skrzypczak, Yvan; Souweine, Bertrand; Blanchon, Loic; Sapin, Vincent; Pereira, Bruno; Constantin, Jean-Michel

    2018-02-08

    Acute respiratory distress syndrome (ARDS) prediction remains challenging despite available clinical scores. To assess soluble receptor for advanced glycation end-products (sRAGE), a marker of lung epithelial injury, as a predictor of ARDS in a high-risk population, adult patients with at least one ARDS risk factor upon admission to participating intensive care units (ICUs) were enrolled in a multicentre, prospective study between June 2014 and January 2015. Plasma sRAGE and endogenous secretory RAGE (esRAGE) were measured at baseline (ICU admission) and 24 hours later (day one). Four AGER candidate single nucleotide polymorphisms (SNPs) were also assayed because of previous reports of functionality (rs1800625, rs1800624, rs3134940, and rs2070600). The primary outcome was ARDS development within seven days. Of 500 patients enrolled, 464 patients were analysed, and 59 developed ARDS by day seven. Higher baseline and day one plasma sRAGE, but not esRAGE, were independently associated with increased ARDS risk. AGER SNP rs2070600 (Ser/Ser) was associated with increased ARDS risk and higher plasma sRAGE in this cohort, although confirmatory studies are needed to assess the role of AGER SNPs in ARDS prediction. These findings suggest that among at-risk ICU patients, higher plasma sRAGE may identify those who are more likely to develop ARDS.

  4. Toward DNA-based facial composites: preliminary results and validation.

    PubMed

    Claes, Peter; Hill, Harold; Shriver, Mark D

    2014-11-01

    The potential of constructing useful DNA-based facial composites is forensically of great interest. Given the significant identity information coded in the human face these predictions could help investigations out of an impasse. Although, there is substantial evidence that much of the total variation in facial features is genetically mediated, the discovery of which genes and gene variants underlie normal facial variation has been hampered primarily by the multipartite nature of facial variation. Traditionally, such physical complexity is simplified by simple scalar measurements defined a priori, such as nose or mouth width or alternatively using dimensionality reduction techniques such as principal component analysis where each principal coordinate is then treated as a scalar trait. However, as shown in previous and related work, a more impartial and systematic approach to modeling facial morphology is available and can facilitate both the gene discovery steps, as we recently showed, and DNA-based facial composite construction, as we show here. We first use genomic ancestry and sex to create a base-face, which is simply an average sex and ancestry matched face. Subsequently, the effects of 24 individual SNPs that have been shown to have significant effects on facial variation are overlaid on the base-face forming the predicted-face in a process akin to a photomontage or image blending. We next evaluate the accuracy of predicted faces using cross-validation. Physical accuracy of the facial predictions either locally in particular parts of the face or in terms of overall similarity is mainly determined by sex and genomic ancestry. The SNP-effects maintain the physical accuracy while significantly increasing the distinctiveness of the facial predictions, which would be expected to reduce false positives in perceptual identification tasks. To the best of our knowledge this is the first effort at generating facial composites from DNA and the results are preliminary but certainly promising, especially considering the limited amount of genetic information about the face contained in these 24 SNPs. This approach can incorporate additional SNPs as these are discovered and their effects documented. In this context we discuss three main avenues of research: expanding our knowledge of the genetic architecture of facial morphology, improving the predictive modeling of facial morphology by exploring and incorporating alternative prediction models, and increasing the value of the results through the weighted encoding of physical measurements in terms of human perception of faces. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. De Novo Assembly and Transcriptome Analysis of the Rubber Tree (Hevea brasiliensis) and SNP Markers Development for Rubber Biosynthesis Pathways

    PubMed Central

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection. PMID:25048025

  6. New generation pharmacogenomic tools: a SNP linkage disequilibrium Map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies.

    PubMed

    De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A

    2002-06-01

    Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.

  7. Genome-wide Association Study Identifies Candidate Genes for Male Fertility Traits in Humans

    PubMed Central

    Kosova, Gülüm; Scott, Nicole M.; Niederberger, Craig; Prins, Gail S.; Ober, Carole

    2012-01-01

    Despite the fact that hundreds of genes are known to affect fertility in animal models, relatively little is known about genes that influence natural fertility in humans. To broadly survey genes contributing to variation in male fertility, we conducted a genome-wide association study (GWAS) of two fertility traits (family size and birth rate) in 269 married men who are members of a founder population of European descent that proscribes contraception and has large family sizes. Associations between ∼250,000 autosomal SNPs and the fertility traits were examined. A total of 41 SNPs with p ≤ 1 × 10−4 for either trait were taken forward to a validation study of 123 ethnically diverse men from Chicago who had previously undergone semen analyses. Nine (22%) of the SNPs associated with reduced fertility in the GWAS were also associated with one or more of the ten measures of reduced sperm quantity and/or function, yielding 27 associations with p values < 0.05 and seven with p values < 0.01 in the validation study. On the basis of 5,000 permutations of our data, the probabilities of observing this many or more small p values were 0.0014 and 5.6 × 10−4, respectively. Among the nine associated loci, outstanding candidates for male fertility genes include USP8, an essential deubiquitinating enzyme that has a role in acrosome assembly; UBD and EPSTI1, which have potential roles in innate immunity; and LRRC32, which encodes a latent transforming growth factor β (TGF-β) receptor on regulatory T cells. We suggest that mutations in these genes that are more severe may account for some of the unexplained infertility (or subfertility) in the general population. PMID:22633400

  8. Genome-wide resequencing of KRICE_CORE reveals their potential for future breeding, as well as functional and evolutionary studies in the post-genomic era.

    PubMed

    Kim, Tae-Sung; He, Qiang; Kim, Kyu-Won; Yoon, Min-Young; Ra, Won-Hee; Li, Feng Peng; Tong, Wei; Yu, Jie; Oo, Win Htet; Choi, Buung; Heo, Eun-Beom; Yun, Byoung-Kook; Kwon, Soon-Jae; Kwon, Soon-Wook; Cho, Yoo-Hyun; Lee, Chang-Yong; Park, Beom-Seok; Park, Yong-Jin

    2016-05-26

    Rice germplasm collections continue to grow in number and size around the world. Since maintaining and screening such massive resources remains challenging, it is important to establish practical methods to manage them. A core collection, by definition, refers to a subset of the entire population that preserves the majority of genetic diversity, enhancing the efficiency of germplasm utilization. Here, we report whole-genome resequencing of the 137 rice mini core collection or Korean rice core set (KRICE_CORE) that represents 25,604 rice germplasms deposited in the Korean genebank of the Rural Development Administration (RDA). We implemented the Illumina HiSeq 2000 and 2500 platform to produce short reads and then assembled those with 9.8 depths using Nipponbare as a reference. Comparisons of the sequences with the reference genome yielded more than 15 million (M) single nucleotide polymorphisms (SNPs) and 1.3 M INDELs. Phylogenetic and population analyses using 2,046,529 high-quality SNPs successfully assigned rice accessions to the relevant rice subgroups, suggesting that these SNPs capture evolutionary signatures that have accumulated in rice subpopulations. Furthermore, genome-wide association studies (GWAS) for four exemplary agronomic traits in the KRIC_CORE manifest the utility of KRICE_CORE; that is, identifying previously defined genes or novel genetic factors that potentially regulate important phenotypes. This study provides strong evidence that the size of KRICE_CORE is small but contains high genetic and functional diversity across the genome. Thus, our resequencing results will be useful for future breeding, as well as functional and evolutionary studies, in the post-genomic era.

  9. Whole-genome sequencing and genetic variant analysis of a Quarter Horse mare.

    PubMed

    Doan, Ryan; Cohen, Noah D; Sawyer, Jason; Ghaffari, Noushin; Johnson, Charlie D; Dindot, Scott V

    2012-02-17

    The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.

  10. Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes

    PubMed Central

    Asp, Torben; Kristensen, Michael

    2016-01-01

    Background Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. Results The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Conclusion Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification. PMID:27019205

  11. Implication of common and disease specific variants in CLU, CR1, and PICALM.

    PubMed

    Ferrari, Raffaele; Moreno, Jorge H; Minhajuddin, Abu T; O'Bryant, Sid E; Reisch, Joan S; Barber, Robert C; Momeni, Parastoo

    2012-08-01

    Two recent genome-wide association studies (GWAS) for late onset Alzheimer's disease (LOAD) revealed 3 new genes: clusterin (CLU), phosphatidylinositol binding clathrin assembly protein (PICALM), and complement receptor 1 (CR1). In order to evaluate association with these genome-wide association study-identified genes and to isolate the variants contributing to the pathogenesis of LOAD, we genotyped the top single nucleotide polymorphisms (SNPs), rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), and sequenced the entire coding regions of these genes in our cohort of 342 LOAD patients and 277 control subjects. We confirmed the association of rs3851179 (PICALM) (p = 7.4 × 10(-3)) with the disease status. Through sequencing we identified 18 variants in CLU, 3 of which were found exclusively in patients; 8 variants (out of 65) in CR1 gene were only found in patients and the 16 variants identified in PICALM gene were present in both patients and controls. In silico analysis of the variants in PICALM did not predict any damaging effect on the protein. The haplotype analysis of the variants in each gene predicted a common haplotype when the 3 single nucleotide polymorphisms rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), respectively, were included. For each gene the haplotype structure and size differed between patients and controls. In conclusion, we confirmed association of CLU, CR1, and PICALM genes with the disease status in our cohort through identification of a number of disease-specific variants among patients through the sequencing of the coding region of these genes. Published by Elsevier Inc.

  12. HLA-DQA1 and PLA2R1 polymorphisms and risk of idiopathic membranous nephropathy.

    PubMed

    Bullich, Gemma; Ballarín, José; Oliver, Artur; Ayasreh, Nadia; Silva, Irene; Santín, Sheila; Díaz-Encarnación, Montserrat M; Torra, Roser; Ars, Elisabet

    2014-02-01

    Single nucleotide polymorphisms (SNPs) within HLA complex class II HLA-DQ α-chain 1 (HLA-DQA1) and M-type phospholipase A2 receptor (PLA2R1) genes were identified as strong risk factors for idiopathic membranous nephropathy (IMN) development in a recent genome-wide association study. Copy number variants (CNVs) within the Fc gamma receptor III (FCGR3) locus have been associated with several autoimmune diseases, but their role in IMN has not been studied. This study aimed to validate the association of HLA-DQA1 and PLA2R1 risk alleles with IMN in a Spanish cohort, test the putative association of FCGR3A and FCGR3B CNVs with IMN, and assess the use of these genetic factors to predict the clinical outcome of the disease. A Spanish cohort of 89 IMN patients and 286 matched controls without nephropathy was recruited between October of 2009 and July of 2012. Case-control studies for SNPs within HLA-DQA1 (rs2187668) and PLA2R1 (rs4664308) genes and CNVs for FCGR3A and FCGR3B genes were performed. The contribution of these polymorphisms to predict clinical outcome and renal function decline was analyzed. This study validated the association of these HLA-DQA1 and PLA2R1 SNPs with IMN in a Spanish cohort and its increased risk when combining both risk genotypes. No significant association was found between FCGR3 CNVs and IMN. These results revealed that HLA-DQA1 and PLA2R1 genotype combination adjusted for baseline proteinuria strongly predicted response to immunosuppressive therapy. HLA-DQA1 genotype adjusted for proteinuria was also linked with renal function decline. This study confirms that HLA-DQA1 and PLA2R1 genotypes are risk factors for IMN, whereas no association was identified for FCGR3 CNVs. This study provides, for the first time, evidence of the contribution of these HLA-DQA1 and PLA2R1 polymorphisms in predicting IMN response to immunosuppressors and disease progression. Future studies are needed to validate and identify prognostic markers.

  13. Genomic Prediction of Resistance to Pasteurellosis in Gilthead Sea Bream (Sparus aurata) Using 2b-RAD Sequencing

    PubMed Central

    Palaiokostas, Christos; Ferraresso, Serena; Franch, Rafaella; Houston, Ross D.; Bargelloni, Luca

    2016-01-01

    Gilthead sea bream (Sparus aurata) is a species of paramount importance to the Mediterranean aquaculture industry, with an annual production exceeding 140,000 metric tons. Pasteurellosis due to the Gram-negative bacterium Photobacterium damselae subsp. piscicida (Phdp) causes significant mortality, especially during larval and juvenile stages, and poses a serious threat to bream production. Selective breeding for improved resistance to pasteurellosis is a promising avenue for disease control, and the use of genetic markers to predict breeding values can improve the accuracy of selection, and allow accurate calculation of estimated breeding values of nonchallenged animals. In the current study, a population of 825 sea bream juveniles, originating from a factorial cross between 67 broodfish (32 sires, 35 dams), were challenged by 30 min immersion with 1 × 105 CFU virulent Phdp. Mortalities and survivors were recorded and sampled for genotyping by sequencing. The restriction-site associated DNA sequencing approach, 2b-RAD, was used to generate genome-wide single nucleotide polymorphism (SNP) genotypes for all samples. A high-density linkage map containing 12,085 SNPs grouped into 24 linkage groups (consistent with the karyotype) was constructed. The heritability of surviving days (censored data) was 0.22 (95% highest density interval: 0.11–0.36) and 0.28 (95% highest density interval: 0.17–0.4) using the pedigree and the genomic relationship matrix respectively. A genome-wide association study did not reveal individual SNPs significantly associated with resistance at a genome-wide significance level. Genomic prediction approaches were tested to investigate the potential of the SNPs obtained by 2b-RAD for estimating breeding values for resistance. The accuracy of the genomic prediction models (r = 0.38–0.46) outperformed the traditional BLUP approach based on pedigree records (r = 0.30). Overall results suggest that major quantitative trait loci affecting resistance to pasteurellosis were not present in this population, but highlight the effectiveness of 2b-RAD genotyping by sequencing for genomic selection in a mass spawning fish species. PMID:27652890

  14. Genetic influences on right ventricular systolic pressure (RVSP) in chronic obstructive pulmonary disease (COPD).

    PubMed

    Shaw, Janet G; Dent, Annette G; Passmore, Linda H; Burstow, Darryl J; Bowman, Rayleen V; Zimmerman, Paul V; Fong, Kwun M; Yang, Ian A

    2012-06-13

    Pulmonary hypertension (PH) is a complication of chronic obstructive pulmonary disease (COPD). This study examined genetic variations in mediators of vascular remodelling and their association with PH in patients with COPD. In patients with COPD, we genotyped 7 SNPs in 6 candidate PH genes (NOS3, ACE, EDN1, PTGIS, SLC6A4, VEGFA). We tested for association with right ventricular systolic pressure (RVSP), spirometry and gas transfer, and hypoxemia. In patients with COPD, we genotyped 7 SNPs in 6 candidate PH genes (NOS3, ACE, EDN1, PTGIS, SLC6A4, VEGFA). We tested for association with right ventricular systolic pressure (RVSP), spirometry and gas transfer, and hypoxemia. 580 COPD patients were recruited, 341 patients had a transthoracic echocardiogram, with RVSP measurable in 278 patients (mean age 69  years, mean FEV1 50% predicted, mean RVSP 44  mmHg, median history of 50 pack-years). Of the 7 tested SNPs, the NOS3-VNTR polymorphism was significantly associated with RVSP in a dose-dependent fashion for the risk allele: mean RVSP for a/a and a/b genotypes were 52.0 and 46.6  mmHg respectively, compared to 43.2  mmHg for b/b genotypes (P = 0.032). No associations were found between RVSP and other polymorphisms. ACE II or ID genotypes were associated with a lower FEV1% predicted than the ACE DD genotype (P = 0.028). The NOS3-298 TT genotype was associated with lower KCO % predicted than the NOS3-298 GG or GT genotype (P = 0.031). The NOS3-VNTR polymorphism was associated with RVSP in patients with COPD, supporting its involvement in the pathogenesis of PH in COPD. ACE and NOS3 genotypes were associated with COPD disease severity, but not with the presence of PH. Further study of these genes could lead to the development of prognostic and screening tools for PH in COPD.

  15. EnsembleGASVR: a novel ensemble method for classifying missense single nucleotide polymorphisms.

    PubMed

    Rapakoulia, Trisevgeni; Theofilatos, Konstantinos; Kleftogiannis, Dimitrios; Likothanasis, Spiros; Tsakalidis, Athanasios; Mavroudi, Seferina

    2014-08-15

    Single nucleotide polymorphisms (SNPs) are considered the most frequently occurring DNA sequence variations. Several computational methods have been proposed for the classification of missense SNPs to neutral and disease associated. However, existing computational approaches fail to select relevant features by choosing them arbitrarily without sufficient documentation. Moreover, they are limited to the problem of missing values, imbalance between the learning datasets and most of them do not support their predictions with confidence scores. To overcome these limitations, a novel ensemble computational methodology is proposed. EnsembleGASVR facilitates a two-step algorithm, which in its first step applies a novel evolutionary embedded algorithm to locate close to optimal Support Vector Regression models. In its second step, these models are combined to extract a universal predictor, which is less prone to overfitting issues, systematizes the rebalancing of the learning sets and uses an internal approach for solving the missing values problem without loss of information. Confidence scores support all the predictions and the model becomes tunable by modifying the classification thresholds. An extensive study was performed for collecting the most relevant features for the problem of classifying SNPs, and a superset of 88 features was constructed. Experimental results show that the proposed framework outperforms well-known algorithms in terms of classification performance in the examined datasets. Finally, the proposed algorithmic framework was able to uncover the significant role of certain features such as the solvent accessibility feature, and the top-scored predictions were further validated by linking them with disease phenotypes. Datasets and codes are freely available on the Web at http://prlab.ceid.upatras.gr/EnsembleGASVR/dataset-codes.zip. All the required information about the article is available through http://prlab.ceid.upatras.gr/EnsembleGASVR/site.html. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Dosing algorithm for warfarin using CYP2C9 and VKORC1 genotyping from a multi-ethnic population: comparison with other equations.

    PubMed

    Wu, Alan H B; Wang, Ping; Smith, Andrew; Haller, Christine; Drake, Katherine; Linder, Mark; Valdes, Roland

    2008-02-01

    Polymorphism in the genes for cytochrome (CYP)2C9 and the vitamin K epoxide reductase complex subunit 1 (VKORC1) affect the pharmacokinetics and pharmacodynamics of warfarin. We developed and validated a warfarin-dosing algorithm for a multi-ethnic population that predicts the best dose for stable anticoagulation, and compared its performance against other regression equations. We determined the allele and haplotype frequencies of genes for CYP2C9 and VKORC1 on 167 Caucasian, African-American, Asian and Hispanic patients on warfarin. On a subset where complete data were available (n=92), we developed a dosing equation that predicts the actual dose needed to maintain target anticoagulation using demographic variables and genotypes. This regression was validated against an independent group of subjects. We also applied our data to five other published warfarin-dosing equations. The allele frequency for CYP2C9*2 and *3 and the A allele for VKORC1 3673 was similar to previously published reports. For Caucasians and Asians, VKORC1 SNPs were in Hardy-Weinberg linkage equilibrium. Some VKORC1 SNPs among the African-American population and one SNP among Hispanics were not in equilibrium. The linear regression of predicted versus actual warfarin dose produced r-values of 0.71 for the training set and 0.67 for the validation set. The regression coefficient improved (to r=0.78 and 0.75, respectively) when rare genotypes were eliminated or when the 7566 VKORC1 genotype was added to the model. All of the regression models tested produced a similar degree of correlation. The exclusion of rare genotypes that are more associated with certain ethnicities improved the model. Minor improvements in algorithms can be observed with the inclusion of ethnicity and more CYP2C9 and VKORC1 SNPs as variables. Major improvements will likely require the identification of new gene associations with warfarin dosing.

  17. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies.

    PubMed

    Sulovari, Arvis; Li, Dawei

    2014-07-19

    Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. http://www.uvm.edu/genomics/software/gact.

  18. Painful Temporomandibular Disorder: Decade of Discovery from OPPERA Studies.

    PubMed

    Slade, G D; Ohrbach, R; Greenspan, J D; Fillingim, R B; Bair, E; Sanders, A E; Dubner, R; Diatchenko, L; Meloto, C B; Smith, S; Maixner, W

    2016-09-01

    In 2006, the OPPERA project (Orofacial Pain: Prospective Evaluation and Risk Assessment) set out to identify risk factors for development of painful temporomandibular disorder (TMD). A decade later, this review summarizes its key findings. At 4 US study sites, OPPERA recruited and examined 3,258 community-based TMD-free adults assessing genetic and phenotypic measures of biological, psychosocial, clinical, and health status characteristics. During follow-up, 4% of participants per annum developed clinically verified TMD, although that was a "symptom iceberg" when compared with the 19% annual rate of facial pain symptoms. The most influential predictors of clinical TMD were simple checklists of comorbid health conditions and nonpainful orofacial symptoms. Self-reports of jaw parafunction were markedly stronger predictors than corresponding examiner assessments. The strongest psychosocial predictor was frequency of somatic symptoms, although not somatic reactivity. Pressure pain thresholds measured at cranial sites only weakly predicted incident TMD yet were strongly associated with chronic TMD, cross-sectionally, in OPPERA's separate case-control study. The puzzle was resolved in OPPERA's nested case-control study where repeated measures of pressure pain thresholds revealed fluctuation that coincided with TMD's onset, persistence, and recovery but did not predict its incidence. The nested case-control study likewise furnished novel evidence that deteriorating sleep quality predicted TMD incidence. Three hundred genes were investigated, implicating 6 single-nucleotide polymorphisms (SNPs) as risk factors for chronic TMD, while another 6 SNPs were associated with intermediate phenotypes for TMD. One study identified a serotonergic pathway in which multiple SNPs influenced risk of chronic TMD. Two other studies investigating gene-environment interactions found that effects of stress on pain were modified by variation in the gene encoding catechol O-methyltransferase. Lessons learned from OPPERA have verified some implicated risk factors for TMD and refuted others, redirecting our thinking. Now it is time to apply those lessons to studies investigating treatment and prevention of TMD. © International & American Associations for Dental Research 2016.

  19. De novo assembly and characterization of leaf transcriptome for the development of functional molecular markers of the extremophile multipurpose tree species Prosopis alba

    PubMed Central

    2013-01-01

    Background Prosopis alba (Fabaceae) is an important native tree adapted to arid and semiarid regions of north-western Argentina which is of great value as multipurpose species. Despite its importance, the genomic resources currently available for the entire Prosopis genus are still limited. Here we describe the development of a leaf transcriptome and the identification of new molecular markers that could support functional genetic studies in natural and domesticated populations of this genus. Results Next generation DNA pyrosequencing technology applied to P. alba transcripts produced a total of 1,103,231 raw reads with an average length of 421 bp. De novo assembling generated a set of 15,814 isotigs and 71,101 non-assembled sequences (singletons) with an average of 991 bp and 288 bp respectively. A total of 39,000 unique singletons were identified after clustering natural and artificial duplicates from pyrosequencing reads. Regarding the non-redundant sequences or unigenes, 22,095 out of 54,814 were successfully annotated with Gene Ontology terms. Moreover, simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 5,992 and 6,236 markers, respectively, throughout the genome. For the validation of the the predicted SSR markers, a subset of 87 SSRs selected through functional annotation evidence was successfully amplified from six DNA samples of seedlings. From this analysis, 11 of these 87 SSRs were identified as polymorphic. Additionally, another set of 123 nuclear polymorphic SSRs were determined in silico, of which 50% have the probability of being effectively polymorphic. Conclusions This study generated a successful global analysis of the P. alba leaf transcriptome after bioinformatic and wet laboratory validations of RNA-Seq data. The limited set of molecular markers currently available will be significantly increased with the thousands of new markers that were identified in this study. This information will strongly contribute to genomics resources for P. alba functional analysis and genetics. Finally, it will also potentially contribute to the development of population-based genome studies in the genera. PMID:24125525

  20. Global skin colour prediction from DNA.

    PubMed

    Walsh, Susan; Chaitanya, Lakshmi; Breslin, Krystal; Muralidharan, Charanya; Bronikowska, Agnieszka; Pospiech, Ewelina; Koller, Julia; Kovatsi, Leda; Wollstein, Andreas; Branicki, Wojciech; Liu, Fan; Kayser, Manfred

    2017-07-01

    Human skin colour is highly heritable and externally visible with relevance in medical, forensic, and anthropological genetics. Although eye and hair colour can already be predicted with high accuracies from small sets of carefully selected DNA markers, knowledge about the genetic predictability of skin colour is limited. Here, we investigate the skin colour predictive value of 77 single-nucleotide polymorphisms (SNPs) from 37 genetic loci previously associated with human pigmentation using 2025 individuals from 31 global populations. We identified a minimal set of 36 highly informative skin colour predictive SNPs and developed a statistical prediction model capable of skin colour prediction on a global scale. Average cross-validated prediction accuracies expressed as area under the receiver-operating characteristic curve (AUC) ± standard deviation were 0.97 ± 0.02 for Light, 0.83 ± 0.11 for Dark, and 0.96 ± 0.03 for Dark-Black. When using a 5-category, this resulted in 0.74 ± 0.05 for Very Pale, 0.72 ± 0.03 for Pale, 0.73 ± 0.03 for Intermediate, 0.87±0.1 for Dark, and 0.97 ± 0.03 for Dark-Black. A comparative analysis in 194 independent samples from 17 populations demonstrated that our model outperformed a previously proposed 10-SNP-classifier approach with AUCs rising from 0.79 to 0.82 for White, comparable at the intermediate level of 0.63 and 0.62, respectively, and a large increase from 0.64 to 0.92 for Black. Overall, this study demonstrates that the chosen DNA markers and prediction model, particularly the 5-category level; allow skin colour predictions within and between continental regions for the first time, which will serve as a valuable resource for future applications in forensic and anthropologic genetics.

Top