Combined array CGH plus SNP genome analyses in a single assay for optimized clinical testing
Wiszniewska, Joanna; Bi, Weimin; Shaw, Chad; Stankiewicz, Pawel; Kang, Sung-Hae L; Pursley, Amber N; Lalani, Seema; Hixson, Patricia; Gambin, Tomasz; Tsai, Chun-hui; Bock, Hans-Georg; Descartes, Maria; Probst, Frank J; Scaglia, Fernando; Beaudet, Arthur L; Lupski, James R; Eng, Christine; Wai Cheung, Sau; Bacino, Carlos; Patel, Ankita
2014-01-01
In clinical diagnostics, both array comparative genomic hybridization (array CGH) and single nucleotide polymorphism (SNP) genotyping have proven to be powerful genomic technologies utilized for the evaluation of developmental delay, multiple congenital anomalies, and neuropsychiatric disorders. Differences in the ability to resolve genomic changes between these arrays may constitute an implementation challenge for clinicians: which platform (SNP vs array CGH) might best detect the underlying genetic cause for the disease in the patient? While only SNP arrays enable the detection of copy number neutral regions of absence of heterozygosity (AOH), they have limited ability to detect single-exon copy number variants (CNVs) due to the distribution of SNPs across the genome. To provide comprehensive clinical testing for both CNVs and copy-neutral AOH, we enhanced our custom-designed high-resolution oligonucleotide array that has exon-targeted coverage of 1860 genes with 60 000 SNP probes, referred to as Chromosomal Microarray Analysis – Comprehensive (CMA-COMP). Of the 3240 cases evaluated by this array, clinically significant CNVs were detected in 445 cases including 21 cases with exonic events. In addition, 162 cases (5.0%) showed at least one AOH region >10 Mb. We demonstrate that even though this array has a lower density of SNP probes than other commercially available SNP arrays, it reliably detected AOH events >10 Mb as well as exonic CNVs beyond the detection limitations of SNP genotyping. Thus, combining SNP probes and exon-targeted array CGH into one platform provides clinically useful genetic screening in an efficient manner. PMID:23695279
... array, and oligo/SNP combination array. Related terms: comparative genomic hybridization ; copy number variant ; SNP array chromosome ... for example, the AB blood groups in humans comparative genomic hybridization Method in which two DNA samples ( ...
Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela
2014-01-01
High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs.
Bianco, Luca; Cestaro, Alessandro; Sargent, Daniel James; Banchi, Elisa; Derdak, Sophia; Di Guardo, Mario; Salvi, Silvio; Jansen, Johannes; Viola, Roberto; Gut, Ivo; Laurens, Francois; Chagné, David; Velasco, Riccardo; van de Weg, Eric; Troggio, Michela
2014-01-01
High-density SNP arrays for genome-wide assessment of allelic variation have made high resolution genetic characterization of crop germplasm feasible. A medium density array for apple, the IRSC 8K SNP array, has been successfully developed and used for screens of bi-parental populations. However, the number of robust and well-distributed markers contained on this array was not sufficient to perform genome-wide association analyses in wider germplasm sets, or Pedigree-Based Analysis at high precision, because of rapid decay of linkage disequilibrium. We describe the development of an Illumina Infinium array targeting 20K SNPs. The SNPs were predicted from re-sequencing data derived from the genomes of 13 Malus × domestica apple cultivars and one accession belonging to a crab apple species (M. micromalus). A pipeline for SNP selection was devised that avoided the pitfalls associated with the inclusion of paralogous sequence variants, supported the construction of robust multi-allelic SNP haploblocks and selected up to 11 entries within narrow genomic regions of ±5 kb, termed focal points (FPs). Broad genome coverage was attained by placing FPs at 1 cM intervals on a consensus genetic map, complementing them with FPs to enrich the ends of each of the chromosomes, and by bridging physical intervals greater than 400 Kbps. The selection also included ∼3.7K validated SNPs from the IRSC 8K array. The array has already been used in other studies where ∼15.8K SNP markers were mapped with an average of ∼6.8K SNPs per full-sib family. The newly developed array with its high density of polymorphic validated SNPs is expected to be of great utility for Pedigree-Based Analysis and Genomic Selection. It will also be a valuable tool to help dissect the genetic mechanisms controlling important fruit quality traits, and to aid the identification of marker-trait associations suitable for the application of Marker Assisted Selection in apple breeding programs. PMID:25303088
Haraksingh, Rajini R.; Abyzov, Alexej; Gerstein, Mark; Urban, Alexander E.; Snyder, Michael
2011-01-01
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications. PMID:22140474
Optimal design of low-density SNP arrays for genomic prediction: algorithm and applications
USDA-ARS?s Scientific Manuscript database
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for their optimal design. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optim...
Antanaviciute, Laima; Fernández-Fernández, Felicidad; Jansen, Johannes; Banchi, Elisa; Evans, Katherine M; Viola, Roberto; Velasco, Riccardo; Dunwell, Jim M; Troggio, Michela; Sargent, Daniel J
2012-05-25
A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the 'Golden Delicious' genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the 'Golden Delicious' pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the 'Golden Delicious' reference sequence will assist in the continued improvement of the genome sequence assembly for that variety.
Tumor Touch Imprints as Source for Whole Genome Analysis of Neuroblastoma Tumors
Brunner, Clemens; Brunner-Herglotz, Bettina; Ziegler, Andrea; Frech, Christian; Amann, Gabriele; Ladenstein, Ruth; Ambros, Inge M.; Ambros, Peter F.
2016-01-01
Introduction Tumor touch imprints (TTIs) are routinely used for the molecular diagnosis of neuroblastomas by interphase fluorescence in-situ hybridization (I-FISH). However, in order to facilitate a comprehensive, up-to-date molecular diagnosis of neuroblastomas and to identify new markers to refine risk and therapy stratification methods, whole genome approaches are needed. We examined the applicability of an ultra-high density SNP array platform that identifies copy number changes of varying sizes down to a few exons for the detection of genomic changes in tumor DNA extracted from TTIs. Material and Methods DNAs were extracted from TTIs of 46 neuroblastoma and 4 other pediatric tumors. The DNAs were analyzed on the Cytoscan HD SNP array platform to evaluate numerical and structural genomic aberrations. The quality of the data obtained from TTIs was compared to that from randomly chosen fresh or fresh frozen solid tumors (n = 212) and I-FISH validation was performed. Results SNP array profiles were obtained from 48 (out of 50) TTI DNAs of which 47 showed genomic aberrations. The high marker density allowed for single gene analysis, e.g. loss of nine exons in the ATRX gene and the visualization of chromothripsis. Data quality was comparable to fresh or fresh frozen tumor SNP profiles. SNP array results were confirmed by I-FISH. Conclusion TTIs are an excellent source for SNP array processing with the advantage of simple handling, distribution and storage of tumor tissue on glass slides. The minimal amount of tumor tissue needed to analyze whole genomes makes TTIs an economic surrogate source in the molecular diagnostic work up of tumor samples. PMID:27560999
2011-01-01
Background Integration of genomic variation with phenotypic information is an effective approach for uncovering genotype-phenotype associations. This requires an accurate identification of the different types of variation in individual genomes. Results We report the integration of the whole genome sequence of a single Holstein Friesian bull with data from single nucleotide polymorphism (SNP) and comparative genomic hybridization (CGH) array technologies to determine a comprehensive spectrum of genomic variation. The performance of resequencing SNP detection was assessed by combining SNPs that were identified to be either in identity by descent (IBD) or in copy number variation (CNV) with results from SNP array genotyping. Coding insertions and deletions (indels) were found to be enriched for size in multiples of 3 and were located near the N- and C-termini of proteins. For larger indels, a combination of split-read and read-pair approaches proved to be complementary in finding different signatures. CNVs were identified on the basis of the depth of sequenced reads, and by using SNP and CGH arrays. Conclusions Our results provide high resolution mapping of diverse classes of genomic variation in an individual bovine genome and demonstrate that structural variation surpasses sequence variation as the main component of genomic variability. Better accuracy of SNP detection was achieved with little loss of sensitivity when algorithms that implemented mapping quality were used. IBD regions were found to be instrumental for calculating resequencing SNP accuracy, while SNP detection within CNVs tended to be less reliable. CNV discovery was affected dramatically by platform resolution and coverage biases. The combined data for this study showed that at a moderate level of sequencing coverage, an ensemble of platforms and tools can be applied together to maximize the accurate detection of sequence and structural variants. PMID:22082336
Hoffmann, Thomas J; Zhan, Yiping; Kvale, Mark N; Hesselson, Stephanie E; Gollub, Jeremy; Iribarren, Carlos; Lu, Yontao; Mei, Gangwu; Purdy, Matthew M; Quesenberry, Charles; Rowell, Sarah; Shapero, Michael H; Smethurst, David; Somkin, Carol P; Van den Eeden, Stephen K; Walter, Larry; Webster, Teresa; Whitmer, Rachel A; Finn, Andrea; Schaefer, Catherine; Kwok, Pui-Yan; Risch, Neil
2011-12-01
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies. Copyright © 2011 Elsevier Inc. All rights reserved.
2012-01-01
Background A whole-genome genotyping array has previously been developed for Malus using SNP data from 28 Malus genotypes. This array offers the prospect of high throughput genotyping and linkage map development for any given Malus progeny. To test the applicability of the array for mapping in diverse Malus genotypes, we applied the array to the construction of a SNP-based linkage map of an apple rootstock progeny. Results Of the 7,867 Malus SNP markers on the array, 1,823 (23.2%) were heterozygous in one of the two parents of the progeny, 1,007 (12.8%) were heterozygous in both parental genotypes, whilst just 2.8% of the 921 Pyrus SNPs were heterozygous. A linkage map spanning 1,282.2 cM was produced comprising 2,272 SNP markers, 306 SSR markers and the S-locus. The length of the M432 linkage map was increased by 52.7 cM with the addition of the SNP markers, whilst marker density increased from 3.8 cM/marker to 0.5 cM/marker. Just three regions in excess of 10 cM remain where no markers were mapped. We compared the positions of the mapped SNP markers on the M432 map with their predicted positions on the ‘Golden Delicious’ genome sequence. A total of 311 markers (13.7% of all mapped markers) mapped to positions that conflicted with their predicted positions on the ‘Golden Delicious’ pseudo-chromosomes, indicating the presence of paralogous genomic regions or mis-assignments of genome sequence contigs during the assembly and anchoring of the genome sequence. Conclusions We incorporated data for the 2,272 SNP markers onto the map of the M432 progeny and have presented the most complete and saturated map of the full 17 linkage groups of M. pumila to date. The data were generated rapidly in a high-throughput semi-automated pipeline, permitting significant savings in time and cost over linkage map construction using microsatellites. The application of the array will permit linkage maps to be developed for QTL analyses in a cost-effective manner, and the identification of SNPs that have been assigned erroneous positions on the ‘Golden Delicious’ reference sequence will assist in the continued improvement of the genome sequence assembly for that variety. PMID:22631220
Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar).
Houston, Ross D; Taggart, John B; Cézard, Timothé; Bekaert, Michaël; Lowe, Natalie R; Downing, Alison; Talbot, Richard; Bishop, Stephen C; Archibald, Alan L; Bron, James E; Penman, David J; Davassi, Alessandro; Brew, Fiona; Tinch, Alan E; Gharbi, Karim; Hamilton, Alastair
2014-02-06
Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in salmonids and in aquaculture breeding programs via genomic selection.
Development and validation of a high density SNP genotyping array for Atlantic salmon (Salmo salar)
2014-01-01
Background Dense single nucleotide polymorphism (SNP) genotyping arrays provide extensive information on polymorphic variation across the genome of species of interest. Such information can be used in studies of the genetic architecture of quantitative traits and to improve the accuracy of selection in breeding programs. In Atlantic salmon (Salmo salar), these goals are currently hampered by the lack of a high-density SNP genotyping platform. Therefore, the aim of the study was to develop and test a dense Atlantic salmon SNP array. Results SNP discovery was performed using extensive deep sequencing of Reduced Representation (RR-Seq), Restriction site-Associated DNA (RAD-Seq) and mRNA (RNA-Seq) libraries derived from farmed and wild Atlantic salmon samples (n = 283) resulting in the discovery of > 400 K putative SNPs. An Affymetrix Axiom® myDesign Custom Array was created and tested on samples of animals of wild and farmed origin (n = 96) revealing a total of 132,033 polymorphic SNPs with high call rate, good cluster separation on the array and stable Mendelian inheritance in our sample. At least 38% of these SNPs are from transcribed genomic regions and therefore more likely to include functional variants. Linkage analysis utilising the lack of male recombination in salmonids allowed the mapping of 40,214 SNPs distributed across all 29 pairs of chromosomes, highlighting the extensive genome-wide coverage of the SNPs. An identity-by-state clustering analysis revealed that the array can clearly distinguish between fish of different origins, within and between farmed and wild populations. Finally, Y-chromosome-specific probes included on the array provide an accurate molecular genetic test for sex. Conclusions This manuscript describes the first high-density SNP genotyping array for Atlantic salmon. This array will be publicly available and is likely to be used as a platform for high-resolution genetics research into traits of evolutionary and economic importance in salmonids and in aquaculture breeding programs via genomic selection. PMID:24524230
Development and Validation of a High-Density SNP Genotyping Array for African Oil Palm.
Kwong, Qi Bin; Teh, Chee Keng; Ong, Ai Ling; Heng, Huey Ying; Lee, Heng Leng; Mohamed, Mohaimi; Low, Joel Zi-Bin; Apparow, Sukganah; Chew, Fook Tim; Mayes, Sean; Kulaveerasingam, Harikrishna; Tammi, Martti; Appleton, David Ross
2016-08-01
High-density single nucleotide polymorphism (SNP) genotyping arrays are powerful tools that can measure the level of genetic polymorphism within a population. To develop a whole-genome SNP array for oil palms, SNP discovery was performed using deep resequencing of eight libraries derived from 132 Elaeis guineensis and Elaeis oleifera palms belonging to 59 origins, resulting in the discovery of >3 million putative SNPs. After SNP filtering, the Illumina OP200K custom array was built with 170 860 successful probes. Phenetic clustering analysis revealed that the array could distinguish between palms of different origins in a way consistent with pedigree records. Genome-wide linkage disequilibrium declined more slowly for the commercial populations (ranging from 120 kb at r(2) = 0.43 to 146 kb at r(2) = 0.50) when compared with the semi-wild populations (19.5 kb at r(2) = 0.22). Genetic fixation mapping comparing the semi-wild and commercial population identified 321 selective sweeps. A genome-wide association study (GWAS) detected a significant peak on chromosome 2 associated with the polygenic component of the shell thickness trait (based on the trait shell-to-fruit; S/F %) in tenera palms. Testing of a genomic selection model on the same trait resulted in good prediction accuracy (r = 0.65) with 42% of the S/F % variation explained. The first high-density SNP genotyping array for oil palm has been developed and shown to be robust for use in genetic studies and with potential for developing early trait prediction to shorten the oil palm breeding cycle. Copyright © 2016 The Author. Published by Elsevier Inc. All rights reserved.
Hulse-Kemp, Amanda M.; Lemm, Jana; Plieske, Joerg; Ashrafi, Hamid; Buyyarapu, Ramesh; Fang, David D.; Frelichowski, James; Giband, Marc; Hague, Steve; Hinze, Lori L.; Kochan, Kelli J.; Riggs, Penny K.; Scheffler, Jodi A.; Udall, Joshua A.; Ulloa, Mauricio; Wang, Shirley S.; Zhu, Qian-Hao; Bag, Sumit K.; Bhardwaj, Archana; Burke, John J.; Byers, Robert L.; Claverie, Michel; Gore, Michael A.; Harker, David B.; Islam, Md S.; Jenkins, Johnie N.; Jones, Don C.; Lacape, Jean-Marc; Llewellyn, Danny J.; Percy, Richard G.; Pepper, Alan E.; Poland, Jesse A.; Mohan Rai, Krishan; Sawant, Samir V.; Singh, Sunil Kumar; Spriggs, Andrew; Taylor, Jen M.; Wang, Fei; Yourstone, Scott M.; Zheng, Xiuting; Lawley, Cindy T.; Ganal, Martin W.; Van Deynze, Allen; Wilson, Iain W.; Stelly, David M.
2015-01-01
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection, and studying patterns of genomic diversity among cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intraspecific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative interspecific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array were developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal × Seemann, G. mustelinum Miers × Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson and Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intraspecific and one interspecific, and 3,533 SNP markers were co-occurring in both maps. The produced intraspecific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The CottonSNP63K array, cluster file and associated marker sequences constitute a major new resource for the global cotton research community. PMID:25908569
Tang, Shaohua; Lv, Jiaojiao; Chen, Xiangnan; Bai, Lili; Li, Huanzheng; Chen, Chong; Wang, Ping; Xu, Xueqin; Lu, Jianxin
2016-01-01
To evaluate the usefulness of single-nucleotide polymorphism (SNP) array for prenatal genetic diagnosis of congenital heart defect (CHD), we used this approach to detect clinically significant copy number variants (CNVs) in fetuses with CHDs. A HumanCytoSNP-12 array was used to detect genomic samples obtained from 39 fetuses that exhibited cardiovascular abnormalities on ultrasound and had a normal karyotype. The relationship between CNVs and CHDs was identified by using genotype-phenotype comparisons and searching of chromosomal databases. All clinically significant CNVs were confirmed by real-time PCR. CNVs were detected in 38/39 (97.4%) fetuses: variants of unknown significance were detected in 2/39 (5.1%), and clinically significant CNVs were identified in 7/39 (17.9%). In 3 of the 7 fetuses with clinically significant CNVs, 3 rare and previously undescribed CNVs were detected, and these CNVs encompassed the CHD candidate genes FLNA (Xq28 dup), BCOR (Xp11.4 dup), and RBL2 (16q12.2 del). Compared with conventional cytogenetic genomics, SNP array analysis provides significantly improved detection of submicroscopic genomic aberrations in pregnancies with CHDs. Based on these results, we propose that genomic SNP array is an effective method which could be used in the prenatal diagnostic test to assist genetic counseling for pregnancies with CHDs. © 2015 S. Karger AG, Basel.
Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron
2012-01-01
Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421
SNPConvert: SNP Array Standardization and Integration in Livestock Species.
Nicolazzi, Ezequiel Luis; Marras, Gabriele; Stella, Alessandra
2016-06-09
One of the main advantages of single nucleotide polymorphism (SNP) array technology is providing genotype calls for a specific number of SNP markers at a relatively low cost. Since its first application in animal genetics, the number of available SNP arrays for each species has been constantly increasing. However, conversely to that observed in whole genome sequence data analysis, SNP array data does not have a common set of file formats or coding conventions for allele calling. Therefore, the standardization and integration of SNP array data from multiple sources have become an obstacle, especially for users with basic or no programming skills. Here, we describe the difficulties related to handling SNP array data, focusing on file formats, SNP allele coding, and mapping. We also present SNPConvert suite, a multi-platform, open-source, and user-friendly set of tools to overcome these issues. This tool, which can be integrated with open-source and open-access tools already available, is a first step towards an integrated system to standardize and integrate any type of raw SNP array data. The tool is available at: https://github. com/nicolazzie/SNPConvert.git.
Nicolazzi, Ezequiel L; Caprera, Andrea; Nazzicari, Nelson; Cozzi, Paolo; Strozzi, Francesco; Lawley, Cindy; Pirani, Ali; Soans, Chandrasen; Brew, Fiona; Jorjani, Hossein; Evans, Gary; Simpson, Barry; Tosser-Klopp, Gwenola; Brauning, Rudiger; Williams, John L; Stella, Alessandra
2015-04-10
In recent years, the use of genomic information in livestock species for genetic improvement, association studies and many other fields has become routine. In order to accommodate different market requirements in terms of genotyping cost, manufacturers of single nucleotide polymorphism (SNP) arrays, private companies and international consortia have developed a large number of arrays with different content and different SNP density. The number of currently available SNP arrays differs among species: ranging from one for goats to more than ten for cattle, and the number of arrays available is increasing rapidly. However, there is limited or no effort to standardize and integrate array- specific (e.g. SNP IDs, allele coding) and species-specific (i.e. past and current assemblies) SNP information. Here we present SNPchiMp v.3, a solution to these issues for the six major livestock species (cow, pig, horse, sheep, goat and chicken). Original data was collected directly from SNP array producers and specific international genome consortia, and stored in a MySQL database. The database was then linked to an open-access web tool and to public databases. SNPchiMp v.3 ensures fast access to the database (retrieving within/across SNP array data) and the possibility of annotating SNP array data in a user-friendly fashion. This platform allows easy integration and standardization, and it is aimed at both industry and research. It also enables users to easily link the information available from the array producer with data in public databases, without the need of additional bioinformatics tools or pipelines. In recognition of the open-access use of Ensembl resources, SNPchiMp v.3 was officially credited as an Ensembl E!mpowered tool. Availability at http://bioinformatics.tecnoparco.org/SNPchimp.
Fine mapping of copy number variations on two cattle genome assemblies using high density SNP array
USDA-ARS?s Scientific Manuscript database
Btau_4.0 and UMD3.1 are two distinct cattle reference genome assemblies. In our previous study using the low density BovineSNP50 array, we reported a copy number variation (CNV) analysis on Btau_4.0 with 521 animals of 21 cattle breeds, yielding 682 CNV regions with a total length of 139.8 megabases...
Novel applications of array comparative genomic hybridization in molecular diagnostics.
Cheung, Sau W; Bi, Weimin
2018-05-31
In 2004, the implementation of array comparative genomic hybridization (array comparative genome hybridization [CGH]) into clinical practice marked a new milestone for genetic diagnosis. Array CGH and single-nucleotide polymorphism (SNP) arrays enable genome-wide detection of copy number changes in a high resolution, and therefore microarray has been recognized as the first-tier test for patients with intellectual disability or multiple congenital anomalies, and has also been applied prenatally for detection of clinically relevant copy number variations in the fetus. Area covered: In this review, the authors summarize the evolution of array CGH technology from their diagnostic laboratory, highlighting exonic SNP arrays developed in the past decade which detect small intragenic copy number changes as well as large DNA segments for the region of heterozygosity. The applications of array CGH to human diseases with different modes of inheritance with the emphasis on autosomal recessive disorders are discussed. Expert commentary: An exonic array is a powerful and most efficient clinical tool in detecting genome wide small copy number variants in both dominant and recessive disorders. However, whole-genome sequencing may become the single integrated platform for detection of copy number changes, single-nucleotide changes as well as balanced chromosomal rearrangements in the near future.
Discovery of 100K SNP array and its utilization in sugarcane
USDA-ARS?s Scientific Manuscript database
Next generation sequencing (NGS) enable us to identify thousands of single nucleotide polymorphisms (SNPs) marker for genotyping and fingerprinting. However, the process requires very precise bioinformatics analysis and filtering process. High throughput SNP array with predefined genomic location co...
Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple
Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron
2012-01-01
As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718
Brown, Allan F; Yousef, Gad G; Chebrolu, Kranthi K; Byrd, Robert W; Everhart, Koyt W; Thomas, Aswathy; Reid, Robert W; Parkin, Isobel A P; Sharpe, Andrew G; Oliver, Rebekah; Guzman, Ivette; Jackson, Eric W
2014-09-01
A high-resolution genetic linkage map of B. oleracea was developed from a B. napus SNP array. The work will facilitate genetic and evolutionary studies in Brassicaceae. A broccoli population, VI-158 × BNC, consisting of 150 F2:3 families was used to create a saturated Brassica oleracea (diploid: CC) linkage map using a recently developed rapeseed (Brassica napus) (tetraploid: AACC) Illumina Infinium single nucleotide polymorphism (SNP) array. The map consisted of 547 non-redundant SNP markers spanning 948.1 cM across nine chromosomes with an average interval size of 1.7 cM. As the SNPs are anchored to the genomic reference sequence of the rapid cycling B. oleracea TO1000, we were able to estimate that the map provides 96 % coverage of the diploid genome. Carotenoid analysis of 2 years data identified 3 QTLs on two chromosomes that are associated with up to half of the phenotypic variation associated with the accumulation of total or individual compounds. By searching the genome sequences of the two related diploid species (B. oleracea and B. rapa), we further identified putative carotenoid candidate genes in the region of these QTLs. This is the first description of the use of a B. napus SNP array to rapidly construct high-density genetic linkage maps of one of the constituent diploid species. The unambiguous nature of these markers with regard to genomic sequences provides evidence to the nature of genes underlying the QTL, and demonstrates the value and impact this resource will have on Brassica research.
Gutierrez, Alejandro P; Turner, Frances; Gharbi, Karim; Talbot, Richard; Lowe, Natalie R; Peñaloza, Carolina; McCullough, Mark; Prodöhl, Paulo A; Bean, Tim P; Houston, Ross D
2017-07-05
SNP arrays are enabling tools for high-resolution studies of the genetic basis of complex traits in farmed and wild animals. Oysters are of critical importance in many regions from both an ecological and economic perspective, and oyster aquaculture forms a key component of global food security. The aim of our study was to design a combined-species, medium density SNP array for Pacific oyster ( Crassostrea gigas ) and European flat oyster ( Ostrea edulis ), and to test the performance of this array on farmed and wild populations from multiple locations, with a focus on European populations. SNP discovery was carried out by whole-genome sequencing (WGS) of pooled genomic DNA samples from eight C. gigas populations, and restriction site-associated DNA sequencing (RAD-Seq) of 11 geographically diverse O. edulis populations. Nearly 12 million candidate SNPs were discovered and filtered based on several criteria, including preference for SNPs segregating in multiple populations and SNPs with monomorphic flanking regions. An Affymetrix Axiom Custom Array was created and tested on a diverse set of samples ( n = 219) showing ∼27 K high quality SNPs for C. gigas and ∼11 K high quality SNPs for O. edulis segregating in these populations. A high proportion of SNPs were segregating in each of the populations, and the array was used to detect population structure and levels of linkage disequilibrium (LD). Further testing of the array on three C. gigas nuclear families ( n = 165) revealed that the array can be used to clearly distinguish between both families based on identity-by-state (IBS) clustering parental assignment software. This medium density, combined-species array will be publicly available through Affymetrix, and will be applied for genome-wide association and evolutionary genetic studies, and for genomic selection in oyster breeding programs. Copyright © 2017 Gutierrez et al.
Construction of a versatile SNP array for pyramiding useful genes of rice.
Kurokawa, Yusuke; Noda, Tomonori; Yamagata, Yoshiyuki; Angeles-Shim, Rosalyn; Sunohara, Hidehiko; Uehara, Kanako; Furuta, Tomoyuki; Nagai, Keisuke; Jena, Kshirod Kumar; Yasui, Hideshi; Yoshimura, Atsushi; Ashikari, Motoyuki; Doi, Kazuyuki
2016-01-01
DNA marker-assisted selection (MAS) has become an indispensable component of breeding. Single nucleotide polymorphisms (SNP) are the most frequent polymorphism in the rice genome. However, SNP markers are not readily employed in MAS because of limitations in genotyping platforms. Here the authors report a Golden Gate SNP array that targets specific genes controlling yield-related traits and biotic stress resistance in rice. As a first step, the SNP genotypes were surveyed in 31 parental varieties using the Affymetrix Rice 44K SNP microarray. The haplotype information for 16 target genes was then converted to the Golden Gate platform with 143-plex markers. Haplotypes for the 14 useful allele are unique and can discriminate among all other varieties. The genotyping consistency between the Affymetrix microarray and the Golden Gate array was 92.8%, and the accuracy of the Golden Gate array was confirmed in 3 F2 segregating populations. The concept of the haplotype-based selection by using the constructed SNP array was proofed. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Comparison between genotyping by sequencing and SNP-chip genotyping in QTL mapping in wheat
USDA-ARS?s Scientific Manuscript database
Array- or chip-based single nucleotide polymorphism (SNP) markers are widely used in genomic studies because of their abundance in a genome and cost less per data point compared to older marker technologies. Genotyping by sequencing (GBS), a relatively newer approach of genotyping, suggests equal or...
Unterseer, Sandra; Bauer, Eva; Haberer, Georg; Seidel, Michael; Knaak, Carsten; Ouzunova, Milena; Meitinger, Thomas; Strom, Tim M; Fries, Ruedi; Pausch, Hubert; Bertani, Christofer; Davassi, Alessandro; Mayer, Klaus Fx; Schön, Chris-Carolin
2014-09-29
High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far. We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel. The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.
Interim report on updated microarray probes for the LLNL Burkholderia pseudomallei SNP array
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, S; Jaing, C
2012-03-27
The overall goal of this project is to forensically characterize 100 unknown Burkholderia isolates in the US-Australia collaboration. We will identify genome-wide single nucleotide polymorphisms (SNPs) from B. pseudomallei and near neighbor species including B. mallei, B. thailandensis and B. oklahomensis. We will design microarray probes to detect these SNP markers and analyze 100 Burkholderia genomic DNAs extracted from environmental, clinical and near neighbor isolates from Australian collaborators on the Burkholderia SNP microarray. We will analyze the microarray genotyping results to characterize the genetic diversity of these new isolates and triage the samples for whole genome sequencing. In this interimmore » report, we described the SNP analysis and the microarray probe design for the Burkholderia SNP microarray.« less
USDA-ARS?s Scientific Manuscript database
High-density single nucleotide polymorphism (SNP) genotyping chips are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships among individuals in populations and studying marker-trait associations in mapping experiments. We developed a genotyping array includ...
Nie, Bei; Yang, Min; Fu, Weiling; Liang, Zhiqing
2015-07-07
The surface invasive cleavage assay, because of its innate accuracy and ability for self-signal amplification, provides a potential route for the mapping of hundreds of thousands of human SNP sites. However, its performance on a high density DNA array has not yet been established, due to the unusual "hairpin" probe design on the microarray and the lack of chemical stability of commercially available substrates. Here we present an applicable method to implement a nanocrystalline diamond thin film as an alternative substrate for fabricating an addressable DNA array using maskless light-directed photochemistry, producing the most chemically stable and biocompatible system for genetic analysis and enzymatic reactions. The surface invasive cleavage reaction, followed by degenerated primer ligation and post-rolling circle amplification is consecutively performed on the addressable diamond DNA array, accurately mapping SNP sites from PCR-amplified human genomic target DNA. Furthermore, a specially-designed DNA array containing dual probes in the same pixel is fabricated by following a reverse light-directed DNA synthesis protocol. This essentially enables us to decipher thousands of SNP alleles in a single-pot reaction by the simple addition of enzyme, target and reaction buffers.
Delaneau, Olivier; Marchini, Jonathan
2014-06-13
A major use of the 1000 Genomes Project (1000 GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000 GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants.
Evaluation of Genomic Instability in the Abnormal Prostate
2006-12-01
array CGH maps copy number aberrations relative to the genome sequence by using arrays of BAC or cDNA clones as the hybridization target instead of...data produced from these analyses complicate the interpretation of results . For these reasons, and as outlined by Davies et al., 22 it is desirable...There have been numerous studies of these abnormalities and several techniques, including 9 chromosome painting, array CGH and SNP arrays , have
Hernandez-Ferrer, Carles; Quintela Garcia, Ines; Danielski, Katharina; Carracedo, Ángel; Pérez-Jurado, Luis A; González, Juan R
2015-05-20
The well-known Genome-Wide Association Studies (GWAS) had led to many scientific discoveries using SNP data. Even so, they were not able to explain the full heritability of complex diseases. Now, other structural variants like copy number variants or DNA inversions, either germ-line or in mosaicism events, are being studies. We present the R package affy2sv to pre-process Affymetrix CytoScan HD/750k array (also for Genome-Wide SNP 5.0/6.0 and Axiom) in structural variant studies. We illustrate the capabilities of affy2sv using two different complete pipelines on real data. The first one performing a GWAS and a mosaic alterations detection study, and the other detecting CNVs and performing an inversion calling. Both examples presented in the article show up how affy2sv can be used as part of more complex pipelines aimed to analyze Affymetrix SNP arrays data in genetic association studies, where different types of structural variants are considered.
Peace, Cameron; Bassil, Nahla; Main, Dorrie; Ficklin, Stephen; Rosyara, Umesh R.; Stegmeir, Travis; Sebolt, Audrey; Gilmore, Barbara; Lawley, Cindy; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Iezzoni, Amy
2012-01-01
High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops. Next-generation sequencing in diverse breeding germplasm provided 25 billion basepairs (Gb) of cherry DNA sequence from which were identified genome-wide SNPs for sweet cherry and for the two sour cherry subgenomes derived from sweet cherry (avium subgenome) and P. fruticosa (fruticosa subgenome). Anchoring to the peach genome sequence, recently released by the International Peach Genome Initiative, predicted relative physical locations of the 1.9 million putative SNPs detected, preliminarily filtered to 368,943 SNPs. Further filtering was guided by results of a 144-SNP subset examined with the Illumina GoldenGate® assay on 160 accessions. A 6K Infinium® II array was designed with SNPs evenly spaced genetically across the sweet and sour cherry genomes. SNPs were developed for each sour cherry subgenome by using minor allele frequency in the sour cherry detection panel to enrich for subgenome-specific SNPs followed by targeting to either subgenome according to alleles observed in sweet cherry. The array was evaluated using panels of sweet (n = 269) and sour (n = 330) cherry breeding germplasm. Approximately one third of array SNPs were informative for each crop. A total of 1825 polymorphic SNPs were verified in sweet cherry, 13% of these originally developed for sour cherry. Allele dosage was resolved for 2058 polymorphic SNPs in sour cherry, one third of these being originally developed for sweet cherry. This publicly available genomics resource represents a significant advance in cherry genome-scanning capability that will accelerate marker-locus-trait association discovery, genome structure investigation, and genetic diversity assessment in this diploid-tetraploid crop group. PMID:23284615
Tsai, Hsin Y; Robledo, Diego; Lowe, Natalie R; Bekaert, Michael; Taggart, John B; Bron, James E; Houston, Ross D
2016-07-07
High density linkage maps are useful tools for fine-scale mapping of quantitative trait loci, and characterization of the recombination landscape of a species' genome. Genomic resources for Atlantic salmon (Salmo salar) include a well-assembled reference genome, and high density single nucleotide polymorphism (SNP) arrays. Our aim was to create a high density linkage map, and to align it with the reference genome assembly. Over 96,000 SNPs were mapped and ordered on the 29 salmon linkage groups using a pedigreed population comprising 622 fish from 60 nuclear families, all genotyped with the 'ssalar01' high density SNP array. The number of SNPs per group showed a high positive correlation with physical chromosome length (r = 0.95). While the order of markers on the genetic and physical maps was generally consistent, areas of discrepancy were identified. Approximately 6.5% of the previously unmapped reference genome sequence was assigned to chromosomes using the linkage map. Male recombination rate was lower than females across the vast majority of the genome, but with a notable peak in subtelomeric regions. Finally, using RNA-Seq data to annotate the reference genome, the mapped SNPs were categorized according to their predicted function, including annotation of ∼2500 putative nonsynonymous variants. The highest density SNP linkage map for any salmonid species has been created, annotated, and integrated with the Atlantic salmon reference genome assembly. This map highlights the marked heterochiasmy of salmon, and provides a useful resource for salmonid genetics and genomics research. Copyright © 2016 Tsai et al.
van Geest, Geert; Voorrips, Roeland E; Esselink, Danny; Post, Aike; Visser, Richard Gf; Arens, Paul
2017-08-07
Cultivated chrysanthemum is an outcrossing hexaploid (2n = 6× = 54) with a disputed mode of inheritance. In this paper, we present a single nucleotide polymorphism (SNP) selection pipeline that was used to design an Affymetrix Axiom array with 183 k SNPs from RNA sequencing data (1). With this array, we genotyped four bi-parental populations (with sizes of 405, 53, 76 and 37 offspring plants respectively), and a cultivar panel of 63 genotypes. Further, we present a method for dosage scoring in hexaploids from signal intensities of the array based on mixture models (2) and validation of selection steps in the SNP selection pipeline (3). The resulting genotypic data is used to draw conclusions on the mode of inheritance in chrysanthemum (4), and to make an inference on allelic expression bias (5). With use of the mixture model approach, we successfully called the dosage of 73,936 out of 183,130 SNPs (40.4%) that segregated in any of the bi-parental populations. To investigate the mode of inheritance, we analysed markers that segregated in the large bi-parental population (n = 405). Analysis of segregation of duplex x nulliplex SNPs resulted in evidence for genome-wide hexasomic inheritance. This evidence was substantiated by the absence of strong linkage between markers in repulsion, which indicated absence of full disomic inheritance. We present the success rate of SNP discovery out of RNA sequencing data as affected by different selection steps, among which SNP coverage over genotypes and use of different types of sequence read mapping software. Genomic dosage highly correlated with relative allele coverage from the RNA sequencing data, indicating that most alleles are expressed according to their genomic dosage. The large population, genotyped with a very large number of markers, is a unique framework for extensive genetic analyses in hexaploid chrysanthemum. As starting point, we show conclusive evidence for genome-wide hexasomic inheritance.
Xiao, Shijun; Wang, Panpan; Dong, Linsong; Zhang, Yaguang; Han, Zhaofang; Wang, Qiurong
2016-01-01
Whole-genome single-nucleotide polymorphism (SNP) markers are valuable genetic resources for the association and conservation studies. Genome-wide SNP development in many teleost species are still challenging because of the genome complexity and the cost of re-sequencing. Genotyping-By-Sequencing (GBS) provided an efficient reduced representative method to squeeze cost for SNP detection; however, most of recent GBS applications were reported on plant organisms. In this work, we used an EcoRI-NlaIII based GBS protocol to teleost large yellow croaker, an important commercial fish in China and East-Asia, and reported the first whole-genome SNP development for the species. 69,845 high quality SNP markers that evenly distributed along genome were detected in at least 80% of 500 individuals. Nearly 95% randomly selected genotypes were successfully validated by Sequenom MassARRAY assay. The association studies with the muscle eicosapentaenoic acid (EPA) and docosahexaenoic acid (DHA) content discovered 39 significant SNP markers, contributing as high up to ∼63% genetic variance that explained by all markers. Functional genes that involved in fat digestion and absorption pathway were identified, such as APOB, CRAT and OSBPL10. Notably, PPT2 Gene, previously identified in the association study of the plasma n-3 and n-6 polyunsaturated fatty acid level in human, was re-discovered in large yellow croaker. Our study verified that EcoRI-NlaIII based GBS could produce quality SNP markers in a cost-efficient manner in teleost genome. The developed SNP markers and the EPA and DHA associated SNP loci provided invaluable resources for the population structure, conservation genetics and genomic selection of large yellow croaker and other fish organisms. PMID:28028455
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea N.; McLoughlin, Kevin; Be, Nicholas A.
Venezuelan equine encephalitis virus (VEEV) is a mosquito-borne alphavirus that has caused large outbreaks of severe illness in both horses and humans. New approaches are needed to rapidly infer the origin of a newly discovered VEEV strain, estimate its equine amplification and resultant epidemic potential, and predict human virulence phenotype. We performed whole genome single nucleotide polymorphism (SNP) analysis of all available VEE antigenic complex genomes, verified that a SNP-based phylogeny accurately captured the features of a phylogenetic tree based on multiple sequence alignment, and developed a high resolution genome-wide SNP microarray. We used the microarray to analyze a broadmore » panel of VEEV isolates, found excellent concordance between array- and sequence-based SNP calls, genotyped unsequenced isolates, and placed them on a phylogeny with sequenced genomes. The microarray successfully genotyped VEEV directly from tissue samples of an infected mouse, bypassing the need for viral isolation, culture and genomic sequencing. Lastly, we identified genomic variants associated with serotypes and host species, revealing a complex relationship between genotype and phenotype.« less
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications.
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R; Taylor, Jeremy F; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal.
Optimal Design of Low-Density SNP Arrays for Genomic Prediction: Algorithm and Applications
Wu, Xiao-Lin; Xu, Jiaqi; Feng, Guofei; Wiggans, George R.; Taylor, Jeremy F.; He, Jun; Qian, Changsong; Qiu, Jiansheng; Simpson, Barry; Walker, Jeremy; Bauck, Stewart
2016-01-01
Low-density (LD) single nucleotide polymorphism (SNP) arrays provide a cost-effective solution for genomic prediction and selection, but algorithms and computational tools are needed for the optimal design of LD SNP chips. A multiple-objective, local optimization (MOLO) algorithm was developed for design of optimal LD SNP chips that can be imputed accurately to medium-density (MD) or high-density (HD) SNP genotypes for genomic prediction. The objective function facilitates maximization of non-gap map length and system information for the SNP chip, and the latter is computed either as locus-averaged (LASE) or haplotype-averaged Shannon entropy (HASE) and adjusted for uniformity of the SNP distribution. HASE performed better than LASE with ≤1,000 SNPs, but required considerably more computing time. Nevertheless, the differences diminished when >5,000 SNPs were selected. Optimization was accomplished conditionally on the presence of SNPs that were obligated to each chromosome. The frame location of SNPs on a chip can be either uniform (evenly spaced) or non-uniform. For the latter design, a tunable empirical Beta distribution was used to guide location distribution of frame SNPs such that both ends of each chromosome were enriched with SNPs. The SNP distribution on each chromosome was finalized through the objective function that was locally and empirically maximized. This MOLO algorithm was capable of selecting a set of approximately evenly-spaced and highly-informative SNPs, which in turn led to increased imputation accuracy compared with selection solely of evenly-spaced SNPs. Imputation accuracy increased with LD chip size, and imputation error rate was extremely low for chips with ≥3,000 SNPs. Assuming that genotyping or imputation error occurs at random, imputation error rate can be viewed as the upper limit for genomic prediction error. Our results show that about 25% of imputation error rate was propagated to genomic prediction in an Angus population. The utility of this MOLO algorithm was also demonstrated in a real application, in which a 6K SNP panel was optimized conditional on 5,260 obligatory SNP selected based on SNP-trait association in U.S. Holstein animals. With this MOLO algorithm, both imputation error rate and genomic prediction error rate were minimal. PMID:27583971
Chen, Guo-Bo; Lee, Sang Hong; Brion, Marie-Jo A; Montgomery, Grant W; Wray, Naomi R; Radford-Smith, Graham L; Visscher, Peter M
2014-09-01
As custom arrays are cheaper than generic GWAS arrays, larger sample size is achievable for gene discovery. Custom arrays can tag more variants through denser genotyping of SNPs at associated loci, but at the cost of losing genome-wide coverage. Balancing this trade-off is important for maximizing experimental designs. We quantified both the gain in captured SNP-heritability at known candidate regions and the loss due to imperfect genome-wide coverage for inflammatory bowel disease using immunochip (iChip) and imputed GWAS data on 61,251 and 38.550 samples, respectively. For Crohn's disease (CD), the iChip and GWAS data explained 19 and 26% of variation in liability, respectively, and SNPs in the densely genotyped iChip regions explained 13% of the SNP-heritability for both the iChip and GWAS data. For ulcerative colitis (UC), the iChip and GWAS data explained 15 and 19% of variation in liability, respectively, and the dense iChip regions explained 10 and 9% of the SNP-heritability in the iChip and the GWAS data. From bivariate analyses, estimates of the genetic correlation in risk between CD and UC were 0.75 (SE 0.017) and 0.62 (SE 0.042) for the iChip and GWAS data, respectively. We also quantified the SNP-heritability of genomic regions that did or did not contain the previous 163 GWAS hits for CD and UC, and SNP-heritability of the overlapping loci between the densely genotyped iChip regions and the 163 GWAS hits. For both diseases, over different genomic partitioning, the densely genotyped regions on the iChip tagged at least as much variation in liability as in the corresponding regions in the GWAS data, however a certain amount of tagged SNP-heritability in the GWAS data was lost using the iChip due to the low coverage at unselected regions. These results imply that custom arrays with a GWAS backbone will facilitate more gene discovery, both at associated and novel loci. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Li, Ao; Liu, Zongzhi; Lezon-Geyda, Kimberly; Sarkar, Sudipa; Lannin, Donald; Schulz, Vincent; Krop, Ian; Winer, Eric; Harris, Lyndsay; Tuck, David
2011-01-01
There is an increasing interest in using single nucleotide polymorphism (SNP) genotyping arrays for profiling chromosomal rearrangements in tumors, as they allow simultaneous detection of copy number and loss of heterozygosity with high resolution. Critical issues such as signal baseline shift due to aneuploidy, normal cell contamination, and the presence of GC content bias have been reported to dramatically alter SNP array signals and complicate accurate identification of aberrations in cancer genomes. To address these issues, we propose a novel Global Parameter Hidden Markov Model (GPHMM) to unravel tangled genotyping data generated from tumor samples. In contrast to other HMM methods, a distinct feature of GPHMM is that the issues mentioned above are quantitatively modeled by global parameters and integrated within the statistical framework. We developed an efficient EM algorithm for parameter estimation. We evaluated performance on three data sets and show that GPHMM can correctly identify chromosomal aberrations in tumor samples containing as few as 10% cancer cells. Furthermore, we demonstrated that the estimation of global parameters in GPHMM provides information about the biological characteristics of tumor samples and the quality of genotyping signal from SNP array experiments, which is helpful for data quality control and outlier detection in cohort studies. PMID:21398628
A ddRAD Based Linkage Map of the Cultivated Strawberry, Fragaria xananassa
Davik, Jahn; Sargent, Daniel James; Brurberg, May Bente; Lien, Sigbjørn; Kent, Matthew; Alsheikh, Muath
2015-01-01
The cultivated strawberry (Fragaria ×ananassa Duch.) is an allo-octoploid considered difficult to disentangle genetically due to its four relatively similar sub-genomic chromosome sets. This has been alleviated by the recent release of the strawberry IStraw90 whole genome genotyping array. However, array resolution relies on the genotypes used in the array construction and may be of limited general use. SNP detection based on reduced genomic sequencing approaches has the potential of providing better coverage in cases where the studied genotypes are only distantly related from the SNP array’s construction foundation. Here we have used double digest restriction-associated DNA sequencing (ddRAD) to identify SNPs in a 145 seedling F1 hybrid population raised from the cross between the cultivars Sonata (♀) and Babette (♂). A linkage map containing 907 markers which spanned 1,581.5 cM across 31 linkage groups representing the 28 chromosomes of the species. Comparing the physical span of the SNP markers with the F. vesca genome sequence, the linkage groups resolved covered 79% of the estimated 830 Mb of the F. ×ananassa genome. Here, we have developed the first linkage map for F. ×ananassa using ddRAD and show that this technique and other related techniques are useful tools for linkage map development and downstream genetic studies in the octoploid strawberry. PMID:26398886
Haraksingh, Rajini R; Abyzov, Alexej; Urban, Alexander Eckehart
2017-04-24
High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.
A Discovery Resource of Rare Copy Number Variations in Individuals with Autism Spectrum Disorder
Prasad, Aparna; Merico, Daniele; Thiruvahindrapuram, Bhooma; Wei, John; Lionel, Anath C.; Sato, Daisuke; Rickaby, Jessica; Lu, Chao; Szatmari, Peter; Roberts, Wendy; Fernandez, Bridget A.; Marshall, Christian R.; Hatchwell, Eli; Eis, Peggy S.; Scherer, Stephen W.
2012-01-01
The identification of rare inherited and de novo copy number variations (CNVs) in human subjects has proven a productive approach to highlight risk genes for autism spectrum disorder (ASD). A variety of microarrays are available to detect CNVs, including single-nucleotide polymorphism (SNP) arrays and comparative genomic hybridization (CGH) arrays. Here, we examine a cohort of 696 unrelated ASD cases using a high-resolution one-million feature CGH microarray, the majority of which were previously genotyped with SNP arrays. Our objective was to discover new CNVs in ASD cases that were not detected by SNP microarray analysis and to delineate novel ASD risk loci via combined analysis of CGH and SNP array data sets on the ASD cohort and CGH data on an additional 1000 control samples. Of the 615 ASD cases analyzed on both SNP and CGH arrays, we found that 13,572 of 21,346 (64%) of the CNVs were exclusively detected by the CGH array. Several of the CGH-specific CNVs are rare in population frequency and impact previously reported ASD genes (e.g., NRXN1, GRM8, DPYD), as well as novel ASD candidate genes (e.g., CIB2, DAPP1, SAE1), and all were inherited except for a de novo CNV in the GPHN gene. A functional enrichment test of gene-sets in ASD cases over controls revealed nucleotide metabolism as a potential novel pathway involved in ASD, which includes several candidate genes for follow-up (e.g., DPYD, UPB1, UPP1, TYMP). Finally, this extensively phenotyped and genotyped ASD clinical cohort serves as an invaluable resource for the next step of genome sequencing for complete genetic variation detection. PMID:23275889
Xu, Li-Xin; Holland, Heidrun; Kirsten, Holger; Ahnert, Peter; Krupp, Wolfgang; Bauer, Manfred; Schober, Ralf; Mueller, Wolf; Fritzsch, Dominik; Meixensberger, Jürgen; Koschny, Ronald
2015-04-01
According to the World Health Organization gangliogliomas are classified as well-differentiated and slowly growing neuroepithelial tumors, composed of neoplastic mature ganglion and glial cells. It is the most frequent tumor entity observed in patients with long-term epilepsy. Comprehensive cytogenetic and molecular cytogenetic data including high-resolution genomic profiling (single nucleotide polymorphism (SNP)-array) of gangliogliomas are scarce but necessary for a better oncological understanding of this tumor entity. For a detailed characterization at the single cell and cell population levels, we analyzed genomic alterations of three gangliogliomas using trypsin-Giemsa banding (GTG-banding) and by spectral karyotyping (SKY) in combination with SNP-array and gene expression array experiments. By GTG and SKY, we could confirm frequently detected chromosomal aberrations (losses within chromosomes 10, 13 and 22; gains within chromosomes 5, 7, 8 and 12), and identify so far unknown genetic aberrations like the unbalanced non-reciprocal translocation t(1;18)(q21;q21). Interestingly, we report on the second so far detected ganglioglioma with ring chromosome 1. Analyses of SNP-array data from two of the tumors and respective germline DNA (peripheral blood) identified few small gains and losses and a number of copy-neutral regions with loss of heterozygosity (LOH) in germline and in tumor tissue. In comparison to germline DNA, tumor tissues did not show substantial regions with significant loss or gain or with newly developed LOH. Gene expression analyses of tumor-specific genes revealed similarities in the profile of the analyzed samples regarding different relevant pathways. Taken together, we describe overlapping but also distinct and novel genetic aberrations of three gangliogliomas. © 2014 Japanese Society of Neuropathology.
Bourret, Vincent; Kent, Matthew P; Primmer, Craig R; Vasemägi, Anti; Karlsson, Sten; Hindar, Kjetil; McGinnity, Philip; Verspoor, Eric; Bernatchez, Louis; Lien, Sigbjørn
2013-02-01
Atlantic salmon (Salmo salar) is one of the most extensively studied fish species in the world due to its significance in aquaculture, fisheries and ongoing conservation efforts to protect declining populations. Yet, limited genomic resources have hampered our understanding of genetic architecture in the species and the genetic basis of adaptation to the wide range of natural and artificial environments it occupies. In this study, we describe the development of a medium-density Atlantic salmon single nucleotide polymorphism (SNP) array based on expressed sequence tags (ESTs) and genomic sequencing. The array was used in the most extensive assessment of population genetic structure performed to date in this species. A total of 6176 informative SNPs were successfully genotyped in 38 anadromous and freshwater wild populations distributed across the species natural range. Principal component analysis clearly differentiated European and North American populations, and within Europe, three major regional genetic groups were identified for the first time in a single analysis. We assessed the potential for the array to disentangle neutral and putative adaptive divergence of SNP allele frequencies across populations and among regional groups. In Europe, secondary contact zones were identified between major clusters where endogenous and exogenous barriers could be associated, rendering the interpretation of environmental influence on potentially adaptive divergence equivocal. A small number of markers highly divergent in allele frequencies (outliers) were observed between (multiple) freshwater and anadromous populations, between northern and southern latitudes, and when comparing Baltic populations to all others. We also discuss the potential future applications of the SNP array for conservation, management and aquaculture. © 2012 Blackwell Publishing Ltd.
USDA-ARS?s Scientific Manuscript database
High-throughput genotyping arrays provide a standardized resource for crop research communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), candidate marker and quantitative trait loci (QTL) ide...
USDA-ARS?s Scientific Manuscript database
Single nucleotide polymorphisms (SNPs) are the most abundant DNA sequence variation in the genomes which can be used to associate genotypic variation to the phenotype. Therefore, availability of a high-density SNP array with uniform genome coverage can advance genetic studies and breeding applicatio...
KinSNP software for homozygosity mapping of disease genes using SNP microarrays
2010-01-01
Consanguineous families affected with a recessive genetic disease caused by homozygotisation of a mutation offer a unique advantage for positional cloning of rare diseases. Homozygosity mapping of patient genotypes is a powerful technique for the identification of the genomic locus harbouring the causing mutation. This strategy relies on the observation that in these patients a large region spanning the disease locus is also homozygous with high probability. The high marker density in single nucleotide polymorphism (SNP) arrays is extremely advantageous for homozygosity mapping. We present KinSNP, a user-friendly software tool for homozygosity mapping using SNP arrays. The software searches for stretches of SNPs which are homozygous to the same allele in all ascertained sick individuals. User-specified parameters control the number of allowed genotyping 'errors' within homozygous blocks. Candidate disease regions are then reported in a detailed, coloured Excel file, along with genotypes of family members and healthy controls. An interactive genome browser has been included which shows homozygous blocks, individual genotypes, genes and further annotations along the chromosomes, with zooming and scrolling capabilities. The software has been used to identify the location of a mutated gene causing insensitivity to pain in a large Bedouin family. KinSNP is freely available from http://bioinfo.bgu.ac.il/bsu/software/kinSNP. PMID:20846928
Sulovari, Arvis; Li, Dawei
2014-07-19
Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. http://www.uvm.edu/genomics/software/gact.
Analysis of genetic diversity using SNP markers in oat
USDA-ARS?s Scientific Manuscript database
A large-scale single nucleotide polymorphism (SNP) discovery was carried out in cultivated oat using Roche 454 sequencing methods. DNA sequences were generated from cDNAs originating from a panel of 20 diverse oat cultivars, and from Diversity Array Technology (DArT) genomic complexity reductions fr...
Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue
2016-01-01
Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean. PMID:26856884
Wang, Jiao; Chu, Shanshan; Zhang, Huairen; Zhu, Ying; Cheng, Hao; Yu, Deyue
2016-02-09
Domestication of soybeans occurred under the intense human-directed selections aimed at developing high-yielding lines. Tracing the domestication history and identifying the genes underlying soybean domestication require further exploration. Here, we developed a high-throughput NJAU 355 K SoySNP array and used this array to study the genetic variation patterns in 367 soybean accessions, including 105 wild soybeans and 262 cultivated soybeans. The population genetic analysis suggests that cultivated soybeans have tended to originate from northern and central China, from where they spread to other regions, accompanied with a gradual increase in seed weight. Genome-wide scanning for evidence of artificial selection revealed signs of selective sweeps involving genes controlling domestication-related agronomic traits including seed weight. To further identify genomic regions related to seed weight, a genome-wide association study (GWAS) was conducted across multiple environments in wild and cultivated soybeans. As a result, a strong linkage disequilibrium region on chromosome 20 was found to be significantly correlated with seed weight in cultivated soybeans. Collectively, these findings should provide an important basis for genomic-enabled breeding and advance the study of functional genomics in soybean.
Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro
2010-04-27
To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.
Schweizer, Rena M; Robinson, Jacqueline; Harrigan, Ryan; Silva, Pedro; Galverni, Marco; Musiani, Marco; Green, Richard E; Novembre, John; Wayne, Robert K
2016-01-01
In an era of ever-increasing amounts of whole-genome sequence data for individuals and populations, the utility of traditional single nucleotide polymorphisms (SNPs) array-based genome scans is uncertain. We previously performed a SNP array-based genome scan to identify candidate genes under selection in six distinct grey wolf (Canis lupus) ecotypes. Using this information, we designed a targeted capture array for 1040 genes, including all exons and flanking regions, as well as 5000 1-kb nongenic neutral regions, and resequenced these regions in 107 wolves. Selection tests revealed striking patterns of variation within candidate genes relative to noncandidate regions and identified potentially functional variants related to local adaptation. We found 27% and 47% of candidate genes from the previous SNP array study had functional changes that were outliers in sweed and bayenv analyses, respectively. This result verifies the use of genomewide SNP surveys to tag genes that contain functional variants between populations. We highlight nonsynonymous variants in APOB, LIPG and USH2A that occur in functional domains of these proteins, and that demonstrate high correlation with precipitation seasonality and vegetation. We find Arctic and High Arctic wolf ecotypes have higher numbers of genes under selection, which highlight their conservation value and heightened threat due to climate change. This study demonstrates that combining genomewide genotyping arrays with large-scale resequencing and environmental data provides a powerful approach to discern candidate functional variants in natural populations. © 2015 John Wiley & Sons Ltd.
Characterization of genetic variability of Venezuelan equine encephalitis viruses
Gardner, Shea N.; McLoughlin, Kevin; Be, Nicholas A.; ...
2016-04-07
Venezuelan equine encephalitis virus (VEEV) is a mosquito-borne alphavirus that has caused large outbreaks of severe illness in both horses and humans. New approaches are needed to rapidly infer the origin of a newly discovered VEEV strain, estimate its equine amplification and resultant epidemic potential, and predict human virulence phenotype. We performed whole genome single nucleotide polymorphism (SNP) analysis of all available VEE antigenic complex genomes, verified that a SNP-based phylogeny accurately captured the features of a phylogenetic tree based on multiple sequence alignment, and developed a high resolution genome-wide SNP microarray. We used the microarray to analyze a broadmore » panel of VEEV isolates, found excellent concordance between array- and sequence-based SNP calls, genotyped unsequenced isolates, and placed them on a phylogeny with sequenced genomes. The microarray successfully genotyped VEEV directly from tissue samples of an infected mouse, bypassing the need for viral isolation, culture and genomic sequencing. Lastly, we identified genomic variants associated with serotypes and host species, revealing a complex relationship between genotype and phenotype.« less
Yáñez, J M; Naswa, S; López, M E; Bassini, L; Correa, K; Gilbey, J; Bernatchez, L; Norris, A; Neira, R; Lhorente, J P; Schnable, P S; Newman, S; Mileham, A; Deeb, N; Di Genova, A; Maass, A
2016-07-01
A considerable number of single nucleotide polymorphisms (SNPs) are required to elucidate genotype-phenotype associations and determine the molecular basis of important traits. In this work, we carried out de novo SNP discovery accounting for both genome duplication and genetic variation from American and European salmon populations. A total of 9 736 473 nonredundant SNPs were identified across a set of 20 fish by whole-genome sequencing. After applying six bioinformatic filtering steps, 200 K SNPs were selected to develop an Affymetrix Axiom(®) myDesign Custom Array. This array was used to genotype 480 fish representing wild and farmed salmon from Europe, North America and Chile. A total of 159 099 (79.6%) SNPs were validated as high quality based on clustering properties. A total of 151 509 validated SNPs showed a unique position in the genome. When comparing these SNPs against 238 572 markers currently available in two other Atlantic salmon arrays, only 4.6% of the SNP overlapped with the panel developed in this study. This novel high-density SNP panel will be very useful for the dissection of economically and ecologically relevant traits, enhancing breeding programmes through genomic selection as well as supporting genetic studies in both wild and farmed populations of Atlantic salmon using high-resolution genomewide information. © 2016 John Wiley & Sons Ltd.
Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao
Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos
2015-01-01
Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. PMID:26070980
Vallejo, Roger L; Silva, Rafael M O; Evenhuis, Jason P; Gao, Guangtu; Liu, Sixin; Parsons, James E; Martin, Kyle E; Wiens, Gregory D; Lourenco, Daniela A L; Leeds, Timothy D; Palti, Yniv
2018-06-05
Previously accurate genomic predictions for Bacterial cold water disease (BCWD) resistance in rainbow trout were obtained using a medium-density single nucleotide polymorphism (SNP) array. Here, the impact of lower-density SNP panels on the accuracy of genomic predictions was investigated in a commercial rainbow trout breeding population. Using progeny performance data, the accuracy of genomic breeding values (GEBV) using 35K, 10K, 3K, 1K, 500, 300 and 200 SNP panels as well as a panel with 70 quantitative trait loci (QTL)-flanking SNP was compared. The GEBVs were estimated using the Bayesian method BayesB, single-step GBLUP (ssGBLUP) and weighted ssGBLUP (wssGBLUP). The accuracy of GEBVs remained high despite the sharp reductions in SNP density, and even with 500 SNP accuracy was higher than the pedigree-based prediction (0.50-0.56 versus 0.36). Furthermore, the prediction accuracy with the 70 QTL-flanking SNP (0.65-0.72) was similar to the panel with 35K SNP (0.65-0.71). Genomewide linkage disequilibrium (LD) analysis revealed strong LD (r 2 ≥ 0.25) spanning on average over 1 Mb across the rainbow trout genome. This long-range LD likely contributed to the accurate genomic predictions with the low-density SNP panels. Population structure analysis supported the hypothesis that long-range LD in this population may be caused by admixture. Results suggest that lower-cost, low-density SNP panels can be used for implementing genomic selection for BCWD resistance in rainbow trout breeding programs. © 2018 The Authors. This article is a U.S. Government work and is in the public domain in the USA. Journal of Animal Breeding and Genetics published by Blackwell Verlag GmbH.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea; Slezak, Tom
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. Themore » SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.« less
The Role of Constitutional Copy Number Variants in Breast Cancer
Walker, Logan C.; Wiggins, George A.R.; Pearson, John F.
2015-01-01
Constitutional copy number variants (CNVs) include inherited and de novo deviations from a diploid state at a defined genomic region. These variants contribute significantly to genetic variation and disease in humans, including breast cancer susceptibility. Identification of genetic risk factors for breast cancer in recent years has been dominated by the use of genome-wide technologies, such as single nucleotide polymorphism (SNP)-arrays, with a significant focus on single nucleotide variants. To date, these large datasets have been underutilised for generating genome-wide CNV profiles despite offering a massive resource for assessing the contribution of these structural variants to breast cancer risk. Technical challenges remain in determining the location and distribution of CNVs across the human genome due to the accuracy of computational prediction algorithms and resolution of the array data. Moreover, better methods are required for interpreting the functional effect of newly discovered CNVs. In this review, we explore current and future application of SNP array technology to assess rare and common CNVs in association with breast cancer risk in humans. PMID:27600231
KinSNP software for homozygosity mapping of disease genes using SNP microarrays.
Amir, El-Ad David; Bartal, Ofer; Morad, Efrat; Nagar, Tal; Sheynin, Jony; Parvari, Ruti; Chalifa-Caspi, Vered
2010-08-01
Consanguineous families affected with a recessive genetic disease caused by homozygotisation of a mutation offer a unique advantage for positional cloning of rare diseases. Homozygosity mapping of patient genotypes is a powerful technique for the identification of the genomic locus harbouring the causing mutation. This strategy relies on the observation that in these patients a large region spanning the disease locus is also homozygous with high probability. The high marker density in single nucleotide polymorphism (SNP) arrays is extremely advantageous for homozygosity mapping. We present KinSNP, a user-friendly software tool for homozygosity mapping using SNP arrays. The software searches for stretches of SNPs which are homozygous to the same allele in all ascertained sick individuals. User-specified parameters control the number of allowed genotyping 'errors' within homozygous blocks. Candidate disease regions are then reported in a detailed, coloured Excel file, along with genotypes of family members and healthy controls. An interactive genome browser has been included which shows homozygous blocks, individual genotypes, genes and further annotations along the chromosomes, with zooming and scrolling capabilities. The software has been used to identify the location of a mutated gene causing insensitivity to pain in a large Bedouin family. KinSNP is freely available from.
Haplotype-Based Genotyping in Polyploids.
Clevenger, Josh P; Korani, Walid; Ozias-Akins, Peggy; Jackson, Scott
2018-01-01
Accurate identification of polymorphisms from sequence data is crucial to unlocking the potential of high throughput sequencing for genomics. Single nucleotide polymorphisms (SNPs) are difficult to accurately identify in polyploid crops due to the duplicative nature of polyploid genomes leading to low confidence in the true alignment of short reads. Implementing a haplotype-based method in contrasting subgenome-specific sequences leads to higher accuracy of SNP identification in polyploids. To test this method, a large-scale 48K SNP array (Axiom Arachis2) was developed for Arachis hypogaea (peanut), an allotetraploid, in which 1,674 haplotype-based SNPs were included. Results of the array show that 74% of the haplotype-based SNP markers could be validated, which is considerably higher than previous methods used for peanut. The haplotype method has been implemented in a standalone program, HAPLOSWEEP, which takes as input bam files and a vcf file and identifies haplotype-based markers. Haplotype discovery can be made within single reads or span paired reads, and can leverage long read technology by targeting any length of haplotype. Haplotype-based genotyping is applicable in all allopolyploid genomes and provides confidence in marker identification and in silico-based genotyping for polyploid genomics.
ITALICS: an algorithm for normalization and DNA copy number calling for Affymetrix SNP arrays.
Rigaill, Guillem; Hupé, Philippe; Almeida, Anna; La Rosa, Philippe; Meyniel, Jean-Philippe; Decraene, Charles; Barillot, Emmanuel
2008-03-15
Affymetrix SNP arrays can be used to determine the DNA copy number measurement of 11 000-500 000 SNPs along the genome. Their high density facilitates the precise localization of genomic alterations and makes them a powerful tool for studies of cancers and copy number polymorphism. Like other microarray technologies it is influenced by non-relevant sources of variation, requiring correction. Moreover, the amplitude of variation induced by non-relevant effects is similar or greater than the biologically relevant effect (i.e. true copy number), making it difficult to estimate non-relevant effects accurately without including the biologically relevant effect. We addressed this problem by developing ITALICS, a normalization method that estimates both biological and non-relevant effects in an alternate, iterative manner, accurately eliminating irrelevant effects. We compared our normalization method with other existing and available methods, and found that ITALICS outperformed these methods for several in-house datasets and one public dataset. These results were validated biologically by quantitative PCR. The R package ITALICS (ITerative and Alternative normaLIzation and Copy number calling for affymetrix Snp arrays) has been submitted to Bioconductor.
McCue, Molly E.; Bannasch, Danika L.; Petersen, Jessica L.; Gurr, Jessica; Bailey, Ernie; Binns, Matthew M.; Distl, Ottmar; Guérin, Gérard; Hasegawa, Telhisa; Hill, Emmeline W.; Leeb, Tosso; Lindgren, Gabriella; Penedo, M. Cecilia T.; Røed, Knut H.; Ryder, Oliver A.; Swinburne, June E.; Tozaki, Teruaki; Valberg, Stephanie J.; Vaudin, Mark; Lindblad-Toh, Kerstin
2012-01-01
An equine SNP genotyping array was developed and evaluated on a panel of samples representing 14 domestic horse breeds and 18 evolutionarily related species. More than 54,000 polymorphic SNPs provided an average inter-SNP spacing of ∼43 kb. The mean minor allele frequency across domestic horse breeds was 0.23, and the number of polymorphic SNPs within breeds ranged from 43,287 to 52,085. Genome-wide linkage disequilibrium (LD) in most breeds declined rapidly over the first 50–100 kb and reached background levels within 1–2 Mb. The extent of LD and the level of inbreeding were highest in the Thoroughbred and lowest in the Mongolian and Quarter Horse. Multidimensional scaling (MDS) analyses demonstrated the tight grouping of individuals within most breeds, close proximity of related breeds, and less tight grouping in admixed breeds. The close relationship between the Przewalski's Horse and the domestic horse was demonstrated by pair-wise genetic distance and MDS. Genotyping of other Perissodactyla (zebras, asses, tapirs, and rhinoceros) was variably successful, with call rates and the number of polymorphic loci varying across taxa. Parsimony analysis placed the modern horse as sister taxa to Equus przewalski. The utility of the SNP array in genome-wide association was confirmed by mapping the known recessive chestnut coat color locus (MC1R) and defining a conserved haplotype of ∼750 kb across all breeds. These results demonstrate the high quality of this SNP genotyping resource, its usefulness in diverse genome analyses of the horse, and potential use in related species. PMID:22253606
Wong, Gerard; Leckie, Christopher; Gorringe, Kylie L; Haviv, Izhak; Campbell, Ian G; Kowalczyk, Adam
2010-04-15
High-density single nucleotide polymorphism (SNP) genotyping arrays are efficient and cost effective platforms for the detection of copy number variation (CNV). To ensure accuracy in probe synthesis and to minimize production costs, short oligonucleotide probe sequences are used. The use of short probe sequences limits the specificity of binding targets in the human genome. The specificity of these short probeset sequences has yet to be fully analysed against a normal reference human genome. Sequence similarity can artificially elevate or suppress copy number measurements, and hence reduce the reliability of affected probe readings. For the purpose of detecting narrow CNVs reliably down to the width of a single probeset, sequence similarity is an important issue that needs to be addressed. We surveyed the Affymetrix Human Mapping SNP arrays for probeset sequence similarity against the reference human genome. Utilizing sequence similarity results, we identified a collection of fine-scaled putative CNVs between gender from autosomal probesets whose sequence matches various loci on the sex chromosomes. To detect these variations, we utilized our statistical approach, Detecting REcurrent Copy number change using rank-order Statistics (DRECS), and showed that its performance was superior and more stable than the t-test in detecting CNVs. Through the application of DRECS on the HapMap population datasets with multi-matching probesets filtered, we identified biologically relevant SNPs in aberrant regions across populations with known association to physical traits, such as height, covered by the span of a single probe. This provided empirical confirmation of the existence of naturally occurring narrow CNVs as well as the sensitivity of the Affymetrix SNP array technology in detecting them. The MATLAB implementation of DRECS is available at http://ww2.cs.mu.oz.au/ approximately gwong/DRECS/index.html.
Diversity analysis of cotton (Gossypium hirsutum L.) germplasm using the CottonSNP63K Array.
Hinze, Lori L; Hulse-Kemp, Amanda M; Wilson, Iain W; Zhu, Qian-Hao; Llewellyn, Danny J; Taylor, Jen M; Spriggs, Andrew; Fang, David D; Ulloa, Mauricio; Burke, John J; Giband, Marc; Lacape, Jean-Marc; Van Deynze, Allen; Udall, Joshua A; Scheffler, Jodi A; Hague, Steve; Wendel, Jonathan F; Pepper, Alan E; Frelichowski, James; Lawley, Cindy T; Jones, Don C; Percy, Richard G; Stelly, David M
2017-02-03
Cotton germplasm resources contain beneficial alleles that can be exploited to develop germplasm adapted to emerging environmental and climate conditions. Accessions and lines have traditionally been characterized based on phenotypes, but phenotypic profiles are limited by the cost, time, and space required to make visual observations and measurements. With advances in molecular genetic methods, genotypic profiles are increasingly able to identify differences among accessions due to the larger number of genetic markers that can be measured. A combination of both methods would greatly enhance our ability to characterize germplasm resources. Recent efforts have culminated in the identification of sufficient SNP markers to establish high-throughput genotyping systems, such as the CottonSNP63K array, which enables a researcher to efficiently analyze large numbers of SNP markers and obtain highly repeatable results. In the current investigation, we have utilized the SNP array for analyzing genetic diversity primarily among cotton cultivars, making comparisons to SSR-based phylogenetic analyses, and identifying loci associated with seed nutritional traits. The SNP markers distinctly separated G. hirsutum from other Gossypium species and distinguished the wild from cultivated types of G. hirsutum. The markers also efficiently discerned differences among cultivars, which was the primary goal when designing the CottonSNP63K array. Population structure within the genus compared favorably with previous results obtained using SSR markers, and an association study identified loci linked to factors that affect cottonseed protein content. Our results provide a large genome-wide variation data set for primarily cultivated cotton. Thousands of SNPs in representative cotton genotypes provide an opportunity to finely discriminate among cultivated cotton from around the world. The SNPs will be relevant as dense markers of genome variation for association mapping approaches aimed at correlating molecular polymorphisms with variation in phenotypic traits, as well as for molecular breeding approaches in cotton.
Troggio, Michela; Surbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the 'Golden Delicious' genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies.
USDA-ARS?s Scientific Manuscript database
Single nucleotide polymorphisms (SNPs) are capable of providing the highest level of genome coverage for genomic and genetic analysis because of their abundance and relatively even distribution in the genome. Such a capacity, however, cannot be achieved without an efficient genotyping platform such ...
Development and evaluation of the first high-throughput SNP array for common carp (Cyprinus carpio)
2014-01-01
Background A large number of single nucleotide polymorphisms (SNPs) have been identified in common carp (Cyprinus carpio) but, as yet, no high-throughput genotyping platform is available for this species. C. carpio is an important aquaculture species that accounts for nearly 14% of freshwater aquaculture production worldwide. We have developed an array for C. carpio with 250,000 SNPs and evaluated its performance using samples from various strains of C. carpio. Results The SNPs used on the array were selected from two resources: the transcribed sequences from RNA-seq data of four strains of C. carpio, and the genome re-sequencing data of five strains of C. carpio. The 250,000 SNPs on the resulting array are distributed evenly across the reference C.carpio genome with an average spacing of 6.6 kb. To evaluate the SNP array, 1,072 C. carpio samples were collected and tested. Of the 250,000 SNPs on the array, 185,150 (74.06%) were found to be polymorphic sites. Genotyping accuracy was checked using genotyping data from a group of full-siblings and their parents, and over 99.8% of the qualified SNPs were found to be reliable. Analysis of the linkage disequilibrium on all samples and on three domestic C.carpio strains revealed that the latter had the longer haplotype blocks. We also evaluated our SNP array on 80 samples from eight species related to C. carpio, with from 53,526 to 71,984 polymorphic SNPs. An identity by state analysis divided all the samples into three clusters; most of the C. carpio strains formed the largest cluster. Conclusions The Carp SNP array described here is the first high-throughput genotyping platform for C. carpio. Our evaluation of this array indicates that it will be valuable for farmed carp and for genetic and population biology studies in C. carpio and related species. PMID:24762296
Development and evaluation of the first high-throughput SNP array for common carp (Cyprinus carpio).
Xu, Jian; Zhao, Zixia; Zhang, Xiaofeng; Zheng, Xianhu; Li, Jiongtang; Jiang, Yanliang; Kuang, Youyi; Zhang, Yan; Feng, Jianxin; Li, Chuangju; Yu, Juhua; Li, Qiang; Zhu, Yuanyuan; Liu, Yuanyuan; Xu, Peng; Sun, Xiaowen
2014-04-24
A large number of single nucleotide polymorphisms (SNPs) have been identified in common carp (Cyprinus carpio) but, as yet, no high-throughput genotyping platform is available for this species. C. carpio is an important aquaculture species that accounts for nearly 14% of freshwater aquaculture production worldwide. We have developed an array for C. carpio with 250,000 SNPs and evaluated its performance using samples from various strains of C. carpio. The SNPs used on the array were selected from two resources: the transcribed sequences from RNA-seq data of four strains of C. carpio, and the genome re-sequencing data of five strains of C. carpio. The 250,000 SNPs on the resulting array are distributed evenly across the reference C.carpio genome with an average spacing of 6.6 kb. To evaluate the SNP array, 1,072 C. carpio samples were collected and tested. Of the 250,000 SNPs on the array, 185,150 (74.06%) were found to be polymorphic sites. Genotyping accuracy was checked using genotyping data from a group of full-siblings and their parents, and over 99.8% of the qualified SNPs were found to be reliable. Analysis of the linkage disequilibrium on all samples and on three domestic C.carpio strains revealed that the latter had the longer haplotype blocks. We also evaluated our SNP array on 80 samples from eight species related to C. carpio, with from 53,526 to 71,984 polymorphic SNPs. An identity by state analysis divided all the samples into three clusters; most of the C. carpio strains formed the largest cluster. The Carp SNP array described here is the first high-throughput genotyping platform for C. carpio. Our evaluation of this array indicates that it will be valuable for farmed carp and for genetic and population biology studies in C. carpio and related species.
2011-01-01
Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across multiple Eucalyptus species is feasible, although strongly dependent on having a representative and sufficiently deep collection of sequences from many individuals of each target species. A higher density SNP platform will be instrumental to undertake genome-wide phylogenetic and population genomics studies and to implement molecular breeding by Genomic Selection in Eucalyptus. PMID:21492434
Analysis and visualization of chromosomal abnormalities in SNP data with SNPscan
Ting, Jason C; Ye, Ying; Thomas, George H; Ruczinski, Ingo; Pevsner, Jonathan
2006-01-01
Background A variety of diseases are caused by chromosomal abnormalities such as aneuploidies (having an abnormal number of chromosomes), microdeletions, microduplications, and uniparental disomy. High density single nucleotide polymorphism (SNP) microarrays provide information on chromosomal copy number changes, as well as genotype (heterozygosity and homozygosity). SNP array studies generate multiple types of data for each SNP site, some with more than 100,000 SNPs represented on each array. The identification of different classes of anomalies within SNP data has been challenging. Results We have developed SNPscan, a web-accessible tool to analyze and visualize high density SNP data. It enables researchers (1) to visually and quantitatively assess the quality of user-generated SNP data relative to a benchmark data set derived from a control population, (2) to display SNP intensity and allelic call data in order to detect chromosomal copy number anomalies (duplications and deletions), (3) to display uniparental isodisomy based on loss of heterozygosity (LOH) across genomic regions, (4) to compare paired samples (e.g. tumor and normal), and (5) to generate a file type for viewing SNP data in the University of California, Santa Cruz (UCSC) Human Genome Browser. SNPscan accepts data exported from Affymetrix Copy Number Analysis Tool as its input. We validated SNPscan using data generated from patients with known deletions, duplications, and uniparental disomy. We also inspected previously generated SNP data from 90 apparently normal individuals from the Centre d'Étude du Polymorphisme Humain (CEPH) collection, and identified three cases of uniparental isodisomy, four females having an apparently mosaic X chromosome, two mislabelled SNP data sets, and one microdeletion on chromosome 2 with mosaicism from an apparently normal female. These previously unrecognized abnormalities were all detected using SNPscan. The microdeletion was independently confirmed by fluorescence in situ hybridization, and a region of homozygosity in a UPD case was confirmed by sequencing of genomic DNA. Conclusion SNPscan is useful to identify chromosomal abnormalities based on SNP intensity (such as chromosomal copy number changes) and heterozygosity data (including regions of LOH and some cases of UPD). The program and source code are available at the SNPscan website . PMID:16420694
Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner
2017-01-16
With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.
Making a chocolate chip: development and evaluation of a 6K SNP array for Theobroma cacao.
Livingstone, Donald; Royaert, Stefan; Stack, Conrad; Mockaitis, Keithanne; May, Greg; Farmer, Andrew; Saski, Christopher; Schnell, Ray; Kuhn, David; Motamayor, Juan Carlos
2015-08-01
Theobroma cacao, the key ingredient in chocolate production, is one of the world's most important tree fruit crops, with ∼4,000,000 metric tons produced across 50 countries. To move towards gene discovery and marker-assisted breeding in cacao, a single-nucleotide polymorphism (SNP) identification project was undertaken using RNAseq data from 16 diverse cacao cultivars. RNA sequences were aligned to the assembled transcriptome of the cultivar Matina 1-6, and 330,000 SNPs within coding regions were identified. From these SNPs, a subset of 6,000 high-quality SNPs were selected for inclusion on an Illumina Infinium SNP array: the Cacao6kSNP array. Using Cacao6KSNP array data from over 1,000 cacao samples, we demonstrate that our custom array produces a saturated genetic map and can be used to distinguish among even closely related genotypes. Our study enhances and expands the genetic resources available to the cacao research community, and provides the genome-scale set of tools that are critical for advancing breeding with molecular markers in an agricultural species with high genetic diversity. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Xu, Lingyang; Hou, Yali; Bickhart, Derek M; Song, Jiuzhou; Liu, George E
2013-06-25
Copy number variations (CNVs) are gains and losses of genomic sequence between two individuals of a species when compared to a reference genome. The data from single nucleotide polymorphism (SNP) microarrays are now routinely used for genotyping, but they also can be utilized for copy number detection. Substantial progress has been made in array design and CNV calling algorithms and at least 10 comparison studies in humans have been published to assess them. In this review, we first survey the literature on existing microarray platforms and CNV calling algorithms. We then examine a number of CNV calling tools to evaluate their impacts using bovine high-density SNP data. Large incongruities in the results from different CNV calling tools highlight the need for standardizing array data collection, quality assessment and experimental validation. Only after careful experimental design and rigorous data filtering can the impacts of CNVs on both normal phenotypic variability and disease susceptibility be fully revealed.
Kawakami, Takeshi; Backström, Niclas; Burri, Reto; Husby, Arild; Olason, Pall; Rice, Amber M; Ålund, Murielle; Qvarnström, Anna; Ellegren, Hans
2014-01-01
With the access to draft genome sequence assemblies and whole-genome resequencing data from population samples, molecular ecology studies will be able to take truly genome-wide approaches. This now applies to an avian model system in ecological and evolutionary research: Old World flycatchers of the genus Ficedula, for which we recently obtained a 1.1 Gb collared flycatcher genome assembly and identified 13 million single-nucleotide polymorphism (SNP)s in population resequencing of this species and its sister species, pied flycatcher. Here, we developed a custom 50K Illumina iSelect flycatcher SNP array with markers covering 30 autosomes and the Z chromosome. Using a number of selection criteria for inclusion in the array, both genotyping success rate and polymorphism information content (mean marker heterozygosity = 0.41) were high. We used the array to assess linkage disequilibrium (LD) and hybridization in flycatchers. Linkage disequilibrium declined quickly to the background level at an average distance of 17 kb, but the extent of LD varied markedly within the genome and was more than 10-fold higher in ‘genomic islands’ of differentiation than in the rest of the genome. Genetic ancestry analysis identified 33 F1 hybrids but no later-generation hybrids from sympatric populations of collared flycatchers and pied flycatchers, contradicting earlier reports of backcrosses identified from much fewer number of markers. With an estimated divergence time as recently as <1 Ma, this suggests strong selection against F1 hybrids and unusually rapid evolution of reproductive incompatibility in an avian system. PMID:24784959
Schulz, Vincent; Chen, Min; Tuck, David
2010-01-01
Background Genotyping platforms such as single nucleotide polymorphism (SNP) arrays are powerful tools to study genomic aberrations in cancer samples. Allele specific information from SNP arrays provides valuable information for interpreting copy number variation (CNV) and allelic imbalance including loss-of-heterozygosity (LOH) beyond that obtained from the total DNA signal available from array comparative genomic hybridization (aCGH) platforms. Several algorithms based on hidden Markov models (HMMs) have been designed to detect copy number changes and copy-neutral LOH making use of the allele information on SNP arrays. However heterogeneity in clinical samples, due to stromal contamination and somatic alterations, complicates analysis and interpretation of these data. Methods We have developed MixHMM, a novel hidden Markov model using hidden states based on chromosomal structural aberrations. MixHMM allows CNV detection for copy numbers up to 7 and allows more complete and accurate description of other forms of allelic imbalance, such as increased copy number LOH or imbalanced amplifications. MixHMM also incorporates a novel sample mixing model that allows detection of tumor CNV events in heterogeneous tumor samples, where cancer cells are mixed with a proportion of stromal cells. Conclusions We validate MixHMM and demonstrate its advantages with simulated samples, clinical tumor samples and a dilution series of mixed samples. We have shown that the CNVs of cancer cells in a tumor sample contaminated with up to 80% of stromal cells can be detected accurately using Illumina BeadChip and MixHMM. Availability The MixHMM is available as a Python package provided with some other useful tools at http://genecube.med.yale.edu:8080/MixHMM. PMID:20532221
Wang, Shichen; Wong, Debbie; Forrest, Kerrie; Allen, Alexandra; Chao, Shiaoman; Huang, Bevan E; Maccaferri, Marco; Salvi, Silvio; Milner, Sara G; Cattivelli, Luigi; Mastrangelo, Anna M; Whan, Alex; Stephen, Stuart; Barker, Gary; Wieseke, Ralf; Plieske, Joerg; International Wheat Genome Sequencing Consortium; Lillemo, Morten; Mather, Diane; Appels, Rudi; Dolferus, Rudy; Brown-Guedira, Gina; Korol, Abraham; Akhunova, Alina R; Feuillet, Catherine; Salse, Jerome; Morgante, Michele; Pozniak, Curtis; Luo, Ming-Cheng; Dvorak, Jan; Morell, Matthew; Dubcovsky, Jorge; Ganal, Martin; Tuberosa, Roberto; Lawley, Cindy; Mikoulitch, Ivan; Cavanagh, Colin; Edwards, Keith J; Hayden, Matthew; Akhunov, Eduard
2014-01-01
High-density single nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships between individuals in populations and studying marker–trait associations in mapping experiments. We developed a genotyping array including about 90 000 gene-associated SNPs and used it to characterize genetic variation in allohexaploid and allotetraploid wheat populations. The array includes a significant fraction of common genome-wide distributed SNPs that are represented in populations of diverse geographical origin. We used density-based spatial clustering algorithms to enable high-throughput genotype calling in complex data sets obtained for polyploid wheat. We show that these model-free clustering algorithms provide accurate genotype calling in the presence of multiple clusters including clusters with low signal intensity resulting from significant sequence divergence at the target SNP site or gene deletions. Assays that detect low-intensity clusters can provide insight into the distribution of presence–absence variation (PAV) in wheat populations. A total of 46 977 SNPs from the wheat 90K array were genetically mapped using a combination of eight mapping populations. The developed array and cluster identification algorithms provide an opportunity to infer detailed haplotype structure in polyploid wheat and will serve as an invaluable resource for diversity studies and investigating the genetic basis of trait variation in wheat. PMID:24646323
Männik, Katrin; Parkel, Sven; Palta, Priit; Zilina, Olga; Puusepp, Helen; Esko, Tõnu; Mägi, Reedik; Nõukas, Margit; Veidenberg, Andres; Nelis, Mari; Metspalu, Andres; Remm, Maido; Ounap, Katrin; Kurg, Ants
2011-01-01
The increasing use of whole-genome array screening has revealed the important role of DNA copy-number variations in the pathogenesis of neurodevelopmental disorders and several recurrent genomic disorders have been defined during recent years. However, some variants considered to be pathogenic have also been observed in phenotypically normal individuals. This underlines the importance of further characterization of genomic variants with potentially variable expressivity in both patient and general population cohorts to clarify their phenotypic consequence. In this study whole-genome SNP arrays were used to investigate genomic rearrangements in 77 Estonian families with idiopathic mental retardation. In addition to this family-based approach, phenotype and genotype data from a cohort of 1000 individuals in the general population were used for accurate interpretation of aberrations found in mental retardation patients. Relevant structural aberrations were detected in 18 of the families analyzed (23%). Fifteen of those were in genomic regions where clinical significance has previously been established. In 3 families, 4 novel aberrations associated with intellectual disability were detected in chromosome regions 2p25.1-p24.3, 3p12.1-p11.2, 7p21.2-p21.1 and Xq28. Carriers of imbalances in 15q13.3, 16p11.2 and Xp22.31 were identified among reference individuals, affirming the variable phenotypic consequence of rare variants in some genomic regions considered as pathogenic. Copyright © 2010 Elsevier Masson SAS. All rights reserved.
Increasing feed efficiency and reducing methane emissions using genomics: An international approach
USDA-ARS?s Scientific Manuscript database
Genomic technology (including SNP arrays and next-generation sequencing) is a powerful driver for the genetic improvement of livestock. Phenotype recording can now, to an extent, be partitioned from selection, and even limited to several thousand animals. Rapid development of new technologies and pr...
Linkage Disequilibrium And Genome-Wide Association Studies In O. sativa
USDA-ARS?s Scientific Manuscript database
There is increasing evidence that genome-wide association studies provide a powerful approach to find the genetic basis of complex phenotypic variation in all kinds of species. For this purpose, we developed the first generation 44K Affymetrix SNP array in rice (see Tung et al. poster). We genotyped...
Gao, Z J; Jiang, Q; Cheng, D Z; Yan, X X; Chen, Q; Xu, K M
2016-10-02
Objective: To evaluate the application of single nucleotide polymorphism (SNP)-microarray and target gene sequencing technology in the clinical molecular genetic diagnosis of unexplained intellectual disability(ID) or developmental delay (DD). Method: Patients with ID or DD were recruited in the Department of Neurology, Affiliated Children's Hospital of Capital Institute of Pediatrics between September 2015 and February 2016. The intellectual assessment of the patients was performed using 0-6-year-old pediatric examination table of neuropsychological development or Wechsler intelligence scale (>6 years). Patients with a DQ less than 49 or IQ less than 51 were included in this study. The patients were scanned by SNP-array for detection of genomic copy number variations (CNV), and the revealed genomic imbalance was confirmed by quantitative real time-PCR. Candidate gene mutation screening was carried out by target gene sequencing technology.Causal mutations or likely pathogenic variants were verified by polymerase chain reaction and direct sequencing. Result: There were 15 children with ID or DD enrolled, 9 males and 6 females. The age of these patients was 7 months-16 years and 9 months. SNP-array revealed that two of the 15 patients had genomic CNV. Both CNV were de novo micro deletions, one involved 11q24.1q25 and the other micro deletion located on 21q22.2q22.3. Both micro deletions were proved to have a clinical significance due to their association with ID, brain DD, unusual faces etc. by querying Decipher database. Thirteen patients with negative findings in SNP-array were consequently examined with target gene sequencing technology, genotype-phenotype correlation analysis and genetic analysis. Five patients were diagnosed with monogenic disorder, two were diagnosed with suspected genetic disorder and six were still negative. Conclusion: Sequential use of SNP-array and target gene sequencing technology can significantly increase the molecular genetic etiologic diagnosis rate of the patients with unexplained ID or DD. Combined use of these technologies can serve as a useful examinational method in assisting differential diagnosis of children with unexplained ID or DD.
Zhu, Bo; Niu, Hong; Zhang, Wengang; Wang, Zezhao; Liang, Yonghu; Guan, Long; Guo, Peng; Chen, Yan; Zhang, Lupei; Guo, Yong; Ni, Heming; Gao, Xue; Gao, Huijiang; Xu, Lingyang; Li, Junya
2017-06-14
Fatty acid composition of muscle is an important trait contributing to meat quality. Recently, genome-wide association study (GWAS) has been extensively used to explore the molecular mechanism underlying important traits in cattle. In this study, we performed GWAS using high density SNP array to analyze the association between SNPs and fatty acids and evaluated the accuracy of genomic prediction for fatty acids in Chinese Simmental cattle. Using the BayesB method, we identified 35 and 7 regions in Chinese Simmental cattle that displayed significant associations with individual fatty acids and fatty acid groups, respectively. We further obtained several candidate genes which may be involved in fatty acid biosynthesis including elongation of very long chain fatty acids protein 5 (ELOVL5), fatty acid synthase (FASN), caspase 2 (CASP2) and thyroglobulin (TG). Specifically, we obtained strong evidence of association signals for one SNP located at 51.3 Mb for FASN using Genome-wide Rapid Association Mixed Model and Regression-Genomic Control (GRAMMAR-GC) approaches. Also, region-based association test identified multiple SNPs within FASN and ELOVL5 for C14:0. In addition, our result revealed that the effectiveness of genomic prediction for fatty acid composition using BayesB was slightly superior over GBLUP in Chinese Simmental cattle. We identified several significantly associated regions and loci which can be considered as potential candidate markers for genomics-assisted breeding programs. Using multiple methods, our results revealed that FASN and ELOVL5 are associated with fatty acids with strong evidence. Our finding also suggested that it is feasible to perform genomic selection for fatty acids in Chinese Simmental cattle.
Troggio, Michela; Šurbanovski, Nada; Bianco, Luca; Moretto, Marco; Giongo, Lara; Banchi, Elisa; Viola, Roberto; Fernández, Felicdad Fernández; Costa, Fabrizio; Velasco, Riccardo; Cestaro, Alessandro; Sargent, Daniel James
2013-01-01
High throughput arrays for the simultaneous genotyping of thousands of single-nucleotide polymorphisms (SNPs) have made the rapid genetic characterisation of plant genomes and the development of saturated linkage maps a realistic prospect for many plant species of agronomic importance. However, the correct calling of SNP genotypes in divergent polyploid genomes using array technology can be problematic due to paralogy, and to divergence in probe sequences causing changes in probe binding efficiencies. An Illumina Infinium II whole-genome genotyping array was recently developed for the cultivated apple and used to develop a molecular linkage map for an apple rootstock progeny (M432), but a large proportion of segregating SNPs were not mapped in the progeny, due to unexpected genotype clustering patterns. To investigate the causes of this unexpected clustering we performed BLAST analysis of all probe sequences against the ‘Golden Delicious’ genome sequence and discovered evidence for paralogous annealing sites and probe sequence divergence for a high proportion of probes contained on the array. Following visual re-evaluation of the genotyping data generated for 8,788 SNPs for the M432 progeny using the array, we manually re-scored genotypes at 818 loci and mapped a further 797 markers to the M432 linkage map. The newly mapped markers included the majority of those that could not be mapped previously, as well as loci that were previously scored as monomorphic, but which segregated due to divergence leading to heterozygosity in probe annealing sites. An evaluation of the 8,788 probes in a diverse collection of Malus germplasm showed that more than half the probes returned genotype clustering patterns that were difficult or impossible to interpret reliably, highlighting implications for the use of the array in genome-wide association studies. PMID:23826289
Genome-wide association study for milking speed in French Holstein cows.
Marete, Andrew; Sahana, Goutam; Fritz, Sébastien; Lefebvre, Rachel; Barbat, Anne; Lund, Mogens Sandø; Guldbrandtsen, Bernt; Boichard, Didier
2018-04-25
Using a combination of data from the BovineSNP50 BeadChip SNP array (Illumina, San Diego, CA) and a EuroGenomics (Amsterdam, the Netherlands) custom single nucleotide polymorphism (SNP) chip with SNP pre-selected from whole genome sequence data, we carried out an association study of milking speed in 32,491 French Holstein dairy cows. Milking speed was measured by a score given by the farmer. Phenotypes were yield deviations as obtained from the French evaluation system. They were analyzed with a linear mixed model for association studies. We identified SNP on 22 chromosomes significantly associated with milking speed. As clinical mastitis and somatic cell score have an unfavorable genetic correlation with milking speed, we tested whether the most significant SNP on these 22 chromosomes associated with milking speed were also associated with clinical mastitis or somatic cell score. Nine hundred seventy-one genome-wide significant SNP were associated with milking speed. Of these, 86 were associated with clinical mastitis and 198 with somatic cell score. The most significant association signals for milking speed were observed on chromosomes 7, 8, 10, 14, and 18. The most significant signal was located on chromosome 14 (ZFAT gene). Eleven novel milking speed quantitative trait loci (QTL) were observed on chromosomes 7, 10, 11, 14, 18, 25, and 26. Twelve candidate SNP for milking speed mapped directly within genes. Of these 10 were QTL lead SNP, which mapped within the genes HMHA1, POLR2E, GNB5, KLHL29, ZFAT, KCNB2, CEACAM18, CCL24, and LHPP. Limited pleiotropy was observed between milking speed QTL and clinical mastitis. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Abbey, Darren; Hickman, Meleah; Gresham, David; Berman, Judith
2011-01-01
Phenotypic diversity can arise rapidly through loss of heterozygosity (LOH) or by the acquisition of copy number variations (CNV) spanning whole chromosomes or shorter contiguous chromosome segments. In Candida albicans, a heterozygous diploid yeast pathogen with no known meiotic cycle, homozygosis and aneuploidy alter clinical characteristics, including drug resistance. Here, we developed a high-resolution microarray that simultaneously detects ∼39,000 single nucleotide polymorphism (SNP) alleles and ∼20,000 copy number variation loci across the C. albicans genome. An important feature of the array analysis is a computational pipeline that determines SNP allele ratios based upon chromosome copy number. Using the array and analysis tools, we constructed a haplotype map (hapmap) of strain SC5314 to assign SNP alleles to specific homologs, and we used it to follow the acquisition of loss of heterozygosity (LOH) and copy number changes in a series of derived laboratory strains. This high-resolution SNP/CGH microarray and the associated hapmap facilitated the phasing of alleles in lab strains and revealed detrimental genome changes that arose frequently during molecular manipulations of laboratory strains. Furthermore, it provided a useful tool for rapid, high-resolution, and cost-effective characterization of changes in allele diversity as well as changes in chromosome copy number in new C. albicans isolates. PMID:22384363
Ganal, Martin W.; Durstewitz, Gregor; Polley, Andreas; Bérard, Aurélie; Buckler, Edward S.; Charcosset, Alain; Clarke, Joseph D.; Graner, Eva-Maria; Hansen, Mark; Joets, Johann; Le Paslier, Marie-Christine; McMullen, Michael D.; Montalent, Pierre; Rose, Mark; Schön, Chris-Carolin; Sun, Qi; Walter, Hildrun; Martin, Olivier C.; Falque, Matthieu
2011-01-01
SNP genotyping arrays have been useful for many applications that require a large number of molecular markers such as high-density genetic mapping, genome-wide association studies (GWAS), and genomic selection. We report the establishment of a large maize SNP array and its use for diversity analysis and high density linkage mapping. The markers, taken from more than 800,000 SNPs, were selected to be preferentially located in genes and evenly distributed across the genome. The array was tested with a set of maize germplasm including North American and European inbred lines, parent/F1 combinations, and distantly related teosinte material. A total of 49,585 markers, including 33,417 within 17,520 different genes and 16,168 outside genes, were of good quality for genotyping, with an average failure rate of 4% and rates up to 8% in specific germplasm. To demonstrate this array's use in genetic mapping and for the independent validation of the B73 sequence assembly, two intermated maize recombinant inbred line populations – IBM (B73×Mo17) and LHRF (F2×F252) – were genotyped to establish two high density linkage maps with 20,913 and 14,524 markers respectively. 172 mapped markers were absent in the current B73 assembly and their placement can be used for future improvements of the B73 reference sequence. Colinearity of the genetic and physical maps was mostly conserved with some exceptions that suggest errors in the B73 assembly. Five major regions containing non-colinearities were identified on chromosomes 2, 3, 6, 7 and 9, and are supported by both independent genetic maps. Four additional non-colinear regions were found on the LHRF map only; they may be due to a lower density of IBM markers in those regions or to true structural rearrangements between lines. Given the array's high quality, it will be a valuable resource for maize genetics and many aspects of maize breeding. PMID:22174790
USDA-ARS?s Scientific Manuscript database
Improving water-use efficiency by incorporating drought avoidance traits into new wheat varieties is an important objective for wheat breeding in water-limited environments. This study uses genome wide association studies (GWAS) to identify candidate loci for water-soluble carbohydrate accumulation,...
A Complex 6p25 Rearrangement in a Child With Multiple Epiphyseal Dysplasia
Bedoyan, Jirair K.; Lesperance, Marci M.; Ackley, Todd; Iyer, Ramaswamy K.; Innis, Jeffrey W.; Misra, Vinod K.
2015-01-01
Genomic rearrangements are increasingly recognized as important contributors to human disease. Here we report on an 11½-year-old child with myopia, Duane retraction syndrome, bilateral mixed hearing loss, skeletal anomalies including multiple epiphyseal dysplasia, and global developmental delay, and a complex 6p25 genomic rearrangement. We have employed oligonucleotide-based comparative genomic hybridization arrays (aCGH) of different resolutions (44 and 244K) as well as a 1 M single nucleotide polymorphism (SNP) array to analyze this complex rearrangement. Our analyses reveal a complex rearrangement involving a ~2.21 Mb interstitial deletion, a ~240 kb terminal deletion, and a 70–80 kb region in between these two deletions that shows maintenance of genomic copy number. The interstitial deletion contains eight known genes, including three Forkhead box containing (FOX) transcription factors (FOXQ1, FOXF2, and FOXC1). The region maintaining genomic copy number partly overlaps the dual specificity protein phosphatase 22 (DUSP22) gene. Array analyses suggest a homozygous loss of genomic material at the 5′ end of DUSP22, which was corroborated using TaqMan® copy number analysis. It is possible that this homozygous genomic loss may render both copies of DUSP22 or its products non-functional. Our analysis suggests a rearrangement mechanism distinct from a previously reported replication-based error-prone mechanism without template switching for a specific 6p25 rearrangement with a 1.22 Mb interstitial deletion. Our study demonstrates the utility and limitations of using oligonucleotide-based aCGH and SNP array technologies of increasing resolutions in order to identify complex DNA rearrangements and gene disruptions. PMID:21204225
Tumino, Giorgio; Voorrips, Roeland E; Rizza, Fulvia; Badeck, Franz W; Morcia, Caterina; Ghizzoni, Roberta; Germeier, Christoph U; Paulo, Maria-João; Terzi, Valeria; Smulders, Marinus J M
2016-09-01
Infinium SNP data analysed as continuous intensity ratios enabled associating genotypic and phenotypic data from heterogeneous oat samples, showing that association mapping for frost tolerance is a feasible option. Oat is sensitive to freezing temperatures, which restricts the cultivation of fall-sown or winter oats to regions with milder winters. Fall-sown oats have a longer growth cycle, mature earlier, and have a higher productivity than spring-sown oats, therefore improving frost tolerance is an important goal in oat breeding. Our aim was to test the effectiveness of a Genome-Wide Association Study (GWAS) for mapping QTLs related to frost tolerance, using an approach that tolerates continuously distributed signals from SNPs in bulked samples from heterogeneous accessions. A collection of 138 European oat accessions, including landraces, old and modern varieties from 27 countries was genotyped using the Infinium 6K SNP array. The SNP data were analyzed as continuous intensity ratios, rather than converting them into discrete values by genotype calling. PCA and Ward's clustering of genetic similarities revealed the presence of two main groups of accessions, which roughly corresponded to Continental Europe and Mediterranean/Atlantic Europe, although a total of eight subgroups can be distinguished. The accessions were phenotyped for frost tolerance under controlled conditions by measuring fluorescence quantum yield of photosystem II after a freezing stress. GWAS were performed by a linear mixed model approach, comparing different corrections for population structure. All models detected three robust QTLs, two of which co-mapped with QTLs identified earlier in bi-parental mapping populations. The approach used in the present work shows that SNP array data of heterogeneous hexaploid oat samples can be successfully used to determine genetic similarities and to map associations to quantitative phenotypic traits.
Hartmann, Luise; Stephenson, Christine F; Verkamp, Stephanie R; Johnson, Krystal R; Burnworth, Bettina; Hammock, Kelle; Brodersen, Lisa Eidenschink; de Baca, Monica E; Wells, Denise A; Loken, Michael R; Zehentner, Barbara K
2014-12-01
Array comparative genomic hybridization (aCGH) has become a powerful tool for analyzing hematopoietic neoplasms and identifying genome-wide copy number changes in a single assay. aCGH also has superior resolution compared with fluorescence in situ hybridization (FISH) or conventional cytogenetics. Integration of single nucleotide polymorphism (SNP) probes with microarray analysis allows additional identification of acquired uniparental disomy, a copy neutral aberration with known potential to contribute to tumor pathogenesis. However, a limitation of microarray analysis has been the inability to detect clonal heterogeneity in a sample. This study comprised 16 samples (acute myeloid leukemia, myelodysplastic syndrome, chronic lymphocytic leukemia, plasma cell neoplasm) with complex cytogenetic features and evidence of clonal evolution. We used an integrated manual peak reassignment approach combining analysis of aCGH and SNP microarray data for characterization of subclonal abnormalities. We compared array findings with results obtained from conventional cytogenetic and FISH studies. Clonal heterogeneity was detected in 13 of 16 samples by microarray on the basis of log2 values. Use of the manual peak reassignment analysis approach improved resolution of the sample's clonal composition and genetic heterogeneity in 10 of 13 (77%) patients. Moreover, in 3 patients, clonal disease progression was revealed by array analysis that was not evident by cytogenetic or FISH studies. Genetic abnormalities originating from separate clonal subpopulations can be identified and further characterized by combining aCGH and SNP hybridization results from 1 integrated microarray chip by use of the manual peak reassignment technique. Its clinical utility in comparison to conventional cytogenetic or FISH studies is demonstrated. © 2014 American Association for Clinical Chemistry.
USDA-ARS?s Scientific Manuscript database
Background: In a previously reported genome-wide association study based on a high-density bovine SNP genotyping array, 8 SNP were nominally associated (P=0.003) with average daily gain (ADG) and 3 of these were also associated (P=0.002) with average daily feed intake (ADFI) in a population of c...
R classes and methods for SNP array data.
Scharpf, Robert B; Ruczinski, Ingo
2010-01-01
The Bioconductor project is an "open source and open development software project for the analysis and comprehension of genomic data" (1), primarily based on the R programming language. Infrastructure packages, such as Biobase, are maintained by Bioconductor core developers and serve several key roles to the broader community of Bioconductor software developers and users. In particular, Biobase introduces an S4 class, the eSet, for high-dimensional assay data. Encapsulating the assay data as well as meta-data on the samples, features, and experiment in the eSet class definition ensures propagation of the relevant sample and feature meta-data throughout an analysis. Extending the eSet class promotes code reuse through inheritance as well as interoperability with other R packages and is less error-prone. Recently proposed class definitions for high-throughput SNP arrays extend the eSet class. This chapter highlights the advantages of adopting and extending Biobase class definitions through a working example of one implementation of classes for the analysis of high-throughput SNP arrays.
Karampetsou, Evangelia; Morrogh, Deborah; Chitty, Lyn
2014-01-01
The advantage of microarray (array) over conventional karyotype for the diagnosis of fetal pathogenic chromosomal anomalies has prompted the use of microarrays in prenatal diagnostics. In this review we compare the performance of different array platforms (BAC, oligonucleotide CGH, SNP) and designs (targeted, whole genome, whole genome, and targeted, custom) and discuss their advantages and disadvantages in relation to prenatal testing. We also discuss the factors to consider when implementing a microarray testing service for the diagnosis of fetal chromosomal aberrations. PMID:26237396
Bungartz, Annemarie; Klaus, Marius; Mathew, Boby; Léon, Jens; Naz, Ali Ahmad
2016-03-01
The aim of the present study was to develop a new cost effective PCR based CAPS marker set using advantages of high-throughput SNP genotyping. Initially, SNP survey was made using 20 diverse barley genotypes via 9k iSelect array genotyping that resulted in 6334 polymorphic SNP markers. Principle component analysis using this marker data showed fine differentiation of barley diverse gene pool. Till this end, we developed 200 SNP derived CAPS markers distributed across the genome covering around 991cM with an average marker density of 5.09cM. Further, we genotyped 68 CAPS markers in an F2 population (Cheri×ICB181160) segregating for seed color variation in barley. Genetic mapping of seed color revealed putative linkage of single nuclear gene on chromosome 1H. These findings showed the proof of concept for the development and utility of a newer cost effective genomic tool kit to analyze broader genetic resources of barley worldwide. Copyright © 2016 Elsevier Inc. All rights reserved.
A high-density intraspecific SNP linkage map of pigeonpea (Cajanas cajan L. Millsp.)
Mandal, Paritra; Bhutani, Shefali; Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram Pratap; Chaudhary, A. K.; Yadav, Rekha; Gaikwad, K.; Sevanthi, Amitha Mithra; Datta, Subhojit; Raje, Ranjeet S.; Sharma, Tilak R.; Singh, Nagendra Kumar
2017-01-01
Pigeonpea (Cajanus cajan (L.) Millsp.) is a major food legume cultivated in semi-arid tropical regions including the Indian subcontinent, Africa, and Southeast Asia. It is an important source of protein, minerals, and vitamins for nearly 20% of the world population. Due to high carbon sequestration and drought tolerance, pigeonpea is an important crop for the development of climate resilient agriculture and nutritional security. However, pigeonpea productivity has remained low for decades because of limited genetic and genomic resources, and sparse utilization of landraces and wild pigeonpea germplasm. Here, we present a dense intraspecific linkage map of pigeonpea comprising 932 markers that span a total adjusted map length of 1,411.83 cM. The consensus map is based on three different linkage maps that incorporate a large number of single nucleotide polymorphism (SNP) markers derived from next generation sequencing data, using Illumina GoldenGate bead arrays, and genotyping with restriction site associated DNA (RAD) sequencing. The genotyping-by-sequencing enhanced the marker density but was met with limited success due to lack of common markers across the genotypes of mapping population. The integrated map has 547 bead-array SNP, 319 RAD-SNP, and 65 simple sequence repeat (SSR) marker loci. We also show here correspondence between our linkage map and published genome pseudomolecules of pigeonpea. The availability of a high-density linkage map will help improve the anchoring of the pigeonpea genome to its chromosomes and the mapping of genes and quantitative trait loci associated with useful agronomic traits. PMID:28654689
2009-01-01
Background Array genomic hybridization is being used clinically to detect pathogenic copy number variants in children with intellectual disability and other birth defects. However, there is no agreement regarding the kind of array, the distribution of probes across the genome, or the resolution that is most appropriate for clinical use. Results We performed 500 K Affymetrix GeneChip® array genomic hybridization in 100 idiopathic intellectual disability trios, each comprised of a child with intellectual disability of unknown cause and both unaffected parents. We found pathogenic genomic imbalance in 16 of these 100 individuals with idiopathic intellectual disability. In comparison, we had found pathogenic genomic imbalance in 11 of 100 children with idiopathic intellectual disability in a previous cohort who had been studied by 100 K GeneChip® array genomic hybridization. Among 54 intellectual disability trios selected from the previous cohort who were re-tested with 500 K GeneChip® array genomic hybridization, we identified all 10 previously-detected pathogenic genomic alterations and at least one additional pathogenic copy number variant that had not been detected with 100 K GeneChip® array genomic hybridization. Many benign copy number variants, including one that was de novo, were also detected with 500 K array genomic hybridization, but it was possible to distinguish the benign and pathogenic copy number variants with confidence in all but 3 (1.9%) of the 154 intellectual disability trios studied. Conclusion Affymetrix GeneChip® 500 K array genomic hybridization detected pathogenic genomic imbalance in 10 of 10 patients with idiopathic developmental disability in whom 100 K GeneChip® array genomic hybridization had found genomic imbalance, 1 of 44 patients in whom 100 K GeneChip® array genomic hybridization had found no abnormality, and 16 of 100 patients who had not previously been tested. Effective clinical interpretation of these studies requires considerable skill and experience. PMID:19917086
Kuhn, Alexandre; Ong, Yao Min; Cheng, Ching-Yu; Wong, Tien Yin; Quake, Stephen R; Burkholder, William F
2014-06-03
Insertions of the human-specific subfamily of LINE-1 (L1) retrotransposon are highly polymorphic across individuals and can critically influence the human transcriptome. We hypothesized that L1 insertions could represent genetic variants determining important human phenotypic traits, and performed an integrated analysis of L1 elements and single nucleotide polymorphisms (SNPs) in several human populations. We found that a large fraction of L1s were in high linkage disequilibrium with their surrounding genomic regions and that they were well tagged by SNPs. However, L1 variants were only partially captured by SNPs on standard SNP arrays, so that their potential phenotypic impact would be frequently missed by SNP array-based genome-wide association studies. We next identified potential phenotypic effects of L1s by looking for signatures of natural selection linked to L1 insertions; significant extended haplotype homozygosity was detected around several L1 insertions. This finding suggests that some of these L1 insertions may have been the target of recent positive selection.
Coverage and efficiency in current SNP chips
Ha, Ngoc-Thuy; Freytag, Saskia; Bickeboeller, Heike
2014-01-01
To answer the question as to which commercial high-density SNP chip covers most of the human genome given a fixed budget, we compared the performance of 12 chips of different sizes released by Affymetrix and Illumina for the European, Asian, and African populations. These include Affymetrix' relatively new population-optimized arrays, whose SNP sets are each tailored toward a specific ethnicity. Our evaluation of the chips included the use of two measures, efficiency and cost–benefit ratio, which we developed as supplements to genetic coverage. Unlike coverage, these measures factor in the price of a chip or its substitute size (number of SNPs on chip), allowing comparisons to be drawn between differently priced chips. In this fashion, we identified the Affymetrix population-optimized arrays as offering the most cost-effective coverage for the Asian and African population. For the European population, we established the Illumina Human Omni 2.5-8 as the preferred choice. Interestingly, the Affymetrix chip tailored toward an Eastern Asian subpopulation performed well for all three populations investigated. However, our coverage estimates calculated for all chips proved much lower than those advertised by the producers. All our analyses were based on the 1000 Genome Project as reference population. PMID:24448550
SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock.
Nicolazzi, Ezequiel Luis; Picciolini, Matteo; Strozzi, Francesco; Schnabel, Robert David; Lawley, Cindy; Pirani, Ali; Brew, Fiona; Stella, Alessandra
2014-02-11
Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.
Increased genomic prediction accuracy in wheat breeding using a large Australian panel.
Norman, Adam; Taylor, Julian; Tanaka, Emi; Telfer, Paul; Edwards, James; Martinant, Jean-Pierre; Kuchel, Haydn
2017-12-01
Genomic prediction accuracy within a large panel was found to be substantially higher than that previously observed in smaller populations, and also higher than QTL-based prediction. In recent years, genomic selection for wheat breeding has been widely studied, but this has typically been restricted to population sizes under 1000 individuals. To assess its efficacy in germplasm representative of commercial breeding programmes, we used a panel of 10,375 Australian wheat breeding lines to investigate the accuracy of genomic prediction for grain yield, physical grain quality and other physiological traits. To achieve this, the complete panel was phenotyped in a dedicated field trial and genotyped using a custom Axiom TM Affymetrix SNP array. A high-quality consensus map was also constructed, allowing the linkage disequilibrium present in the germplasm to be investigated. Using the complete SNP array, genomic prediction accuracies were found to be substantially higher than those previously observed in smaller populations and also more accurate compared to prediction approaches using a finite number of selected quantitative trait loci. Multi-trait genetic correlations were also assessed at an additive and residual genetic level, identifying a negative genetic correlation between grain yield and protein as well as a positive genetic correlation between grain size and test weight.
High throughput SNP discovery and genotyping in hexaploid wheat.
Rimbert, Hélène; Darrier, Benoît; Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre; Paux, Etienne
2018-01-01
Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research.
Roorkiwal, Manish; Jain, Ankit; Kale, Sandip M; Doddamani, Dadakhalandar; Chitikineni, Annapurna; Thudi, Mahendar; Varshney, Rajeev K
2018-04-01
To accelerate genomics research and molecular breeding applications in chickpea, a high-throughput SNP genotyping platform 'Axiom ® CicerSNP Array' has been designed, developed and validated. Screening of whole-genome resequencing data from 429 chickpea lines identified 4.9 million SNPs, from which a subset of 70 463 high-quality nonredundant SNPs was selected using different stringent filter criteria. This was further narrowed down to 61 174 SNPs based on p-convert score ≥0.3, of which 50 590 SNPs could be tiled on array. Among these tiled SNPs, a total of 11 245 SNPs (22.23%) were from the coding regions of 3673 different genes. The developed Axiom ® CicerSNP Array was used for genotyping two recombinant inbred line populations, namely ICCRIL03 (ICC 4958 × ICC 1882) and ICCRIL04 (ICC 283 × ICC 8261). Genotyping data reflected high success and polymorphic rate, with 15 140 (29.93%; ICCRIL03) and 20 018 (39.57%; ICCRIL04) polymorphic SNPs. High-density genetic maps comprising 13 679 SNPs spanning 1033.67 cM and 7769 SNPs spanning 1076.35 cM were developed for ICCRIL03 and ICCRIL04 populations, respectively. QTL analysis using multilocation, multiseason phenotyping data on these RILs identified 70 (ICCRIL03) and 120 (ICCRIL04) main-effect QTLs on genetic map. Higher precision and potential of this array is expected to advance chickpea genetics and breeding applications. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Application of Nexus copy number software for CNV detection and analysis.
Darvishi, Katayoon
2010-04-01
Among human structural genomic variation, copy number variants (CNVs) are the most frequently known component, comprised of gains/losses of DNA segments that are generally 1 kb in length or longer. Array-based comparative genomic hybridization (aCGH) has emerged as a powerful tool for detecting genomic copy number variants (CNVs). With the rapid increase in the density of array technology and with the adaptation of new high-throughput technology, a reliable and computationally scalable method for accurate mapping of recurring DNA copy number aberrations has become a main focus in research. Here we introduce Nexus Copy Number software, a platform-independent tool, to analyze the output files of all types of commercial and custom-made comparative genomic hybridization (CGH) and single-nucleotide polymorphism (SNP) arrays, such as those manufactured by Affymetrix, Agilent Technologies, Illumina, and Roche NimbleGen. It also supports data generated by various array image-analysis software tools such as GenePix, ImaGene, and BlueFuse. (c) 2010 by John Wiley & Sons, Inc.
Li, Gang; Hillier, LaDeana W; Grahn, Robert A; Zimin, Aleksey V; David, Victor A; Menotti-Raymond, Marilyn; Middleton, Rondo; Hannah, Steven; Hendrickson, Sher; Makunin, Alex; O'Brien, Stephen J; Minx, Pat; Wilson, Richard K; Lyons, Leslie A; Warren, Wesley C; Murphy, William J
2016-06-01
High-resolution genetic and physical maps are invaluable tools for building accurate genome assemblies, and interpreting results of genome-wide association studies (GWAS). Previous genetic and physical maps anchored good quality draft assemblies of the domestic cat genome, enabling the discovery of numerous genes underlying hereditary disease and phenotypes of interest to the biomedical science and breeding communities. However, these maps lacked sufficient marker density to order thousands of shorter scaffolds in earlier assemblies, which instead relied heavily on comparative mapping with related species. A high-resolution map would aid in validating and ordering chromosome scaffolds from existing and new genome assemblies. Here, we describe a high-resolution genetic linkage map of the domestic cat genome based on genotyping 453 domestic cats from several multi-generational pedigrees on the Illumina 63K SNP array. The final maps include 58,055 SNP markers placed relative to 6637 markers with unique positions, distributed across all autosomes and the X chromosome. Our final sex-averaged maps span a total autosomal length of 4464 cM, the longest described linkage map for any mammal, confirming length estimates from a previous microsatellite-based map. The linkage map was used to order and orient the scaffolds from a substantially more contiguous domestic cat genome assembly (Felis catus v8.0), which incorporated ∼20 × coverage of Illumina fragment reads. The new genome assembly shows substantial improvements in contiguity, with a nearly fourfold increase in N50 scaffold size to 18 Mb. We use this map to report probable structural errors in previous maps and assemblies, and to describe features of the recombination landscape, including a massive (∼50 Mb) recombination desert (of virtually zero recombination) on the X chromosome that parallels a similar desert on the porcine X chromosome in both size and physical location. Copyright © 2016 Li et al.
Hao, Chenyang; Wang, Yuquan; Chao, Shiaoman; Li, Tian; Liu, Hongxia; Wang, Lanfen; Zhang, Xueyong
2017-01-30
A Chinese wheat mini core collection was genotyped using the wheat 9 K iSelect SNP array. Total 2420 and 2396 polymorphic SNPs were detected on the A and the B genome chromosomes, which formed 878 haplotype blocks. There were more blocks in the B genome, but the average block size was significantly (P < 0.05) smaller than those in the A genome. Intense selection (domestication and breeding) had a stronger effect on the A than on the B genome chromosomes. Based on the genetic pedigrees, many blocks can be traced back to a well-known Strampelli cross, which was made one century ago. Furthermore, polyploidization of wheat (both tetraploidization and hexaploidization) induced revolutionary changes in both the A and the B genomes, with a greater increase of gene diversity compared to their diploid ancestors. Modern breeding has dramatically increased diversity in the gene coding regions, though obvious blocks were formed on most of the chromosomes in both tetraploid and hexaploid wheats. Tag-SNP markers identified in this study can be used for marker assisted selection using haplotype blocks as a wheat breeding strategy. This strategy can also be employed to facilitate genome selection in other self-pollinating crop species.
Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M
2015-01-01
In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.
Genotype imputation in the domestic dog
Meurs, K. M.
2016-01-01
Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels. PMID:27129452
Gao, Guangtu; Nome, Torfinn; Pearse, Devon E; Moen, Thomas; Naish, Kerry A; Thorgaard, Gary H; Lien, Sigbjørn; Palti, Yniv
2018-01-01
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout ( Oncorhynchus mykiss ), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup , followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.
A Genome-Wide Breast Cancer Scan in African Americans
2010-06-01
SNPs from the African American breast cancer scan to COGs , a European collaborative study which is has designed a SNP array with that will be genotyped...Award Number: W81XWH-08-1-0383 TITLE: A Genome-wide Breast Cancer Scan in African Americans PRINCIPAL INVESTIGATOR: Christopher A...SUBTITLE A Genome-wide Breast Cancer Scan in African Americans 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-08-1-0383 5c. PROGRAM
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect QTL for BCWD resistance in rainbow trout (Oncorhynchus mykiss). However, the recent availability of a 57K SNP array and a genome phys...
The genome-wide structure of two economically important indigenous Sicilian cattle breeds.
Mastrangelo, S; Saura, M; Tolone, M; Salces-Ortiz, J; Di Gerlando, R; Bertolini, F; Fontanesi, L; Sardina, M T; Serrano, M; Portolano, B
2014-11-01
Genomic technologies, such as high-throughput genotyping based on SNP arrays, provided background information concerning genome structure in domestic animals. The aim of this work was to investigate the genetic structure, the genome-wide estimates of inbreeding, coancestry, effective population size (Ne), and the patterns of linkage disequilibrium (LD) in 2 economically important Sicilian local cattle breeds, Cinisara (CIN) and Modicana (MOD), using the Illumina Bovine SNP50K v2 BeadChip. To understand the genetic relationship and to place both Sicilian breeds in a global context, genotypes from 134 other domesticated bovid breeds were used. Principal component analysis showed that the Sicilian cattle breeds were closer to individuals of Bos taurus taurus from Eurasia and formed nonoverlapping clusters with other breeds. Between the Sicilian cattle breeds, MOD was the most differentiated, whereas the animals belonging to the CIN breed showed a lower value of assignment, the presence of substructure, and genetic links with the MOD breed. The average molecular inbreeding and coancestry coefficients were moderately high, and the current estimates of Ne were low in both breeds. These values indicated a low genetic variability. Considering levels of LD between adjacent markers, the average r(2) in the MOD breed was comparable to those reported for others cattle breeds, whereas CIN showed a lower value. Therefore, these results support the need of more dense SNP arrays for a high-power association mapping and genomic selection efficiency, particularly for the CIN cattle breed. Controlling molecular inbreeding and coancestry would restrict inbreeding depression, the probability of losing beneficial rare alleles, and therefore the risk of extinction. The results generated from this study have important implications for the development of conservation and/or selection breeding programs in these 2 local cattle breeds.
Cifola, Ingrid; Bianchi, Cristina; Mangano, Eleonora; Bombelli, Silvia; Frascati, Fabio; Fasoli, Ester; Ferrero, Stefano; Di Stefano, Vitalba; Zipeto, Maria A; Magni, Fulvio; Signorini, Stefano; Battaglia, Cristina; Perego, Roberto A
2011-06-13
Clear cell renal cell carcinoma (ccRCC) is characterized by recurrent copy number alterations (CNAs) and loss of heterozygosity (LOH), which may have potential diagnostic and prognostic applications. Here, we explored whether ccRCC primary cultures, established from surgical tumor specimens, maintain the DNA profile of parental tumor tissues allowing a more confident CNAs and LOH discrimination with respect to the original tissues. We established a collection of 9 phenotypically well-characterized ccRCC primary cell cultures. Using the Affymetrix SNP array technology, we performed the genome-wide copy number (CN) profiling of both cultures and corresponding tumor tissues. Global concordance for each culture/tissue pair was assayed evaluating the correlations between whole-genome CN profiles and SNP allelic calls. CN analysis was performed using the two CNAG v3.0 and Partek software, and comparing results returned by two different algorithms (Hidden Markov Model and Genomic Segmentation). A very good overlap between the CNAs of each culture and corresponding tissue was observed. The finding, reinforced by high whole-genome CN correlations and SNP call concordances, provided evidence that each culture was derived from its corresponding tissue and maintained the genomic alterations of parental tumor. In addition, primary culture DNA profile remained stable for at least 3 weeks, till to third passage. These cultures showed a greater cell homogeneity and enrichment in tumor component than original tissues, thus enabling a better discrimination of CNAs and LOH. Especially for hemizygous deletions, primary cultures presented more evident CN losses, typically accompanied by LOH; differently, in original tissues the intensity of these deletions was weaken by normal cell contamination and LOH calls were missed. ccRCC primary cultures are a reliable in vitro model, well-reproducing original tumor genetics and phenotype, potentially useful for future functional approaches aimed to study genes or pathways involved in ccRCC etiopathogenesis and to identify novel clinical markers or therapeutic targets. Moreover, SNP array technology proved to be a powerful tool to better define the cell composition and homogeneity of RCC primary cultures. © 2011 Cifola et al; licensee BioMed Central Ltd.
Vallejo, Roger L.; Liu, Sixin; Gao, Guangtu; Fragomeni, Breno O.; Hernandez, Alvaro G.; Leeds, Timothy D.; Parsons, James E.; Martin, Kyle E.; Evenhuis, Jason P.; Welch, Timothy J.; Wiens, Gregory D.; Palti, Yniv
2017-01-01
Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect quantitative trait loci (QTL) for BCWD resistance in rainbow trout (Oncorhynchus mykiss). However, the recent availability of a 57 K SNP array and a reference genome assembly have enabled us to conduct genome-wide association studies (GWAS) that overcome several experimental limitations from our previous work. In the current study, we conducted GWAS for BCWD resistance in two rainbow trout breeding populations using two genotyping platforms, the 57 K Affymetrix SNP array and restriction-associated DNA (RAD) sequencing. Overall, we identified 14 moderate-large effect QTL that explained up to 60.8% of the genetic variance in one of the two populations and 27.7% in the other. Four of these QTL were found in both populations explaining a substantial proportion of the variance, although major differences were also detected between the two populations. Our results confirm that BCWD resistance is controlled by the oligogenic inheritance of few moderate-large effect loci and a large-unknown number of loci each having a small effect on BCWD resistance. We detected differences in QTL number and genome location between two GWAS models (weighted single-step GBLUP and Bayes B), which highlights the utility of using different models to uncover QTL. The RAD-SNPs detected a greater number of QTL than the 57 K SNP array in one population, suggesting that the RAD-SNPs may uncover polymorphisms that are more unique and informative for the specific population in which they were discovered. PMID:29109734
Vallejo, Roger L; Liu, Sixin; Gao, Guangtu; Fragomeni, Breno O; Hernandez, Alvaro G; Leeds, Timothy D; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Welch, Timothy J; Wiens, Gregory D; Palti, Yniv
2017-01-01
Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect quantitative trait loci (QTL) for BCWD resistance in rainbow trout ( Oncorhynchus mykiss ). However, the recent availability of a 57 K SNP array and a reference genome assembly have enabled us to conduct genome-wide association studies (GWAS) that overcome several experimental limitations from our previous work. In the current study, we conducted GWAS for BCWD resistance in two rainbow trout breeding populations using two genotyping platforms, the 57 K Affymetrix SNP array and restriction-associated DNA (RAD) sequencing. Overall, we identified 14 moderate-large effect QTL that explained up to 60.8% of the genetic variance in one of the two populations and 27.7% in the other. Four of these QTL were found in both populations explaining a substantial proportion of the variance, although major differences were also detected between the two populations. Our results confirm that BCWD resistance is controlled by the oligogenic inheritance of few moderate-large effect loci and a large-unknown number of loci each having a small effect on BCWD resistance. We detected differences in QTL number and genome location between two GWAS models (weighted single-step GBLUP and Bayes B), which highlights the utility of using different models to uncover QTL. The RAD-SNPs detected a greater number of QTL than the 57 K SNP array in one population, suggesting that the RAD-SNPs may uncover polymorphisms that are more unique and informative for the specific population in which they were discovered.
Schweighofer, Carmen D.; Coombes, Kevin R.; Majewski, Tadeusz; Barron, Lynn L.; Lerner, Susan; Sargent, Rachel L.; O'Brien, Susan; Ferrajoli, Alessandra; Wierda, William G.; Czerniak, Bogdan A.; Medeiros, L. Jeffrey; Keating, Michael J.; Abruzzo, Lynne V.
2013-01-01
Genomic abnormalities, such as deletions in 11q22 or 17p13, are associated with poorer prognosis in patients with chronic lymphocytic leukemia (CLL). We hypothesized that unknown regions of copy number variation (CNV) affect clinical outcome and can be detected by array-based single-nucleotide polymorphism (SNP) genotyping. We compared SNP genotypes from 168 untreated patients with CLL with genotypes from 73 white HapMap controls. We identified 322 regions of recurrent CNV, 82 of which occurred significantly more often in CLL than in HapMap (CLL-specific CNV), including regions typically aberrant in CLL: deletions in 6q21, 11q22, 13q14, and 17p13 and trisomy 12. In univariate analyses, 35 of total and 11 of CLL-specific CNVs were associated with unfavorable time-to-event outcomes, including gains or losses in chromosomes 2p, 4p, 4q, 6p, 6q, 7q, 11p, 11q, and 17p. In multivariate analyses, six CNVs (ie, CLL-specific variations in 11p15.1-15.4 or 6q27) predicted time-to-treatment or overall survival independently of established markers of prognosis. Moreover, genotypic complexity (ie, the number of independent CNVs per patient) significantly predicted prognosis, with a median time-to-treatment of 64 months versus 23 months in patients with zero to one versus two or more CNVs, respectively (P = 3.3 × 10−8). In summary, a comparison of SNP genotypes from patients with CLL with HapMap controls allowed us to identify known and unknown recurrent CNVs and to determine regions and rates of CNV that predict poorer prognosis in patients with CLL. PMID:23273604
Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S
2015-06-01
This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome sequence data alongside the 54k SNP set. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.
Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D
2013-02-28
Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.
Talseth-Palmer, Bente A; Holliday, Elizabeth G; Evans, Tiffany-Jane; McEvoy, Mark; Attia, John; Grice, Desma M; Masson, Amy L; Meldrum, Cliff; Spigelman, Allan; Scott, Rodney J
2013-03-26
Hereditary non-polyposis colorectal cancer (HNPCC)/Lynch syndrome (LS) is a cancer syndrome characterised by early-onset epithelial cancers, especially colorectal cancer (CRC) and endometrial cancer. The aim of the current study was to use SNP-array technology to identify genomic aberrations which could contribute to the increased risk of cancer in HNPCC/LS patients. Individuals diagnosed with HNPCC/LS (100) and healthy controls (384) were genotyped using the Illumina Human610-Quad SNP-arrays. Copy number variation (CNV) calling and association analyses were performed using Nexus software, with significant results validated using QuantiSNP. TaqMan Copy-Number assays were used for verification of CNVs showing significant association with HNPCC/LS identified by both software programs. We detected copy number (CN) gains associated with HNPCC/LS status on chromosome 7q11.21 (28% cases and 0% controls, Nexus; p =3.60E-20 and QuantiSNP; p < 1.00E-16) and 16p11.2 (46% in cases, while a CN loss was observed in 23% of controls, Nexus; p = 4.93E-21 and QuantiSNP; p = 5.00E-06) via in silico analyses. TaqMan Copy-Number assay was used for validation of CNVs showing significant association with HNPCC/LS. In addition, CNV burden (total CNV length, average CNV length and number of observed CNV events) was significantly greater in cases compared to controls. A greater CNV burden was identified in HNPCC/LS cases compared to controls supporting the notion of higher genomic instability in these patients. One intergenic locus on chromosome 7q11.21 is possibly associated with HNPCC/LS and deserves further investigation. The results from this study highlight the complexities of fluorescent based CNV analyses. The inefficiency of both CNV detection methods to reproducibly detect observed CNVs demonstrates the need for sequence data to be considered alongside intensity data to avoid false positive results.
2012-01-01
Background High-density genotyping arrays that measure hybridization of genomic DNA fragments to allele-specific oligonucleotide probes are widely used to genotype single nucleotide polymorphisms (SNPs) in genetic studies, including human genome-wide association studies. Hybridization intensities are converted to genotype calls by clustering algorithms that assign each sample to a genotype class at each SNP. Data for SNP probes that do not conform to the expected pattern of clustering are often discarded, contributing to ascertainment bias and resulting in lost information - as much as 50% in a recent genome-wide association study in dogs. Results We identified atypical patterns of hybridization intensities that were highly reproducible and demonstrated that these patterns represent genetic variants that were not accounted for in the design of the array platform. We characterized variable intensity oligonucleotide (VINO) probes that display such patterns and are found in all hybridization-based genotyping platforms, including those developed for human, dog, cattle, and mouse. When recognized and properly interpreted, VINOs recovered a substantial fraction of discarded probes and counteracted SNP ascertainment bias. We developed software (MouseDivGeno) that identifies VINOs and improves the accuracy of genotype calling. MouseDivGeno produced highly concordant genotype calls when compared with other methods but it uniquely identified more than 786000 VINOs in 351 mouse samples. We used whole-genome sequence from 14 mouse strains to confirm the presence of novel variants explaining 28000 VINOs in those strains. We also identified VINOs in human HapMap 3 samples, many of which were specific to an African population. Incorporating VINOs in phylogenetic analyses substantially improved the accuracy of a Mus species tree and local haplotype assignment in laboratory mouse strains. Conclusion The problems of ascertainment bias and missing information due to genotyping errors are widely recognized as limiting factors in genetic studies. We have conducted the first formal analysis of the effect of novel variants on genotyping arrays, and we have shown that these variants account for a large portion of miscalled and uncalled genotypes. Genetic studies will benefit from substantial improvements in the accuracy of their results by incorporating VINOs in their analyses. PMID:22260749
Developing 100K Affymetrix Axiom SNP Array for Polyploid Sugarcane
USDA-ARS?s Scientific Manuscript database
Sugarcane genotyping or fingerprinting has long been a daunting task due to its high polyploidy level with large number of chromosomes. Single nucleotide polymorphisms (SNPs) are very abundant DNA sequence variations in the genomes. With the advance of next generation sequencing (NGS) technologies, ...
Population sequencing reveals breed and sub-species specific CNVs in cattle
USDA-ARS?s Scientific Manuscript database
Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an increased...
Small cell ovarian carcinoma: genomic stability and responsiveness to therapeutics.
Gamwell, Lisa F; Gambaro, Karen; Merziotis, Maria; Crane, Colleen; Arcand, Suzanna L; Bourada, Valerie; Davis, Christopher; Squire, Jeremy A; Huntsman, David G; Tonin, Patricia N; Vanderhyden, Barbara C
2013-02-21
The biology of small cell ovarian carcinoma of the hypercalcemic type (SCCOHT), which is a rare and aggressive form of ovarian cancer, is poorly understood. Tumourigenicity, in vitro growth characteristics, genetic and genomic anomalies, and sensitivity to standard and novel chemotherapeutic treatments were investigated in the unique SCCOHT cell line, BIN-67, to provide further insight in the biology of this rare type of ovarian cancer. The tumourigenic potential of BIN-67 cells was determined and the tumours formed in a xenograft model was compared to human SCCOHT. DNA sequencing, spectral karyotyping and high density SNP array analysis was performed. The sensitivity of the BIN-67 cells to standard chemotherapeutic agents and to vesicular stomatitis virus (VSV) and the JX-594 vaccinia virus was tested. BIN-67 cells were capable of forming spheroids in hanging drop cultures. When xenografted into immunodeficient mice, BIN-67 cells developed into tumours that reflected the hypercalcemia and histology of human SCCOHT, notably intense expression of WT-1 and vimentin, and lack of expression of inhibin. Somatic mutations in TP53 and the most common activating mutations in KRAS and BRAF were not found in BIN-67 cells by DNA sequencing. Spectral karyotyping revealed a largely normal diploid karyotype (in greater than 95% of cells) with a visibly shorter chromosome 20 contig. High density SNP array analysis also revealed few genomic anomalies in BIN-67 cells, which included loss of heterozygosity of an estimated 16.7 Mb interval on chromosome 20. SNP array analyses of four SCCOHT samples also indicated a low frequency of genomic anomalies in the majority of cases. Although resistant to platinum chemotherapeutic drugs, BIN-67 cell viability in vitro was reduced by > 75% after infection with oncolytic viruses. These results show that SCCOHT differs from high-grade serous carcinomas by exhibiting few chromosomal anomalies and lacking TP53 mutations. Although BIN-67 cells are resistant to standard chemotherapeutic agents, their sensitivity to oncolytic viruses suggests that their therapeutic use in SCCOHT should be considered.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kerns, Sarah L.; Departments of Pathology and Genetics, Albert Einstein College of Medicine, Bronx, New York; Stock, Richard
2013-01-01
Purpose: To identify single nucleotide polymorphisms (SNPs) associated with development of erectile dysfunction (ED) among prostate cancer patients treated with radiation therapy. Methods and Materials: A 2-stage genome-wide association study was performed. Patients were split randomly into a stage I discovery cohort (132 cases, 103 controls) and a stage II replication cohort (128 cases, 102 controls). The discovery cohort was genotyped using Affymetrix 6.0 genome-wide arrays. The 940 top ranking SNPs selected from the discovery cohort were genotyped in the replication cohort using Illumina iSelect custom SNP arrays. Results: Twelve SNPs identified in the discovery cohort and validated in themore » replication cohort were associated with development of ED following radiation therapy (Fisher combined P values 2.1 Multiplication-Sign 10{sup -5} to 6.2 Multiplication-Sign 10{sup -4}). Notably, these 12 SNPs lie in or near genes involved in erectile function or other normal cellular functions (adhesion and signaling) rather than DNA damage repair. In a multivariable model including nongenetic risk factors, the odds ratios for these SNPs ranged from 1.6 to 5.6 in the pooled cohort. There was a striking relationship between the cumulative number of SNP risk alleles an individual possessed and ED status (Sommers' D P value = 1.7 Multiplication-Sign 10{sup -29}). A 1-allele increase in cumulative SNP score increased the odds for developing ED by a factor of 2.2 (P value = 2.1 Multiplication-Sign 10{sup -19}). The cumulative SNP score model had a sensitivity of 84% and specificity of 75% for prediction of developing ED at the radiation therapy planning stage. Conclusions: This genome-wide association study identified a set of SNPs that are associated with development of ED following radiation therapy. These candidate genetic predictors warrant more definitive validation in an independent cohort.« less
Genome-Wide Association Study of a Varroa-Specific Defense Behavior in Honeybees (Apis mellifera)
Spötter, Andreas; Gupta, Pooja; Mayer, Manfred; Reinsch, Norbert
2016-01-01
Honey bees are exposed to many damaging pathogens and parasites. The most devastating is Varroa destructor, which mainly affects the brood. A promising approach for preventing its spread is to breed Varroa-resistant honey bees. One trait that has been shown to provide significant resistance against the Varroa mite is hygienic behavior, which is a behavioral response of honeybee workers to brood diseases in general. Here, we report the use of an Affymetrix 44K SNP array to analyze SNPs associated with detection and uncapping of Varroa-parasitized brood by individual worker bees (Apis mellifera). For this study, 22 000 individually labeled bees were video-monitored and a sample of 122 cases and 122 controls was collected and analyzed to determine the dependence/independence of SNP genotypes from hygienic and nonhygienic behavior on a genome-wide scale. After false-discovery rate correction of the P values, 6 SNP markers had highly significant associations with the trait investigated (α < 0.01). Inspection of the genomic regions around these SNPs led to the discovery of putative candidate genes. PMID:26774061
High throughput SNP discovery and genotyping in hexaploid wheat
Navarro, Julien; Kitt, Jonathan; Choulet, Frédéric; Leveugle, Magalie; Duarte, Jorge; Rivière, Nathalie; Eversole, Kellye; Le Gouis, Jacques; Davassi, Alessandro; Balfourier, François; Le Paslier, Marie-Christine; Berard, Aurélie; Brunel, Dominique; Feuillet, Catherine; Poncet, Charles; Sourdille, Pierre
2018-01-01
Because of their abundance and their amenability to high-throughput genotyping techniques, Single Nucleotide Polymorphisms (SNPs) are powerful tools for efficient genetics and genomics studies, including characterization of genetic resources, genome-wide association studies and genomic selection. In wheat, most of the previous SNP discovery initiatives targeted the coding fraction, leaving almost 98% of the wheat genome largely unexploited. Here we report on the use of whole-genome resequencing data from eight wheat lines to mine for SNPs in the genic, the repetitive and non-repetitive intergenic fractions of the wheat genome. Eventually, we identified 3.3 million SNPs, 49% being located on the B-genome, 41% on the A-genome and 10% on the D-genome. We also describe the development of the TaBW280K high-throughput genotyping array containing 280,226 SNPs. Performance of this chip was examined by genotyping a set of 96 wheat accessions representing the worldwide diversity. Sixty-nine percent of the SNPs can be efficiently scored, half of them showing a diploid-like clustering. The TaBW280K was proven to be a very efficient tool for diversity analyses, as well as for breeding as it can discriminate between closely related elite varieties. Finally, the TaBW280K array was used to genotype a population derived from a cross between Chinese Spring and Renan, leading to the construction a dense genetic map comprising 83,721 markers. The results described here will provide the wheat community with powerful tools for both basic and applied research. PMID:29293495
Population sequencing reveals breed and sub-species specific CNVs in cattle
USDA-ARS?s Scientific Manuscript database
Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect the rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an incre...
Development and utilization of 100K SNP array in Saccharum Spp.
USDA-ARS?s Scientific Manuscript database
Sugarcane genotyping or fingerprinting has long been a daunting task due to its high polyploidy level with large number of chromosomes. Single nucleotide polymorphisms (SNPs) are very abundant DNA sequence variations in the genome. With the advance of next generation sequencing (NGS) technologies, m...
Delannoy, Sabine; Mariani-Kurkdjian, Patricia; Webb, Hattie E; Bonacorsi, Stephane; Fach, Patrick
2017-01-01
Shiga toxin-producing Escherichia coli of serotype O26:H11/H- constitute a diverse group of strains and several clones with distinct genetic characteristics have been identified and characterized. Whole genome sequencing was performed using Illumina and PacBio technologies on eight stx2 -positive O26:H11 strains circulating in France. Comparative analyses of the whole genome of the stx2 -positive O26:H11 strains indicate that several clones of EHEC O26:H11 are co-circulating in France. Phylogenetic analysis of the French strains together with stx2 -positive and stx -negative E. coli O26:H11 genomes obtained from Genbank indicates the existence of four clonal complexes (SNP-CCs) separated in two distinct lineages, one of which comprises the "new French clone" (SNP-CC1) that appears genetically closely related to stx -negative attaching and effacing E. coli (AEEC) strains. Interestingly, the whole genome SNP (wgSNP) phylogeny is summarized in the cas gene phylogeny, and a simple qPCR assay targeting the CRISPR array specific to SNP-CC1 (SP_O26-E) can distinguish between the two main lineages. The PacBio sequencing allowed a detailed analysis of the mobile genetic elements (MGEs) of the strains. Numerous MGEs were identified in each strain, including a large number of prophages and up to four large plasmids, representing overall 8.7-19.8% of the total genome size. Analysis of the prophage pool of the strains shows a considerable diversity with a complex history of recombination. Each clonal complex (SNP-CC) is characterized by a unique set of plasmids and phages, including stx -prophages, suggesting evolution through separate acquisition events. Overall, the MGEs appear to play a major role in O26:H11 intra-serotype clonal diversification.
Delannoy, Sabine; Mariani-Kurkdjian, Patricia; Webb, Hattie E.; Bonacorsi, Stephane; Fach, Patrick
2017-01-01
Shiga toxin-producing Escherichia coli of serotype O26:H11/H- constitute a diverse group of strains and several clones with distinct genetic characteristics have been identified and characterized. Whole genome sequencing was performed using Illumina and PacBio technologies on eight stx2-positive O26:H11 strains circulating in France. Comparative analyses of the whole genome of the stx2-positive O26:H11 strains indicate that several clones of EHEC O26:H11 are co-circulating in France. Phylogenetic analysis of the French strains together with stx2-positive and stx-negative E. coli O26:H11 genomes obtained from Genbank indicates the existence of four clonal complexes (SNP-CCs) separated in two distinct lineages, one of which comprises the “new French clone” (SNP-CC1) that appears genetically closely related to stx-negative attaching and effacing E. coli (AEEC) strains. Interestingly, the whole genome SNP (wgSNP) phylogeny is summarized in the cas gene phylogeny, and a simple qPCR assay targeting the CRISPR array specific to SNP-CC1 (SP_O26-E) can distinguish between the two main lineages. The PacBio sequencing allowed a detailed analysis of the mobile genetic elements (MGEs) of the strains. Numerous MGEs were identified in each strain, including a large number of prophages and up to four large plasmids, representing overall 8.7–19.8% of the total genome size. Analysis of the prophage pool of the strains shows a considerable diversity with a complex history of recombination. Each clonal complex (SNP-CC) is characterized by a unique set of plasmids and phages, including stx-prophages, suggesting evolution through separate acquisition events. Overall, the MGEs appear to play a major role in O26:H11 intra-serotype clonal diversification. PMID:28932209
Gorkhali, Neena Amatya; Dong, Kunzhe; Yang, Min; Song, Shen; Kader, Adiljian; Shrestha, Bhola Shankar; He, Xiaohong; Zhao, Qianjun; Pu, Yabin; Li, Xiangchen; Kijas, James; Guan, Weijun; Han, Jianlin; Jiang, Lin; Ma, Yuehui
2016-07-22
Sheep has successfully adapted to the extreme high-altitude Himalayan region. To identify genes underlying such adaptation, we genotyped genome-wide single nucleotide polymorphisms (SNPs) of four major sheep breeds living at different altitudes in Nepal and downloaded SNP array data from additional Asian and Middle East breeds. Using a di value-based genomic comparison between four high-altitude and eight lowland Asian breeds, we discovered the most differentiated variants at the locus of FGF-7 (Keratinocyte growth factor-7), which was previously reported as a good protective candidate for pulmonary injuries. We further found a SNP upstream of FGF-7 that appears to contribute to the divergence signature. First, the SNP occurred at an extremely conserved site. Second, the SNP showed an increasing allele frequency with the elevated altitude in Nepalese sheep. Third, the electrophoretic mobility shift assays (EMSA) analysis using human lung cancer cells revealed the allele-specific DNA-protein interactions. We thus hypothesized that FGF-7 gene potentially enhances lung function by regulating its expression level in high-altitude sheep through altering its binding of specific transcription factors. Especially, FGF-7 gene was not implicated in previous studies of other high-altitude species, suggesting a potential novel adaptive mechanism to high altitude in sheep at the Himalayas.
Helm, Benjamin M; Langley, Katherine; Spangler, Brooke; Vergano, Samantha
2014-08-01
Single nucleotide polymorphism microarrays have the ability to reveal parental consanguinity which may or may not be known to healthcare providers. Consanguinity can have significant implications for the health of patients and for individual and family psychosocial well-being. These results often present ethical and legal dilemmas that can have important ramifications. Unexpected consanguinity can be confounding to healthcare professionals who may be unprepared to handle these results or to communicate them to families or other appropriate representatives. There are few published accounts of experiences with consanguinity and SNP arrays. In this paper we discuss three cases where molecular evidence of parental incest was identified by SNP microarray. We hope to further highlight consanguinity as a potential incidental finding, how the cases were handled by the clinical team, and what resources were found to be most helpful. This paper aims to contribute further to professional discourse on incidental findings with genomic technology and how they were addressed clinically. These experiences may provide some guidance on how others can prepare for these findings and help improve practice. As genetic and genomic testing is utilized more by non-genetics providers, we also hope to inform about the importance of engaging with geneticists and genetic counselors when addressing these findings.
Evaluation of Bovine High-Density SNP Genotyping Array in Indigenous Dairy Cattle Breeds.
Dash, S; Singh, A; Bhatia, A K; Jayakumar, S; Sharma, A; Singh, S; Ganguly, I; Dixit, S P
2018-04-03
In total 52 samples of Sahiwal ( 19 ), Tharparkar ( 17 ), and Gir ( 16 ) were genotyped by using BovineHD SNP chip to analyze minor allele frequency (MAF), genetic diversity, and linkage disequilibrium among these cattle. The common SNPs of BovineHD and 54K SNP Chips were also extracted and evaluated for their performance. Only 40%-50% SNPs of these arrays was found informative for genetic analysis in these cattle breeds. The overall mean of MAF for SNPs of BovineHD SNPChip was 0.248 ± 0.006, 0.241 ± 0.007, and 0.242 ± 0.009 in Sahiwal, Tharparkar and Gir, respectively, while that for 54K SNPs was on lower side. The average Reynold's genetic distance between breeds ranged from 0.042 to 0.055 based on BovineHD Beadchip, and from 0.052 to 0.084 based on 54K SNP Chip. The estimates of genetic diversity based on HD and 54K chips were almost same and, hence, low density chip seems to be good enough to decipher genetic diversity of these cattle breeds. The linkage disequilibrium started decaying (r 2 < 0.2) at 140 kb inter-marker distance and, hence, a 20K low density customized SNP array from HD chip could be designed for genomic selection in these cattle else the 54K Bead Chip as such will be useful.
Bangera, Rama; Correa, Katharina; Lhorente, Jean P; Figueroa, René; Yáñez, José M
2017-01-31
Salmon Rickettsial Syndrome (SRS) caused by Piscirickettsia salmonis is a major disease affecting the Chilean salmon industry. Genomic selection (GS) is a method wherein genome-wide markers and phenotype information of full-sibs are used to predict genomic EBV (GEBV) of selection candidates and is expected to have increased accuracy and response to selection over traditional pedigree based Best Linear Unbiased Prediction (PBLUP). Widely used GS methods such as genomic BLUP (GBLUP), SNPBLUP, Bayes C and Bayesian Lasso may perform differently with respect to accuracy of GEBV prediction. Our aim was to compare the accuracy, in terms of reliability of genome-enabled prediction, from different GS methods with PBLUP for resistance to SRS in an Atlantic salmon breeding program. Number of days to death (DAYS), binary survival status (STATUS) phenotypes, and 50 K SNP array genotypes were obtained from 2601 smolts challenged with P. salmonis. The reliability of different GS methods at different SNP densities with and without pedigree were compared to PBLUP using a five-fold cross validation scheme. Heritability estimated from GS methods was significantly higher than PBLUP. Pearson's correlation between predicted GEBV from PBLUP and GS models ranged from 0.79 to 0.91 and 0.79-0.95 for DAYS and STATUS, respectively. The relative increase in reliability from different GS methods for DAYS and STATUS with 50 K SNP ranged from 8 to 25% and 27-30%, respectively. All GS methods outperformed PBLUP at all marker densities. DAYS and STATUS showed superior reliability over PBLUP even at the lowest marker density of 3 K and 500 SNP, respectively. 20 K SNP showed close to maximal reliability for both traits with little improvement using higher densities. These results indicate that genomic predictions can accelerate genetic progress for SRS resistance in Atlantic salmon and implementation of this approach will contribute to the control of SRS in Chile. We recommend GBLUP for routine GS evaluation because this method is computationally faster and the results are very similar with other GS methods. The use of lower density SNP or the combination of low density SNP and an imputation strategy may help to reduce genotyping costs without compromising gain in reliability.
The Minnesota Center for Twin and Family Research Genome-Wide Association Study
Miller, Michael B.; Basu, Saonli; Cunningham, Julie; Eskin, Eleazar; Malone, Steven M.; Oetting, William S.; Schork, Nicholas; Sul, Jae Hoon; Iacono, William G.; Mcgue, Matt
2012-01-01
As part of the Genes, Environment and Development Initiative (GEDI), the Minnesota Center for Twin and Family Research (MCTFR) undertook a genome-wide association study (GWAS), which we describe here. A total of 8405 research participants, clustered in 4-member families, have been successfully genotyped on 527,829 single nucleotide polymorphism (SNP) markers using Illumina’s Human660W-Quad array. Quality control screening of samples and markers as well as SNP imputation procedures are described. We also describe methods for ancestry control and how the familial clustering of the MCTFR sample can be accounted for in the analysis using a Rapid Feasible Generalized Least Squares algorithm. The rich longitudinal MCTFR assessments provide numerous opportunities for collaboration. PMID:23363460
Capalbo, Antonio; Treff, Nathan R; Cimadomo, Danilo; Tao, Xin; Upham, Kathleen; Ubaldi, Filippo Maria; Rienzi, Laura; Scott, Richard T
2015-07-01
Comprehensive chromosome screening (CCS) methods are being extensively used to select chromosomally normal embryos in human assisted reproduction. Some concerns related to the stage of analysis and which aneuploidy screening method to use still remain. In this study, the reliability of blastocyst-stage aneuploidy screening and the diagnostic performance of the two mostly used CCS methods (quantitative real-time PCR (qPCR) and array comparative genome hybridization (aCGH)) has been assessed. aCGH aneuploid blastocysts were rebiopsied, blinded, and evaluated by qPCR. Discordant cases were subsequently rebiopsied, blinded, and evaluated by single-nucleotide polymorphism (SNP) array-based CCS. Although 81.7% of embryos showed the same diagnosis when comparing aCGH and qPCR-based CCS, 18.3% (22/120) of embryos gave a discordant result for at least one chromosome. SNP array reanalysis showed that a discordance was reported in ten blastocysts for aCGH, mostly due to false positives, and in four cases for qPCR. The discordant aneuploidy call rate per chromosome was significantly higher for aCGH (5.7%) compared with qPCR (0.6%; P<0.01). To corroborate these findings, 39 embryos were simultaneously biopsied for aCGH and qPCR during blastocyst-stage aneuploidy screening cycles. 35 matched including all 21 euploid embryos. Blinded SNP analysis on rebiopsies of the four embryos matched qPCR. These findings demonstrate the high reliability of diagnosis performed at the blastocyst stage with the use of different CCS methods. However, the application of aCGH can be expected to result in a higher aneuploidy rate than other contemporary methods of CCS.
USDA-ARS?s Scientific Manuscript database
Individualized copy number variation (CNV) maps have highlighted the need for population surveys of cattle to detect rare and common variants. While SNP and comparative genomic hybridization (CGH) arrays have provided preliminary data, next-generation sequence (NGS) data analysis offers an increased...
variety of arrays appropriate for a wide breadth of study design needs. Genomic coverage of many of the chromosomal anomalies are services offered at NO ADDITIONAL COST to study investigators with GWAS projects be submitted for both the initial GWAS study as well as replication using our custom SNP service
Bhat, Somanath; Polanowski, Andrea M; Double, Mike C; Jarman, Simon N; Emslie, Kerry R
2012-01-01
Recent advances in nanofluidic technologies have enabled the use of Integrated Fluidic Circuits (IFCs) for high-throughput Single Nucleotide Polymorphism (SNP) genotyping (GT). In this study, we implemented and validated a relatively low cost nanofluidic system for SNP-GT with and without Specific Target Amplification (STA). As proof of principle, we first validated the effect of input DNA copy number on genotype call rate using well characterised, digital PCR (dPCR) quantified human genomic DNA samples and then implemented the validated method to genotype 45 SNPs in the humpback whale, Megaptera novaeangliae, nuclear genome. When STA was not incorporated, for a homozygous human DNA sample, reaction chambers containing, on average 9 to 97 copies, showed 100% call rate and accuracy. Below 9 copies, the call rate decreased, and at one copy it was 40%. For a heterozygous human DNA sample, the call rate decreased from 100% to 21% when predicted copies per reaction chamber decreased from 38 copies to one copy. The tightness of genotype clusters on a scatter plot also decreased. In contrast, when the same samples were subjected to STA prior to genotyping a call rate and a call accuracy of 100% were achieved. Our results demonstrate that low input DNA copy number affects the quality of data generated, in particular for a heterozygous sample. Similar to human genomic DNA, a call rate and a call accuracy of 100% was achieved with whale genomic DNA samples following multiplex STA using either 15 or 45 SNP-GT assays. These calls were 100% concordant with their true genotypes determined by an independent method, suggesting that the nanofluidic system is a reliable platform for executing call rates with high accuracy and concordance in genomic sequences derived from biological tissue.
Genome-wide SNP association-based localization of a dwarfism gene in Friesian dwarf horses.
Orr, N; Back, W; Gu, J; Leegwater, P; Govindarajan, P; Conroy, J; Ducro, B; Van Arendonk, J A M; MacHugh, D E; Ennis, S; Hill, E W; Brama, P A J
2010-12-01
The recent completion of the horse genome and commercial availability of an equine SNP genotyping array has facilitated the mapping of disease genes. We report putative localization of the gene responsible for dwarfism, a trait in Friesian horses that is thought to have a recessive mode of inheritance, to a 2-MB region of chromosome 14 using just 10 affected animals and 10 controls. We successfully genotyped 34,429 SNPs that were tested for association with dwarfism using chi-square tests. The most significant SNP in our study, BIEC2-239376 (P(2df)=4.54 × 10(-5), P(rec)=7.74 × 10(-6)), is located close to a gene implicated in human dwarfism. Fine-mapping and resequencing analyses did not aid in further localization of the causative variant, and replication of our findings in independent sample sets will be necessary to confirm these results. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.
Di Pierro, Erica A; Gianfranceschi, Luca; Di Guardo, Mario; Koehorst-van Putten, Herma Jj; Kruisselbrink, Johannes W; Longhi, Sara; Troggio, Michela; Bianco, Luca; Muranty, Hélène; Pagliarani, Giulia; Tartarini, Stefano; Letschka, Thomas; Lozano Luis, Lidia; Garkava-Gustavsson, Larisa; Micheletti, Diego; Bink, Marco Cam; Voorrips, Roeland E; Aziz, Ebrahimi; Velasco, Riccardo; Laurens, François; van de Weg, W Eric
2016-01-01
Quantitative trait loci (QTL) mapping approaches rely on the correct ordering of molecular markers along the chromosomes, which can be obtained from genetic linkage maps or a reference genome sequence. For apple ( Malus domestica Borkh), the genome sequence v1 and v2 could not meet this need; therefore, a novel approach was devised to develop a dense genetic linkage map, providing the most reliable marker-loci order for the highest possible number of markers. The approach was based on four strategies: (i) the use of multiple full-sib families, (ii) the reduction of missing information through the use of HaploBlocks and alternative calling procedures for single-nucleotide polymorphism (SNP) markers, (iii) the construction of a single backcross-type data set including all families, and (iv) a two-step map generation procedure based on the sequential inclusion of markers. The map comprises 15 417 SNP markers, clustered in 3 K HaploBlock markers spanning 1 267 cM, with an average distance between adjacent markers of 0.37 cM and a maximum distance of 3.29 cM. Moreover, chromosome 5 was oriented according to its homoeologous chromosome 10. This map was useful to improve the apple genome sequence, design the Axiom Apple 480 K SNP array and perform multifamily-based QTL studies. Its collinearity with the genome sequences v1 and v3 are reported. To our knowledge, this is the shortest published SNP map in apple, while including the largest number of markers, families and individuals. This result validates our methodology, proving its value for the construction of integrated linkage maps for any outbreeding species.
Di Pierro, Erica A; Gianfranceschi, Luca; Di Guardo, Mario; Koehorst-van Putten, Herma JJ; Kruisselbrink, Johannes W; Longhi, Sara; Troggio, Michela; Bianco, Luca; Muranty, Hélène; Pagliarani, Giulia; Tartarini, Stefano; Letschka, Thomas; Lozano Luis, Lidia; Garkava-Gustavsson, Larisa; Micheletti, Diego; Bink, Marco CAM; Voorrips, Roeland E; Aziz, Ebrahimi; Velasco, Riccardo; Laurens, François; van de Weg, W Eric
2016-01-01
Quantitative trait loci (QTL) mapping approaches rely on the correct ordering of molecular markers along the chromosomes, which can be obtained from genetic linkage maps or a reference genome sequence. For apple (Malus domestica Borkh), the genome sequence v1 and v2 could not meet this need; therefore, a novel approach was devised to develop a dense genetic linkage map, providing the most reliable marker-loci order for the highest possible number of markers. The approach was based on four strategies: (i) the use of multiple full-sib families, (ii) the reduction of missing information through the use of HaploBlocks and alternative calling procedures for single-nucleotide polymorphism (SNP) markers, (iii) the construction of a single backcross-type data set including all families, and (iv) a two-step map generation procedure based on the sequential inclusion of markers. The map comprises 15 417 SNP markers, clustered in 3 K HaploBlock markers spanning 1 267 cM, with an average distance between adjacent markers of 0.37 cM and a maximum distance of 3.29 cM. Moreover, chromosome 5 was oriented according to its homoeologous chromosome 10. This map was useful to improve the apple genome sequence, design the Axiom Apple 480 K SNP array and perform multifamily-based QTL studies. Its collinearity with the genome sequences v1 and v3 are reported. To our knowledge, this is the shortest published SNP map in apple, while including the largest number of markers, families and individuals. This result validates our methodology, proving its value for the construction of integrated linkage maps for any outbreeding species. PMID:27917289
DeScipio, Cheryl; Morrissette, Jennifer J.D.; Conlin, Laura K.; Clark, Dinah; Kaur, Maninder; Coplan, James; Riethman, Harold; Spinner, Nancy B.; Krantz, Ian D.
2009-01-01
Two brothers, with dissimilar clinical features, were each found to have different abnormalities of chromosome 20 by subtelomere fluorescence in situ hybridization (FISH). The proband had deletion of 20p subtelomere and duplication of 20q subtelomere, while his brother was found to have a duplication of 20p subtelomere and deletion of 20q subtelomere. Parental cytogenetic studies were initially thought to be normal, both by G-banding and by subtelomere FISH analysis. Since chromosome 20 is a metacentric chromosome and an inversion was suspected, we used anchored FISH to assist in identifying a possible inversion. This approach employed concomitant hybridization of a FISH probe to the short (p) arm of chromosome 20 with the 20q subtelomere probe. We identified a cytogenetically non-visible, mosaic pericentric inversion of one of the maternal chromosome 20 homologues, providing a mechanistic explanation for the chromosomal abnormalities present in these brothers. Array comparative genomic hybridization (CGH) with both a custom-made BAC and cosmid-based subtelomere specific array (TEL array) and a commercially-available SNP-based array confirmed and further characterized these rearrangements, identifying this as the largest pericentric inversion of chromosome 20 described to date. TEL array data indicate that the 20p breakpoint is defined by BAC RP11-978M13, ~900 kb from the pter; SNP array data reveal this breakpoint to occur within BAC RP11-978M13. The 20q breakpoint is defined by BAC RP11-93B14, ~1.7 Mb from the qter, by TEL array; SNP array data refine this breakpoint to within a gap between BACs on the TEL array (i.e. between RP11-93B14 and proximal BAC RP11-765G16). PMID:20101690
Descipio, Cheryl; Morrissette, Jennifer D; Conlin, Laura K; Clark, Dinah; Kaur, Maninder; Coplan, James; Riethman, Harold; Spinner, Nancy B; Krantz, Ian D
2010-02-01
Two brothers, with dissimilar clinical features, were each found to have different abnormalities of chromosome 20 by subtelomere fluorescence in situ hybridization (FISH). The proband had deletion of 20p subtelomere and duplication of 20q subtelomere, while his brother was found to have a duplication of 20p subtelomere and deletion of 20q subtelomere. Parental cytogenetic studies were initially thought to be normal, both by G-banding and by subtelomere FISH analysis. Since chromosome 20 is a metacentric chromosome and an inversion was suspected, we used anchored FISH to assist in identifying a possible inversion. This approach employed concomitant hybridization of a FISH probe to the short (p) arm of chromosome 20 with the 20q subtelomere probe. We identified a cytogenetically non-visible, mosaic pericentric inversion of one of the maternal chromosome 20 homologs, providing a mechanistic explanation for the chromosomal abnormalities present in these brothers. Array comparative genomic hybridization (CGH) with both a custom-made BAC and cosmid-based subtelomere specific array (TEL array) and a commercially available SNP-based array confirmed and further characterized these rearrangements, identifying this as the largest pericentric inversion of chromosome 20 described to date. TEL array data indicate that the 20p breakpoint is defined by BAC RP11-978M13, approximately 900 kb from the pter; SNP array data reveal this breakpoint to occur within BAC RP11-978M13. The 20q breakpoint is defined by BAC RP11-93B14, approximately 1.7 Mb from the qter, by TEL array; SNP array data refine this breakpoint to within a gap between BACs on the TEL array (i.e., between RP11-93B14 and proximal BAC RP11-765G16). Copyright 2010 Wiley-Liss, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
McKown, Athena; Klapste, Jaroslav; Guy, Robert
2014-01-01
To uncover the genetic basis of phenotypic trait variation, we used 448 unrelated wild accessions of black cottonwood (Populus trichocarpa Torr. & Gray) from natural populations throughout western North America. Extensive information from large-scale trait phenotyping (with spatial and temporal replications within a common garden) and genotyping (with a 34K Populus SNP array) of all accessions were used for gene discovery in a genome-wide association study (GWAS).
Nested association mapping for dissecting complex traits using Peanut 58K SNP array
USDA-ARS?s Scientific Manuscript database
Genome-wide association studies (GWAS) and linkage mapping have been the two most predominant strategies to dissect complex traits, but are limited by the occurrence of false positives reported for GWAS, and low resolution in the case of linkage analysis. This has led to the development of a joint a...
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonids aquaculture. In previous studies we have identified moderate-large effect QTL for BCWD resistance in rainbow trout (Oncorhynchus mykiss). However, the recent availability of a high density SNP array and...
Design of a bovine low-density SNP array optimized for imputation
USDA-ARS?s Scientific Manuscript database
The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...
2010-10-01
5 Results ...to disease prognosis and in determining the course of treatment for the patient (2) . Breast cancer is a highly heterogeneous and complex disease...progression is a challenge. Introduction of high density single nucleotide polymorphism (SNP) genotyping arrays has helped not only for whole genome
Krug, Utz O.; Lee, Dhong Hyun Tony; Kawamata, Norihiko; Iwanski, Gabriela B.; Lasho, Terra; Weiss, Tamara; Nowak, Daniel; Koren-Michowitz, Maya; Kato, Motohiro; Sanada, Masashi; Shih, Lee-Yung; Nagler, Arnon; Raynaud, Sophie D.; Müller-Tidow, Carsten; Mesa, Ruben; Haferlach, Torsten; Gilliland, D. Gary; Tefferi, Ayalew; Ogawa, Seishi; Koeffler, H. Phillip
2010-01-01
Philadelphia chromosome–negative myeloproliferative neoplasms (MPNs) including polycythemia vera, essential thrombocythemia, and primary myelofibrosis show an inherent tendency for transformation into leukemia (MPN-blast phase), which is hypothesized to be accompanied by acquisition of additional genomic lesions. We, therefore, examined chromosomal abnormalities by high-resolution single nucleotide polymorphism (SNP) array in 88 MPN patients, as well as 71 cases with MPN-blast phase, and correlated these findings with their clinical parameters. Frequent genomic alterations were found in MPN after leukemic transformation with up to 3-fold more genomic changes per sample compared with samples in chronic phase (P < .001). We identified commonly altered regions involved in disease progression including not only established targets (ETV6, TP53, and RUNX1) but also new candidate genes on 7q, 16q, 19p, and 21q. Moreover, trisomy 8 or amplification of 8q24 (MYC) was almost exclusively detected in JAK2V617F− cases with MPN-blast phase. Remarkably, copy number–neutral loss of heterozygosity (CNN-LOH) on either 7q or 9p including homozygous JAK2V617F was related to decreased survival after leukemic transformation (P = .01 and P = .016, respectively). Our high-density SNP-array analysis of MPN genomes in the chronic compared with leukemic stage identified novel target genes and provided prognostic insights associated with the evolution to leukemia. PMID:20068225
Jackson, Eric M.; Sievert, Angela J.; Gai, Xiaowu; Hakonarson, Hakon; Judkins, Alexander R; Tooke, Laura; Perin, Juan Carlos; Xie, Hongbo; Shaikh, Tamim H.; Biegel, Jaclyn A.
2009-01-01
Translational Relevance Previous reports suggested that abnormalities of INI1 could be detected in 70–75% of malignant rhabdoid tumors. The mechanism of inactivation in the other 25% remained unclear. The goal of this study was to perform a high-resolution genomic analysis of a large series of rhabdoid tumors with the expectation of identifying additional loci related to the initiation or progression of these malignancies. We also developed a comprehensive set of assays, including a new MLPA assay, to interrogate the INI1 locus in 22q11.2. Intragenic deletions could be detected using the Illumina 550K Beadchip, whereas single exon deletions could be detected using MLPA. The current study demonstrates that with a multi-platform approach, alterations at the INI1 locus can be detected in almost all cases. Thus, appropriate molecular genetic testing can be used as an aid in the diagnosis and for treatment planning for most patients. Purpose A high-resolution genomic profiling and comprehensive targeted analysis of INI1/SMARCB1 of a large series of pediatric rhabdoid tumors was performed. The aim was to identify regions of copy number change and loss of heterozygosity that might pinpoint additional loci involved in the development or progression of rhabdoid tumors, and define the spectrum of genomic alterations of INI1 in this malignancy. Experimental Design A multi-platform approach, utilizing Illumina single nucleotide polymorphism (SNP) based oligonucleotide arrays, multiplex ligation dependent probe amplification (MLPA), fluorescence in situ hybridization (FISH), and coding sequence analysis was used to characterize genome wide copy number changes, loss of heterozygosity, and genomic alterations of INI1/SMARCB1 in a series of pediatric rhabdoid tumors. Results The bi-allelic alterations of INI1 that led to inactivation were elucidated in 50 of 51 tumors. INI1 inactivation was demonstrated by a variety of mechanisms, including deletions, mutations, and loss of heterozygosity. The results from the array studies highlighted the complexity of rearrangements of chromosome 22, compared to the low frequency of alterations involving the other chromosomes. Conclusions The results from the genome wide SNP-array analysis suggest that INI1 is the primary tumor suppressor gene involved in the development of rhabdoid tumors with no second locus identified. In addition, we did not identify hot spots for the breakpoints in sporadic tumors with deletions of chromosome 22q11.2. By employing a multimodality approach, the wide spectrum of alterations of INI1 can be identified in the majority of patients, which increases the clinical utility of molecular diagnostic testing. PMID:19276269
Schönhals, Elske Maria; Ding, Jia; Ritter, Enrique; Paulo, Maria João; Cara, Nicolás; Tacke, Ekhard; Hofferbert, Hans-Reinhard; Lübeck, Jens; Strahwald, Josef; Gebhardt, Christiane
2017-08-22
Tuber yield and starch content of the cultivated potato are complex traits of decisive importance for breeding improved varieties. Natural variation of tuber yield and starch content depends on the environment and on multiple, mostly unknown genetic factors. Dissection and molecular identification of the genes and their natural allelic variants controlling these complex traits will lead to the development of diagnostic DNA-based markers, by which precision and efficiency of selection can be increased (precision breeding). Three case-control populations were assembled from tetraploid potato cultivars based on maximizing the differences between high and low tuber yield (TY), starch content (TSC) and starch yield (TSY, arithmetic product of TY and TSC). The case-control populations were genotyped by restriction-site associated DNA sequencing (RADseq) and the 8.3 k SolCAP SNP genotyping array. The allele frequencies of single nucleotide polymorphisms (SNPs) were compared between cases and controls. RADseq identified, depending on data filtering criteria, between 6664 and 450 genes with one or more differential SNPs for one, two or all three traits. Differential SNPs in 275 genes were detected using the SolCAP array. A genome wide association study using the SolCAP array on an independent, unselected population identified SNPs associated with tuber starch content in 117 genes. Physical mapping of the genes containing differential or associated SNPs, and comparisons between the two genome wide genotyping methods and two different populations identified genome segments on all twelve potato chromosomes harboring one or more quantitative trait loci (QTL) for TY, TSC and TSY. Several hundred genes control tuber yield and starch content in potato. They are unequally distributed on all potato chromosomes, forming clusters between 0.5-4 Mbp width. The largest fraction of these genes had unknown function, followed by genes with putative signalling and regulatory functions. The genetic control of tuber yield and starch content is interlinked. Most differential SNPs affecting both traits had antagonistic effects: The allele increasing TY decreased TSC and vice versa. Exceptions were 89 SNP alleles which had synergistic effects on TY, TSC and TSY. These and the corresponding genes are primary targets for developing diagnostic markers.
Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data.
Favero, F; Joshi, T; Marquard, A M; Birkbak, N J; Krzystanek, M; Li, Q; Szallasi, Z; Eklund, A C
2015-01-01
Exome or whole-genome deep sequencing of tumor DNA along with paired normal DNA can potentially provide a detailed picture of the somatic mutations that characterize the tumor. However, analysis of such sequence data can be complicated by the presence of normal cells in the tumor specimen, by intratumor heterogeneity, and by the sheer size of the raw data. In particular, determination of copy number variations from exome sequencing data alone has proven difficult; thus, single nucleotide polymorphism (SNP) arrays have often been used for this task. Recently, algorithms to estimate absolute, but not allele-specific, copy number profiles from tumor sequencing data have been described. We developed Sequenza, a software package that uses paired tumor-normal DNA sequencing data to estimate tumor cellularity and ploidy, and to calculate allele-specific copy number profiles and mutation profiles. We applied Sequenza, as well as two previously published algorithms, to exome sequence data from 30 tumors from The Cancer Genome Atlas. We assessed the performance of these algorithms by comparing their results with those generated using matched SNP arrays and processed by the allele-specific copy number analysis of tumors (ASCAT) algorithm. Comparison between Sequenza/exome and SNP/ASCAT revealed strong correlation in cellularity (Pearson's r = 0.90) and ploidy estimates (r = 0.42, or r = 0.94 after manual inspecting alternative solutions). This performance was noticeably superior to previously published algorithms. In addition, in artificial data simulating normal-tumor admixtures, Sequenza detected the correct ploidy in samples with tumor content as low as 30%. The agreement between Sequenza and SNP array-based copy number profiles suggests that exome sequencing alone is sufficient not only for identifying small scale mutations but also for estimating cellularity and inferring DNA copy number aberrations. © The Author 2014. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
2013-01-01
Background Hereditary non-polyposis colorectal cancer (HNPCC)/Lynch syndrome (LS) is a cancer syndrome characterised by early-onset epithelial cancers, especially colorectal cancer (CRC) and endometrial cancer. The aim of the current study was to use SNP-array technology to identify genomic aberrations which could contribute to the increased risk of cancer in HNPCC/LS patients. Methods Individuals diagnosed with HNPCC/LS (100) and healthy controls (384) were genotyped using the Illumina Human610-Quad SNP-arrays. Copy number variation (CNV) calling and association analyses were performed using Nexus software, with significant results validated using QuantiSNP. TaqMan Copy-Number assays were used for verification of CNVs showing significant association with HNPCC/LS identified by both software programs. Results We detected copy number (CN) gains associated with HNPCC/LS status on chromosome 7q11.21 (28% cases and 0% controls, Nexus; p = 3.60E-20 and QuantiSNP; p < 1.00E-16) and 16p11.2 (46% in cases, while a CN loss was observed in 23% of controls, Nexus; p = 4.93E-21 and QuantiSNP; p = 5.00E-06) via in silico analyses. TaqMan Copy-Number assay was used for validation of CNVs showing significant association with HNPCC/LS. In addition, CNV burden (total CNV length, average CNV length and number of observed CNV events) was significantly greater in cases compared to controls. Conclusion A greater CNV burden was identified in HNPCC/LS cases compared to controls supporting the notion of higher genomic instability in these patients. One intergenic locus on chromosome 7q11.21 is possibly associated with HNPCC/LS and deserves further investigation. The results from this study highlight the complexities of fluorescent based CNV analyses. The inefficiency of both CNV detection methods to reproducibly detect observed CNVs demonstrates the need for sequence data to be considered alongside intensity data to avoid false positive results. PMID:23531357
Johnson, Eric O; Hancock, Dana B; Levy, Joshua L; Gaddis, Nathan C; Saccone, Nancy L; Bierut, Laura J; Page, Grier P
2013-05-01
A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.
Evaluation of copy number variation detection for a SNP array platform
2014-01-01
Background Copy Number Variations (CNVs) are usually inferred from Single Nucleotide Polymorphism (SNP) arrays by use of some software packages based on given algorithms. However, there is no clear understanding of the performance of these software packages; it is therefore difficult to select one or several software packages for CNV detection based on the SNP array platform. We selected four publicly available software packages designed for CNV calling from an Affymetrix SNP array, including Birdsuite, dChip, Genotyping Console (GTC) and PennCNV. The publicly available dataset generated by Array-based Comparative Genomic Hybridization (CGH), with a resolution of 24 million probes per sample, was considered to be the “gold standard”. Compared with the CGH-based dataset, the success rate, average stability rate, sensitivity, consistence and reproducibility of these four software packages were assessed compared with the “gold standard”. Specially, we also compared the efficiency of detecting CNVs simultaneously by two, three and all of the software packages with that by a single software package. Results Simply from the quantity of the detected CNVs, Birdsuite detected the most while GTC detected the least. We found that Birdsuite and dChip had obvious detecting bias. And GTC seemed to be inferior because of the least amount of CNVs it detected. Thereafter we investigated the detection consistency produced by one certain software package and the rest three software suits. We found that the consistency of dChip was the lowest while GTC was the highest. Compared with the CNVs detecting result of CGH, in the matching group, GTC called the most matching CNVs, PennCNV-Affy ranked second. In the non-overlapping group, GTC called the least CNVs. With regards to the reproducibility of CNV calling, larger CNVs were usually replicated better. PennCNV-Affy shows the best consistency while Birdsuite shows the poorest. Conclusion We found that PennCNV outperformed the other three packages in the sensitivity and specificity of CNV calling. Obviously, each calling method had its own limitations and advantages for different data analysis. Therefore, the optimized calling methods might be identified using multiple algorithms to evaluate the concordance and discordance of SNP array-based CNV calling. PMID:24555668
Wu, Xiaoping; Guldbrandtsen, Bernt; Lund, Mogens Sandø; Sahana, Goutam
2016-09-01
Identification of genetic variants associated with feet and legs disorders (FLD) will aid in the genetic improvement of these traits by providing knowledge on genes that influence trait variations. In Denmark, FLD in cattle has been recorded since the 1990s. In this report, we used deregressed breeding values as response variables for a genome-wide association study. Bulls (5,334 Danish Holstein, 4,237 Nordic Red Dairy Cattle, and 1,180 Danish Jersey) with deregressed estimated breeding values were genotyped with the Illumina Bovine 54k single nucleotide polymorphism (SNP) genotyping array. Genotypes were imputed to whole-genome sequence variants, and then 22,751,039 SNP on 29 autosomes were used for an association analysis. A modified linear mixed-model approach (efficient mixed-model association eXpedited, EMMAX) and a linear mixed model were used for association analysis. We identified 5 (3,854 SNP), 3 (13,642 SNP), and 0 quantitative trait locus (QTL) regions associated with the FLD index in Danish Holstein, Nordic Red Dairy Cattle, and Danish Jersey populations, respectively. We did not identify any QTL that were common among the 3 breeds. In a meta-analysis of the 3 breeds, 4 QTL regions were significant, but no additional QTL region was identified compared with within-breed analyses. Comparison between top SNP locations within these QTL regions and known genes suggested that RASGRP1, LCORL, MOS, and MITF may be candidate genes for FLD in dairy cattle. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Comparing CNV detection methods for SNP arrays.
Winchester, Laura; Yau, Christopher; Ragoussis, Jiannis
2009-09-01
Data from whole genome association studies can now be used for dual purposes, genotyping and copy number detection. In this review we discuss some of the methods for using SNP data to detect copy number events. We examine a number of algorithms designed to detect copy number changes through the use of signal-intensity data and consider methods to evaluate the changes found. We describe the use of several statistical models in copy number detection in germline samples. We also present a comparison of data using these methods to assess accuracy of prediction and detection of changes in copy number.
Muchero, Wellington; Diop, Ndeye N; Bhat, Prasanna R; Fenton, Raymond D; Wanamaker, Steve; Pottorff, Marti; Hearne, Sarah; Cisse, Ndiaga; Fatokun, Christian; Ehlers, Jeffrey D; Roberts, Philip A; Close, Timothy J
2009-10-27
Consensus genetic linkage maps provide a genomic framework for quantitative trait loci identification, map-based cloning, assessment of genetic diversity, association mapping, and applied breeding in marker-assisted selection schemes. Among "orphan crops" with limited genomic resources such as cowpea [Vigna unguiculata (L.) Walp.] (2n = 2x = 22), the use of transcript-derived SNPs in genetic maps provides opportunities for automated genotyping and estimation of genome structure based on synteny analysis. Here, we report the development and validation of a high-throughput EST-derived SNP assay for cowpea, its application in consensus map building, and determination of synteny to reference genomes. SNP mining from 183,118 ESTs sequenced from 17 cDNA libraries yielded approximately 10,000 high-confidence SNPs from which an Illumina 1,536-SNP GoldenGate genotyping array was developed and applied to 741 recombinant inbred lines from six mapping populations. Approximately 90% of the SNPs were technically successful, providing 1,375 dependable markers. Of these, 928 were incorporated into a consensus genetic map spanning 680 cM with 11 linkage groups and an average marker distance of 0.73 cM. Comparison of this cowpea genetic map to reference legumes, soybean (Glycine max) and Medicago truncatula, revealed extensive macrosynteny encompassing 85 and 82%, respectively, of the cowpea map. Regions of soybean genome duplication were evident relative to the simpler diploid cowpea. Comparison with Arabidopsis revealed extensive genomic rearrangement with some conserved microsynteny. These results support evolutionary closeness between cowpea and soybean and identify regions for synteny-based functional genomics studies in legumes.
Zhou, Wei; Liu, Ranran; Zhang, Jingjing; Zheng, Maiqing; Li, Peng; Chang, Guobin; Wen, Jie; Zhao, Guiping
2014-10-01
Copy number variation (CNV) has been recently examined in many species and is recognized as being a source of genetic variability, especially for disease-related phenotypes. In this study, the PennCNV software, a genome-wide CNV detection system based on the 60 K SNP BeadChip was used on a total sample size of 1,310 Beijing-You chickens (a Chinese local breed). After quality control, 137 high confidence CNVRs covering 27.31 Mb of the chicken genome and corresponding to 2.61 % of the whole chicken genome. Within these regions, 131 known genes or coding sequences were involved. Q-PCR was applied to verify some of the genes related to disease development. Results showed that copy number of genes such as, phosphatidylinositol-5-phosphate 4-kinase II alpha, PHD finger protein 14, RHACD8 (a CD8α- like messenger RNA), MHC B-G, zinc finger protein, sarcosine dehydrogenase and ficolin 2 varied between individual chickens, which also supports the reliability of chip-detection of the CNVs. As one source of genomic variation, CNVs may provide new insight into the relationship between the genome and phenotypic characteristics.
Gebreyesus, Grum; Lund, Mogens S; Buitenhuis, Bart; Bovenhuis, Henk; Poulsen, Nina A; Janss, Luc G
2017-12-05
Accurate genomic prediction requires a large reference population, which is problematic for traits that are expensive to measure. Traits related to milk protein composition are not routinely recorded due to costly procedures and are considered to be controlled by a few quantitative trait loci of large effect. The amount of variation explained may vary between regions leading to heterogeneous (co)variance patterns across the genome. Genomic prediction models that can efficiently take such heterogeneity of (co)variances into account can result in improved prediction reliability. In this study, we developed and implemented novel univariate and bivariate Bayesian prediction models, based on estimates of heterogeneous (co)variances for genome segments (BayesAS). Available data consisted of milk protein composition traits measured on cows and de-regressed proofs of total protein yield derived for bulls. Single-nucleotide polymorphisms (SNPs), from 50K SNP arrays, were grouped into non-overlapping genome segments. A segment was defined as one SNP, or a group of 50, 100, or 200 adjacent SNPs, or one chromosome, or the whole genome. Traditional univariate and bivariate genomic best linear unbiased prediction (GBLUP) models were also run for comparison. Reliabilities were calculated through a resampling strategy and using deterministic formula. BayesAS models improved prediction reliability for most of the traits compared to GBLUP models and this gain depended on segment size and genetic architecture of the traits. The gain in prediction reliability was especially marked for the protein composition traits β-CN, κ-CN and β-LG, for which prediction reliabilities were improved by 49 percentage points on average using the MT-BayesAS model with a 100-SNP segment size compared to the bivariate GBLUP. Prediction reliabilities were highest with the BayesAS model that uses a 100-SNP segment size. The bivariate versions of our BayesAS models resulted in extra gains of up to 6% in prediction reliability compared to the univariate versions. Substantial improvement in prediction reliability was possible for most of the traits related to milk protein composition using our novel BayesAS models. Grouping adjacent SNPs into segments provided enhanced information to estimate parameters and allowing the segments to have different (co)variances helped disentangle heterogeneous (co)variances across the genome.
USDA-ARS?s Scientific Manuscript database
Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect QTL for BCWD resistance in rainbow trout (Oncorhynchus mykiss). However, the recent availability of a 57K SNP array and a genome phys...
Larson, Wesley; Palti, Yniv; Gao, G.; Warheit, Kenneth I.; Seeb, James E.
2017-01-01
Natural-origin steelhead trout (Oncorhynchus mykiss (Walbaum, 1792)) in the Pacific Northwest, USA, are threatened by a number of factors including habitat destruction, disease, decline in marine survival, and a potential erosion of genetic viability due to introgression from hatchery strains. Our major goal was to use a recently developed SNP array containing ∼57 000 SNPs to identify a subset of SNPs that differentiate hatchery and natural-origin populations. We analyzed 35 765 polymorphic SNPs in nine populations of steelhead trout sampled from Puget Sound, Washington, USA. We then conducted two outlier tests and found 360 loci that were candidates for divergent selection between hatchery and natural-origin populations (mean FCT = 0.29, maximum = 0.65) and 595 SNPs that were candidates for selection among natural-origin populations (mean FST = 0.25, maximum = 0.51). Comparisons with a linkage map revealed that two chromosomes (Omy05 and Omy25) contained significantly more outliers than other chromosomes, suggesting that regions on Omy05 and Omy25 may be of adaptive significance. Our results highlight several advantages of the 57 000 SNP array as a tool for population and conservation genomics studies.
Tomato breeding in the genomics era: insights from a SNP array.
Víquez-Zamora, Marcela; Vosman, Ben; van de Geest, Henri; Bovy, Arnaud; Visser, Richard G F; Finkers, Richard; van Heusden, Adriaan W
2013-05-27
The major bottle neck in genetic and linkage studies in tomato has been the lack of a sufficient number of molecular markers. This has radically changed with the application of next generation sequencing and high throughput genotyping. A set of 6000 SNPs was identified and 5528 of them were used to evaluate tomato germplasm at the level of species, varieties and segregating populations. From the 5528 SNPs, 1980 originated from 454-sequencing, 3495 from Illumina Solexa sequencing and 53 were additional known markers. Genotyping different tomato samples allowed the evaluation of the level of heterozygosity and introgressions among commercial varieties. Cherry tomatoes were especially different from round/beefs in chromosomes 4, 5 and 12. We were able to identify a set of 750 unique markers distinguishing S. lycopersicum 'Moneymaker' from all its distantly related wild relatives. Clustering and neighbour joining analysis among varieties and species showed expected grouping patterns, with S. pimpinellifolium as the most closely related to commercial tomatoes earlier results. Our results show that a SNP search in only a few breeding lines already provides generally applicable markers in tomato and its wild relatives. It also shows that the Illumina bead array generated data are highly reproducible. Our SNPs can roughly be divided in two categories: SNPs of which both forms are present in the wild relatives and in domesticated tomatoes (originating from common ancestors) and SNPs unique for the domesticated tomato (originating from after the domestication event). The SNPs can be used for genotyping, identification of varieties, comparison of genetic and physical linkage maps and to confirm (phylogenetic) relations. In the SNPs used for the array there is hardly any overlap with the SolCAP array and it is strongly recommended to combine both SNP sets and to select a core collection of robust SNPs completely covering the entire tomato genome.
Ulloa, Mauricio; Hulse-Kemp, Amanda M; De Santiago, Luis M; Stelly, David M; Burke, John J
2017-01-01
High-density linkage maps are vital to supporting the correct placement of scaffolds and gene sequences on chromosomes and fundamental to contemporary organismal research and scientific approaches to genetic improvement, especially in paleopolyploids with exceptionally complex genomes, eg, upland cotton ( Gossypium hirsutum L., "2n = 52"). Three independently developed intraspecific upland mapping populations were analyzed to generate 3 high-density genetic linkage single-nucleotide polymorphism (SNP) maps and a consensus map using the CottonSNP63K array. The populations consisted of a previously reported F 2 , a recombinant inbred line (RIL), and reciprocal RIL population, from "Phytogen 72" and "Stoneville 474" cultivars. The cluster file provided 7417 genotyped SNP markers, resulting in 26 linkage groups corresponding to the 26 chromosomes (c) of the allotetraploid upland cotton (AD) 1 arisen from the merging of 2 genomes ("A" Old World and "D" New World). Patterns of chromosome-specific recombination were largely consistent across mapping populations. The high-density genetic consensus map included 7244 SNP markers that spanned 3538 cM and comprised 3824 SNP bins, of which 1783 and 2041 were in the A t and D t subgenomes with 1825 and 1713 cM map lengths, respectively. Subgenome average distances were nearly identical, indicating that subgenomic differences in bin number arose due to the high numbers of SNPs on the D t subgenome. Examination of expected recombination frequency or crossovers (COs) on the chromosomes within each population of the 2 subgenomes revealed that COs were also not affected by the SNPs or SNP bin number in these subgenomes. Comparative alignment analyses identified historical ancestral A t -subgenomic translocations of c02 and c03, as well as of c04 and c05. The consensus map SNP sequences aligned with high congruency to the NBI assembly of Gossypium hirsutum . However, the genomic comparisons revealed evidence of additional unconfirmed possible duplications, inversions and translocations, and unbalance SNP sequence homology or SNP sequence/loci genomic dominance, or homeolog loci bias of the upland tetraploid A t and D t subgenomes. The alignments indicated that 364 SNP-associated previously unintegrated scaffolds can be placed in pseudochromosomes of the NBI G hirsutum assembly. This is the first intraspecific SNP genetic linkage consensus map assembled in G hirsutum with a core of reproducible mendelian SNP markers assayed on different populations and it provides further knowledge of chromosome arrangement of genic and nongenic SNPs. Together, the consensus map and RIL populations provide a synergistically useful platform for localizing and identifying agronomically important loci for improvement of the cotton crop.
Foresman, Bradley J.; Oliver, Rebekah E.; Jackson, Eric W.; Chao, Shiaoman; Arruda, Marcio P.; Kolb, Frederic L.
2016-01-01
Barley yellow dwarf viruses (BYDVs) are responsible for the disease barley yellow dwarf (BYD) and affect many cereals including oat (Avena sativa L.). Until recently, the molecular marker technology in oat has not allowed for many marker-trait association studies to determine the genetic mechanisms for tolerance. A genome-wide association study (GWAS) was performed on 428 spring oat lines using a recently developed high-density oat single nucleotide polymorphism (SNP) array as well as a SNP-based consensus map. Marker-trait associations were performed using a Q-K mixed model approach to control for population structure and relatedness. Six significant SNP-trait associations representing two QTL were found on chromosomes 3C (Mrg17) and 18D (Mrg04). This is the first report of BYDV tolerance QTL on chromosome 3C (Mrg17) and 18D (Mrg04). Haplotypes using the two QTL were evaluated and distinct classes for tolerance were identified based on the number of favorable alleles. A large number of lines carrying both favorable alleles were observed in the panel. PMID:27175781
Li, X; Buitenhuis, A J; Lund, M S; Li, C; Sun, D; Zhang, Q; Poulsen, N A; Su, G
2015-11-01
The identification of causal genes or genomic regions associated with fatty acids (FA) will enhance our understanding of the pathways underlying FA synthesis and provide opportunities for changing milk fat composition through a genetic approach. The linkage disequilibrium between adjacent markers is highly consistent between the Chinese and Danish Holstein populations, such that a joint genome-wide association study (GWAS) can be performed. In this study, a joint GWAS was performed for 16 milk FA traits based on data of 784 Chinese and 371 Danish Holstein cows genotyped by a high-density bovine single nucleotide polymorphism (SNP) array. A total of 486,464 SNP markers on 29 bovine autosomes were used. Bonferroni corrections were applied to adjust the significance thresholds for multiple testing at the genome- and chromosome-wide levels. According to the analysis of either the Chinese or Danish data individually, the total numbers of overlapping SNP that were significant at the chromosome level were 94 for C14:1, 208 for the C14 index, and 1 for C18:0. Joint analysis using the combined data of the 2 populations detected greater numbers of significant SNP compared with either of the individual populations alone for 7 and 10 traits at the genome- and chromosome-wide significance levels, respectively. Greater numbers of significant SNP were detected for C18:0 and the C18 index in the Chinese population compared with the joint analysis. Sixty-five significant SNP across all traits had significantly different effects in the 2 populations. Ten FA were influenced by a quantitative trait loci (QTL) region including DGAT1. Both C14:1 and the C14 index were influenced by a QTL region including SCD1 in the combined population. Other QTL regions also showed significant associations with the studied FA. A large region (14.9-24.9 Mbp) in BTA26 significantly influenced C14:1 and the C14 index in both populations, mostly likely due to the SNP in SCD1. A QTL region (69.97-73.69 Mbp) on BTA9 showed a significantly different effect on C18:0 between the 2 populations. Detection of these important SNP and the corresponding QTL regions will be helpful for follow-up studies to identify causal mutations and their interaction with environments for milk FA in dairy cattle. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Jiang, Y; Zhao, Y; Rodemann, B; Plieske, J; Kollers, S; Korzun, V; Ebmeyer, E; Argillier, O; Hinze, M; Ling, J; Röder, M S; Ganal, M W; Mette, M F; Reif, J C
2015-03-01
Genome-wide mapping approaches in diverse populations are powerful tools to unravel the genetic architecture of complex traits. The main goals of our study were to investigate the potential and limits to unravel the genetic architecture and to identify the factors determining the accuracy of prediction of the genotypic variation of Fusarium head blight (FHB) resistance in wheat (Triticum aestivum L.) based on data collected with a diverse panel of 372 European varieties. The wheat lines were phenotyped in multi-location field trials for FHB resistance and genotyped with 782 simple sequence repeat (SSR) markers, and 9k and 90k single-nucleotide polymorphism (SNP) arrays. We applied genome-wide association mapping in combination with fivefold cross-validations and observed surprisingly high accuracies of prediction for marker-assisted selection based on the detected quantitative trait loci (QTLs). Using a random sample of markers not selected for marker-trait associations revealed only a slight decrease in prediction accuracy compared with marker-based selection exploiting the QTL information. The same picture was confirmed in a simulation study, suggesting that relatedness is a main driver of the accuracy of prediction in marker-assisted selection of FHB resistance. When the accuracy of prediction of three genomic selection models was contrasted for the three marker data sets, no significant differences in accuracies among marker platforms and genomic selection models were observed. Marker density impacted the accuracy of prediction only marginally. Consequently, genomic selection of FHB resistance can be implemented most cost-efficiently based on low- to medium-density SNP arrays.
Liu, Xin; Wang, Li Gang; Luo, Wei Zhen; Li, Yong; Liang, Jing; Yan, Hua; Zhao, Ke Bin; Wang, Li Xian; Zhang, Long Chao
2014-12-01
A high-density single nucleotide polymorphism (SNP) array containing 62 163 markers was employed for a genome-wide association study (GWAS) to identify variants associated with lean meat in ham (LMH, %) and lean meat percentage (LMP, %) within a porcine Large White×Minzhu intercross population. For each individual, LMH and LMP were measured after slaughter at the age of 240±7 days. A total of 557 F2 animals were genotyped. The GWAS revealed that 21 SNPs showed significant genome-wide or chromosome-wide associations with LMH and LMP by the Genome-wide Rapid Association using Mixed Model and Regression-Genomic Control approach. Nineteen significant genome-wide SNPs were mapped to the distal end of Sus Scrofa Chromosome (SSC) 2, where a major known gene responsible for muscle mass, IGF2 is located. A conditioned analysis, in which the genotype of the strongest associated SNP is included as a fixed effect in the model, showed that those significant SNPs on SSC2 were derived from a single quantitative trait locus. The two chromosome-wide association SNPs on SSC1 disappeared after conditioned analysis suggested the association signal is a false association derived from using a F2 population. The present result is expected to lead to novel insights into muscle mass in different pig breeds and lays a preliminary foundation for follow-up studies for identification of causal mutations for subsequent application in marker-assisted selection programs for improving muscle mass in pigs. © 2014 Japanese Society of Animal Science.
Genome-wide association studies for multiple diseases of the German Shepherd Dog
Tsai, Kate L.; Noorai, Rooksana E.; Starr-Moss, Alison N.; Quignon, Pascale; Rinz, Caitlin J.; Ostrander, Elaine A.; Steiner, Jörg M.; Murphy, Keith E.
2012-01-01
The German Shepherd Dog (GSD) is a popular working and companion breed for which over 50 hereditary diseases have been documented. Herein, SNP profiles for 197 GSDs were generated using the Affymetrix v2 canine SNP array for a genome-wide association study to identify loci associated with four diseases: pituitary dwarfism, degenerative myelopathy (DM), congenital megaesophagus (ME), and pancreatic acinar atrophy (PAA). A locus on Chr 9 is strongly associated with pituitary dwarfism and is proximal to a plausible candidate gene, LHX3. Results for DM confirm a major locus encompassing SOD1, in which an associated point mutation was previously identified, but do not suggest modifier loci. Several SNPs on Chr 12 are associated with ME and a 4.7 Mb haplotype block is present in affected dogs. Analysis of additional ME cases for a SNP within the haplotype provides further support for this association. Results for PAA indicate more complex genetic underpinnings. Several regions on multiple chromosomes reach genome-wide significance. However, no major locus is apparent and only two associated haplotype blocks, on Chrs 7 and 12 are observed. These data suggest that PAA may be governed by multiple loci with small effects, or it may be a heterogeneous disorder. PMID:22105877
High-density genetic map construction and comparative genome analysis in asparagus bean.
Huang, Haitao; Tan, Huaqiang; Xu, Dongmei; Tang, Yi; Niu, Yisong; Lai, Yunsong; Tie, Manman; Li, Huanxiu
2018-03-19
Genetic maps are a prerequisite for quantitative trait locus (QTL) analysis, marker-assisted selection (MAS), fine gene mapping, and assembly of genome sequences. So far, several asparagus bean linkage maps have been established using various kinds of molecular markers. However, these maps were all constructed by gel- or array-based markers. No maps based on sequencing method have been reported. In this study, an NGS-based strategy, SLAF-seq, was applied to create a high-density genetic map for asparagus bean. Through SLAF library construction and Illumina sequencing of two parents and 100 F2 individuals, a total of 55,437 polymorphic SLAF markers were developed and mined for SNP markers. The map consisted of 5,225 SNP markers in 11 LGs, spanning a total distance of 1,850.81 cM, with an average distance between markers of 0.35 cM. Comparative genome analysis with four other legume species, soybean, common bean, mung bean and adzuki bean showed that asparagus bean is genetically more related to adzuki bean. The results will provide a foundation for future genomic research, such as QTL fine mapping, comparative mapping in pulses, and offer support for assembling asparagus bean genome sequence.
Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David
2018-04-11
Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions. It also allows the definition of sequence length and sequence variability of the target region as well as the less variable flanking regions for tailoring to MPS platforms. As shown in this study, TIA can be used to discover identity-linked SNP islands within the human genome, useful for differentiating individuals by targeted resequencing on MPS technologies.
USDA-ARS?s Scientific Manuscript database
A high-throughput genotyping platform is needed to enable marker-assisted breeding in the allo-octoploid cultivated strawberry Fragaria ×ananassa. Short-read sequences from one diploid and 19 octoploid accessions were aligned to the diploid Fragaria vesca ‘Hawaii 4’ reference genome to identify sing...
Lepoittevin, Camille; Frigerio, Jean-Marc; Garnier-Géré, Pauline; Salin, Franck; Cervera, María-Teresa; Vornam, Barbara; Harvengt, Luc; Plomion, Christophe
2010-01-01
Background There is considerable interest in the high-throughput discovery and genotyping of single nucleotide polymorphisms (SNPs) to accelerate genetic mapping and enable association studies. This study provides an assessment of EST-derived and resequencing-derived SNP quality in maritime pine (Pinus pinaster Ait.), a conifer characterized by a huge genome size (∼23.8 Gb/C). Methodology/Principal Findings A 384-SNPs GoldenGate genotyping array was built from i/ 184 SNPs originally detected in a set of 40 re-sequenced candidate genes (in vitro SNPs), chosen on the basis of functionality scores, presence of neighboring polymorphisms, minor allele frequencies and linkage disequilibrium and ii/ 200 SNPs screened from ESTs (in silico SNPs) selected based on the number of ESTs used for SNP detection, the SNP minor allele frequency and the quality of SNP flanking sequences. The global success rate of the assay was 66.9%, and a conversion rate (considering only polymorphic SNPs) of 51% was achieved. In vitro SNPs showed significantly higher genotyping-success and conversion rates than in silico SNPs (+11.5% and +18.5%, respectively). The reproducibility was 100%, and the genotyping error rate very low (0.54%, dropping down to 0.06% when removing four SNPs showing elevated error rates). Conclusions/Significance This study demonstrates that ESTs provide a resource for SNP identification in non-model species, which do not require any additional bench work and little bio-informatics analysis. However, the time and cost benefits of in silico SNPs are counterbalanced by a lower conversion rate than in vitro SNPs. This drawback is acceptable for population-based experiments, but could be dramatic in experiments involving samples from narrow genetic backgrounds. In addition, we showed that both the visual inspection of genotyping clusters and the estimation of a per SNP error rate should help identify markers that are not suitable to the GoldenGate technology in species characterized by a large and complex genome. PMID:20543950
Humble, E; Martinez-Barrio, A; Forcada, J; Trathan, P N; Thorne, M A S; Hoffmann, M; Wolf, J B W; Hoffman, J I
2016-07-01
Custom genotyping arrays provide a flexible and accurate means of genotyping single nucleotide polymorphisms (SNPs) in a large number of individuals of essentially any organism. However, validation rates, defined as the proportion of putative SNPs that are verified to be polymorphic in a population, are often very low. A number of potential causes of assay failure have been identified, but none have been explored systematically. In particular, as SNPs are often developed from transcriptomes, parameters relating to the genomic context are rarely taken into account. Here, we assembled a draft Antarctic fur seal (Arctocephalus gazella) genome (assembly size: 2.41 Gb; scaffold/contig N50 : 3.1 Mb/27.5 kb). We then used this resource to map the probe sequences of 144 putative SNPs genotyped in 480 individuals. The number of probe-to-genome mappings and alignment length together explained almost a third of the variation in validation success, indicating that sequence uniqueness and proximity to intron-exon boundaries play an important role. The same pattern was found after mapping the probe sequences to the Walrus and Weddell seal genomes, suggesting that the genomes of species divergent by as much as 23 million years can hold information relevant to SNP validation outcomes. Additionally, reanalysis of genotyping data from seven previous studies found the same two variables to be significantly associated with SNP validation success across a variety of taxa. Finally, our study reveals considerable scope for validation rates to be improved, either by simply filtering for SNPs whose flanking sequences align uniquely and completely to a reference genome, or through predictive modelling. © 2015 John Wiley & Sons Ltd.
Learning about human population history from ancient and modern genomes.
Stoneking, Mark; Krause, Johannes
2011-08-18
Genome-wide data, both from SNP arrays and from complete genome sequencing, are becoming increasingly abundant and are now even available from extinct hominins. These data are providing new insights into population history; in particular, when combined with model-based analytical approaches, genome-wide data allow direct testing of hypotheses about population history. For example, genome-wide data from both contemporary populations and extinct hominins strongly support a single dispersal of modern humans from Africa, followed by two archaic admixture events: one with Neanderthals somewhere outside Africa and a second with Denisovans that (so far) has only been detected in New Guinea. These new developments promise to reveal new stories about human population history, without having to resort to storytelling.
He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao
2006-05-01
Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.
FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context.
Mader, Malte; Simon, Ronald; Steinbiss, Sascha; Kurtz, Stefan
2011-07-28
The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle.
FISH Oracle: a web server for flexible visualization of DNA copy number data in a genomic context
2011-01-01
Background The rapidly growing amount of array CGH data requires improved visualization software supporting the process of identifying candidate cancer genes. Optimally, such software should work across multiple microarray platforms, should be able to cope with data from different sources and should be easy to operate. Results We have developed a web-based software FISH Oracle to visualize data from multiple array CGH experiments in a genomic context. Its fast visualization engine and advanced web and database technology supports highly interactive use. FISH Oracle comes with a convenient data import mechanism, powerful search options for genomic elements (e.g. gene names or karyobands), quick navigation and zooming into interesting regions, and mechanisms to export the visualization into different high quality formats. These features make the software especially suitable for the needs of life scientists. Conclusions FISH Oracle offers a fast and easy to use visualization tool for array CGH and SNP array data. It allows for the identification of genomic regions representing minimal common changes based on data from one or more experiments. FISH Oracle will be instrumental to identify candidate onco and tumor suppressor genes based on the frequency and genomic position of DNA copy number changes. The FISH Oracle application and an installed demo web server are available at http://www.zbh.uni-hamburg.de/fishoracle. PMID:21884636
Gardner, Shea N.; Hall, Barry G.
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125
Design and characterization of a 52K SNP chip for goats.
Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C M; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T; McEwan, John; Martin, Patrice; Moreno, Carole R; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang
2014-01-01
The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50-60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years.
Design and Characterization of a 52K SNP Chip for Goats
Tosser-Klopp, Gwenola; Bardou, Philippe; Bouchez, Olivier; Cabau, Cédric; Crooijmans, Richard; Dong, Yang; Donnadieu-Tonon, Cécile; Eggen, André; Heuven, Henri C. M.; Jamli, Saadiah; Jiken, Abdullah Johari; Klopp, Christophe; Lawley, Cynthia T.; McEwan, John; Martin, Patrice; Moreno, Carole R.; Mulsant, Philippe; Nabihoudine, Ibouniyamine; Pailhoux, Eric; Palhière, Isabelle; Rupp, Rachel; Sarry, Julien; Sayre, Brian L.; Tircazes, Aurélie; Jun Wang; Wang, Wen; Zhang, Wenguang
2014-01-01
The success of Genome Wide Association Studies in the discovery of sequence variation linked to complex traits in humans has increased interest in high throughput SNP genotyping assays in livestock species. Primary goals are QTL detection and genomic selection. The purpose here was design of a 50–60,000 SNP chip for goats. The success of a moderate density SNP assay depends on reliable bioinformatic SNP detection procedures, the technological success rate of the SNP design, even spacing of SNPs on the genome and selection of Minor Allele Frequencies (MAF) suitable to use in diverse breeds. Through the federation of three SNP discovery projects consolidated as the International Goat Genome Consortium, we have identified approximately twelve million high quality SNP variants in the goat genome stored in a database together with their biological and technical characteristics. These SNPs were identified within and between six breeds (meat, milk and mixed): Alpine, Boer, Creole, Katjang, Saanen and Savanna, comprising a total of 97 animals. Whole genome and Reduced Representation Library sequences were aligned on >10 kb scaffolds of the de novo goat genome assembly. The 60,000 selected SNPs, evenly spaced on the goat genome, were submitted for oligo manufacturing (Illumina, Inc) and published in dbSNP along with flanking sequences and map position on goat assemblies (i.e. scaffolds and pseudo-chromosomes), sheep genome V2 and cattle UMD3.1 assembly. Ten breeds were then used to validate the SNP content and 52,295 loci could be successfully genotyped and used to generate a final cluster file. The combined strategy of using mainly whole genome Next Generation Sequencing and mapping on a contig genome assembly, complemented with Illumina design tools proved to be efficient in producing this GoatSNP50 chip. Advances in use of molecular markers are expected to accelerate goat genomic studies in coming years. PMID:24465974
Gardner, Shea N; Hall, Barry G
2013-01-01
Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
Development and validation of the Axiom(®) Apple480K SNP genotyping array.
Bianco, Luca; Cestaro, Alessandro; Linsmith, Gareth; Muranty, Hélène; Denancé, Caroline; Théron, Anthony; Poncet, Charles; Micheletti, Diego; Kerschbamer, Emanuela; Di Pierro, Erica A; Larger, Simone; Pindo, Massimo; Van de Weg, Eric; Davassi, Alessandro; Laurens, François; Velasco, Riccardo; Durel, Charles-Eric; Troggio, Michela
2016-04-01
Cultivated apple (Malus × domestica Borkh.) is one of the most important fruit crops in temperate regions, and has great economic and cultural value. The apple genome is highly heterozygous and has undergone a recent duplication which, combined with a rapid linkage disequilibrium decay, makes it difficult to perform genome-wide association (GWA) studies. Single nucleotide polymorphism arrays offer highly multiplexed assays at a relatively low cost per data point and can be a valid tool for the identification of the markers associated with traits of interest. Here, we describe the development and validation of a 487K SNP Affymetrix Axiom(®) genotyping array for apple and discuss its potential applications. The array has been built from the high-depth resequencing of 63 different cultivars covering most of the genetic diversity in cultivated apple. The SNPs were chosen by applying a focal points approach to enrich genic regions, but also to reach a uniform coverage of non-genic regions. A total of 1324 apple accessions, including the 92 progenies of two mapping populations, have been genotyped with the Axiom(®) Apple480K to assess the effectiveness of the array. A large majority of SNPs (359 994 or 74%) fell in the stringent class of poly high resolution polymorphisms. We also devised a filtering procedure to identify a subset of 275K very robust markers that can be safely used for germplasm surveys in apple. The Axiom(®) Apple480K has now been commercially released both for public and proprietary use and will likely be a reference tool for GWA studies in apple. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Cavanagh, Colin R; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K; Sorrells, Mark E; Hayden, Matthew J; Akhunov, Eduard
2013-05-14
Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat.
Cavanagh, Colin R.; Chao, Shiaoman; Wang, Shichen; Huang, Bevan Emma; Stephen, Stuart; Kiani, Seifollah; Forrest, Kerrie; Saintenac, Cyrille; Brown-Guedira, Gina L.; Akhunova, Alina; See, Deven; Bai, Guihua; Pumphrey, Michael; Tomar, Luxmi; Wong, Debbie; Kong, Stephan; Reynolds, Matthew; da Silva, Marta Lopez; Bockelman, Harold; Talbert, Luther; Anderson, James A.; Dreisigacker, Susanne; Baenziger, Stephen; Carter, Arron; Korzun, Viktor; Morrell, Peter Laurent; Dubcovsky, Jorge; Morell, Matthew K.; Sorrells, Mark E.; Hayden, Matthew J.; Akhunov, Eduard
2013-01-01
Domesticated crops experience strong human-mediated selection aimed at developing high-yielding varieties adapted to local conditions. To detect regions of the wheat genome subject to selection during improvement, we developed a high-throughput array to interrogate 9,000 gene-associated single-nucleotide polymorphisms (SNP) in a worldwide sample of 2,994 accessions of hexaploid wheat including landraces and modern cultivars. Using a SNP-based diversity map we characterized the impact of crop improvement on genomic and geographic patterns of genetic diversity. We found evidence of a small population bottleneck and extensive use of ancestral variation often traceable to founders of cultivars from diverse geographic regions. Analyzing genetic differentiation among populations and the extent of haplotype sharing, we identified allelic variants subjected to selection during improvement. Selective sweeps were found around genes involved in the regulation of flowering time and phenology. An introgression of a wild relative-derived gene conferring resistance to a fungal pathogen was detected by haplotype-based analysis. Comparing selective sweeps identified in different populations, we show that selection likely acts on distinct targets or multiple functionally equivalent alleles in different portions of the geographic range of wheat. The majority of the selected alleles were present at low frequency in local populations, suggesting either weak selection pressure or temporal variation in the targets of directional selection during breeding probably associated with changing agricultural practices or environmental conditions. The developed SNP chip and map of genetic variation provide a resource for advancing wheat breeding and supporting future population genomic and genome-wide association studies in wheat. PMID:23630259
2011-01-01
Background Single nucleotide polymorphisms (SNPs) are the most abundant source of genetic variation among individuals of a species. New genotyping technologies allow examining hundreds to thousands of SNPs in a single reaction for a wide range of applications such as genetic diversity analysis, linkage mapping, fine QTL mapping, association studies, marker-assisted or genome-wide selection. In this paper, we evaluated the potential of highly-multiplexed SNP genotyping for genetic mapping in maritime pine (Pinus pinaster Ait.), the main conifer used for commercial plantation in southwestern Europe. Results We designed a custom GoldenGate assay for 1,536 SNPs detected through the resequencing of gene fragments (707 in vitro SNPs/Indels) and from Sanger-derived Expressed Sequenced Tags assembled into a unigene set (829 in silico SNPs/Indels). Offspring from three-generation outbred (G2) and inbred (F2) pedigrees were genotyped. The success rate of the assay was 63.6% and 74.8% for in silico and in vitro SNPs, respectively. A genotyping error rate of 0.4% was further estimated from segregating data of SNPs belonging to the same gene. Overall, 394 SNPs were available for mapping. A total of 287 SNPs were integrated with previously mapped markers in the G2 parental maps, while 179 SNPs were localized on the map generated from the analysis of the F2 progeny. Based on 98 markers segregating in both pedigrees, we were able to generate a consensus map comprising 357 SNPs from 292 different loci. Finally, the analysis of sequence homology between mapped markers and their orthologs in a Pinus taeda linkage map, made it possible to align the 12 linkage groups of both species. Conclusions Our results show that the GoldenGate assay can be used successfully for high-throughput SNP genotyping in maritime pine, a conifer species that has a genome seven times the size of the human genome. This SNP-array will be extended thanks to recent sequencing effort using new generation sequencing technologies and will include SNPs from comparative orthologous sequences that were identified in the present study, providing a wider collection of anchor points for comparative genomics among the conifers. PMID:21767361
Paulsson, Kajsa; Cazier, Jean-Baptiste; MacDougall, Finlay; Stevens, Jane; Stasevich, Irina; Vrcelj, Nikoletta; Chaplin, Tracy; Lillington, Debra M.; Lister, T. Andrew; Young, Bryan D.
2008-01-01
We present here a genome-wide map of abnormalities found in diagnostic samples from 45 adults and adolescents with acute lymphoblastic leukemia (ALL). A 500K SNP array analysis uncovered frequent genetic abnormalities, with cryptic deletions constituting half of the detected changes, implying that microdeletions are a characteristic feature of this malignancy. Importantly, the pattern of deletions resembled that recently reported in pediatric ALL, suggesting that adult, adolescent, and childhood cases may be more similar on the genetic level than previously thought. Thus, 70% of the cases displayed deletion of one or more of the CDKN2A, PAX5, IKZF1, ETV6, RB1, and EBF1 genes. Furthermore, several genes not previously implicated in the pathogenesis of ALL were identified as possible recurrent targets of deletion. In total, the SNP array analysis identified 367 genetic abnormalities not corresponding to known copy number polymorphisms, with all but two cases (96%) displaying at least one cryptic change. The resolution level of this SNP array study is the highest used to date to investigate a malignant hematologic disorder. Our findings provide insights into the leukemogenic process and may be clinically important in adult and adolescent ALL. Most importantly, we report that microdeletions of key genes appear to be a common, characteristic feature of ALL that is shared among different clinical, morphological, and cytogenetic subgroups. PMID:18458336
Population-genetic properties of differentiated copy number variations in cattle.
Xu, Lingyang; Hou, Yali; Bickhart, Derek M; Zhou, Yang; Hay, El Hamidi Abdel; Song, Jiuzhou; Sonstegard, Tad S; Van Tassell, Curtis P; Liu, George E
2016-03-23
While single nucleotide polymorphism (SNP) is typically the variant of choice for population genetics, copy number variation (CNV) which comprises insertion, deletion and duplication of genomic sequence, is an informative type of genetic variation. CNVs have been shown to be both common in mammals and important for understanding the relationship between genotype and phenotype. However, CNV differentiation, selection and its population genetic properties are not well understood across diverse populations. We performed a population genetics survey based on CNVs derived from the BovineHD SNP array data of eight distinct cattle breeds. We generated high resolution results that show geographical patterns of variations and genome-wide admixture proportions within and among breeds. Similar to the previous SNP-based studies, our CNV-based results displayed a strong correlation of population structure and geographical location. By conducting three pairwise comparisons among European taurine, African taurine, and indicine groups, we further identified 78 unique CNV regions that were highly differentiated, some of which might be due to selection. These CNV regions overlapped with genes involved in traits related to parasite resistance, immunity response, body size, fertility, and milk production. Our results characterize CNV diversity among cattle populations and provide a list of lineage-differentiated CNVs.
Accuracy of CNV Detection from GWAS Data.
Zhang, Dandan; Qian, Yudong; Akula, Nirmala; Alliey-Rodriguez, Ney; Tang, Jinsong; Gershon, Elliot S; Liu, Chunyu
2011-01-13
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites--Birdsuite, Partek, HelixTree, and PennCNV-Affy--in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two "gold standards," the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a "gold standard" for detection of CNVs remains to be established.
Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.
Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C
2018-06-01
There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.
Garinet, Simon; Néou, Mario; de La Villéon, Bruno; Faillot, Simon; Sakat, Julien; Da Fonseca, Juliana P; Jouinot, Anne; Le Tourneau, Christophe; Kamal, Maud; Luscap-Rondof, Windy; Boeva, Valentina; Gaujoux, Sebastien; Vidaud, Michel; Pasmant, Eric; Letourneur, Franck; Bertherat, Jérôme; Assié, Guillaume
2017-09-01
Pangenomic studies identified distinct molecular classes for many cancers, with major clinical applications. However, routine use requires cost-effective assays. We assessed whether targeted next-generation sequencing (NGS) could call chromosomal alterations and DNA methylation status. A training set of 77 tumors and a validation set of 449 (43 tumor types) were analyzed by targeted NGS and single-nucleotide polymorphism (SNP) arrays. Thirty-two tumors were analyzed by NGS after bisulfite conversion, and compared to methylation array or methylation-specific multiplex ligation-dependent probe amplification. Considering allelic ratios, correlation was strong between targeted NGS and SNP arrays (r = 0.88). In contrast, considering DNA copy number, for variations of one DNA copy, correlation was weaker between read counts and SNP array (r = 0.49). Thus, we generated TARGOMICs, optimized for detecting chromosome alterations by combining allelic ratios and read counts generated by targeted NGS. Sensitivity for calling normal, lost, and gained chromosomes was 89%, 72%, and 31%, respectively. Specificity was 81%, 93%, and 98%, respectively. These results were confirmed in the validation set. Finally, TARGOMICs could efficiently align and compute proportions of methylated cytosines from bisulfite-converted DNA from targeted NGS. In conclusion, beyond calling mutations, targeted NGS efficiently calls chromosome alterations and methylation status in tumors. A single run and minor design/protocol adaptations are sufficient. Optimizing targeted NGS should expand translation of genomics to clinical routine. Copyright © 2017 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Interest in genomic SNP testing for prostate cancer risk: a pilot survey.
Hall, Michael J; Ruth, Karen J; Chen, David Yt; Gross, Laura M; Giri, Veda N
2015-01-01
Advancements in genomic testing have led to the identification of single nucleotide polymorphisms (SNPs) associated with prostate cancer. The clinical utility of SNP tests to evaluate prostate cancer risk is unclear. Studies have not examined predictors of interest in novel genomic SNP tests for prostate cancer risk in a diverse population. Consecutive participants in the Fox Chase Prostate Cancer Risk Assessment Program (PRAP) (n = 40) and unselected men from surgical urology clinics (n = 40) completed a one-time survey. Items examined interest in genomic SNP testing for prostate cancer risk, knowledge, impact of unsolicited findings, and psychosocial factors including health literacy. Knowledge of genomic SNP tests was low in both groups, but interest was higher among PRAP men (p < 0.001). The prospect of receiving unsolicited results about ancestral genomic markers increased interest in testing in both groups. Multivariable modeling identified several predictors of higher interest in a genomic SNP test including higher perceived risk (p = 0.025), indicating zero reasons for not wanting testing (vs ≥1 reason) (p = 0.013), and higher health literacy (p = 0.016). Knowledge of genomic SNP testing was low in this sample, but higher among high-risk men. High-risk status may increase interest in novel genomic tests, while low literacy may lessen interest.
2010-01-01
Background Thoroughbred horses have been selected for traits contributing to speed and stamina for centuries. It is widely recognized that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. Results A genome-wide SNP-association study for optimum racing distance was performed using the EquineSNP50 Bead Chip genotyping array in a cohort of n = 118 elite Thoroughbred racehorses divergent for race distance aptitude. In a cohort-based association test we evaluated genotypic variation at 40,977 SNPs between horses suited to short distance (≤ 8 f) and middle-long distance (> 8 f) races. The most significant SNP was located on chromosome 18: BIEC2-417495 ~690 kb from the gene encoding myostatin (MSTN) [Punadj. = 6.96 × 10-6]. Considering best race distance as a quantitative phenotype, a peak of association on chromosome 18 (chr18:65809482-67545806) comprising eight SNPs encompassing a 1.7 Mb region was observed. Again, similar to the cohort-based analysis, the most significant SNP was BIEC2-417495 (Punadj. = 1.61 × 10-9; PBonf. = 6.58 × 10-5). In a candidate gene study we have previously reported a SNP (g.66493737C>T) in MSTN associated with best race distance in Thoroughbreds; however, its functional and genome-wide relevance were uncertain. Additional re-sequencing in the flanking regions of the MSTN gene revealed four novel 3' UTR SNPs and a 227 bp SINE insertion polymorphism in the 5' UTR promoter sequence. Linkage disequilibrium was highest between g.66493737C>T and BIEC2-417495 (r2 = 0.86). Conclusions Comparative association tests consistently demonstrated the g.66493737C>T SNP as the superior variant in the prediction of distance aptitude in racehorses (g.66493737C>T, P = 1.02 × 10-10; BIEC2-417495, Punadj. = 1.61 × 10-9). Functional investigations will be required to determine whether this polymorphism affects putative transcription-factor binding and gives rise to variation in gene and protein expression. Nonetheless, this study demonstrates that the g.66493737C>T SNP provides the most powerful genetic marker for prediction of race distance aptitude in Thoroughbreds. PMID:20932346
Hill, Emmeline W; McGivney, Beatrice A; Gu, Jingjing; Whiston, Ronan; Machugh, David E
2010-10-11
Thoroughbred horses have been selected for traits contributing to speed and stamina for centuries. It is widely recognized that inherited variation in physical and physiological characteristics is responsible for variation in individual aptitude for race distance, and that muscle phenotypes in particular are important. A genome-wide SNP-association study for optimum racing distance was performed using the EquineSNP50 Bead Chip genotyping array in a cohort of n = 118 elite Thoroughbred racehorses divergent for race distance aptitude. In a cohort-based association test we evaluated genotypic variation at 40,977 SNPs between horses suited to short distance (≤ 8 f) and middle-long distance (> 8 f) races. The most significant SNP was located on chromosome 18: BIEC2-417495 ~690 kb from the gene encoding myostatin (MSTN) [P(unadj.) = 6.96 x 10⁻⁶]. Considering best race distance as a quantitative phenotype, a peak of association on chromosome 18 (chr18:65809482-67545806) comprising eight SNPs encompassing a 1.7 Mb region was observed. Again, similar to the cohort-based analysis, the most significant SNP was BIEC2-417495 (P(unadj.) = 1.61 x 10⁻⁹; P(Bonf.) = 6.58 x 10⁻⁵). In a candidate gene study we have previously reported a SNP (g.66493737C>T) in MSTN associated with best race distance in Thoroughbreds; however, its functional and genome-wide relevance were uncertain. Additional re-sequencing in the flanking regions of the MSTN gene revealed four novel 3' UTR SNPs and a 227 bp SINE insertion polymorphism in the 5' UTR promoter sequence. Linkage disequilibrium was highest between g.66493737C>T and BIEC2-417495 (r² = 0.86). Comparative association tests consistently demonstrated the g.66493737C>T SNP as the superior variant in the prediction of distance aptitude in racehorses (g.66493737C>T, P = 1.02 x 10⁻¹⁰; BIEC2-417495, P(unadj.) = 1.61 x 10⁻⁹). Functional investigations will be required to determine whether this polymorphism affects putative transcription-factor binding and gives rise to variation in gene and protein expression. Nonetheless, this study demonstrates that the g.66493737C>T SNP provides the most powerful genetic marker for prediction of race distance aptitude in Thoroughbreds.
Boussaha, Mekki; Michot, Pauline; Letaief, Rabia; Hozé, Chris; Fritz, Sébastien; Grohs, Cécile; Esquerré, Diane; Duchesne, Amandine; Philippe, Romain; Blanquet, Véronique; Phocas, Florence; Floriot, Sandrine; Rocha, Dominique; Klopp, Christophe; Capitan, Aurélien; Boichard, Didier
2016-11-15
In recent years, several bovine genome sequencing projects were carried out with the aim of developing genomic tools to improve dairy and beef production efficiency and sustainability. In this study, we describe the first French cattle genome variation dataset obtained by sequencing 274 whole genomes representing several major dairy and beef breeds. This dataset contains over 28 million single nucleotide polymorphisms (SNPs) and small insertions and deletions. Comparisons between sequencing results and SNP array genotypes revealed a very high genotype concordance rate, which indicates the good quality of our data. To our knowledge, this is the first large-scale catalog of small genomic variations in French dairy and beef cattle. This resource will contribute to the study of gene functions and population structure and also help to improve traits through genotype-guided selection.
Simčič, Mojca; Smetko, Anamarija; Sölkner, Johann; Seichter, Doris; Gorjanc, Gregor; Kompan, Dragomir; Medugorac, Ivica
2015-01-01
The aim of this study was to obtain unbiased estimates of the diversity parameters, the population history, and the degree of admixture in Cika cattle which represents the local admixed breeds at risk of extinction undergoing challenging conservation programs. Genetic analyses were performed on the genome-wide Single Nucleotide Polymorphism (SNP) Illumina Bovine SNP50 array data of 76 Cika animals and 531 animals from 14 reference populations. To obtain unbiased estimates we used short haplotypes spanning four markers instead of single SNPs to avoid an ascertainment bias of the BovineSNP50 array. Genome-wide haplotypes combined with partial pedigree and type trait classification show the potential to improve identification of purebred animals with a low degree of admixture. Phylogenetic analyses demonstrated unique genetic identity of Cika animals. Genetic distance matrix presented by rooted Neighbour-Net suggested long and broad phylogenetic connection between Cika and Pinzgauer. Unsupervised clustering performed by the admixture analysis and two-dimensional presentation of the genetic distances between individuals also suggest Cika is a distinct breed despite being similar in appearance to Pinzgauer. Animals identified as the most purebred could be used as a nucleus for a recovery of the native genetic background in the current admixed population. The results show that local well-adapted strains, which have never been intensively managed and differentiated into specific breeds, exhibit large haplotype diversity. They suggest a conservation and recovery approach that does not rely exclusively on the search for the original native genetic background but rather on the identification and removal of common introgressed haplotypes would be more powerful. Successful implementation of such an approach should be based on combining phenotype, pedigree, and genome-wide haplotype data of the breed of interest and a spectrum of reference breeds which potentially have had direct or indirect historical contribution to the genetic makeup of the breed of interest. PMID:25923207
Pierson, Tyler Mark; Markello, Thomas; Accardi, John; Wolfe, Lynne; Adams, David; Sincan, Murat; Tarazi, Noor M.; Fajardo, Karin Fuentes; Cherukuri, Praveen F.; Bajraktari, Ilda; Meilleur, Katy G.; Donkervoort, Sandra; Jain, Mina; Hu, Ying; Lehky, Tanya J.; Cruz, Pedro; Mullikin, James C.; Bonnemann, Carsten; Gahl, William A.; Boerkoel, Cornelius F.; Tifft, Cynthia J.
2013-01-01
Early-onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD) is a myopathic disorder associated with mutations in MEGF10. By novel analysis of SNP array hybridization and exome sequence coverage, we diagnosed a 10-year old girl with EMARDD following identification of a novel homozygous deletion of exon 7 in MEGF10. In contrast to previously reported EMARDD patients, her weakness was more prominent proximally than distally, and involved her legs more than her arms. MRI of her pelvis and thighs showed muscle atrophy and fatty replacement. Ultrasound of several muscle groups revealed dense homogenous increases in echogenicity. Cloning and sequencing of the deletion breakpoint identified features suggesting the mutation arose by fork stalling and template switching. These findings constitute the first genomic deletion causing EMARDD, expand the clinical phenotype, and provide new insight into the pattern and histology of its muscular pathology. PMID:23453856
2010-01-01
Background The information provided by dense genome-wide markers using high throughput technology is of considerable potential in human disease studies and livestock breeding programs. Genome-wide association studies relate individual single nucleotide polymorphisms (SNP) from dense SNP panels to individual measurements of complex traits, with the underlying assumption being that any association is caused by linkage disequilibrium (LD) between SNP and quantitative trait loci (QTL) affecting the trait. Often SNP are in genomic regions of no trait variation. Whole genome Bayesian models are an effective way of incorporating this and other important prior information into modelling. However a full Bayesian analysis is often not feasible due to the large computational time involved. Results This article proposes an expectation-maximization (EM) algorithm called emBayesB which allows only a proportion of SNP to be in LD with QTL and incorporates prior information about the distribution of SNP effects. The posterior probability of being in LD with at least one QTL is calculated for each SNP along with estimates of the hyperparameters for the mixture prior. A simulated example of genomic selection from an international workshop is used to demonstrate the features of the EM algorithm. The accuracy of prediction is comparable to a full Bayesian analysis but the EM algorithm is considerably faster. The EM algorithm was accurate in locating QTL which explained more than 1% of the total genetic variation. A computational algorithm for very large SNP panels is described. Conclusions emBayesB is a fast and accurate EM algorithm for implementing genomic selection and predicting complex traits by mapping QTL in genome-wide dense SNP marker data. Its accuracy is similar to Bayesian methods but it takes only a fraction of the time. PMID:20969788
Zhang, Linsheng; Znoyko, Iya; Costa, Luciano J; Conlin, Laura K; Daber, Robert D; Self, Sally E; Wolff, Daynna J
2011-12-01
Chronic lymphocytic leukemia (CLL) is a clinically heterogeneous disease. The methods currently used for monitoring CLL and determining conditions for treatment are limited in their ability to predict disease progression, patient survival, and response to therapy. Although clonal diversity and the acquisition of new chromosomal abnormalities during the disease course (clonal evolution) have been associated with disease progression, their prognostic potential has been underappreciated because cytogenetic and fluorescence in situ hybridization (FISH) studies have a restricted ability to detect genomic abnormalities and clonal evolution. We hypothesized that whole genome analysis using high resolution single nucleotide polymorphism (SNP) microarrays would be useful to detect diversity and infer clonal evolution to offer prognostic information. In this study, we used the Infinium Omni1 BeadChip (Illumina, San Diego, CA) array for the analysis of genetic variation and percent mosaicism in 25 non-selected CLL patients to explore the prognostic value of the assessment of clonal diversity in patients with CLL. We calculated the percentage of mosaicism for each abnormality by applying a mathematical algorithm to the genotype frequency data and by manual determination using the Simulated DNA Copy Number (SiDCoN) tool, which was developed from a computer model of mosaicism. At least one genetic abnormality was identified in each case, and the SNP data was 98% concordant with FISH results. Clonal diversity, defined as the presence of two or more genetic abnormalities with differing percentages of mosaicism, was observed in 12 patients (48%), and the diversity correlated with the disease stage. Clonal diversity was present in most cases of advanced disease (Rai stages III and IV) or those with previous treatment, whereas 9 of 13 patients without detected clonal diversity were asymptomatic or clinically stable. In conclusion, SNP microarray studies with simultaneous evaluation of genomic alterations and mosaic distribution of clones can be used to assess apparent clonal evolution via analysis of clonal diversity. Since clonal evolution in CLL is strongly correlated with disease progression, whole genome SNP microarray analysis provides a new comprehensive and reliable prognostic tool for CLL patients. Copyright © 2011 Elsevier Inc. All rights reserved.
SEURAT: visual analytics for the integrated analysis of microarray data.
Gribov, Alexander; Sill, Martin; Lück, Sonja; Rücker, Frank; Döhner, Konstanze; Bullinger, Lars; Benner, Axel; Unwin, Antony
2010-06-03
In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.
A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation
2013-01-01
Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change. PMID:23445355
Wallenborn, M; Petters, O; Rudolf, D; Hantmann, H; Richter, M; Ahnert, P; Rohani, L; Smink, J J; Bulwin, G C; Krupp, W; Schulz, R M; Holland, H
2018-04-23
In the development of cell-based medicinal products, it is crucial to guarantee that the application of such an advanced therapy medicinal product (ATMP) is safe for the patients. The consensus of the European regulatory authorities is: "In conclusion, on the basis of the state of art, conventional karyotyping can be considered a valuable and useful technique to analyse chromosomal stability during preclinical studies". 408 chondrocyte samples (84 monolayers and 324 spheroids) from six patients were analysed using trypsin-Giemsa staining, spectral karyotyping and fluorescence in situ hybridisation, to evaluate the genetic stability of chondrocyte samples from non-clinical studies. Single nucleotide polymorphism (SNP) array analysis was performed on chondrocyte spheroids from five of the six donors. Applying this combination of techniques, the genetic analyses performed revealed no significant genetic instability until passage 3 in monolayer cells and interphase cells from spheroid cultures at different time points. Clonal occurrence of polyploid metaphases and endoreduplications were identified associated with prolonged cultivation time. Also, gonosomal losses were observed in chondrocyte spheroids, with increasing passage and duration of the differentiation phase. Interestingly, in one of the donors, chromosomal aberrations that are also described in extraskeletal myxoid chondrosarcoma were identified. The SNP array analysis exhibited chromosomal aberrations in two donors and copy neutral losses of heterozygosity regions in four donors. This study showed the necessity of combined genetic analyses at defined cultivation time points in quality studies within the field of cell therapy.
Tzvetkov, Mladen V; Becker, Christian; Kulle, Bettina; Nürnberg, Peter; Brockmöller, Jürgen; Wojnowski, Leszek
2005-02-01
Whole-genome DNA amplification by multiple displacement (MD-WGA) is a promising tool to obtain sufficient DNA amounts from samples of limited quantity. Using Affymetrix' GeneChip Human Mapping 10K Arrays, we investigated the accuracy and allele amplification bias in DNA samples subjected to MD-WGA. We observed an excellent concordance (99.95%) between single-nucleotide polymorphisms (SNPs) called both in the nonamplified and the corresponding amplified DNA. This concordance was only 0.01% lower than the intra-assay reproducibility of the genotyping technique used. However, MD-WGA failed to amplify an estimated 7% of polymorphic loci. Due to the algorithm used to call genotypes, this was detected only for heterozygous loci. We achieved a 4.3-fold reduction of noncalled SNPs by combining the results from two independent MD-WGA reactions. This indicated that inter-reaction variations rather than specific chromosomal loci reduced the efficiency of MD-WGA. Consistently, we detected no regions of reduced amplification, with the exception of several SNPs located near chromosomal ends. Altogether, despite a substantial loss of polymorphic sites, MD-WGA appears to be the current method of choice to amplify genomic DNA for array-based SNP analyses. The number of nonamplified loci can be substantially reduced by amplifying each DNA sample in duplicate.
Hussein, Ibtessam R; Bader, Rima S; Chaudhary, Adeel G; Bassiouni, Randa; Alquaiti, Maha; Ashgan, Fai; Schulten, Hans-Juergen; Al Qahtani, Mohammad H
2018-06-01
Congenital heart defects (CHDs) are the most common birth defects in neonatal life. CHDs could be presented as isolated defects or associated with developmental delay (DD) and/or other congenital malformations. A small proportion of cardiac defects are caused by chromosomal abnormalities or single gene defects; however, in a large proportion of cases no genetic diagnosis could be achieved by clinical examination and conventional genetic analysis. The development of genome wide array-Comparative Genomic Hybridization technique (array-CGH) allowed for the detection of cryptic chromosomal imbalances and pathogenic copy number variants (CNVs) not detected by conventional techniques. We investigated 94 patients having CHDs associated with other malformations and/or DD. Clinical examination and Echocardiography was done to all patients to evaluate the type of CHD. To investigate for genome defects we applied high-density array-CGH 2 × 400K (41 patients) and CGH/SNP microarray 2 × 400K (Agilent) for 53 patients. Confirmation of results was done using Fluorescent in situ hybridization (FISH) or qPCR techniques in certain cases. Chromosomal abnormalities such as trisomy 18, 13, 21, microdeletions: del22q11.2, del7q11.23, del18 (p11.32; p11.21), tetrasomy 18p, trisomy 9p, del11q24-q25, add 15p, add(18)(q21.3), and der 9, 15 (q34.2; q11.2) were detected in 21/94 patients (22%) using both conventional cytogenetics methods and array-CGH technique. Cryptic chromosomal anomalies and pathogenic variants were detected in 15/73 (20.5%) cases. CNVs were observed in a large proportion of the studied samples (27/56) (48%). Clustering of variants was observed in chromosome 1p36, 1p21.1, 2q37, 3q29, 5p15, 7p22.3, 8p23, 11p15.5, 14q11.2, 15q11.2, 16p13.3, 16p11.2, 18p11, 21q22, and 22q11.2. CGH/SNP array could detect loss of heterozygosity (LOH) in different chromosomal loci in 10/25 patients. Array-CGH technique allowed for detection of cryptic chromosomal imbalances that could not be detected by conventional cytogenetics methods. CHDs associated with DD/congenital malformations presented with a relatively high rate of cryptic chromosomal abnormalities. Clustering of CNVs in certain genome loci needs further analysis to identify candidate genes that may provide clues for understanding the molecular pathway of cardiac development.
Hagen, Ingerid J; Billing, Anna M; Rønning, Bernt; Pedersen, Sindre A; Pärn, Henrik; Slate, Jon; Jensen, Henrik
2013-05-01
With the advent of next generation sequencing, new avenues have opened to study genomics in wild populations of non-model species. Here, we describe a successful approach to a genome-wide medium density Single Nucleotide Polymorphism (SNP) panel in a non-model species, the house sparrow (Passer domesticus), through the development of a 10 K Illumina iSelect HD BeadChip. Genomic DNA and cDNA derived from six individuals were sequenced on a 454 GS FLX system and generated a total of 1.2 million sequences, in which SNPs were detected. As no reference genome exists for the house sparrow, we used the zebra finch (Taeniopygia guttata) reference genome to determine the most likely position of each SNP. The 10 000 SNPs on the SNP-chip were selected to be distributed evenly across 31 chromosomes, giving on average one SNP per 100 000 bp. The SNP-chip was screened across 1968 individual house sparrows from four island populations. Of the original 10 000 SNPs, 7413 were found to be variable, and 99% of these SNPs were successfully called in at least 93% of all individuals. We used the SNP-chip to demonstrate the ability of such genome-wide marker data to detect population sub-division, and compared these results to similar analyses using microsatellites. The SNP-chip will be used to map Quantitative Trait Loci (QTL) for fitness-related phenotypic traits in natural populations. © 2013 Blackwell Publishing Ltd.
UPD detection using homozygosity profiling with a SNP genotyping microarray.
Papenhausen, Peter; Schwartz, Stuart; Risheg, Hiba; Keitges, Elisabeth; Gadi, Inder; Burnside, Rachel D; Jaswaney, Vikram; Pappas, John; Pasion, Romela; Friedman, Kenneth; Tepperberg, James
2011-04-01
Single nucleotide polymorphism (SNP) based chromosome microarrays provide both a high-density whole genome analysis of copy number and genotype. In the past 21 months we have analyzed over 13,000 samples primarily referred for developmental delay using the Affymetrix SNP/CN 6.0 version array platform. In addition to copy number, we have focused on the relative distribution of allele homozygosity (HZ) throughout the genome to confirm a strong association of uniparental disomy (UPD) with regions of isoallelism found in most confirmed cases of UPD. We sought to determine whether a long contiguous stretch of HZ (LCSH) greater than a threshold value found only in a single chromosome would correlate with UPD of that chromosome. Nine confirmed UPD cases were retrospectively analyzed with the array in the study, each showing the anticipated LCSH with the smallest 13.5 Mb in length. This length is well above the average longest run of HZ in a set of control patients and was then set as the prospective threshold for reporting possible UPD correlation. Ninety-two cases qualified at that threshold, 46 of those had molecular UPD testing and 29 were positive. Including retrospective cases, 16 showed complete HZ across the chromosome, consistent with total isoUPD. The average size LCSH in the 19 cases that were not completely HZ was 46.3 Mb with a range of 13.5-127.8 Mb. Three patients showed only segmental UPD. Both the size and location of the LCSH are relevant to correlation with UPD. Further studies will continue to delineate an optimal threshold for LCSH/UPD correlation. Copyright © 2011 Wiley-Liss, Inc.
Zhang, Zhongyang; Hao, Ke
2015-11-01
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.
Zhang, Zhongyang; Hao, Ke
2015-01-01
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity. PMID:26583378
Natural Allelic Diversity, Genetic Structure and Linkage Disequilibrium Pattern in Wild Chickpea
Kujur, Alice; Das, Shouvik; Badoni, Saurabh; Kumar, Vinod; Singh, Mohar; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.
2014-01-01
Characterization of natural allelic diversity and understanding the genetic structure and linkage disequilibrium (LD) pattern in wild germplasm accessions by large-scale genotyping of informative microsatellite and single nucleotide polymorphism (SNP) markers is requisite to facilitate chickpea genetic improvement. Large-scale validation and high-throughput genotyping of genome-wide physically mapped 478 genic and genomic microsatellite markers and 380 transcription factor gene-derived SNP markers using gel-based assay, fluorescent dye-labelled automated fragment analyser and matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass array have been performed. Outcome revealed their high genotyping success rate (97.5%) and existence of a high level of natural allelic diversity among 94 wild and cultivated Cicer accessions. High intra- and inter-specific polymorphic potential and wider molecular diversity (11–94%) along with a broader genetic base (13–78%) specifically in the functional genic regions of wild accessions was assayed by mapped markers. It suggested their utility in monitoring introgression and transferring target trait-specific genomic (gene) regions from wild to cultivated gene pool for the genetic enhancement. Distinct species/gene pool-wise differentiation, admixed domestication pattern, and differential genome-wide recombination and LD estimates/decay observed in a six structured population of wild and cultivated accessions using mapped markers further signifies their usefulness in chickpea genetics, genomics and breeding. PMID:25222488
Pengelly, Reuben J; Tapper, William; Gibson, Jane; Knut, Marcin; Tearle, Rick; Collins, Andrew; Ennis, Sarah
2015-09-03
An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.
Saunders, Edward J; Dadaev, Tokhir; Leongamornlert, Daniel A; Al Olama, Ali Amin; Benlloch, Sara; Giles, Graham G; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schleutker, Johanna; Nordestgaard, Borge G; Travis, Ruth C; Neal, David; Pasayan, Nora; Khaw, Kay-Tee; Stanford, Janet L; Blot, William J; Thibodeau, Stephen N; Maier, Christiane; Kibel, Adam S; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Park, Jong Y; Kaneva, Radka; Batra, Jyotsna; Teixeira, Manuel R; Pandha, Hardev; Govindasami, Koveela; Muir, Ken; Easton, Douglas F; Eeles, Rosalind A; Kote-Jarai, Zsofia
2016-04-12
Germline mutations within DNA-repair genes are implicated in susceptibility to multiple forms of cancer. For prostate cancer (PrCa), rare mutations in BRCA2 and BRCA1 give rise to moderately elevated risk, whereas two of B100 common, low-penetrance PrCa susceptibility variants identified so far by genome-wide association studies implicate RAD51B and RAD23B. Genotype data from the iCOGS array were imputed to the 1000 genomes phase 3 reference panel for 21 780 PrCa cases and 21 727 controls from the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium. We subsequently performed single variant, gene and pathway-level analyses using 81 303 SNPs within 20 Kb of a panel of 179 DNA-repair genes. Single SNP analyses identified only the previously reported association with RAD51B. Gene-level analyses using the SKAT-C test from the SNP-set (Sequence) Kernel Association Test (SKAT) identified a significant association with PrCa for MSH5. Pathway-level analyses suggested a possible role for the translesion synthesis pathway in PrCa risk and Homologous recombination/Fanconi Anaemia pathway for PrCa aggressiveness, even though after adjustment for multiple testing these did not remain significant. MSH5 is a novel candidate gene warranting additional follow-up as a prospective PrCa-risk locus. MSH5 has previously been reported as a pleiotropic susceptibility locus for lung, colorectal and serous ovarian cancers.
Rare De Novo Copy Number Variants in Patients with Congenital Pulmonary Atresia
Xie, Li; Chen, Jin-Lan; Zhang, Wei-Zhi; Wang, Shou-Zheng; Zhao, Tian-Li; Huang, Can; Wang, Jian; Yang, Jin-Fu; Yang, Yi-Feng; Tan, Zhi-Ping
2014-01-01
Background Ongoing studies using genomic microarrays and next-generation sequencing have demonstrated that the genetic contributions to cardiovascular diseases have been significantly ignored in the past. The aim of this study was to identify rare copy number variants in individuals with congenital pulmonary atresia (PA). Methods and Results Based on the hypothesis that rare structural variants encompassing key genes play an important role in heart development in PA patients, we performed high-resolution genome-wide microarrays for copy number variations (CNVs) in 82 PA patient-parent trios and 189 controls with an Illumina SNP array platform. CNVs were identified in 17/82 patients (20.7%), and eight of these CNVs (9.8%) are considered potentially pathogenic. Five de novo CNVs occurred at two known congenital heart disease (CHD) loci (16p13.1 and 22q11.2). Two de novo CNVs that may affect folate and vitamin B12 metabolism were identified for the first time. A de novo 1-Mb deletion at 17p13.2 may represent a rare genomic disorder that involves mild intellectual disability and associated facial features. Conclusions Rare CNVs contribute to the pathogenesis of PA (9.8%), suggesting that the causes of PA are heterogeneous and pleiotropic. Together with previous data from animal models, our results might help identify a link between CHD and folate-mediated one-carbon metabolism (FOCM). With the accumulation of high-resolution SNP array data, these previously undescribed rare CNVs may help reveal critical gene(s) in CHD and may provide novel insights about CHD pathogenesis. PMID:24826987
Rare de novo copy number variants in patients with congenital pulmonary atresia.
Xie, Li; Chen, Jin-Lan; Zhang, Wei-Zhi; Wang, Shou-Zheng; Zhao, Tian-Li; Huang, Can; Wang, Jian; Yang, Jin-Fu; Yang, Yi-Feng; Tan, Zhi-Ping
2014-01-01
Ongoing studies using genomic microarrays and next-generation sequencing have demonstrated that the genetic contributions to cardiovascular diseases have been significantly ignored in the past. The aim of this study was to identify rare copy number variants in individuals with congenital pulmonary atresia (PA). Based on the hypothesis that rare structural variants encompassing key genes play an important role in heart development in PA patients, we performed high-resolution genome-wide microarrays for copy number variations (CNVs) in 82 PA patient-parent trios and 189 controls with an Illumina SNP array platform. CNVs were identified in 17/82 patients (20.7%), and eight of these CNVs (9.8%) are considered potentially pathogenic. Five de novo CNVs occurred at two known congenital heart disease (CHD) loci (16p13.1 and 22q11.2). Two de novo CNVs that may affect folate and vitamin B12 metabolism were identified for the first time. A de novo 1-Mb deletion at 17p13.2 may represent a rare genomic disorder that involves mild intellectual disability and associated facial features. Rare CNVs contribute to the pathogenesis of PA (9.8%), suggesting that the causes of PA are heterogeneous and pleiotropic. Together with previous data from animal models, our results might help identify a link between CHD and folate-mediated one-carbon metabolism (FOCM). With the accumulation of high-resolution SNP array data, these previously undescribed rare CNVs may help reveal critical gene(s) in CHD and may provide novel insights about CHD pathogenesis.
Wang, Yanan; Tang, Zhonglin; Sun, Yaqi; Wang, Hongyang; Wang, Chao; Yu, Shaobo; Liu, Jing; Zhang, Yu; Fan, Bin; Li, Kui; Liu, Bang
2014-01-01
Copy number variations (CNVs) represent a substantial source of structural variants in mammals and contribute to both normal phenotypic variability and disease susceptibility. Although low-resolution CNV maps are produced in many domestic animals, and several reports have been published about the CNVs of porcine genome, the differences between Chinese and western pigs still remain to be elucidated. In this study, we used Porcine SNP60 BeadChip and PennCNV algorithm to perform a genome-wide CNV detection in 302 individuals from six Chinese indigenous breeds (Tongcheng, Laiwu, Luchuan, Bama, Wuzhishan and Ningxiang pigs), three western breeds (Yorkshire, Landrace and Duroc) and one hybrid (Tongcheng×Duroc). A total of 348 CNV Regions (CNVRs) across genome were identified, covering 150.49 Mb of the pig genome or 6.14% of the autosomal genome sequence. In these CNVRs, 213 CNVRs were found to exist only in the six Chinese indigenous breeds, and 60 CNVRs only in the three western breeds. The characters of CNVs in four Chinese normal size breeds (Luchuan, Tongcheng and Laiwu pigs) and two minipig breeds (Bama and Wuzhishan pigs) were also analyzed in this study. Functional annotation suggested that these CNVRs possess a great variety of molecular function and may play important roles in phenotypic and production traits between Chinese and western breeds. Our results are important complementary to the CNV map in pig genome, which provide new information about the diversity of Chinese and western pig breeds, and facilitate further research on porcine genome CNVs.
Sun, Yaqi; Wang, Hongyang; Wang, Chao; Yu, Shaobo; Liu, Jing; Zhang, Yu; Fan, Bin; Li, Kui; Liu, Bang
2014-01-01
Copy number variations (CNVs) represent a substantial source of structural variants in mammals and contribute to both normal phenotypic variability and disease susceptibility. Although low-resolution CNV maps are produced in many domestic animals, and several reports have been published about the CNVs of porcine genome, the differences between Chinese and western pigs still remain to be elucidated. In this study, we used Porcine SNP60 BeadChip and PennCNV algorithm to perform a genome-wide CNV detection in 302 individuals from six Chinese indigenous breeds (Tongcheng, Laiwu, Luchuan, Bama, Wuzhishan and Ningxiang pigs), three western breeds (Yorkshire, Landrace and Duroc) and one hybrid (Tongcheng×Duroc). A total of 348 CNV Regions (CNVRs) across genome were identified, covering 150.49 Mb of the pig genome or 6.14% of the autosomal genome sequence. In these CNVRs, 213 CNVRs were found to exist only in the six Chinese indigenous breeds, and 60 CNVRs only in the three western breeds. The characters of CNVs in four Chinese normal size breeds (Luchuan, Tongcheng and Laiwu pigs) and two minipig breeds (Bama and Wuzhishan pigs) were also analyzed in this study. Functional annotation suggested that these CNVRs possess a great variety of molecular function and may play important roles in phenotypic and production traits between Chinese and western breeds. Our results are important complementary to the CNV map in pig genome, which provide new information about the diversity of Chinese and western pig breeds, and facilitate further research on porcine genome CNVs. PMID:25198154
Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.
Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan
2017-08-17
Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.
Craddock, Nick; Hurles, Matthew E; Cardin, Niall; Pearson, Richard D; Plagnol, Vincent; Robson, Samuel; Vukcevic, Damjan; Barnes, Chris; Conrad, Donald F; Giannoulatou, Eleni; Holmes, Chris; Marchini, Jonathan L; Stirrups, Kathy; Tobin, Martin D; Wain, Louise V; Yau, Chris; Aerts, Jan; Ahmad, Tariq; Andrews, T Daniel; Arbury, Hazel; Attwood, Anthony; Auton, Adam; Ball, Stephen G; Balmforth, Anthony J; Barrett, Jeffrey C; Barroso, Inês; Barton, Anne; Bennett, Amanda J; Bhaskar, Sanjeev; Blaszczyk, Katarzyna; Bowes, John; Brand, Oliver J; Braund, Peter S; Bredin, Francesca; Breen, Gerome; Brown, Morris J; Bruce, Ian N; Bull, Jaswinder; Burren, Oliver S; Burton, John; Byrnes, Jake; Caesar, Sian; Clee, Chris M; Coffey, Alison J; Connell, John M C; Cooper, Jason D; Dominiczak, Anna F; Downes, Kate; Drummond, Hazel E; Dudakia, Darshna; Dunham, Andrew; Ebbs, Bernadette; Eccles, Diana; Edkins, Sarah; Edwards, Cathryn; Elliot, Anna; Emery, Paul; Evans, David M; Evans, Gareth; Eyre, Steve; Farmer, Anne; Ferrier, I Nicol; Feuk, Lars; Fitzgerald, Tomas; Flynn, Edward; Forbes, Alistair; Forty, Liz; Franklyn, Jayne A; Freathy, Rachel M; Gibbs, Polly; Gilbert, Paul; Gokumen, Omer; Gordon-Smith, Katherine; Gray, Emma; Green, Elaine; Groves, Chris J; Grozeva, Detelina; Gwilliam, Rhian; Hall, Anita; Hammond, Naomi; Hardy, Matt; Harrison, Pile; Hassanali, Neelam; Hebaishi, Husam; Hines, Sarah; Hinks, Anne; Hitman, Graham A; Hocking, Lynne; Howard, Eleanor; Howard, Philip; Howson, Joanna M M; Hughes, Debbie; Hunt, Sarah; Isaacs, John D; Jain, Mahim; Jewell, Derek P; Johnson, Toby; Jolley, Jennifer D; Jones, Ian R; Jones, Lisa A; Kirov, George; Langford, Cordelia F; Lango-Allen, Hana; Lathrop, G Mark; Lee, James; Lee, Kate L; Lees, Charlie; Lewis, Kevin; Lindgren, Cecilia M; Maisuria-Armer, Meeta; Maller, Julian; Mansfield, John; Martin, Paul; Massey, Dunecan C O; McArdle, Wendy L; McGuffin, Peter; McLay, Kirsten E; Mentzer, Alex; Mimmack, Michael L; Morgan, Ann E; Morris, Andrew P; Mowat, Craig; Myers, Simon; Newman, William; Nimmo, Elaine R; O'Donovan, Michael C; Onipinla, Abiodun; Onyiah, Ifejinelo; Ovington, Nigel R; Owen, Michael J; Palin, Kimmo; Parnell, Kirstie; Pernet, David; Perry, John R B; Phillips, Anne; Pinto, Dalila; Prescott, Natalie J; Prokopenko, Inga; Quail, Michael A; Rafelt, Suzanne; Rayner, Nigel W; Redon, Richard; Reid, David M; Renwick; Ring, Susan M; Robertson, Neil; Russell, Ellie; St Clair, David; Sambrook, Jennifer G; Sanderson, Jeremy D; Schuilenburg, Helen; Scott, Carol E; Scott, Richard; Seal, Sheila; Shaw-Hawkins, Sue; Shields, Beverley M; Simmonds, Matthew J; Smyth, Debbie J; Somaskantharajah, Elilan; Spanova, Katarina; Steer, Sophia; Stephens, Jonathan; Stevens, Helen E; Stone, Millicent A; Su, Zhan; Symmons, Deborah P M; Thompson, John R; Thomson, Wendy; Travers, Mary E; Turnbull, Clare; Valsesia, Armand; Walker, Mark; Walker, Neil M; Wallace, Chris; Warren-Perry, Margaret; Watkins, Nicholas A; Webster, John; Weedon, Michael N; Wilson, Anthony G; Woodburn, Matthew; Wordsworth, B Paul; Young, Allan H; Zeggini, Eleftheria; Carter, Nigel P; Frayling, Timothy M; Lee, Charles; McVean, Gil; Munroe, Patricia B; Palotie, Aarno; Sawcer, Stephen J; Scherer, Stephen W; Strachan, David P; Tyler-Smith, Chris; Brown, Matthew A; Burton, Paul R; Caulfield, Mark J; Compston, Alastair; Farrall, Martin; Gough, Stephen C L; Hall, Alistair S; Hattersley, Andrew T; Hill, Adrian V S; Mathew, Christopher G; Pembrey, Marcus; Satsangi, Jack; Stratton, Michael R; Worthington, Jane; Deloukas, Panos; Duncanson, Audrey; Kwiatkowski, Dominic P; McCarthy, Mark I; Ouwehand, Willem; Parkes, Miles; Rahman, Nazneen; Todd, John A; Samani, Nilesh J; Donnelly, Peter
2010-04-01
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.
Comprehensive comparison of three commercial human whole-exome capture platforms.
Asan; Xu, Yu; Jiang, Hui; Tyler-Smith, Chris; Xue, Yali; Jiang, Tao; Wang, Jiawei; Wu, Mingzhi; Liu, Xiao; Tian, Geng; Wang, Jun; Wang, Jian; Yang, Huangming; Zhang, Xiuqing
2011-09-28
Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study. We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias. We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.
Peng, Wenzhu; Xu, Jian; Zhang, Yan; Feng, Jianxin; Dong, Chuanju; Jiang, Likun; Feng, Jingyan; Chen, Baohua; Gong, Yiwen; Chen, Lin; Xu, Peng
2016-01-01
High density genetic linkage maps are essential for QTL fine mapping, comparative genomics and high quality genome sequence assembly. In this study, we constructed a high-density and high-resolution genetic linkage map with 28,194 SNP markers on 14,146 distinct loci for common carp based on high-throughput genotyping with the carp 250 K single nucleotide polymorphism (SNP) array in a mapping family. The genetic length of the consensus map was 10,595.94 cM with an average locus interval of 0.75 cM and an average marker interval of 0.38 cM. Comparative genomic analysis revealed high level of conserved syntenies between common carp and the closely related model species zebrafish and medaka. The genome scaffolds were anchored to the high-density linkage map, spanning 1,357 Mb of common carp reference genome. QTL mapping and association analysis identified 22 QTLs for growth-related traits and 7 QTLs for sex dimorphism. Candidate genes underlying growth-related traits were identified, including important regulators such as KISS2, IGF1, SMTLB, NPFFR1 and CPE. Candidate genes associated with sex dimorphism were also identified including 3KSR and DMRT2b. The high-density and high-resolution genetic linkage map provides an important tool for QTL fine mapping and positional cloning of economically important traits, and improving common carp genome assembly. PMID:27225429
Pavy, Nathalie; Parsons, Lee S; Paule, Charles; MacKay, John; Bousquet, Jean
2006-01-01
Background High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. Results A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (PSNP), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either PSNP ≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using PSNP ≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. Conclusion We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies. PMID:16824208
Farmer, Andrew D.; Huang, Wei; Ambachew, Daniel; Penmetsa, R. Varma; Carrasquilla-Garcia, Noelia; Assefa, Teshale; Cannon, Steven B.
2018-01-01
Recombination (R) rate and linkage disequilibrium (LD) analyses are the basis for plant breeding. These vary by breeding system, by generation of inbreeding or outcrossing and by region in the chromosome. Common bean (Phaseolus vulgaris L.) is a favored food legume with a small sequenced genome (514 Mb) and n = 11 chromosomes. The goal of this study was to describe R and LD in the common bean genome using a 768-marker array of single nucleotide polymorphisms (SNP) based on Trans-legume Orthologous Group (TOG) genes along with an advanced-generation Recombinant Inbred Line reference mapping population (BAT93 x Jalo EEP558) and an internationally available diversity panel. A whole genome genetic map was created that covered all eleven linkage groups (LG). The LGs were linked to the physical map by sequence data of the TOGs compared to each chromosome sequence of common bean. The genetic map length in total was smaller than for previous maps reflecting the precision of allele calling and mapping with SNP technology as well as the use of gene-based markers. A total of 91.4% of TOG markers had singleton hits with annotated Pv genes and all mapped outside of regions of resistance gene clusters. LD levels were found to be stronger within the Mesoamerican genepool and decay more rapidly within the Andean genepool. The recombination rate across the genome was 2.13 cM / Mb but R was found to be highly repressed around centromeres and frequent outside peri-centromeric regions. These results have important implications for association and genetic mapping or crop improvement in common bean. PMID:29522524
Elmore, James R; Obmann, Melissa A; Kuivaniemi, Helena; Tromp, Gerard; Gerhard, Glenn S; Franklin, David P; Boddy, Amy M; Carey, David J
2009-06-01
The goal of this project was to identify genetic variants associated with abdominal aortic aneurysms (AAAs). A genome wide association study was carried out using pooled DNA samples from 123 AAA cases and 112 controls matched for age, gender, and smoking history using Affymetrix 500K single nucleotide polymorphism (SNP) arrays (Affymetrix, Inc, Santa Clara, Calif). The difference in mean allele frequency between cases and controls was calculated for each SNP and used to identify candidate genomic regions. Association of candidate SNPs with AAA was confirmed by individual TaqMan genotype assays in a total of 2096 cases and controls that included an independent replication sample set. A genome wide association study of AAA cases and controls identified a candidate AAA-associated haplotype on chromosome 3p12.3. By individual genotype analysis, four SNPs in this region were significantly associated with AAA in cases and controls from the original study population. One SNP in this region (rs7635818) was genotyped in a total of 502 cases and 736 controls from the original study population (P = .017) and 448 cases and 410 controls from an independent replication sample (P = .013; combined P value = .0028; combined odds ratio [OR] = 1.33). An even stronger association with AAA was observed in a subset of smokers (391 cases, 241 controls, P = .00041, OR = 1.80), which represent the highest risk group for AAA. The AAA-associated haplotype is located approximately 200 kbp upstream of the CNTN3 gene transcription start site. This study identifies a region on chromosome 3 that is significantly associated with AAA in 2 distinct study populations.
Hohenlohe, Paul A.; Day, Mitch D.; Amish, Stephen J.; Miller, Michael R.; Kamps-Hughes, Nick; Boyer, Matthew C.; Muhlfeld, Clint C.; Allendorf, Fred W.; Johnson, Eric A.; Luikart, Gordon
2013-01-01
Rapid and inexpensive methods for genomewide single nucleotide polymorphism (SNP) discovery and genotyping are urgently needed for population management and conservation. In hybridized populations, genomic techniques that can identify and genotype thousands of species-diagnostic markers would allow precise estimates of population- and individual-level admixture as well as identification of 'super invasive' alleles, which show elevated rates of introgression above the genomewide background (likely due to natural selection). Techniques like restriction-site-associated DNA (RAD) sequencing can discover and genotype large numbers of SNPs, but they have been limited by the length of continuous sequence data they produce with Illumina short-read sequencing. We present a novel approach, overlapping paired-end RAD sequencing, to generate RAD contigs of >300–400 bp. These contigs provide sufficient flanking sequence for design of high-throughput SNP genotyping arrays and strict filtering to identify duplicate paralogous loci. We applied this approach in five populations of native westslope cutthroat trout that previously showed varying (low) levels of admixture from introduced rainbow trout (RBT). We produced 77 141 RAD contigs and used these data to filter and genotype 3180 previously identified species-diagnostic SNP loci. Our population-level and individual-level estimates of admixture were generally consistent with previous microsatellite-based estimates from the same individuals. However, we observed slightly lower admixture estimates from genomewide markers, which might result from natural selection against certain genome regions, different genomic locations for microsatellites vs. RAD-derived SNPs and/or sampling error from the small number of microsatellite loci (n = 7). We also identified candidate adaptive super invasive alleles from RBT that had excessively high admixture proportions in hybridized cutthroat trout populations.
Variation in Recombination Rate and Its Genetic Determinism in Sheep Populations
Petit, Morgane; Astruc, Jean-Michel; Sarry, Julien; Drouilhet, Laurence; Fabre, Stéphane; Moreno, Carole R.; Servin, Bertrand
2017-01-01
Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and data sets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified ∼50,000 crossover hotspots on the genome, and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting interindividual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly two other loci of smaller effects including the KCNJ15 and FSHR genes. The comparison of these new results to those obtained previously in a distantly related population of domestic sheep (the Soay) revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome. The two data sets were thus combined to create more precise male meiotic recombination maps in Sheep. However, despite their similar recombination maps, Soay and Lacaune males were found to exhibit different heritabilities and QTL effects for interindividual variation in genome-wide recombination rates. This highlights the robustness of recombination patterns to underlying variation in their genetic determinism. PMID:28978774
Variation in Recombination Rate and Its Genetic Determinism in Sheep Populations.
Petit, Morgane; Astruc, Jean-Michel; Sarry, Julien; Drouilhet, Laurence; Fabre, Stéphane; Moreno, Carole R; Servin, Bertrand
2017-10-01
Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and data sets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified ∼50,000 crossover hotspots on the genome, and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting interindividual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly two other loci of smaller effects including the KCNJ15 and FSHR genes. The comparison of these new results to those obtained previously in a distantly related population of domestic sheep (the Soay) revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome. The two data sets were thus combined to create more precise male meiotic recombination maps in Sheep. However, despite their similar recombination maps, Soay and Lacaune males were found to exhibit different heritabilities and QTL effects for interindividual variation in genome-wide recombination rates. This highlights the robustness of recombination patterns to underlying variation in their genetic determinism. Copyright © 2017 by the Genetics Society of America.
High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis.
Eyre, Steve; Bowes, John; Diogo, Dorothée; Lee, Annette; Barton, Anne; Martin, Paul; Zhernakova, Alexandra; Stahl, Eli; Viatte, Sebastien; McAllister, Kate; Amos, Christopher I; Padyukov, Leonid; Toes, Rene E M; Huizinga, Tom W J; Wijmenga, Cisca; Trynka, Gosia; Franke, Lude; Westra, Harm-Jan; Alfredsson, Lars; Hu, Xinli; Sandor, Cynthia; de Bakker, Paul I W; Davila, Sonia; Khor, Chiea Chuen; Heng, Khai Koon; Andrews, Robert; Edkins, Sarah; Hunt, Sarah E; Langford, Cordelia; Symmons, Deborah; Concannon, Pat; Onengut-Gumuscu, Suna; Rich, Stephen S; Deloukas, Panos; Gonzalez-Gay, Miguel A; Rodriguez-Rodriguez, Luis; Ärlsetig, Lisbeth; Martin, Javier; Rantapää-Dahlqvist, Solbritt; Plenge, Robert M; Raychaudhuri, Soumya; Klareskog, Lars; Gregersen, Peter K; Worthington, Jane
2012-12-01
Using the Immunochip custom SNP array, which was designed for dense genotyping of 186 loci identified through genome-wide association studies (GWAS), we analyzed 11,475 individuals with rheumatoid arthritis (cases) of European ancestry and 15,870 controls for 129,464 markers. We combined these data in a meta-analysis with GWAS data from additional independent cases (n = 2,363) and controls (n = 17,872). We identified 14 new susceptibility loci, 9 of which were associated with rheumatoid arthritis overall and five of which were specifically associated with disease that was positive for anticitrullinated peptide antibodies, bringing the number of confirmed rheumatoid arthritis risk loci in individuals of European ancestry to 46. We refined the peak of association to a single gene for 19 loci, identified secondary independent effects at 6 loci and identified association to low-frequency variants at 4 loci. Bioinformatic analyses generated strong hypotheses for the causal SNP at seven loci. This study illustrates the advantages of dense SNP mapping analysis to inform subsequent functional investigations.
High density genetic mapping identifies new susceptibility loci for rheumatoid arthritis
Eyre, Steve; Bowes, John; Diogo, Dorothée; Lee, Annette; Barton, Anne; Martin, Paul; Zhernakova, Alexandra; Stahl, Eli; Viatte, Sebastien; McAllister, Kate; Amos, Christopher I.; Padyukov, Leonid; Toes, Rene E.M.; Huizinga, Tom W.J.; Wijmenga, Cisca; Trynka, Gosia; Franke, Lude; Westra, Harm-Jan; Alfredsson, Lars; Hu, Xinli; Sandor, Cynthia; de Bakker, Paul I.W.; Davila, Sonia; Khor, Chiea Chuen; Heng, Khai Koon; Andrews, Robert; Edkins, Sarah; Hunt, Sarah E; Langford, Cordelia; Symmons, Deborah; Concannon, Pat; Onengut-Gumuscu, Suna; Rich, Stephen S; Deloukas, Panos; Gonzalez-Gay, Miguel A.; Rodriguez-Rodriguez, Luis; Ärlsetig, Lisbeth; Martin, Javier; Rantapää-Dahlqvist, Solbritt; Plenge, Robert; Raychaudhuri, Soumya; Klareskog, Lars; Gregersen, Peter K; Worthington, Jane
2012-01-01
Summary Using the Immunochip custom single nucleotide polymorphism (SNP) array, designed for dense genotyping of 186 genome wide association study (GWAS) confirmed loci we analysed 11,475 rheumatoid arthritis cases of European ancestry and 15,870 controls for 129,464 markers. The data were combined in meta-analysis with GWAS data from additional independent cases (n=2,363) and controls (n=17,872). We identified fourteen novel loci; nine were associated with rheumatoid arthritis overall and 5 specifically in anti-citrillunated peptide antibody positive disease, bringing the number of confirmed European ancestry rheumatoid arthritis loci to 46. We refined the peak of association to a single gene for 19 loci, identified secondary independent effects at six loci and association to low frequency variants (minor allele frequency <0.05) at 4 loci. Bioinformatic analysis of the data generated strong hypotheses for the causal SNP at seven loci. This study illustrates the advantages of dense SNP mapping analysis to inform subsequent functional investigations. PMID:23143596
Fox, Ervin R.; Young, J. Hunter; Li, Yali; Dreisbach, Albert W.; Keating, Brendan J.; Musani, Solomon K.; Liu, Kiang; Morrison, Alanna C.; Ganesh, Santhi; Kutlar, Abdullah; Ramachandran, Vasan S.; Polak, Josef F.; Fabsitz, Richard R.; Dries, Daniel L.; Farlow, Deborah N.; Redline, Susan; Adeyemo, Adebowale; Hirschorn, Joel N.; Sun, Yan V.; Wyatt, Sharon B.; Penman, Alan D.; Palmas, Walter; Rotter, Jerome I.; Townsend, Raymond R.; Doumatey, Ayo P.; Tayo, Bamidele O.; Mosley, Thomas H.; Lyon, Helen N.; Kang, Sun J.; Rotimi, Charles N.; Cooper, Richard S.; Franceschini, Nora; Curb, J. David; Martin, Lisa W.; Eaton, Charles B.; Kardia, Sharon L.R.; Taylor, Herman A.; Caulfield, Mark J.; Ehret, Georg B.; Johnson, Toby; Chakravarti, Aravinda; Zhu, Xiaofeng; Levy, Daniel; Munroe, Patricia B.; Rice, Kenneth M.; Bochud, Murielle; Johnson, Andrew D.; Chasman, Daniel I.; Smith, Albert V.; Tobin, Martin D.; Verwoert, Germaine C.; Hwang, Shih-Jen; Pihur, Vasyl; Vollenweider, Peter; O'Reilly, Paul F.; Amin, Najaf; Bragg-Gresham, Jennifer L.; Teumer, Alexander; Glazer, Nicole L.; Launer, Lenore; Zhao, Jing Hua; Aulchenko, Yurii; Heath, Simon; Sõber, Siim; Parsa, Afshin; Luan, Jian'an; Arora, Pankaj; Dehghan, Abbas; Zhang, Feng; Lucas, Gavin; Hicks, Andrew A.; Jackson, Anne U.; Peden, John F.; Tanaka, Toshiko; Wild, Sarah H.; Rudan, Igor; Igl, Wilmar; Milaneschi, Yuri; Parker, Alex N.; Fava, Cristiano; Chambers, John C.; Kumari, Meena; JinGo, Min; van der Harst, Pim; Kao, Wen Hong Linda; Sjögren, Marketa; Vinay, D.G.; Alexander, Myriam; Tabara, Yasuharu; Shaw-Hawkins, Sue; Whincup, Peter H.; Liu, Yongmei; Shi, Gang; Kuusisto, Johanna; Seielstad, Mark; Sim, Xueling; Nguyen, Khanh-Dung Hoang; Lehtimäki, Terho; Matullo, Giuseppe; Wu, Ying; Gaunt, Tom R.; Charlotte Onland-Moret, N.; Cooper, Matthew N.; Platou, Carl G.P.; Org, Elin; Hardy, Rebecca; Dahgam, Santosh; Palmen, Jutta; Vitart, Veronique; Braund, Peter S.; Kuznetsova, Tatiana; Uiterwaal, Cuno S.P.M.; Campbell, Harry; Ludwig, Barbara; Tomaszewski, Maciej; Tzoulaki, Ioanna; Palmer, Nicholette D.; Aspelund, Thor; Garcia, Melissa; Chang, Yen-Pei C.; O'Connell, Jeffrey R.; Steinle, Nanette I.; Grobbee, Diederick E.; Arking, Dan E.; Hernandez, Dena; Najjar, Samer; McArdle, Wendy L.; Hadley, David; Brown, Morris J.; Connell, John M.; Hingorani, Aroon D.; Day, Ian N.M.; Lawlor, Debbie A.; Beilby, John P.; Lawrence, Robert W.; Clarke, Robert; Collins, Rory; Hopewell, Jemma C.; Ongen, Halit; Bis, Joshua C.; Kähönen, Mika; Viikari, Jorma; Adair, Linda S.; Lee, Nanette R.; Chen, Ming-Huei; Olden, Matthias; Pattaro, Cristian; Hoffman Bolton, Judith A.; Köttgen, Anna; Bergmann, Sven; Mooser, Vincent; Chaturvedi, Nish; Frayling, Timothy M.; Islam, Muhammad; Jafar, Tazeen H.; Erdmann, Jeanette; Kulkarni, Smita R.; Bornstein, Stefan R.; Grässler, Jürgen; Groop, Leif; Voight, Benjamin F.; Kettunen, Johannes; Howard, Philip; Taylor, Andrew; Guarrera, Simonetta; Ricceri, Fulvio; Emilsson, Valur; Plump, Andrew; Barroso, Inês; Khaw, Kay-Tee; Weder, Alan B.; Hunt, Steven C.; Bergman, Richard N.; Collins, Francis S.; Bonnycastle, Lori L.; Scott, Laura J.; Stringham, Heather M.; Peltonen, Leena; Perola, Markus; Vartiainen, Erkki; Brand, Stefan-Martin; Staessen, Jan A.; Wang, Thomas J.; Burton, Paul R.; SolerArtigas, Maria; Dong, Yanbin; Snieder, Harold; Wang, Xiaoling; Zhu, Haidong; Lohman, Kurt K.; Rudock, Megan E.; Heckbert, Susan R.; Smith, Nicholas L.; Wiggins, Kerri L.; Shriner, Daniel; Veldre, Gudrun; Viigimaa, Margus; Kinra, Sanjay; Prabhakaran, Dorairajan; Tripathy, Vikal; Langefeld, Carl D.; Rosengren, Annika; Thelle, Dag S.; MariaCorsi, Anna; Singleton, Andrew; Forrester, Terrence; Hilton, Gina; McKenzie, Colin A.; Salako, Tunde; Iwai, Naoharu; Kita, Yoshikuni; Ogihara, Toshio; Ohkubo, Takayoshi; Okamura, Tomonori; Ueshima, Hirotsugu; Umemura, Satoshi; Eyheramendy, Susana; Meitinger, Thomas; Wichmann, H.-Erich; Cho, Yoon Shin; Kim, Hyung-Lae; Lee, Jong-Young; Scott, James; Sehmi, Joban S.; Zhang, Weihua; Hedblad, Bo; Nilsson, Peter; Smith, George Davey; Wong, Andrew; Narisu, Narisu; Stančáková, Alena; Raffel, Leslie J.; Yao, Jie; Kathiresan, Sekar; O'Donnell, Chris; Schwartz, Steven M.; Arfan Ikram, M.; Longstreth, Will T.; Seshadri, Sudha; Shrine, Nick R.G.; Wain, Louise V.; Morken, Mario A.; Swift, Amy J.; Laitinen, Jaana; Prokopenko, Inga; Zitting, Paavo; Cooper, Jackie A.; Humphries, Steve E.; Danesh, John; Rasheed, Asif; Goel, Anuj; Hamsten, Anders; Watkins, Hugh; Bakker, Stephan J.L.; van Gilst, Wiek H.; Janipalli, Charles S.; Radha Mani, K.; Yajnik, Chittaranjan S.; Hofman, Albert; Mattace-Raso, Francesco U.S.; Oostra, Ben A.; Demirkan, Ayse; Isaacs, Aaron; Rivadeneira, Fernando; Lakatta, Edward G.; Orru, Marco; Scuteri, Angelo; Ala-Korpela, Mika; Kangas, Antti J.; Lyytikäinen, Leo-Pekka; Soininen, Pasi; Tukiainen, Taru; Würz, Peter; Twee-Hee Ong, Rick; Dörr, Marcus; Kroemer, Heyo K.; Völker, Uwe; Völzke, Henry; Galan, Pilar; Hercberg, Serge; Lathrop, Mark; Zelenika, Diana; Deloukas, Panos; Mangino, Massimo; Spector, Tim D.; Zhai, Guangju; Meschia, James F.; Nalls, Michael A.; Sharma, Pankaj; Terzic, Janos; Kranthi Kumar, M.J.; Denniff, Matthew; Zukowska-Szczechowska, Ewa; Wagenknecht, Lynne E.; Fowkes, Gerald R.; Charchar, Fadi J.; Schwarz, Peter E.H.; Hayward, Caroline; Guo, Xiuqing; Bots, Michiel L.; Brand, Eva; Samani, Nilesh J.; Polasek, Ozren; Talmud, Philippa J.; Nyberg, Fredrik; Kuh, Diana; Laan, Maris; Hveem, Kristian; Palmer, Lyle J.; van der Schouw, Yvonne T.; Casas, Juan P.; Mohlke, Karen L.; Vineis, Paolo; Raitakari, Olli; Wong, Tien Y.; Shyong Tai, E.; Laakso, Markku; Rao, Dabeeru C.; Harris, Tamara B.; Morris, Richard W.; Dominiczak, Anna F.; Kivimaki, Mika; Marmot, Michael G.; Miki, Tetsuro; Saleheen, Danish; Chandak, Giriraj R.; Coresh, Josef; Navis, Gerjan; Salomaa, Veikko; Han, Bok-Ghee; Kooner, Jaspal S.; Melander, Olle; Ridker, Paul M.; Bandinelli, Stefania; Gyllensten, Ulf B.; Wright, Alan F.; Wilson, James F.; Ferrucci, Luigi; Farrall, Martin; Tuomilehto, Jaakko; Pramstaller, Peter P.; Elosua, Roberto; Soranzo, Nicole; Sijbrands, Eric J.G.; Altshuler, David; Loos, Ruth J.F.; Shuldiner, Alan R.; Gieger, Christian; Meneton, Pierre; Uitterlinden, Andre G.; Wareham, Nicholas J.; Gudnason, Vilmundur; Rettig, Rainer; Uda, Manuela; Strachan, David P.; Witteman, Jacqueline C.M.; Hartikainen, Anna-Liisa; Beckmann, Jacques S.; Boerwinkle, Eric; Boehnke, Michael; Larson, Martin G.; Järvelin, Marjo-Riitta; Psaty, Bruce M.; Abecasis, Gonçalo R.; Elliott, Paul; van Duijn , Cornelia M.; Newton-Cheh, Christopher
2011-01-01
The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10−8) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10−8). The top IBC association for SBP was rs2012318 (P= 6.4 × 10−6) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10−6) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexity. PMID:21378095
Fox, Ervin R; Young, J Hunter; Li, Yali; Dreisbach, Albert W; Keating, Brendan J; Musani, Solomon K; Liu, Kiang; Morrison, Alanna C; Ganesh, Santhi; Kutlar, Abdullah; Ramachandran, Vasan S; Polak, Josef F; Fabsitz, Richard R; Dries, Daniel L; Farlow, Deborah N; Redline, Susan; Adeyemo, Adebowale; Hirschorn, Joel N; Sun, Yan V; Wyatt, Sharon B; Penman, Alan D; Palmas, Walter; Rotter, Jerome I; Townsend, Raymond R; Doumatey, Ayo P; Tayo, Bamidele O; Mosley, Thomas H; Lyon, Helen N; Kang, Sun J; Rotimi, Charles N; Cooper, Richard S; Franceschini, Nora; Curb, J David; Martin, Lisa W; Eaton, Charles B; Kardia, Sharon L R; Taylor, Herman A; Caulfield, Mark J; Ehret, Georg B; Johnson, Toby; Chakravarti, Aravinda; Zhu, Xiaofeng; Levy, Daniel
2011-06-01
The prevalence of hypertension in African Americans (AAs) is higher than in other US groups; yet, few have performed genome-wide association studies (GWASs) in AA. Among people of European descent, GWASs have identified genetic variants at 13 loci that are associated with blood pressure. It is unknown if these variants confer susceptibility in people of African ancestry. Here, we examined genome-wide and candidate gene associations with systolic blood pressure (SBP) and diastolic blood pressure (DBP) using the Candidate Gene Association Resource (CARe) consortium consisting of 8591 AAs. Genotypes included genome-wide single-nucleotide polymorphism (SNP) data utilizing the Affymetrix 6.0 array with imputation to 2.5 million HapMap SNPs and candidate gene SNP data utilizing a 50K cardiovascular gene-centric array (ITMAT-Broad-CARe [IBC] array). For Affymetrix data, the strongest signal for DBP was rs10474346 (P= 3.6 × 10(-8)) located near GPR98 and ARRDC3. For SBP, the strongest signal was rs2258119 in C21orf91 (P= 4.7 × 10(-8)). The top IBC association for SBP was rs2012318 (P= 6.4 × 10(-6)) near SLC25A42 and for DBP was rs2523586 (P= 1.3 × 10(-6)) near HLA-B. None of the top variants replicated in additional AA (n = 11 882) or European-American (n = 69 899) cohorts. We replicated previously reported European-American blood pressure SNPs in our AA samples (SH2B3, P= 0.009; TBX3-TBX5, P= 0.03; and CSK-ULK3, P= 0.0004). These genetic loci represent the best evidence of genetic influences on SBP and DBP in AAs to date. More broadly, this work supports that notion that blood pressure among AAs is a trait with genetic underpinnings but also with significant complexity.
Technical note: Equivalent genomic models with a residual polygenic effect.
Liu, Z; Goddard, M E; Hayes, B J; Reinhardt, F; Reents, R
2016-03-01
Routine genomic evaluations in animal breeding are usually based on either a BLUP with genomic relationship matrix (GBLUP) or single nucleotide polymorphism (SNP) BLUP model. For a multi-step genomic evaluation, these 2 alternative genomic models were proven to give equivalent predictions for genomic reference animals. The model equivalence was verified also for young genotyped animals without phenotypes. Due to incomplete linkage disequilibrium of SNP markers to genes or causal mutations responsible for genetic inheritance of quantitative traits, SNP markers cannot explain all the genetic variance. A residual polygenic effect is normally fitted in the genomic model to account for the incomplete linkage disequilibrium. In this study, we start by showing the proof that the multi-step GBLUP and SNP BLUP models are equivalent for the reference animals, when they have a residual polygenic effect included. Second, the equivalence of both multi-step genomic models with a residual polygenic effect was also verified for young genotyped animals without phenotypes. Additionally, we derived formulas to convert genomic estimated breeding values of the GBLUP model to its components, direct genomic values and residual polygenic effect. Third, we made a proof that the equivalence of these 2 genomic models with a residual polygenic effect holds also for single-step genomic evaluation. Both the single-step GBLUP and SNP BLUP models lead to equal prediction for genotyped animals with phenotypes (e.g., reference animals), as well as for (young) genotyped animals without phenotypes. Finally, these 2 single-step genomic models with a residual polygenic effect were proven to be equivalent for estimation of SNP effects, too. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Wu, Jianhui; Huang, Shuo; Zeng, Qingdong; Liu, Shengjie; Wang, Qilin; Mu, Jingmei; Yu, Shizhou; Han, Dejun; Kang, Zhensheng
2018-06-16
A major stripe rust resistance QTL on chromosome 4BL was localized to a 4.5-Mb interval using comparative QTL mapping methods and validated in 276 wheat genotypes by haplotype analysis. CYMMIT-derived wheat line P10103 was previously identified to have adult plant resistance (APR) to stripe rust in the greenhouse and field. The conventional approach for QTL mapping in common wheat is laborious. Here, we performed QTL detection of APR using a combination of genome-wide scanning and extreme pool-genotyping. SNP-based genetic maps were constructed using the Wheat55 K SNP array to genotype a recombinant inbred line (RIL) population derived from the cross Mingxian 169 × P10103. Five stable QTL were detected across multiple environments. A fter comparing SNP profiles from contrasting, extreme DNA pools of RILs six putative QTL were located to approximate chromosome positions. A major QTL on chromosome 4B was identified in F 2:4 contrasting pools from cross Zhengmai 9023 × P10103. A consensus QTL (LOD = 26-40, PVE = 42-55%), named QYr.nwafu-4BL, was defined and localized to a 4.5-Mb interval flanked by SNP markers AX-110963704 and AX-110519862 in chromosome arm 4BL. Based on stripe rust response, marker genotypes, pedigree analysis and mapping data, QYr.nwafu-4BL is likely to be a new APR QTL. The applicability of the SNP-based markers flanking QYr.nwafu-4BL was validated on a diversity panel of 276 wheat lines. The additional minor QTL on chromosomes 4A, 5A, 5B and 6A enhanced the level of resistance conferred by QYr.nwafu-4BL. Marker-assisted pyramiding of QYr.nwafu-4BL and other favorable minor QTL in new wheat cultivars should improve the level of APR to stripe rust.
McClure, Matthew C; Bickhart, Derek; Null, Dan; Vanraden, Paul; Xu, Lingyang; Wiggans, George; Liu, George; Schroeder, Steve; Glasscock, Jarret; Armstrong, Jon; Cole, John B; Van Tassell, Curtis P; Sonstegard, Tad S
2014-01-01
The recent discovery of bovine haplotypes with negative effects on fertility in the Brown Swiss, Holstein, and Jersey breeds has allowed producers to identify carrier animals using commercial single nucleotide polymorphism (SNP) genotyping assays. This study was devised to identify the causative mutations underlying defective bovine embryo development contained within three of these haplotypes (Brown Swiss haplotype 1 and Holstein haplotypes 2 and 3) by combining exome capture with next generation sequencing. Of the 68,476,640 sequence variations (SV) identified, only 1,311 genome-wide SNP were concordant with the haplotype status of 21 sequenced carriers. Validation genotyping of 36 candidate SNP identified only 1 variant that was concordant to Holstein haplotype 3 (HH3), while no variants located within the refined intervals for HH2 or BH1 were concordant. The variant strictly associated with HH3 is a non-synonymous SNP (T/C) within exon 24 of the Structural Maintenance of Chromosomes 2 (SMC2) on Chromosome 8 at position 95,410,507 (UMD3.1). This polymorphism changes amino acid 1135 from phenylalanine to serine and causes a non-neutral, non-tolerated, and evolutionarily unlikely substitution within the NTPase domain of the encoded protein. Because only exome capture sequencing was used, we could not rule out the possibility that the true causative mutation for HH3 might lie in a non-exonic genomic location. Given the essential role of SMC2 in DNA repair, chromosome condensation and segregation during cell division, our findings strongly support the non-synonymous SNP (T/C) in SMC2 as the likely causative mutation. The absence of concordant variations for HH2 or BH1 suggests either the underlying causative mutations lie within a non-exomic region or in exome regions not covered by the capture array.
McClure, Matthew C.; Bickhart, Derek; Null, Dan; VanRaden, Paul; Xu, Lingyang; Wiggans, George; Liu, George; Schroeder, Steve; Glasscock, Jarret; Armstrong, Jon; Cole, John B.; Van Tassell, Curtis P.; Sonstegard, Tad S.
2014-01-01
The recent discovery of bovine haplotypes with negative effects on fertility in the Brown Swiss, Holstein, and Jersey breeds has allowed producers to identify carrier animals using commercial single nucleotide polymorphism (SNP) genotyping assays. This study was devised to identify the causative mutations underlying defective bovine embryo development contained within three of these haplotypes (Brown Swiss haplotype 1 and Holstein haplotypes 2 and 3) by combining exome capture with next generation sequencing. Of the 68,476,640 sequence variations (SV) identified, only 1,311 genome-wide SNP were concordant with the haplotype status of 21 sequenced carriers. Validation genotyping of 36 candidate SNP identified only 1 variant that was concordant to Holstein haplotype 3 (HH3), while no variants located within the refined intervals for HH2 or BH1 were concordant. The variant strictly associated with HH3 is a non-synonymous SNP (T/C) within exon 24 of the Structural Maintenance of Chromosomes 2 (SMC2) on Chromosome 8 at position 95,410,507 (UMD3.1). This polymorphism changes amino acid 1135 from phenylalanine to serine and causes a non-neutral, non-tolerated, and evolutionarily unlikely substitution within the NTPase domain of the encoded protein. Because only exome capture sequencing was used, we could not rule out the possibility that the true causative mutation for HH3 might lie in a non-exonic genomic location. Given the essential role of SMC2 in DNA repair, chromosome condensation and segregation during cell division, our findings strongly support the non-synonymous SNP (T/C) in SMC2 as the likely causative mutation. The absence of concordant variations for HH2 or BH1 suggests either the underlying causative mutations lie within a non-exomic region or in exome regions not covered by the capture array. PMID:24667746
MMP9 polymorphisms and breast cancer risk: a report from the Shanghai Breast Cancer Genetics Study.
Beeghly-Fadiel, Alicia; Lu, Wei; Shu, Xiao-Ou; Long, Jirong; Cai, Qiuyin; Xiang, Yongbin; Gao, Yu-Tang; Zheng, Wei
2011-04-01
In addition to tumor invasion and angiogenesis, matrix metalloproteinase (MMP)9 also contributes to carcinogenesis and tumor growth. Genetic variation that may influence MMP9 expression was evaluated among participants of the Shanghai Breast Cancer Genetics Study (SBCGS) for associations with breast cancer susceptibility. In stage 1, 11 MMP9 single nucleotide polymorphisms (SNPs) were genotyped by the Affymetrix Targeted Genotyping System and/or the Affymetrix Genome-Wide Human SNP Array 6.0 among 4,227 SBCGS participants. One SNP was further genotyped using the Sequenom iPLEX MassARRAY platform among an additional 6,270 SBCGS participants. Associations with breast cancer risk were evaluated by odds ratios (OR) and 95% confidence intervals (CI) from logistic regression models that included adjustment for age, education, and genotyping stage when appropriate. In Stage 1, rare allele homozygotes for a promoter SNP (rs3918241) or a non-synonymous SNP (rs2274756, R668Q) tended to occur more frequently among breast cancer cases (P value = 0.116 and 0.056, respectively). Given their high linkage disequilibrium (D' = 1.0, r (2) = 0.97), one (rs3918241) was selected for additional analysis. An association with breast cancer risk was not supported by additional Stage 2 genotyping. In combined analysis, no elevated risk of breast cancer among homozygotes was found (OR: 1.2, 95% CI: 0.8-1.8). Common genetic variation in MMP9 was not found to be significantly associated with breast cancer susceptibility among participants of the Shanghai Breast Cancer Genetics Study.
De La Vega, Francisco M; Dailey, David; Ziegle, Janet; Williams, Julie; Madden, Dawn; Gilbert, Dennis A
2002-06-01
Since public and private efforts announced the first draft of the human genome last year, researchers have reported great numbers of single nucleotide polymorphisms (SNPs). We believe that the availability of well-mapped, quality SNP markers constitutes the gateway to a revolution in genetics and personalized medicine that will lead to better diagnosis and treatment of common complex disorders. A new generation of tools and public SNP resources for pharmacogenomic and genetic studies--specifically for candidate-gene, candidate-region, and whole-genome association studies--will form part of the new scientific landscape. This will only be possible through the greater accessibility of SNP resources and superior high-throughput instrumentation-assay systems that enable affordable, highly productive large-scale genetic studies. We are contributing to this effort by developing a high-quality linkage disequilibrium SNP marker map and an accompanying set of ready-to-use, validated SNP assays across every gene in the human genome. This effort incorporates both the public sequence and SNP data sources, and Celera Genomics' human genome assembly and enormous resource ofphysically mapped SNPs (approximately 4,000,000 unique records). This article discusses our approach and methodology for designing the map, choosing quality SNPs, designing and validating these assays, and obtaining population frequency ofthe polymorphisms. We also discuss an advanced, high-performance SNP assay chemisty--a new generation of the TaqMan probe-based, 5' nuclease assay-and high-throughput instrumentation-software system for large-scale genotyping. We provide the new SNP map and validation information, validated SNP assays and reagents, and instrumentation systems as a novel resource for genetic discoveries.
Arlt, Martin F.; Ozdemir, Alev Cagla; Birkeland, Shanda R.; Lyons, Robert H.; Glover, Thomas W.; Wilson, Thomas E.
2011-01-01
Copy-number variants (CNVs) are a major source of genetic variation in human health and disease. Previous studies have implicated replication stress as a causative factor in CNV formation. However, existing data are technically limited in the quality of comparisons that can be made between human CNVs and experimentally induced variants. Here, we used two high-resolution strategies—single nucleotide polymorphism (SNP) arrays and mate-pair sequencing—to compare CNVs that occur constitutionally to those that arise following aphidicolin-induced DNA replication stress in the same human cells. Although the optimized methods provided complementary information, sequencing was more sensitive to small variants and provided superior structural descriptions. The majority of constitutional and all aphidicolin-induced CNVs appear to be formed via homology-independent mechanisms, while aphidicolin-induced CNVs were of a larger median size than constitutional events even when mate-pair data were considered. Aphidicolin thus appears to stimulate formation of CNVs that closely resemble human pathogenic CNVs and the subset of larger nonhomologous constitutional CNVs. PMID:21212237
Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I
2016-08-26
Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.
Increasing the number of single nucleotide polymorphisms used in genomic evaluations of dairy cattle
USDA-ARS?s Scientific Manuscript database
A small increase in the accuracy of genomic evaluations of dairy cattle was achieved by increasing the number of SNP used to 61,013. All the 45,195 SNP used previously were retained, and 15,818 SNP were selected from higher density genotyping chips if the magnitude of the SNP effect was among the to...
Rice SNP-seek database update: new SNPs, indels, and queries.
Mansueto, Locedie; Fuentes, Roven Rommel; Borja, Frances Nikki; Detras, Jeffery; Abriol-Santos, Juan Miguel; Chebotarov, Dmytro; Sanciangco, Millicent; Palis, Kevin; Copetti, Dario; Poliakov, Alexandre; Dubchak, Inna; Solovyev, Victor; Wing, Rod A; Hamilton, Ruaraidh Sackville; Mauleon, Ramil; McNally, Kenneth L; Alexandrov, Nickolai
2017-01-04
We describe updates to the Rice SNP-Seek Database since its first release. We ran a new SNP-calling pipeline followed by filtering that resulted in complete, base, filtered and core SNP datasets. Besides the Nipponbare reference genome, the pipeline was run on genome assemblies of IR 64, 93-11, DJ 123 and Kasalath. New genotype query and display features are added for reference assemblies, SNP datasets and indels. JBrowse now displays BAM, VCF and other annotation tracks, the additional genome assemblies and an embedded VISTA genome comparison viewer. Middleware is redesigned for improved performance by using a hybrid of HDF5 and RDMS for genotype storage. Query modules for genotypes, varieties and genes are improved to handle various constraints. An integrated list manager allows the user to pass query parameters for further analysis. The SNP Annotator adds traits, ontology terms, effects and interactions to markers in a list. Web-service calls were implemented to access most data. These features enable seamless querying of SNP-Seek across various biological entities, a step toward semi-automated gene-trait association discovery. URL: http://snp-seek.irri.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Cho, Young-Il; Ahn, Yul-Kyun; Tripathi, Swati; Kim, Jeong-Ho; Lee, Hye-Eun; Kim, Do-Sun
2015-01-01
Numerous studies using single nucleotide polymorphisms (SNPs) have been conducted in humans, and other animals, and in major crops, including rice, soybean, and Chinese cabbage. However, the number of SNP studies in cabbage is limited. In this present study, we evaluated whether 7,645 SNPs previously identified as molecular markers linked to disease resistance in the Brassica rapa genome could be applied to B. oleracea. In a BLAST analysis using the SNP sequences of B. rapa and B. oleracea genomic sequence data registered in the NCBI database, 256 genes for which SNPs had been identified in B. rapa were found in B. oleracea. These genes were classified into three functional groups: molecular function (64 genes), biological process (96 genes), and cellular component (96 genes). A total of 693 SNP markers, including 145 SNP markers [BRH—developed from the B. rapa genome for high-resolution melt (HRM) analysis], 425 SNP markers (BRP—based on the B. rapa genome that could be applied to B. oleracea), and 123 new SNP markers (BRS—derived from BRP and designed for HRM analysis), were investigated for their ability to amplify sequences from cabbage genomic DNA. In total, 425 of the SNP markers (BRP-based on B. rapa genome), selected from 7,645 SNPs, were successfully applied to B. oleracea. Using PCR, 108 of 145 BRH (74.5%), 415 of 425 BRP (97.6%), and 118 of 123 BRS (95.9%) showed amplification, suggesting that it is possible to apply SNP markers developed based on the B. rapa genome to B. oleracea. These results provide valuable information that can be utilized in cabbage genetics and breeding programs using molecular markers derived from other Brassica species. PMID:25790283
Müschen, Markus; Kato, Motohiro; Kawamata, Norihiko; Meixel, Antonie; Nowak, Verena; Kim, Han S.; Kang, Sharon; Paquette, Ronald; Chang, Mi-Sook; Thoenissen, Nils H.; Mossner, Max; Hofmann, Wolf-Karsten; Kohlmann, Alexander; Weiss, Tamara; Haferlach, Torsten; Haferlach, Claudia; Koeffler, H. Phillip
2010-01-01
To elucidate whether tyrosine kinase inhibitor (TKI) resistance in chronic myeloid leukemia is associated with characteristic genomic alterations, we analyzed DNA samples from 45 TKI-resistant chronic myeloid leukemia patients with 250K single nucleotide polymorphism arrays. From 20 patients, matched serial samples of pretreatment and TKI resistance time points were available. Eleven of the 45 TKI-resistant patients had mutations of BCR-ABL1, including 2 T315I mutations. Besides known TKI resistance-associated genomic lesions, such as duplication of the BCR-ABL1 gene (n = 8) and trisomy 8 (n = 3), recurrent submicroscopic alterations, including acquired uniparental disomy, were detectable on chromosomes 1, 8, 9, 17, 19, and 22. On chromosome 22, newly acquired and recurrent deletions of the IGLC1 locus were detected in 3 patients, who had previously presented with lymphoid or myeloid blast crisis. This may support a hypothesis of TKI-induced selection of subclones differentiating into immature B-cell progenitors as a mechanism of disease progression and evasion of TKI sensitivity. PMID:19965645
SEURAT: Visual analytics for the integrated analysis of microarray data
2010-01-01
Background In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. Results We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. Conclusions The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data. PMID:20525257
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H; Proukakis, Christos
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance.
Nacheva, Elizabeth; Mokretar, Katya; Soenmez, Aynur; Pittman, Alan M.; Grace, Colin; Valli, Roberto; Ejaz, Ayesha; Vattathil, Selina; Maserati, Emanuela; Houlden, Henry; Taanman, Jan-Willem; Schapira, Anthony H.
2017-01-01
Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array “waves”, and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance. PMID:28683077
SUSCEPTIBILITY LOCI FOR UMBILICAL HERNIA IN SWINE DETECTED BY GENOME-WIDE ASSOCIATION.
Liao, X J; Lia, L; Zhang, Z Y; Long, Y; Yang, B; Ruan, G R; Su, Y; Ai, H S; Zhang, W C; Deng, W Y; Xiao, S J; Ren, J; Ding, N S; Huang, L S
2015-10-01
Umbilical hernia (UH) is a complex disorder caused by both genetic and environmental factors. UH brings animal welfare problems and severe economic loss to the pig industry. Until now, the genetic basis of UH is poorly understood. The high-density 60K porcine SNP array enables the rapid application of genome-wide association study (GWAS) to identify genetic loci for phenotypic traits at genome wide scale in pigs. The objective of this research was to identify susceptibility loci for swine umbilical hernia using the GWAS approach. We genotyped 478 piglets from 142 families representing three Western commercial breeds with the Illumina PorcineSNP60 BeadChip. Then significant SNPs were detected by GWAS using ROADTRIPS (Robust Association-Detection Test for Related Individuals with Population Substructure) software base on a Bonferroni corrected threshold (P = 1.67E-06) or suggestive threshold (P = 3.34E-05) and false discovery rate (FDR = 0.05). After quality control, 29,924 qualified SNPs and 472 piglets were used for GWAS. Two suggestive loci predisposing to pig UH were identified at 44.25MB on SSC2 (rs81358018, P = 3.34E-06, FDR = 0.049933) and at 45.90MB on SSC17 (rs81479278, P = 3.30E-06, FDR = 0.049933) in Duroc population, respectively. And no SNP was detected to be associated with pig UH at significant level in neither Landrace nor Large White population. Furthermore, we carried out a meta-analysis in the combined pure-breed population containing all the 472 piglets. rs81479278 (P = 1.16E-06, FDR = 0.022475) was identified to associate with pig UH at genome-wide significant level. SRC was characterized as plausible candidate gene for susceptibility to pig UH according to its genomic position and biological functions. To our knowledge, this study gives the first description of GWAS identifying susceptibility loci for umbilical hernia in pigs. Our findings provide deeper insights to the genetic architecture of umbilical hernia in pigs.
Dutra, Roberta L; Piazzon, Flavia B; Zanardo, Évelin A; Costa, Thais Virginia Moura Machado; Montenegro, Marília M; Novo-Filho, Gil M; Dias, Alexandre T; Nascimento, Amom M; Kim, Chong Ae; Kulikowski, Leslie D
2015-12-01
Williams-Beuren syndrome (WBS) is caused by a hemizygous contiguous gene microdeletion of 1.55-1.84 Mb at 7q11.23 region. Approximately, 28 genes have been shown to contribute to classical phenotype of SWB with presence of dysmorphic facial features, supravalvular aortic stenosis (SVAS), intellectual disability, and overfriendliness. With the use of Microarray-based comparative genomic hybridization and other molecular cytogenetic techniques, is possible define with more accuracy partial or atypical deletion and refine the genotype-phenotype correlation. Here, we report on a rare genomic structural rearrangement in a boy with atypical deletion in 7q11.23 and XYY syndrome with characteristic clinical signs, but not sufficient for the diagnosis of WBS. Cytogenetic analysis of G-banding showed a karyotype 47,XYY. Analysis of DNA with the technique of MLPA (Multiplex Ligation-dependent Probe Amplification) using kits a combination of kits (P064, P036, P070, and P029) identified an atypical deletion on 7q11.23. In addition, high resolution SNP Oligonucleotide Microarray Analysis (SNP-array) confirmed the alterations found by MLPA and revealed others pathogenic CNVs, in the chromosomes 7 and X. The present report demonstrates an association not yet described in literature, between Williams-Beuren syndrome and 47,XYY. The identification of atypical deletion in 7q11.23 concomitant to additional pathogenic CNVs in others genomic regions allows a better comprehension of clinical consequences of atypical genomic rearrangements. © 2015 Wiley Periodicals, Inc.
snpGeneSets: An R Package for Genome-Wide Study Annotation
Mei, Hao; Li, Lianna; Jiang, Fan; Simino, Jeannette; Griswold, Michael; Mosley, Thomas; Liu, Shijian
2016-01-01
Genome-wide studies (GWS) of SNP associations and differential gene expressions have generated abundant results; next-generation sequencing technology has further boosted the number of variants and genes identified. Effective interpretation requires massive annotation and downstream analysis of these genome-wide results, a computationally challenging task. We developed the snpGeneSets package to simplify annotation and analysis of GWS results. Our package integrates local copies of knowledge bases for SNPs, genes, and gene sets, and implements wrapper functions in the R language to enable transparent access to low-level databases for efficient annotation of large genomic data. The package contains functions that execute three types of annotations: (1) genomic mapping annotation for SNPs and genes and functional annotation for gene sets; (2) bidirectional mapping between SNPs and genes, and genes and gene sets; and (3) calculation of gene effect measures from SNP associations and performance of gene set enrichment analyses to identify functional pathways. We applied snpGeneSets to type 2 diabetes (T2D) results from the NHGRI genome-wide association study (GWAS) catalog, a Finnish GWAS, and a genome-wide expression study (GWES). These studies demonstrate the usefulness of snpGeneSets for annotating and performing enrichment analysis of GWS results. The package is open-source, free, and can be downloaded at: https://www.umc.edu/biostats_software/. PMID:27807048
Tolleson, M W; Gill, C A; Herring, A D; Riggs, P K; Sawyer, J E; Sanders, J O; Riley, D G
2017-06-01
The size, support, and health of udders limit the productive life of beef cows, especially those with background, because, in general, such cows have a reputation for problems with udders. Genomic association studies of bovine udder traits have been conducted in dairy cattle and recently in Continental European beef breeds but not in cows with background. The objective of this study was to determine associations of SNP and udder support scores, teat length, and teat diameter in half (Nellore), half (Angus) cows. Udders of cows ( = 295) born from 2003 to 2007 were evaluated for udder support and teat length and diameter ( = 1,746 records) from 2005 through 2014. These included a subjective score representing udder support (values of 1 indicated poorly supported, pendulous udders and values of 9 indicated very well-supported udders) and lengths and diameters of individual teats in the 4 udder quarters as well as the average. Cows were in full-sibling or half-sibling families. Residuals for each trait were produced from repeated records models with cow age category nested within birth year of cows. Those residuals were averaged to become the dependent variables for genomewide association analyses. Regression analyses of those dependent variables included genotypic values as explanatory variables for 34,980 SNP from a commercially available array and included the genomic relationship matrix. Fifteen SNP loci on BTA 5 were associated (false discovery rate controlled at 0.05) with udder support score. One of those was also detected as associated with average teat diameter. Three of those 15 SNP were located within genes, including one each in (), (), and (). These are notable for their functional role in some aspect of mammary gland formation or health. Other candidate genes for these traits in the vicinity of the SNP loci include () and (). Because these were detected in Nellore-Angus crossbred cows, which typically have very well-formed udders with excellent support across their productive lives, similar efforts in other breeds should be completed, because that may facilitate further refinement of genomic regions responsible for variation in udder traits important in multiple breeds.
Genomic selection in dairy cattle: the USDA experience
USDA-ARS?s Scientific Manuscript database
Genomic selection has revolutionized dairy cattle breeding. Since 2000, assays have been developed to genotype large numbers of single nucleotide polymorphisms (SNP) at relatively low cost. The first commercial SNP genotyping chip was released with a set of 54,001 SNP in December 2007. Over 15,000 ...
Haplotype-based approach to known MS-associated regions increases the amount of explained risk
Khankhanian, Pouya; Gourraud, Pierre-Antoine; Lizee, Antoine; Goodin, Douglas S
2015-01-01
Genome-wide association studies (GWAS), using single nucleotide polymorphisms (SNPs), have yielded 110 non-human leucocyte antigen genomic regions that are associated with multiple sclerosis (MS). Despite this large number of associations, however, only 28% of MS-heritability can currently be explained. Here we compare the use of multi-SNP-haplotypes to the use of single-SNPs as alternative methods to describe MS genetic risk. SNP-haplotypes (of various lengths from 1 up to 15 contiguous SNPs) were constructed at each of the 110 previously identified, MS-associated, genomic regions. Even after correcting for the larger number of statistical comparisons made when using the haplotype-method, in 32 of the regions, the SNP-haplotype based model was markedly more significant than the single-SNP based model. By contrast, in no region was the single-SNP based model similarly more significant than the SNP-haplotype based model. Moreover, when we included the 932 MS-associated SNP-haplotypes (that we identified from 102 regions) as independent variables into a logistic linear model, the amount of MS-heritability, as assessed by Nagelkerke's R-squared, was 38%, which was considerably better than 29%, which was obtained by using only single-SNPs. This study demonstrates that SNP-haplotypes can be used to fine-map the genetic associations within regions of interest previously identified by single-SNP GWAS. Moreover, the amount of the MS genetic risk explained by the SNP-haplotype associations in the 110 MS-associated genomic regions was considerably greater when using SNP-haplotypes than when using single-SNPs. Also, the use of SNP-haplotypes can lead to the discovery of new regions of interest, which have not been identified by a single-SNP GWAS. PMID:26185143
He, Jun; Xu, Jiaqi; Wu, Xiao-Lin; Bauck, Stewart; Lee, Jungjae; Morota, Gota; Kachman, Stephen D; Spangler, Matthew L
2018-04-01
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821-0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825-0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.
Variant calling in low-coverage whole genome sequencing of a Native American population sample.
Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C
2014-01-30
The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.
Singh, Hardeep; Sahini, Nishika; Jalali, Subhadra; Mohan, Gayathri
2012-01-01
Purpose To identify genes underlying autosomal recessive retinitis pigmentosa (ARRP) by homozygosity mapping. Methods Families with ARRP were recruited after complete ophthalmic evaluation of all members and diagnosis of RP by predefined criteria. Genomic DNA from affected members of 26 families was genotyped on Illumina single nucleotide polymorphism (SNP) 6.0 K arrays with standard procedures. Genotypes were evaluated for homozygous regions that were common and concordant between affected members of each family. The genes mapping to homozygous intervals within these families were screened for pathogenic changes with PCR amplification and sequencing of coding regions. Cosegegration of sequence changes with disease was determined within each pedigree, and each variation was tested for presence in 100 unrelated normal controls. Results A genome-wide scan for homozygosity showed homozygous regions harboring the tubby like protein 1 gene (TULP1; chromosome 6) in one family, the nuclear receptor subfamily 2, group E, member 3 gene (NR2E3; chromosome 15) in three families, and the membrane frizzled-related protein gene (MFRP; chromosome 11) in one family. Screening of the three genes in the respective families revealed homozygous disease-causing mutations in three families. These included a missense mutation in TULP1, a deletion-cum-insertion in NR2E3, and a single base deletion in MFRP. Patients from all three families had a rod-cone type of dystrophy with night blindness initially. The NR2E3 and MFRP genes were associated with fundus features atypical of RP. Conclusions This study shows involvement of the TULP1, NR2E3, and MFRP genes in ARRP in Indian cases. Genome-wide screening with SNP arrays followed by a prioritized candidate gene evaluation is useful in identifying genes in these patients. PMID:22605927
SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks.
Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng
2016-11-15
Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery.
Kelemen, Arpad; Vasilakos, Athanasios V; Liang, Yulan
2009-09-01
Comprehensive evaluation of common genetic variations through association of single-nucleotide polymorphism (SNP) structure with common complex disease in the genome-wide scale is currently a hot area in human genome research due to the recent development of the Human Genome Project and HapMap Project. Computational science, which includes computational intelligence (CI), has recently become the third method of scientific enquiry besides theory and experimentation. There have been fast growing interests in developing and applying CI in disease mapping using SNP and haplotype data. Some of the recent studies have demonstrated the promise and importance of CI for common complex diseases in genomic association study using SNP/haplotype data, especially for tackling challenges, such as gene-gene and gene-environment interactions, and the notorious "curse of dimensionality" problem. This review provides coverage of recent developments of CI approaches for complex diseases in genetic association study with SNP/haplotype data.
2014-01-01
Background The accessibility of high-throughput genotyping technologies has contributed greatly to the development of genomic resources in non-model organisms. High-density genotyping arrays have only recently been developed for some economically important species such as conifers. The potential for using genomic technologies in association mapping and breeding depends largely on the genome wide patterns of diversity and linkage disequilibrium in current breeding populations. This study aims to deepen our knowledge regarding these issues in maritime pine, the first species used for reforestation in south western Europe. Results Using a new map merging algorithm, we first established a 1,712 cM composite linkage map (comprising 1,838 SNP markers in 12 linkage groups) by bringing together three already available genetic maps. Using rigorous statistical testing based on kernel density estimation and resampling we identified cold and hot spots of recombination. In parallel, 186 unrelated trees of a mass-selected population were genotyped using a 12k-SNP array. A total of 2,600 informative SNPs allowed to describe historical recombination, genetic diversity and genetic structure of this recently domesticated breeding pool that forms the basis of much of the current and future breeding of this species. We observe very low levels of population genetic structure and find no evidence that artificial selection has caused a reduction in genetic diversity. By combining these two pieces of information, we provided the map position of 1,671 SNPs corresponding to 1,192 different loci. This made it possible to analyze the spatial pattern of genetic diversity (H e ) and long distance linkage disequilibrium (LD) along the chromosomes. We found no particular pattern in the empirical variogram of H e across the 12 linkage groups and, as expected for an outcrossing species with large effective population size, we observed an almost complete lack of long distance LD. Conclusions These results are a stepping stone for the development of strategies for studies in population genomics, association mapping and genomic prediction in this economical and ecologically important forest tree species. PMID:24581176
Al-Mamun, Hawlader Abdullah; Clark, Samuel A; Kwan, Paul; Gondro, Cedric
2015-11-24
Knowledge of the genetic structure and overall diversity of livestock species is important to maximise the potential of genome-wide association studies and genomic prediction. Commonly used measures such as linkage disequilibrium (LD), effective population size (N e ), heterozygosity, fixation index (F ST) and runs of homozygosity (ROH) are widely used and help to improve our knowledge about genetic diversity in animal populations. The development of high-density single nucleotide polymorphism (SNP) arrays and the subsequent genotyping of large numbers of animals have greatly increased the accuracy of these population-based estimates. In this study, we used the Illumina OvineSNP50 BeadChip array to estimate and compare LD (measured by r (2) and D'), N e , heterozygosity, F ST and ROH in five Australian sheep populations: three pure breeds, i.e., Merino (MER), Border Leicester (BL), Poll Dorset (PD) and two crossbred populations i.e. F1 crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset (MxBxP). Compared to other livestock species, the sheep populations that were analysed in this study had low levels of LD and high levels of genetic diversity. The rate of LD decay was greater in Merino than in the other pure breeds. Over short distances (<10 kb), the levels of LD were higher in BL and PD than in MER. Similarly, BL and PD had comparatively smaller N e than MER. Observed heterozygosity in the pure breeds ranged from 0.3 in BL to 0.38 in MER. Genetic distances between breeds were modest compared to other livestock species (highest F ST = 0.063) but the genetic diversity within breeds was high. Based on ROH, two chromosomal regions showed evidence of strong recent selection. This study shows that there is a large range of genome diversity in Australian sheep breeds, especially in Merino sheep. The observed range of diversity will influence the design of genome-wide association studies and the results that can be obtained from them. This knowledge will also be useful to design reference populations for genomic prediction of breeding values in sheep.
ENU Mutagenesis in Mice Identifies Candidate Genes For Hypogonadism
Weiss, Jeffrey; Hurley, Lisa A.; Harris, Rebecca M.; Finlayson, Courtney; Tong, Minghan; Fisher, Lisa A.; Moran, Jennifer L.; Beier, David R.; Mason, Christopher; Jameson, J. Larry
2012-01-01
Genome-wide mutagenesis was performed in mice to identify candidate genes for male infertility, for which the predominant causes remain idiopathic. Mice were mutagenized using N-ethyl-N-nitrosourea (ENU), bred, and screened for phenotypes associated with the male urogenital system. Fifteen heritable lines were isolated and chromosomal loci were assigned using low density genome-wide SNP arrays. Ten of the fifteen lines were pursued further using higher resolution SNP analysis to narrow the candidate gene regions. Exon sequencing of candidate genes identified mutations in mice with cystic kidneys (Bicc1), cryptorchidism (Rxfp2), restricted germ cell deficiency (Plk4), and severe germ cell deficiency (Prdm9). In two other lines with severe hypogonadism candidate sequencing failed to identify mutations, suggesting defects in genes with previously undocumented roles in gonadal function. These genomic intervals were sequenced in their entirety and a candidate mutation was identified in SnrpE in one of the two lines. The line harboring the SnrpE variant retains substantial spermatogenesis despite small testis size, an unusual phenotype. In addition to the reproductive defects, heritable phenotypes were observed in mice with ataxia (Myo5a), tremors (Pmp22), growth retardation (unknown gene), and hydrocephalus (unknown gene). These results demonstrate that the ENU screen is an effective tool for identifying potential causes of male infertility. PMID:22258617
[Prenatal genetic diagnosis for a fetus with atypical neurofibromatosis type 1 microdeletion].
Lin, Shaobin; Wu, Jianzhu; Zhang, Zhiqiang; Ji, Yuanjun; Fang, Qun; Chen, Baojiang; Luo, Yanmin
2016-04-01
To analyze the correlation between atypical neurofibromatosis type 1(NF1) microdeletion and fetal phenotype. Fetal blood sampling was carried out for a woman bearing a fetus with talipes equinovarus. G-banded karyotyping and single nucleotide polymorphism array (SNP-array) were performed on the fetal blood sample. Fluorescence in situ hybridization (FISH) was used to confirm the result of SNP array analysis. FISH assay was also carried out on peripheral blood specimens from the parents to ascertain the origin of mutation. The karyotype of fetus was found to be 46, XY by G-banding analysis. However, a 3.132 Mb microdeletion was detected in chromosome region 17q11.2 by SNP array, which overlaped with the region of NF1 microdeletion syndrome. Analyzing of the specimens from the fetus and its parents with FISH has confirmed it to be a de novo deletion. Talipes equinovarus may be an abnormal sonographic feature of fetus with atypical NF1 microdeletion which can be accurately diagnosed with SNP array.
2013-01-01
Background The availability of a large expressed sequence tags (EST) resource and recent advances in high-throughput genotyping technology have made it possible to develop highly multiplexed SNP arrays for multi-objective genetic applications, including the construction of meiotic maps. Such approaches are particularly useful in species with a large genome size, precluding the use of whole-genome shotgun assembly with current technologies. Results In this study, a 12 k-SNP genotyping array was developed for maritime pine from an extensive EST resource assembled into a unigene set. The offspring of three-generation outbred and inbred mapping pedigrees were then genotyped. The inbred pedigree consisted of a classical F2 population resulting from the selfing of a single inter-provenance (Landes x Corsica) hybrid tree, whereas the outbred pedigree (G2) resulted from a controlled cross of two intra-provenance (Landes x Landes) hybrid trees. This resulted in the generation of three linkage maps based on SNP markers: one from the parental genotype of the F2 population (1,131 markers in 1,708 centimorgan (cM)), and one for each parent of the G2 population (1,015 and 1,110 markers in 1,447 and 1,425 cM for the female and male parents, respectively). A comparison of segregation patterns in the progeny obtained from the two types of mating (inbreeding and outbreeding) led to the identification of a chromosomal region carrying an embryo viability locus with a semi-lethal allele. Following selfing and segregation, zygote mortality resulted in a deficit of Corsican homozygous genotypes in the F2 population. This dataset was also used to study the extent and distribution of meiotic recombination along the length of the chromosomes and the effect of sex and/or genetic background on recombination. The genetic background of trees in which meiotic recombination occurred was found to have a significant effect on the frequency of recombination. Furthermore, only a small proportion of the recombination hot- and cold-spots were common to all three genotypes, suggesting that the spatial pattern of recombination was genetically variable. Conclusion This study led to the development of classical genomic tools for this ecologically and economically important species. It also identified a chromosomal region bearing a semi-lethal recessive allele and demonstrated the genetic variability of recombination rate over the genome. PMID:23597128
Linkage disequilibrium between STRPs and SNPs across the human genome.
Payseur, Bret A; Place, Michael; Weber, James L
2008-05-01
Patterns of linkage disequilibrium (LD) reveal the action of evolutionary processes and provide crucial information for association mapping of disease genes. Although recent studies have described the landscape of LD among single nucleotide polymorphisms (SNPs) from across the human genome, associations involving other classes of molecular variation remain poorly understood. In addition to recombination and population history, mutation rate and process are expected to shape LD. To test this idea, we measured associations between short-tandem-repeat polymorphisms (STRPs), which can mutate rapidly and recurrently, and SNPs in 721 regions across the human genome. We directly compared STRP-SNP LD with SNP-SNP LD from the same genomic regions in the human HapMap populations. The intensity of STRP-SNP LD, measured by the average of D', was reduced, consistent with the action of recurrent mutation. Nevertheless, a higher fraction of STRP-SNP pairs than SNP-SNP pairs showed significant LD, on both short (up to 50 kb) and long (cM) scales. These results reveal the substantial effects of mutational processes on LD at STRPs and provide important measures of the potential of STRPs for association mapping of disease genes.
Identification of Genes Promoting Skin Youthfulness by Genome-Wide Association Study
Chang, Anne L.S.; Atzmon, Gil; Bergman, Aviv; Brugmann, Samantha; Atwood, Scott X; Chang, Howard Y; Barzilai, Nir
2014-01-01
To identify genes that promote facial skin youthfulness (SY), a genome-wide association study on an Ashkenazi Jewish discovery group (n=428) was performed using Affymetrix 6.0 Single-Nucleotide Polymorphism (SNP) Array. After SNP quality controls, 901,470 SNPs remained for analysis. The eigenstrat method showed no stratification. Cases and controls were identified by global facial skin aging severity including intrinsic and extrinsic parameters. Linear regression adjusted for age and gender, with no significant differences in smoking history, body mass index, menopausal status, or personal or family history of centenarians. Six SNPs met the Bonferroni threshold with Pallele<10−8; two of these six had Pgenotype<10−8. Quantitative trait loci mapping confirmed linkage disequilibrium. The six SNPs were interrogated by MassARRAY in a replication group (n=436) with confirmation of rs6975107, an intronic region of KCND2 (potassium voltage-gated channel, Shal-related family member 2) (Pgenotype=0.023). A second replication group (n=371) confirmed rs318125, downstream of DIAPH2 (diaphanous homolog 2 (Drosophila)) (Pallele=0.010, Pgenotype=0.002) and rs7616661, downstream of EDEM1 (ER degradation enhancer, mannosidase α-like 1) (Pgenotype=0.042). DIAPH2 has been associated with premature ovarian insufficiency, an aging phenotype in humans. EDEM1 associates with lifespan in animal models, although not humans. KCND2 is expressed in human skin, but has not been associated with aging. These genes represent new candidate genes to study the molecular basis of healthy skin aging. PMID:24037343
Wen, Weie; He, Zhonghu; Gao, Fengmei; Liu, Jindong; Jin, Hui; Zhai, Shengnan; Qu, Yanying; Xia, Xianchun
2017-01-01
A high-density consensus map is a powerful tool for gene mapping, cloning and molecular marker-assisted selection in wheat breeding. The objective of this study was to construct a high-density, single nucleotide polymorphism (SNP)-based consensus map of common wheat (Triticum aestivum L.) by integrating genetic maps from four recombinant inbred line populations. The populations were each genotyped using the wheat 90K Infinium iSelect SNP assay. A total of 29,692 SNP markers were mapped on 21 linkage groups corresponding to 21 hexaploid wheat chromosomes, covering 2,906.86 cM, with an overall marker density of 10.21 markers/cM. Compared with the previous maps based on the wheat 90K SNP chip detected 22,736 (76.6%) of the SNPs with consistent chromosomal locations, whereas 1,974 (6.7%) showed different chromosomal locations, and 4,982 (16.8%) were newly mapped. Alignment of the present consensus map and the wheat expressed sequence tags (ESTs) Chromosome Bin Map enabled assignment of 1,221 SNP markers to specific chromosome bins and 819 ESTs were integrated into the consensus map. The marker orders of the consensus map were validated based on physical positions on the wheat genome with Spearman rank correlation coefficients ranging from 0.69 (4D) to 0.97 (1A, 4B, 5B, and 6A), and were also confirmed by comparison with genetic position on the previously 40K SNP consensus map with Spearman rank correlation coefficients ranging from 0.84 (6D) to 0.99 (6A). Chromosomal rearrangements reported previously were confirmed in the present consensus map and new putative rearrangements were identified. In addition, an integrated consensus map was developed through the combination of five published maps with ours, containing 52,607 molecular markers. The consensus map described here provided a high-density SNP marker map and a reliable order of SNPs, representing a step forward in mapping and validation of chromosomal locations of SNPs on the wheat 90K array. Moreover, it can be used as a reference for quantitative trait loci (QTL) mapping to facilitate exploitation of genes and QTL in wheat breeding. PMID:28848588
Rajasekaran, S; Kanna, Rishi Mugesh; Senthil, Natesan; Raveendran, Muthuraja; Cheung, Kenneth M C; Chan, Danny; Subramaniam, Sakthikanal; Shetty, Ajoy Prasad
2013-10-01
Although the influence of genetics on the process of disc degeneration is well recognized, in recently published studies, there is a wide variation in the race and selection criteria for such study populations. More importantly, the radiographic features of disc degeneration that are selected to represent the disc degeneration phenotype are variable in these studies. The study presented here evaluates the association between single nucleotide polymorphisms (SNPs) of candidate genes and three distinct radiographic features that can be defined as the degenerative disc disease (DDD) phenotype. The study objectives were to examine the allelic diversity of 58 SNPs related to 35 candidate genes related to lumbar DDD, to evaluate the association in a hitherto unevaluated ethnic Indian population that represents more than one-sixth of the world population, and to analyze how genetic associations can vary in the same study subjects with the choice of phenotype. A cross-sectional, case-control study of an ethnic Indian population was carried out. Fifty-eight SNPs in 35 potential candidate genes were evaluated in 342 subjects and the associations were analyzed against three highly specific markers for DDD, namely disc degeneration by Pfirrmann grading, end-plate damage evaluated by total end-plate damage score, and annular tears evaluated by disc herniations and hyperintense zones. Genotyping of cases and controls was performed on a genome-wide SNP array to identify potential associated disease loci. The results from the genome-wide SNP array were then used to facilitate SNP selection and genotype validation was conducted using Sequenom-based genotyping. Eleven of the 58 SNPs provided evidence of association with one of the phenotypes. For annular tears, rs1042631 SNP of AGC1 and rs467691 SNP of ADAMTS5 were highly significantly associated (p<.01) and SNPs in NGFB, IL1B, IL18RAP, and MMP10 were also significantly associated (p<.05). The rs4076018 SNP of NGFB was highly significant (p<.01) and rs2292657 SNP of GLI1 was significantly (p<.05) correlated to disc degeneration. For end-plate damage, the rs2252070 SNP of MMP 13 showed a significant association (p<.05). Previously associated genes such as COL 9, SKT, CHST 3, CILP, IGFR, SOXp, BMP, MMP 2-12, ADH2, IL1RN, and COX2 were not significantly associated and new associations (NGFB and GLI1) were identified. The validity of all the associations was found to be phenotype dependent. For the first time, genetic associations with DDD have been performed in an Indian population. Apart from identifying new associations, the highlight of the study was that in the same study population with DDD, SNP associations completely changed when different radiographic features were used to define the DDD phenotype. Our study results therefore indicate that standardization of the phenotypes chosen to study the genetics of disc degeneration is essential and should be strongly considered before planning genetic association studies. Copyright © 2013 Elsevier Inc. All rights reserved.
Sunflower Hybrid Breeding: From Markers to Genomic Selection
Dimitrijevic, Aleksandra; Horn, Renate
2018-01-01
In sunflower, molecular markers for simple traits as, e.g., fertility restoration, high oleic acid content, herbicide tolerance or resistances to Plasmopara halstedii, Puccinia helianthi, or Orobanche cumana have been successfully used in marker-assisted breeding programs for years. However, agronomically important complex quantitative traits like yield, heterosis, drought tolerance, oil content or selection for disease resistance, e.g., against Sclerotinia sclerotiorum have been challenging and will require genome-wide approaches. Plant genetic resources for sunflower are being collected and conserved worldwide that represent valuable resources to study complex traits. Sunflower association panels provide the basis for genome-wide association studies, overcoming disadvantages of biparental populations. Advances in technologies and the availability of the sunflower genome sequence made novel approaches on the whole genome level possible. Genotype-by-sequencing, and whole genome sequencing based on next generation sequencing technologies facilitated the production of large amounts of SNP markers for high density maps as well as SNP arrays and allowed genome-wide association studies and genomic selection in sunflower. Genome wide or candidate gene based association studies have been performed for traits like branching, flowering time, resistance to Sclerotinia head and stalk rot. First steps in genomic selection with regard to hybrid performance and hybrid oil content have shown that genomic selection can successfully address complex quantitative traits in sunflower and will help to speed up sunflower breeding programs in the future. To make sunflower more competitive toward other oil crops higher levels of resistance against pathogens and better yield performance are required. In addition, optimizing plant architecture toward a more complex growth type for higher plant densities has the potential to considerably increase yields per hectare. Integrative approaches combining omic technologies (genomics, transcriptomics, proteomics, metabolomics and phenomics) using bioinformatic tools will facilitate the identification of target genes and markers for complex traits and will give a better insight into the mechanisms behind the traits. PMID:29387071
Rodriguez-Murillo, Laura; Fromer, Menachem; Mazaika, Erica; Vardarajan, Badri; Italia, Michael; Leipzig, Jeremy; DePalma, Steven R.; Golhar, Ryan; Sanders, Stephan J.; Yamrom, Boris; Ronemus, Michael; Iossifov, Ivan; Willsey, A. Jeremy; State, Matthew W.; Kaltman, Jonathan R.; White, Peter S.; Shen, Yufeng; Warburton, Dorothy; Brueckner, Martina; Seidman, Christine; Goldmuntz, Elizabeth; Gelb, Bruce D.; Lifton, Richard; Seidman, Jonathan; Hakonarson, Hakon; Chung, Wendy K.
2014-01-01
Rationale Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown etiology. Objective To determine the contribution of de novo copy number variants (CNVs) in the etiology of sporadic CHD. Methods and Results We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism (SNP) arrays and/or whole exome sequencing (WES). Results were experimentally validated using digital droplet PCR. We compared validated CNVs in CHD cases to CNVs in 1,301 healthy control trios. The two complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either SNP array (p=7x10−5, Odds Ratio (OR)=4.6) or WES data (p=6x10−4, OR=3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (p=0.02, OR=2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in WES and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q sub-telomeric deletions. Conclusions We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD. PMID:25205790
Al-Mamun, Hawlader A; Kwan, Paul; Clark, Samuel A; Ferdosi, Mohammad H; Tellam, Ross; Gondro, Cedric
2015-08-14
Body weight (BW) is an important trait for meat production in sheep. Although over the past few years, numerous quantitative trait loci (QTL) have been detected for production traits in cattle, few QTL studies have been reported for sheep, with even fewer on meat production traits. Our objective was to perform a genome-wide association study (GWAS) with the medium-density Illumina Ovine SNP50 BeadChip to identify genomic regions and corresponding haplotypes associated with BW in Australian Merino sheep. A total of 1781 Australian Merino sheep were genotyped using the medium-density Illumina Ovine SNP50 BeadChip. Among the 53 862 single nucleotide polymorphisms (SNPs) on this array, 48 640 were used to perform a GWAS using a linear mixed model approach. Genotypes were phased with hsphase; to estimate SNP haplotype effects, linkage disequilibrium blocks were identified in the detected QTL region. Thirty-nine SNPs were associated with BW at a Bonferroni-corrected genome-wide significance threshold of 1 %. One region on sheep (Ovis aries) chromosome 6 (OAR6) between 36.15 and 38.56 Mb, included 13 significant SNPs that were associated with BW; the most significant SNP was OAR6_41936490.1 (P = 2.37 × 10(-16)) at 37.69 Mb with an allele substitution effect of 2.12 kg, which corresponds to 0.248 phenotypic standard deviations for BW. The region that surrounds this association signal on OAR6 contains three genes: leucine aminopeptidase 3 (LAP3), which is involved in the processing of the oxytocin precursor; NCAPG non-SMC condensin I complex, subunit G (NCAPG), which is associated with foetal growth and carcass size in cattle; and ligand dependent nuclear receptor corepressor-like (LCORL), which is associated with height in humans and cattle. The GWAS analysis detected 39 SNPs associated with BW in sheep and a major QTL region was identified on OAR6. In several other mammalian species, regions that are syntenic with this region have been found to be associated with body size traits, which may reflect that the underlying biological mechanisms share a common ancestry. These findings should facilitate the discovery of causative variants for BW and contribute to marker-assisted selection.
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L
2012-01-01
Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
2011-01-01
Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.
Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M
2012-01-01
The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and treesmore » determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less
Montanari, Sara; Saeed, Munazza; Knäbel, Mareike; Kim, YoonKyeong; Troggio, Michela; Malnoy, Mickael; Velasco, Riccardo; Fontana, Paolo; Won, KyungHo; Durel, Charles-Eric; Perchepied, Laure; Schaffer, Robert; Wiedow, Claudia; Bus, Vincent; Brewer, Lester; Gardiner, Susan E; Crowhurst, Ross N; Chagné, David
2013-01-01
We have used new generation sequencing (NGS) technologies to identify single nucleotide polymorphism (SNP) markers from three European pear (Pyrus communis L.) cultivars and subsequently developed a subset of 1096 pear SNPs into high throughput markers by combining them with the set of 7692 apple SNPs on the IRSC apple Infinium® II 8K array. We then evaluated this apple and pear Infinium® II 9K SNP array for large-scale genotyping in pear across several species, using both pear and apple SNPs. The segregating populations employed for array validation included a segregating population of European pear ('Old Home'×'Louise Bon Jersey') and four interspecific breeding families derived from Asian (P. pyrifolia Nakai and P. bretschneideri Rehd.) and European pear pedigrees. In total, we mapped 857 polymorphic pear markers to construct the first SNP-based genetic maps for pear, comprising 78% of the total pear SNPs included in the array. In addition, 1031 SNP markers derived from apple (13% of the total apple SNPs included in the array) were polymorphic and were mapped in one or more of the pear populations. These results are the first to demonstrate SNP transferability across the genera Malus and Pyrus. Our construction of high density SNP-based and gene-based genetic maps in pear represents an important step towards the identification of chromosomal regions associated with a range of horticultural characters, such as pest and disease resistance, orchard yield and fruit quality.
CMPK1 and RBP3 are associated with corneal curvature in Asian populations.
Chen, Peng; Miyake, Masahiro; Fan, Qiao; Liao, Jiemin; Yamashiro, Kenji; Ikram, Mohammad K; Chew, Merywn; Vithana, Eranga N; Khor, Chiea-Chuen; Aung, Tin; Tai, E-Shyong; Wong, Tien-Yin; Teo, Yik-Ying; Yoshimura, Nagahisa; Saw, Seang-Mei; Cheng, Ching-Yu
2014-11-15
Corneal curvature (CC) measures the steepness of the cornea and is an important parameter for clinically diseases such as astigmatism and myopia. Despite the high heritability of CC, only two associated genes have been discovered to date. We performed a three-stage genome-wide association study meta-analysis in 12 660 Asian individuals. Our Stage 1 was done in multiethnic cohorts comprising 7440 individuals, followed by a Stage 2 replication in 2473 Chinese and Stage 3 in 2747 Japanese. The SNP array genotype data were imputed up to the 1000 Genomes Project Phase 1 cosmopolitan panel. The SNP association with the radii of CC was investigated in the linear regression model with the adjustment of age, gender and principal components. In addition to the known genes, MTOR (also known as FRAP1) and PDGFRA, we discovered two novel genes associated with CC: CMPK1 (rs17103186, P = 3.3 × 10(-12)) and RBP3 (rs11204213 [Val884Met], P = 1.1 × 10(-13)). The missense RBP3 SNP, rs11204213, was also associated with axial length (AL) (P = 4.2 × 10(-6)) and had larger effects on both CC and AL compared with other SNPs. The index SNPs at the four indicated loci explained 1.9% of CC variance across the Stages 1 and 2 cohorts, while 33.8% of CC variance was explained by the genome-wide imputation data. We identified two novel genes influencing CC, which are related to either corneal shape or eye size. This study provides additional insights into genetic architecture of corneal shape. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Liu, Weizhen; Maccaferri, Marco; Chen, Xianming; Laghetti, Gaetano; Pignone, Domenico; Pumphrey, Michael; Tuberosa, Roberto
2017-11-01
SNP-based genome scanning in worldwide domesticated emmer germplasm showed high genetic diversity, rapid linkage disequilibrium decay and 51 loci for stripe rust resistance, a large proportion of which were novel. Cultivated emmer wheat (Triticum turgidum ssp. dicoccum), one of the oldest domesticated crops in the world, is a potentially rich reservoir of variation for improvement of resistance/tolerance to biotic and abiotic stresses in wheat. Resistance to stripe rust (Puccinia striiformis f. sp. tritici) in emmer wheat has been under-investigated. Here, we employed genome-wide association (GWAS) mapping with a mixed linear model to dissect effective stripe rust resistance loci in a worldwide collection of 176 cultivated emmer wheat accessions. Adult plants were tested in six environments and seedlings were evaluated with five races from the United States and one from Italy under greenhouse conditions. Five accessions were resistant across all experiments. The panel was genotyped with the wheat 90,000 Illumina iSelect single nucleotide polymorphism (SNP) array and 5106 polymorphic SNP markers with mapped positions were obtained. A high level of genetic diversity and fast linkage disequilibrium decay were observed. In total, we identified 14 loci associated with field resistance in multiple environments. Thirty-seven loci were significantly associated with all-stage (seedling) resistance and six of them were effective against multiple races. Of the 51 total loci, 29 were mapped distantly from previously reported stripe rust resistance genes or quantitative trait loci and represent newly discovered resistance loci. Our results suggest that GWAS is an effective method for characterizing genes in cultivated emmer wheat and confirm that emmer wheat is a rich source of stripe rust resistance loci that can be used for wheat improvement.
Lopes, F B; Wu, X-L; Li, H; Xu, J; Perkins, T; Genho, J; Ferretti, R; Tait, R G; Bauck, S; Rosa, G J M
2018-02-01
Reliable genomic prediction of breeding values for quantitative traits requires the availability of sufficient number of animals with genotypes and phenotypes in the training set. As of 31 October 2016, there were 3,797 Brangus animals with genotypes and phenotypes. These Brangus animals were genotyped using different commercial SNP chips. Of them, the largest group consisted of 1,535 animals genotyped by the GGP-LDV4 SNP chip. The remaining 2,262 genotypes were imputed to the SNP content of the GGP-LDV4 chip, so that the number of animals available for training the genomic prediction models was more than doubled. The present study showed that the pooling of animals with both original or imputed 40K SNP genotypes substantially increased genomic prediction accuracies on the ten traits. By supplementing imputed genotypes, the relative gains in genomic prediction accuracies on estimated breeding values (EBV) were from 12.60% to 31.27%, and the relative gain in genomic prediction accuracies on de-regressed EBV was slightly small (i.e. 0.87%-18.75%). The present study also compared the performance of five genomic prediction models and two cross-validation methods. The five genomic models predicted EBV and de-regressed EBV of the ten traits similarly well. Of the two cross-validation methods, leave-one-out cross-validation maximized the number of animals at the stage of training for genomic prediction. Genomic prediction accuracy (GPA) on the ten quantitative traits was validated in 1,106 newly genotyped Brangus animals based on the SNP effects estimated in the previous set of 3,797 Brangus animals, and they were slightly lower than GPA in the original data. The present study was the first to leverage currently available genotype and phenotype resources in order to harness genomic prediction in Brangus beef cattle. © 2018 Blackwell Verlag GmbH.
Breeding and Genetics Symposium: networks and pathways to guide genomic selection.
Snelling, W M; Cushman, R A; Keele, J W; Maltecca, C; Thomas, M G; Fortes, M R S; Reverter, A
2013-02-01
Many traits affecting profitability and sustainability of meat, milk, and fiber production are polygenic, with no single gene having an overwhelming influence on observed variation. No knowledge of the specific genes controlling these traits has been needed to make substantial improvement through selection. Significant gains have been made through phenotypic selection enhanced by pedigree relationships and continually improving statistical methodology. Genomic selection, recently enabled by assays for dense SNP located throughout the genome, promises to increase selection accuracy and accelerate genetic improvement by emphasizing the SNP most strongly correlated to phenotype although the genes and sequence variants affecting phenotype remain largely unknown. These genomic predictions theoretically rely on linkage disequilibrium (LD) between genotyped SNP and unknown functional variants, but familial linkage may increase effectiveness when predicting individuals related to those in the training data. Genomic selection with functional SNP genotypes should be less reliant on LD patterns shared by training and target populations, possibly allowing robust prediction across unrelated populations. Although the specific variants causing polygenic variation may never be known with certainty, a number of tools and resources can be used to identify those most likely to affect phenotype. Associations of dense SNP genotypes with phenotype provide a 1-dimensional approach for identifying genes affecting specific traits; in contrast, associations with multiple traits allow defining networks of genes interacting to affect correlated traits. Such networks are especially compelling when corroborated by existing functional annotation and established molecular pathways. The SNP occurring within network genes, obtained from public databases or derived from genome and transcriptome sequences, may be classified according to expected effects on gene products. As illustrated by functionally informed genomic predictions being more accurate than naive whole-genome predictions of beef tenderness, coupling evidence from livestock genotypes, phenotypes, gene expression, and genomic variants with existing knowledge of gene functions and interactions may provide greater insight into the genes and genomic mechanisms affecting polygenic traits and facilitate functional genomic selection for economically important traits.
Vitis Phylogenomics: Hybridization Intensities from a SNP Array Outperform Genotype Calls
Miller, Allison J.; Matasci, Naim; Schwaninger, Heidi; Aradhya, Mallikarjuna K.; Prins, Bernard; Zhong, Gan-Yuan; Simon, Charles; Buckler, Edward S.; Myles, Sean
2013-01-01
Understanding relationships among species is a fundamental goal of evolutionary biology. Single nucleotide polymorphisms (SNPs) identified through next generation sequencing and related technologies enable phylogeny reconstruction by providing unprecedented numbers of characters for analysis. One approach to SNP-based phylogeny reconstruction is to identify SNPs in a subset of individuals, and then to compile SNPs on an array that can be used to genotype additional samples at hundreds or thousands of sites simultaneously. Although powerful and efficient, this method is subject to ascertainment bias because applying variation discovered in a representative subset to a larger sample favors identification of SNPs with high minor allele frequencies and introduces bias against rare alleles. Here, we demonstrate that the use of hybridization intensity data, rather than genotype calls, reduces the effects of ascertainment bias. Whereas traditional SNP calls assess known variants based on diversity housed in the discovery panel, hybridization intensity data survey variation in the broader sample pool, regardless of whether those variants are present in the initial SNP discovery process. We apply SNP genotype and hybridization intensity data derived from the Vitis9kSNP array developed for grape to show the effects of ascertainment bias and to reconstruct evolutionary relationships among Vitis species. We demonstrate that phylogenies constructed using hybridization intensities suffer less from the distorting effects of ascertainment bias, and are thus more accurate than phylogenies based on genotype calls. Moreover, we reconstruct the phylogeny of the genus Vitis using hybridization data, show that North American subgenus Vitis species are monophyletic, and resolve several previously poorly known relationships among North American species. This study builds on earlier work that applied the Vitis9kSNP array to evolutionary questions within Vitis vinifera and has general implications for addressing ascertainment bias in array-enabled phylogeny reconstruction. PMID:24236035
Application of genomic selection in farm animal breeding.
Tan, Cheng; Bian, Cheng; Yang, Da; Li, Ning; Wu, Zhen-Fang; Hu, Xiao-Xiang
2017-11-20
Genomic selection (GS) has become a widely accepted method in animal breeding to genetically improve economic traits. With the declining costs of high-density SNP chips and next-generation sequencing, GS has been applied in dairy cattle, swine, poultry and other animals and gained varying degrees of success. Currently, major challenges in GS studies include further reducing the cost of genome-wide SNP genotyping and improving the predictive accuracy of genomic estimated breeding value (GEBV). In this review, we summarize various methods for genome-wide SNP genotyping and GEBV prediction, and give a brief introduction of GS in livestock and poultry breeding. This review will provide a reference for further implementation of GS in farm animal breeding.
Genomic analysis of cow mortality and milk production using a threshold-linear model.
Tsuruta, S; Lourenco, D A L; Misztal, I; Lawlor, T J
2017-09-01
The objective of this study was to investigate the feasibility of genomic evaluation for cow mortality and milk production using a single-step methodology. Genomic relationships between cow mortality and milk production were also analyzed. Data included 883,887 (866,700) first-parity, 733,904 (711,211) second-parity, and 516,256 (492,026) third-parity records on cow mortality (305-d milk yields) of Holsteins from Northeast states in the United States. The pedigree consisted of up to 1,690,481 animals including 34,481 bulls genotyped with 36,951 SNP markers. Analyses were conducted with a bivariate threshold-linear model for each parity separately. Genomic information was incorporated as a genomic relationship matrix in the single-step BLUP. Traditional and genomic estimated breeding values (GEBV) were obtained with Gibbs sampling using fixed variances, whereas reliabilities were calculated from variances of GEBV samples. Genomic EBV were then converted into single nucleotide polymorphism (SNP) marker effects. Those SNP effects were categorized according to values corresponding to 1 to 4 standard deviations. Moving averages and variances of SNP effects were calculated for windows of 30 adjacent SNP, and Manhattan plots were created for SNP variances with the same window size. Using Gibbs sampling, the reliability for genotyped bulls for cow mortality was 28 to 30% in EBV and 70 to 72% in GEBV. The reliability for genotyped bulls for 305-d milk yields was 53 to 65% to 81 to 85% in GEBV. Correlations of SNP effects between mortality and 305-d milk yields within categories were the highest with the largest SNP effects and reached >0.7 at 4 standard deviations. All SNP regions explained less than 0.6% of the genetic variance for both traits, except regions close to the DGAT1 gene, which explained up to 2.5% for cow mortality and 4% for 305-d milk yields. Reliability for GEBV with a moderate number of genotyped animals can be calculated by Gibbs samples. Genomic information can greatly increase the reliability of predictions not only for milk but also for mortality. The existence of a common region on Bos taurus autosome 14 affecting both traits may indicate a major gene with a pleiotropic effect on milk and mortality. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
LS-SNP/PDB: annotated non-synonymous SNPs mapped to Protein Data Bank structures.
Ryan, Michael; Diekhans, Mark; Lien, Stephanie; Liu, Yun; Karchin, Rachel
2009-06-01
LS-SNP/PDB is a new WWW resource for genome-wide annotation of human non-synonymous (amino acid changing) SNPs. It serves high-quality protein graphics rendered with UCSF Chimera molecular visualization software. The system is kept up-to-date by an automated, high-throughput build pipeline that systematically maps human nsSNPs onto Protein Data Bank structures and annotates several biologically relevant features. LS-SNP/PDB is available at (http://ls-snp.icm.jhu.edu/ls-snp-pdb) and via links from protein data bank (PDB) biology and chemistry tabs, UCSC Genome Browser Gene Details and SNP Details pages and PharmGKB Gene Variants Downloads/Cross-References pages.
Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies
Zhang, Yu; Liu, Jun S.
2011-01-01
Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online. PMID:22140288
Yazar, Seyhan; Mishra, Aniket; Ang, Wei; Kearns, Lisa S; Mountain, Jenny A; Pennell, Craig; Montgomery, Grant W; Young, Terri L; Hammond, Christopher J; Macgregor, Stuart; Mackey, David A; Hewitt, Alex W
2013-01-01
Corneal astigmatism is a common eye disorder characterized by irregularities in corneal curvature. Recently, the rs7677751 single nucleotide polymorphism (SNP) at the platelet-derived growth factor receptor alpha (PDGFRA) locus was found to be associated with corneal astigmatism in people of Asian ancestry. In the present study, we sought to replicate this finding and identify other genetic markers of corneal astigmatism in an Australian population of Northern European ancestry. Data from two cohorts were included in this study. The first cohort consisted of 1,013 individuals who were part of the Western Australian Pregnancy Cohort (Raine) Study: 20-year follow-up Eye Study. The second cohort comprised 1,788 individuals of 857 twin families who were recruited through the Twins Eye Study in Tasmania and the Brisbane Adolescent Twin Study. Corneal astigmatism was calculated as the absolute difference between the keratometry readings in two meridians, and genotype data were extracted from genome-wide arrays. Initially, each cohort was analyzed separately, before being combined for meta- and subsequent genome-wide pathway analysis. Following meta-analysis, SNP rs7677751 at the PDGFRA locus had a combined p=0.32. No variant was found to be statistically significantly associated with corneal astigmatism at the genome-wide level (p<5.0×10(-8)). The SNP with strongest association was rs1164064 (p=1.86×10(-6)) on chromosome 3q13. Gene-based pathway analysis identified a significant association between the Gene Ontology "segmentation" (GO:0035282) pathway, corrected p=0.009. Our data suggest that the PDGFRA locus does not transfer a major risk of corneal astigmatism in people of Northern European ancestry. Better-powered studies are required to validate the novel putative findings of our study.
Yuan, Jingwei; Sun, Congjiao; Dou, Taocun; Yi, Guoqiang; Qu, LuJiang; Qu, Liang; Wang, Kehua; Yang, Ning
2015-01-01
Egg number (EN), egg laying rate (LR) and age at first egg (AFE) are important production traits related to egg production in poultry industry. To better understand the knowledge of genetic architecture of dynamic EN during the whole laying cycle and provide the precise positions of associated variants for EN, LR and AFE, laying records from 21 to 72 weeks of age were collected individually for 1,534 F2 hens produced by reciprocal crosses between White Leghorn and Dongxiang Blue-shelled chicken, and their genotypes were assayed by chicken 600 K Affymetrix high density genotyping arrays. Subsequently, pedigree and SNP-based genetic parameters were estimated and a genome-wide association study (GWAS) was conducted on EN, LR and AFE. The heritability estimates were similar between pedigree and SNP-based estimates varying from 0.17 to 0.36. In the GWA analysis, we identified nine genome-wide significant loci associated with EN of the laying periods from 21 to 26 weeks, 27 to 36 weeks and 37 to 72 weeks. Analysis of GTF2A1 and CLSPN suggested that they influenced the function of ovary and uterus, and may be considered as relevant candidates. The identified SNP rs314448799 for accumulative EN from 21 to 40 weeks on chromosome 5 created phenotypic differences of 6.86 eggs between two homozygous genotypes, which could be potentially applied to the molecular breeding for EN selection. Moreover, our finding showed that LR was a moderate polygenic trait. The suggestive significant region on chromosome 16 for AFE suggested the relationship between sex maturity and immune in the current population. The present study comprehensively evaluates the role of genetic variants in the development of egg laying. The findings will be helpful to investigation of causative genes function and future marker-assisted selection and genomic selection in chickens.
Almlöf, Jonas Carlsson; Lundmark, Per; Lundmark, Anders; Ge, Bing; Maouche, Seraya; Göring, Harald H. H.; Liljedahl, Ulrika; Enström, Camilla; Brocheton, Jessy; Proust, Carole; Godefroy, Tiphaine; Sambrook, Jennifer G.; Jolley, Jennifer; Crisp-Hihn, Abigail; Foad, Nicola; Lloyd-Jones, Heather; Stephens, Jonathan; Gwilliam, Rhian; Rice, Catherine M.; Hengstenberg, Christian; Samani, Nilesh J.; Erdmann, Jeanette; Schunkert, Heribert; Pastinen, Tomi; Deloukas, Panos; Goodall, Alison H.; Ouwehand, Willem H.; Cambien, François; Syvänen, Ann-Christine
2012-01-01
A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers. PMID:23300628
Genome-wide SNP identification and QTL mapping for black rot resistance in cabbage.
Lee, Jonghoon; Izzah, Nur Kholilatul; Jayakodi, Murukarthick; Perumal, Sampath; Joh, Ho Jun; Lee, Hyeon Ju; Lee, Sang-Choon; Park, Jee Young; Yang, Ki-Woung; Nou, Il-Sup; Seo, Joodeok; Yoo, Jaeheung; Suh, Youngdeok; Ahn, Kyounggu; Lee, Ji Hyun; Choi, Gyung Ja; Yu, Yeisoo; Kim, Heebal; Yang, Tae-Jin
2015-02-03
Black rot is a destructive bacterial disease causing large yield and quality losses in Brassica oleracea. To detect quantitative trait loci (QTL) for black rot resistance, we performed whole-genome resequencing of two cabbage parental lines and genome-wide SNP identification using the recently published B. oleracea genome sequences as reference. Approximately 11.5 Gb of sequencing data was produced from each parental line. Reference genome-guided mapping and SNP calling revealed 674,521 SNPs between the two cabbage lines, with an average of one SNP per 662.5 bp. Among 167 dCAPS markers derived from candidate SNPs, 117 (70.1%) were validated as bona fide SNPs showing polymorphism between the parental lines. We then improved the resolution of a previous genetic map by adding 103 markers including 87 SNP-based dCAPS markers. The new map composed of 368 markers and covers 1467.3 cM with an average interval of 3.88 cM between adjacent markers. We evaluated black rot resistance in the mapping population in three independent inoculation tests using F2:3 progenies and identified one major QTL and three minor QTLs. We report successful utilization of whole-genome resequencing for large-scale SNP identification and development of molecular markers for genetic map construction. In addition, we identified novel QTLs for black rot resistance. The high-density genetic map will promote QTL analysis for other important agricultural traits and marker-assisted breeding of B. oleracea.
Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter
2017-05-12
A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set test might be used as a first-step to improve GFBLUP models. Approaches like GFBLUP and SNP set test will become increasingly useful, as the functional annotations of genomes keep accumulating for a range of species and traits.
USDA-ARS?s Scientific Manuscript database
In this study, we aimed to (1) predict genomic estimated breeding value (GEBV) for bacterial cold water disease (BCWD) resistance by genotyping training (n=583) and validation samples (n=53) with two genotyping platforms (24K RAD-SNP and 49K SNP) and using different genomic selection (GS) models (Ba...
Pierson, Tyler Mark; Simeonov, Dimitre R; Sincan, Murat; Adams, David A; Markello, Thomas; Golas, Gretchen; Fuentes-Fajardo, Karin; Hansen, Nancy F; Cherukuri, Praveen F; Cruz, Pedro; Blackstone, Craig; Tifft, Cynthia; Boerkoel, Cornelius F; Gahl, William A
2012-01-01
Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype. PMID:22146942
[Genetic analysis of two cases with Dandy-Walker deformed fetus].
Yao, Juan; Fang, Rong; Shen, Xueping; Shen, Guosong; Zhang, Su
2017-10-10
To explore the genetic etiology of two fetuses with Dandy-Walker malformation using single nucleotide polymorphism microarray (SNP-array). The fetuses and their parents were subjected to G banding karyotype analysis. The fetuses were also subjected to SNP-array analysis. The parents of both fetuses showed a normal karyotype. One fetus has a 46,X,?i(X)(q10), while for another conventional cell culture has failed. SNP-array showed that one fetus carried a 6p25.3p25.2 microdeletion, and another carried a Xp22.33p22.2 deletion and a Yq11.221q11 duplication. The abnormal fragments have involved FOXC1, SHOX and STS genes, which are associated with Dandy-Walker malformation. Alteration of 6p25.3p25.2, Xp22.33p22.2 copy numbers probably underlies the Dandy-Walker syndrome in the fetuses. The disorder may be attributed to abnormal expression of FOXC1, SHOX, and STS genes. SNP-array can provide an important supplement for prenatal diagnosis.
Erbe, M; Hayes, B J; Matukumalli, L K; Goswami, S; Bowman, P J; Reich, C M; Mason, B A; Goddard, M E
2012-07-01
Achieving accurate genomic estimated breeding values for dairy cattle requires a very large reference population of genotyped and phenotyped individuals. Assembling such reference populations has been achieved for breeds such as Holstein, but is challenging for breeds with fewer individuals. An alternative is to use a multi-breed reference population, such that smaller breeds gain some advantage in accuracy of genomic estimated breeding values (GEBV) from information from larger breeds. However, this requires that marker-quantitative trait loci associations persist across breeds. Here, we assessed the gain in accuracy of GEBV in Jersey cattle as a result of using a combined Holstein and Jersey reference population, with either 39,745 or 624,213 single nucleotide polymorphism (SNP) markers. The surrogate used for accuracy was the correlation of GEBV with daughter trait deviations in a validation population. Two methods were used to predict breeding values, either a genomic BLUP (GBLUP_mod), or a new method, BayesR, which used a mixture of normal distributions as the prior for SNP effects, including one distribution that set SNP effects to zero. The GBLUP_mod method scaled both the genomic relationship matrix and the additive relationship matrix to a base at the time the breeds diverged, and regressed the genomic relationship matrix to account for sampling errors in estimating relationship coefficients due to a finite number of markers, before combining the 2 matrices. Although these modifications did result in less biased breeding values for Jerseys compared with an unmodified genomic relationship matrix, BayesR gave the highest accuracies of GEBV for the 3 traits investigated (milk yield, fat yield, and protein yield), with an average increase in accuracy compared with GBLUP_mod across the 3 traits of 0.05 for both Jerseys and Holsteins. The advantage was limited for either Jerseys or Holsteins in using 624,213 SNP rather than 39,745 SNP (0.01 for Holsteins and 0.03 for Jerseys, averaged across traits). Even this limited and nonsignificant advantage was only observed when BayesR was used. An alternative panel, which extracted the SNP in the transcribed part of the bovine genome from the 624,213 SNP panel (to give 58,532 SNP), performed better, with an increase in accuracy of 0.03 for Jerseys across traits. This panel captures much of the increased genomic content of the 624,213 SNP panel, with the advantage of a greatly reduced number of SNP effects to estimate. Taken together, using this panel, a combined breed reference and using BayesR rather than GBLUP_mod increased the accuracy of GEBV in Jerseys from 0.43 to 0.52, averaged across the 3 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers
2010-01-01
Background At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI). Methods Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length. Results RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls. Conclusions Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP. PMID:20950478
Troggio, Michela; Malnoy, Mickael; Velasco, Riccardo; Fontana, Paolo; Won, KyungHo; Durel, Charles-Eric; Perchepied, Laure; Schaffer, Robert; Wiedow, Claudia; Bus, Vincent; Brewer, Lester; Gardiner, Susan E.; Crowhurst, Ross N.; Chagné, David
2013-01-01
We have used new generation sequencing (NGS) technologies to identify single nucleotide polymorphism (SNP) markers from three European pear (Pyrus communis L.) cultivars and subsequently developed a subset of 1096 pear SNPs into high throughput markers by combining them with the set of 7692 apple SNPs on the IRSC apple Infinium® II 8K array. We then evaluated this apple and pear Infinium® II 9K SNP array for large-scale genotyping in pear across several species, using both pear and apple SNPs. The segregating populations employed for array validation included a segregating population of European pear (‘Old Home’בLouise Bon Jersey’) and four interspecific breeding families derived from Asian (P. pyrifolia Nakai and P. bretschneideri Rehd.) and European pear pedigrees. In total, we mapped 857 polymorphic pear markers to construct the first SNP-based genetic maps for pear, comprising 78% of the total pear SNPs included in the array. In addition, 1031 SNP markers derived from apple (13% of the total apple SNPs included in the array) were polymorphic and were mapped in one or more of the pear populations. These results are the first to demonstrate SNP transferability across the genera Malus and Pyrus. Our construction of high density SNP-based and gene-based genetic maps in pear represents an important step towards the identification of chromosomal regions associated with a range of horticultural characters, such as pest and disease resistance, orchard yield and fruit quality. PMID:24155917
Genomic Characterisation of the Indigenous Irish Kerry Cattle Breed
Browett, Sam; McHugo, Gillian; Richardson, Ian W.; Magee, David A.; Park, Stephen D. E.; Fahey, Alan G.; Kearney, John F.; Correia, Carolina N.; Randhawa, Imtiaz A. S.; MacHugh, David E.
2018-01-01
Kerry cattle are an endangered landrace heritage breed of cultural importance to Ireland. In the present study we have used genome-wide SNP array data to evaluate genomic diversity within the Kerry population and between Kerry cattle and other European breeds. Patterns of genetic differentiation and gene flow among breeds using phylogenetic trees with ancestry graphs highlighted historical gene flow from the British Shorthorn breed into the ancestral population of modern Kerry cattle. Principal component analysis (PCA) and genetic clustering emphasised the genetic distinctiveness of Kerry cattle relative to comparator British and European cattle breeds. Modelling of genetic effective population size (Ne) revealed a demographic trend of diminishing Ne over time and that recent estimated Ne values for the Kerry breed may be less than the threshold for sustainable genetic conservation. In addition, analysis of genome-wide autozygosity (FROH) showed that genomic inbreeding has increased significantly during the 20 years between 1992 and 2012. Finally, signatures of selection revealed genomic regions subject to natural and artificial selection as Kerry cattle adapted to the climate, physical geography and agro-ecology of southwest Ireland. PMID:29520297
USDA-ARS?s Scientific Manuscript database
The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...
LGI1 microdeletion in autosomal dominant lateral temporal epilepsy
Fanciulli, M.; Santulli, L.; Errichiello, L.; Barozzi, C.; Tomasi, L.; Rigon, L.; Cubeddu, T.; de Falco, A.; Rampazzo, A.; Michelucci, R.; Uzzau, S.; Striano, S.; de Falco, F.A.; Striano, P.
2012-01-01
Objectives: To characterize clinically and genetically a family with autosomal dominant lateral temporal epilepsy (ADLTE) negative to LGI1 exon sequencing test. Methods: All participants were personally interviewed and underwent neurologic examination. Most affected subjects underwent EEG and neuroradiologic examinations (CT/MRI). Available family members were genotyped with the HumanOmni1-Quad v1.0 single nucleotide polymorphism (SNP) array beadchip and copy number variations (CNVs) were analyzed in each subject. LGI1 gene dosage was performed by real-time quantitative PCR (qPCR). Results: The family had 8 affected members (2 deceased) over 3 generations. All of them showed GTC seizures, with focal onset in 6 and unknown onset in 2. Four patients had focal seizures with auditory features. EEG showed only minor sharp abnormalities in 3 patients and MRI was unremarkable in all the patients examined. Three family members presented major depression and anxiety symptoms. Routine LGI1 exon sequencing revealed no point mutation. High-density SNP array CNV analysis identified a genomic microdeletion about 81 kb in size encompassing the first 4 exons of LGI1 in all available affected members and in 2 nonaffected carriers, which was confirmed by qPCR analysis. Conclusions: This is the first microdeletion affecting LGI1 identified in ADLTE. Families with ADLTE in which no point mutations are revealed by direct exon sequencing should be screened for possible genomic deletion mutations by CNV analysis or other appropriate methods. Overall, CNV analysis of multiplex families may be useful for identifying microdeletions in novel disease genes. PMID:22496201
Mason, Annaliese S; Zhang, Jing; Tollenaere, Reece; Vasquez Teuber, Paula; Dalton-Morgan, Jessica; Hu, Liyong; Yan, Guijun; Edwards, David; Redden, Robert; Batley, Jacqueline
2015-09-01
Germplasm collections provide an extremely valuable resource for breeders and researchers. However, misclassification of accessions by species often hinders the effective use of these collections. We propose that use of high-throughput genotyping tools can provide a fast, efficient and cost-effective way of confirming species in germplasm collections, as well as providing valuable genetic diversity data. We genotyped 180 Brassicaceae samples sourced from the Australian Grains Genebank across the recently released Illumina Infinium Brassica 60K SNP array. Of these, 76 were provided on the basis of suspected misclassification and another 104 were sourced independently from the germplasm collection. Presence of the A- and C-genomes combined with principle components analysis clearly separated Brassica rapa, B. oleracea, B. napus, B. carinata and B. juncea samples into distinct species groups. Several lines were further validated using chromosome counts. Overall, 18% of samples (32/180) were misclassified on the basis of species. Within these 180 samples, 23/76 (30%) supplied on the basis of suspected misclassification were misclassified, and 9/105 (9%) of the samples randomly sourced from the Australian Grains Genebank were misclassified. Surprisingly, several individuals were also found to be the product of interspecific hybridization events. The SNP (single nucleotide polymorphism) array proved effective at confirming species, and provided useful information related to genetic diversity. As similar genomic resources become available for different crops, high-throughput molecular genotyping will offer an efficient and cost-effective method to screen germplasm collections worldwide, facilitating more effective use of these valuable resources by breeders and researchers. © 2015 John Wiley & Sons Ltd.
Gutierrez, Alejandro P; Yáñez, José M; Fukui, Steve; Swift, Bruce; Davidson, William S
2015-01-01
Early sexual maturation is considered a serious drawback for Atlantic salmon aquaculture as it retards growth, increases production times and affects flesh quality. Although both growth and sexual maturation are thought to be complex processes controlled by several genetic and environmental factors, selection for these traits has been continuously accomplished since the beginning of Atlantic salmon selective breeding programs. In this genome-wide association study (GWAS) we used a 6.5K single-nucleotide polymorphism (SNP) array to genotype ∼ 480 individuals from the Cermaq Canada broodstock program and search for SNPs associated with growth and age at sexual maturation. Using a mixed model approach we identified markers showing a significant association with growth, grilsing (early sexual maturation) and late sexual maturation. The most significant associations were found for grilsing, with markers located in Ssa10, Ssa02, Ssa13, Ssa25 and Ssa12, and for late maturation with markers located in Ssa28, Ssa01 and Ssa21. A lower level of association was detected with growth on Ssa13. Candidate genes, which were linked to these genetic markers, were identified and some of them show a direct relationship with developmental processes, especially for those in association with sexual maturation. However, the relatively low power to detect genetic markers associated with growth (days to 5 kg) in this GWAS indicates the need to use a higher density SNP array in order to overcome the low levels of linkage disequilibrium observed in Atlantic salmon before the information can be incorporated into a selective breeding program.
Assumption-free estimation of the genetic contribution to refractive error across childhood.
Guggenheim, Jeremy A; St Pourcain, Beate; McMahon, George; Timpson, Nicholas J; Evans, David M; Williams, Cathy
2015-01-01
Studies in relatives have generally yielded high heritability estimates for refractive error: twins 75-90%, families 15-70%. However, because related individuals often share a common environment, these estimates are inflated (via misallocation of unique/common environment variance). We calculated a lower-bound heritability estimate for refractive error free from such bias. Between the ages 7 and 15 years, participants in the Avon Longitudinal Study of Parents and Children (ALSPAC) underwent non-cycloplegic autorefraction at regular research clinics. At each age, an estimate of the variance in refractive error explained by single nucleotide polymorphism (SNP) genetic variants was calculated using genome-wide complex trait analysis (GCTA) using high-density genome-wide SNP genotype information (minimum N at each age=3,404). The variance in refractive error explained by the SNPs ("SNP heritability") was stable over childhood: Across age 7-15 years, SNP heritability averaged 0.28 (SE=0.08, p<0.001). The genetic correlation for refractive error between visits varied from 0.77 to 1.00 (all p<0.001) demonstrating that a common set of SNPs was responsible for the genetic contribution to refractive error across this period of childhood. Simulations suggested lack of cycloplegia during autorefraction led to a small underestimation of SNP heritability (adjusted SNP heritability=0.35; SE=0.09). To put these results in context, the variance in refractive error explained (or predicted) by the time participants spent outdoors was <0.005 and by the time spent reading was <0.01, based on a parental questionnaire completed when the child was aged 8-9 years old. Genetic variation captured by common SNPs explained approximately 35% of the variation in refractive error between unrelated subjects. This value sets an upper limit for predicting refractive error using existing SNP genotyping arrays, although higher-density genotyping in larger samples and inclusion of interaction effects is expected to raise this figure toward twin- and family-based heritability estimates. The same SNPs influenced refractive error across much of childhood. Notwithstanding the strong evidence of association between time outdoors and myopia, and time reading and myopia, less than 1% of the variance in myopia at age 15 was explained by crude measures of these two risk factors, indicating that their effects may be limited, at least when averaged over the whole population.
Distinct contributions of replication and transcription to mutation rate variation of human genomes.
Cui, Peng; Ding, Feng; Lin, Qiang; Zhang, Lingfang; Li, Ang; Zhang, Zhang; Hu, Songnian; Yu, Jun
2012-02-01
Here, we evaluate the contribution of two major biological processes--DNA replication and transcription--to mutation rate variation in human genomes. Based on analysis of the public human tissue transcriptomics data, high-resolution replicating map of Hela cells and dbSNP data, we present significant correlations between expression breadth, replication time in local regions and SNP density. SNP density of tissue-specific (TS) genes is significantly higher than that of housekeeping (HK) genes. TS genes tend to locate in late-replicating genomic regions and genes in such regions have a higher SNP density compared to those in early-replication regions. In addition, SNP density is found to be positively correlated with expression level among HK genes. We conclude that the process of DNA replication generates stronger mutational pressure than transcription-associated biological processes do, resulting in an increase of mutation rate in TS genes while having weaker effects on HK genes. In contrast, transcription-associated processes are mainly responsible for the accumulation of mutations in highly-expressed HK genes. Copyright © 2012 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.
Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups
Lou, Haiyi; Li, Shilin; Jin, Wenfei; Fu, Ruiqing; Lu, Dongsheng; Pan, Xinwei; Zhou, Huaigu; Ping, Yuan; Jin, Li; Xu, Shuhua
2015-01-01
Xinjiang is geographically located in central Asia, and it has played an important historical role in connecting eastern Eurasian (EEA) and western Eurasian (WEA) people. However, human population genomic studies in this region have been largely underrepresented, especially with respect to studies of copy number variations (CNVs). Here we constructed the first CNV map of the three major ethnic minority groups, the Uyghur, Kazakh and Kirgiz, using Affymetrix Genome-Wide Human SNP Array 6.0. We systematically compared the properties of CNVs we identified in the three groups with the data from representatives of EEA and WEA. The analyses indicated a typical genetic admixture pattern in all three groups with ancestries from both EEA and WEA. We also identified several CNV regions showing significant deviation of allele frequency from the expected genome-wide distribution, which might be associated with population-specific phenotypes. Our study provides the first genome-wide perspective on the CNVs of three major Xinjiang ethnic minority groups and has implications for both evolutionary and medical studies. PMID:25026903
Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups.
Lou, Haiyi; Li, Shilin; Jin, Wenfei; Fu, Ruiqing; Lu, Dongsheng; Pan, Xinwei; Zhou, Huaigu; Ping, Yuan; Jin, Li; Xu, Shuhua
2015-04-01
Xinjiang is geographically located in central Asia, and it has played an important historical role in connecting eastern Eurasian (EEA) and western Eurasian (WEA) people. However, human population genomic studies in this region have been largely underrepresented, especially with respect to studies of copy number variations (CNVs). Here we constructed the first CNV map of the three major ethnic minority groups, the Uyghur, Kazakh and Kirgiz, using Affymetrix Genome-Wide Human SNP Array 6.0. We systematically compared the properties of CNVs we identified in the three groups with the data from representatives of EEA and WEA. The analyses indicated a typical genetic admixture pattern in all three groups with ancestries from both EEA and WEA. We also identified several CNV regions showing significant deviation of allele frequency from the expected genome-wide distribution, which might be associated with population-specific phenotypes. Our study provides the first genome-wide perspective on the CNVs of three major Xinjiang ethnic minority groups and has implications for both evolutionary and medical studies.
Feltus, F Alex; Wan, Jun; Schulze, Stefan R; Estill, James C; Jiang, Ning; Paterson, Andrew H
2004-09-01
Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% +/- 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% +/- 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp.
Feltus, F. Alex; Wan, Jun; Schulze, Stefan R.; Estill, James C.; Jiang, Ning; Paterson, Andrew H.
2004-01-01
Dense coverage of the rice genome with polymorphic DNA markers is an invaluable tool for DNA marker-assisted breeding, positional cloning, and a wide range of evolutionary studies. We have aligned drafts of two rice subspecies, indica and japonica, and analyzed levels and patterns of genetic diversity. After filtering multiple copy and low quality sequence, 408,898 candidate DNA polymorphisms (SNPs/INDELs) were discerned between the two subspecies. These filters have the consequence that our data set includes only a subset of the available SNPs (in particular excluding large numbers of SNPs that may occur between repetitive DNA alleles) but increase the likelihood that this subset is useful: Direct sequencing suggests that 79.8% ± 7.5% of the in silico SNPs are real. The SNP sample in our database is not randomly distributed across the genome. In fact, 566 rice genomic regions had unusually high (328 contigs/48.6 Mb/13.6% of genome) or low (237 contigs/64.7 Mb/18.1% of genome) polymorphism rates. Many SNP-poor regions were substantially longer than most SNP-rich regions, covering up to 4 Mb, and possibly reflecting introgression between the respective gene pools that may have occurred hundreds of years ago. Although 46.2% ± 8.3% of the SNPs differentiate other pairs of japonica and indica genotypes, SNP rates in rice were not predictive of evolutionary rates for corresponding genes in another grass species, sorghum. The data set is freely available at http://www.plantgenome.uga.edu/snp. PMID:15342564
TMC-SNPdb: an Indian germline variant database derived from whole exome sequences.
Upadhyay, Pawan; Gardi, Nilesh; Desai, Sanket; Sahoo, Bikram; Singh, Ankita; Togar, Trupti; Iyer, Prajish; Prasad, Ratnam; Chandrani, Pratik; Gupta, Sudeep; Dutt, Amit
2016-01-01
Cancer is predominantly a somatic disease. A mutant allele present in a cancer cell genome is considered somatic when it's absent in the paired normal genome along with public SNP databases. The current build of dbSNP, the most comprehensive public SNP database, however inadequately represents several non-European Caucasian populations, posing a limitation in cancer genomic analyses of data from these populations. We present the T: ata M: emorial C: entre-SNP D: ata B: ase (TMC-SNPdb), as the first open source, flexible, upgradable, and freely available SNP database (accessible through dbSNP build 149 and ANNOVAR)-representing 114 309 unique germline variants-generated from whole exome data of 62 normal samples derived from cancer patients of Indian origin. The TMC-SNPdb is presented with a companion subtraction tool that can be executed with command line option or using an easy-to-use graphical user interface with the ability to deplete additional Indian population specific SNPs over and above dbSNP and 1000 Genomes databases. Using an institutional generated whole exome data set of 132 samples of Indian origin, we demonstrate that TMC-SNPdb could deplete 42, 33 and 28% false positive somatic events post dbSNP depletion in Indian origin tongue, gallbladder, and cervical cancer samples, respectively. Beyond cancer somatic analyses, we anticipate utility of the TMC-SNPdb in several Mendelian germline diseases. In addition to dbSNP build 149 and ANNOVAR, the TMC-SNPdb along with the subtraction tool is available for download in the public domain at the following:Database URL: http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNP/TMCSNPdp.html. © The Author(s) 2016. Published by Oxford University Press.
Analysis of population structure and genetic history of cattle breeds based on high-density SNP data
USDA-ARS?s Scientific Manuscript database
Advances in single nucleotide polymorphism (SNP) genotyping microarrays have facilitated a new understanding of population structure and evolutionary history for several species. Most existing studies in livestock were based on low density SNP arrays. The first wave of low density SNP studies on cat...
Zanke, Christine D; Rodemann, Bernd; Ling, Jie; Muqaddasi, Quddoos H; Plieske, Jörg; Polley, Andreas; Kollers, Sonja; Ebmeyer, Erhard; Korzun, Viktor; Argillier, Odile; Stiewe, Gunther; Zschäckel, Thomas; Ganal, Martin W; Röder, Marion S
2017-03-01
Genotypes with recombination events in the Triticum ventricosum introgression on chromosome 7D allowed to fine-map resistance gene Pch1, the main source of eyespot resistance in European winter wheat cultivars. Eyespot (also called Strawbreaker) is a common and serious fungal disease of winter wheat caused by the necrotrophic fungi Oculimacula yallundae and Oculimacula acuformis (former name Pseudocercosporella herpotrichoides). A genome-wide association study (GWAS) for eyespot was performed with 732 microsatellite markers (SSR) and 7761 mapped SNP markers derived from the 90 K iSELECT wheat array using a panel of 168 European winter wheat varieties as well as three spring wheat varieties and phenotypic evaluation of eyespot in field tests in three environments. Best linear unbiased estimations (BLUEs) were calculated across all trials and ranged from 1.20 (most resistant) to 5.73 (most susceptible) with an average value of 4.24 and a heritability of H 2 = 0.91. A total of 108 SSR and 235 SNP marker-trait associations (MTAs) were identified by considering associations with a -log 10 (P value) ≥3.0. Significant MTAs for eyespot-score BLUEs were found on chromosomes 1D, 2A, 2D, 3D, 5A, 5D, 6A, 7A and 7D for the SSR markers and chromosomes 1B, 2A, 2B, 2D, 3B and 7D for the SNP markers. For 18 varieties (10.5%), a highly resistant phenotype was detected that was linked to the presence of the resistance gene Pch1 on chromosome 7D. The identification of genotypes with recombination events in the introgressed genomic segment from Triticum ventricosum harboring the Pch1 resistance gene on chromosome 7DL allowed the fine-mapping of this gene using additional SNP markers and a potential candidate gene Traes_7DL_973A33763 coding for a CC-NBS-LRR class protein was identified.
Cooper, T A; Wiggans, G R; VanRaden, P M
2013-05-01
Call rates on both a single nucleotide polymorphism (SNP) basis and an animal basis are used as measures of data quality and as screening tools for genomic studies and evaluations of dairy cattle. To investigate the relationship of SNP call rate and genotype accuracy for individual SNP, the correlation between percentages of missing genotypes and parent-progeny conflicts for each SNP was calculated for 103,313 Holsteins. Correlations ranged from 0.14 to 0.38 for the BovineSNP50 and BovineLD (Illumina Inc., San Diego, CA) and GeneSeek Genomic Profiler (Neogen Corp., Lincoln, NE) chips, with lower correlations for newer chips. For US genomic evaluations, genotypes are excluded for animals with a call rate of <90% across autosomal SNP or <80% across X-specific SNP. Mean call rate for 220,175 Holstein, Jersey, and Brown Swiss genotypes was 99.6%. Animal genotypes with a call rate of ≤99% were examined from the US Department of Agriculture genotype database to determine how genotype call rate is related to accuracy of calls on an animal basis. Animal call rate was determined from SNP used in genomic evaluation and is the number of called autosomal and X-specific SNP genotypes divided by the number of SNP from that type of chip. To investigate the relationship of animal call rate and parentage validation, conflicts between a genotyped animal and its sire or dam were determined through a duo test (opposite homozygous SNP genotypes between sire and progeny; 1,374 animal genotypes) and a trio test (also including conflicts with dam and heterozygous SNP genotype for the animal when both parents are the same homozygote; 482 animal genotypes). When animal call rate was ≤ 80%, parentage validation was no longer reliable with the duo test. With the trio test, parentage validation was no longer reliable when animal call rate was ≤ 90%. To investigate how animal call rate was related to genotyping accuracy for animals with multiple genotypes, concordance between genotypes for 1,216 animals that had a genotype with a call rate of ≤ 99% (low call rate) as well as a genotype with a call rate of >99% (high call rate) were calculated by dividing the number of identical SNP genotype calls by the number of SNP that were called for both genotypes. Mean concordance between low- and high-call genotypes was >99% for a low call rate of >90% but decreased to 97% for a call rate of 86 to 90% and to 58% for a call rate of <60%. Edits on call rate reduce the use of incorrect SNP genotypes to calculate genomic evaluations. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Measuring diversity in Gossypium hirsutum using the CottonSNP63K Array
USDA-ARS?s Scientific Manuscript database
A CottonSNP63K array and accompanying cluster file has been developed and includes 45,104 intra-specific SNPs and 17,954 inter-specific SNPs for automated genotyping of cotton (Gossypium spp.) samples. Development of the cluster file included genotyping of 1,156 samples, a subset of which were iden...
Duker, Angela L; Ballif, Blake C; Bawle, Erawati V; Person, Richard E; Mahadevan, Sangeetha; Alliman, Sarah; Thompson, Regina; Traylor, Ryan; Bejjani, Bassem A; Shaffer, Lisa G; Rosenfeld, Jill A; Lamb, Allen N; Sahoo, Trilochan
2010-11-01
Prader-Willi syndrome (PWS) is a neurobehavioral disorder manifested by infantile hypotonia and feeding difficulties in infancy, followed by morbid obesity secondary to hyperphagia. It is caused by deficiency of paternally expressed transcript(s) within the human chromosome region 15q11.2. PWS patients harboring balanced chromosomal translocations with breakpoints within small nuclear ribonucleoprotein polypeptide N (SNRPN) have provided indirect evidence for a role for the imprinted C/D box containing small nucleolar RNA (snoRNA) genes encoded downstream of SNRPN. In addition, recently published data provide strong evidence in support of a role for the snoRNA SNORD116 cluster (HBII-85) in PWS etiology. In this study, we performed detailed phenotypic, cytogenetic, and molecular analyses including chromosome analysis, array comparative genomic hybridization (array CGH), expression studies, and single-nucleotide polymorphism (SNP) genotyping for parent-of-origin determination of the 15q11.2 microdeletion on an 11-year-old child expressing the major components of the PWS phenotype. This child had an ∼236.29 kb microdeletion at 15q11.2 within the larger Prader-Willi/Angelman syndrome critical region that included the SNORD116 cluster of snoRNAs. Analysis of SNP genotypes in proband and mother provided evidence in support of the deletion being on the paternal chromosome 15. This child also met most of the major PWS diagnostic criteria including infantile hypotonia, early-onset morbid obesity, and hypogonadism. Identification and characterization of this case provide unequivocal evidence for a critical role for the SNORD116 snoRNA molecules in PWS pathogenesis. Array CGH testing for genomic copy-number changes in cases with complex phenotypes is proving to be invaluable in detecting novel alterations and enabling better genotype-phenotype correlations.
Genomic and transcriptomic predictors of triglyceride response to regular exercise
Sarzynski, Mark A; Davidsen, Peter K; Sung, Yun Ju; Hesselink, Matthijs K C; Schrauwen, Patrick; Rice, Treva K; Rao, D C; Falciani, Francesco; Bouchard, Claude
2015-01-01
Aim We performed genome-wide and transcriptome-wide profiling to identify genes and single nucleotide polymorphisms (SNPs) associated with the response of triglycerides (TG) to exercise training. Methods Plasma TG levels were measured before and after a 20-week endurance training programme in 478 white participants from the HERITAGE Family Study. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. Affymetrix HG-U133+2 arrays were used to quantitate gene expression levels from baseline muscle biopsies of a subset of participants (N=52). Genome-wide association study (GWAS) analysis was performed using MERLIN, while transcriptomic predictor models were developed using the R-package GALGO. Results The GWAS results showed that eight SNPs were associated with TG training-response (ΔTG) at p<9.9×10−6, while another 31 SNPs showed p values <1×10−4. In multivariate regression models, the top 10 SNPs explained 32.0% of the variance in ΔTG, while conditional heritability analysis showed that four SNPs statistically accounted for all of the heritability of ΔTG. A molecular signature based on the baseline expression of 11 genes predicted 27% of ΔTG in HERITAGE, which was validated in an independent study. A composite SNP score based on the top four SNPs, each from the genomic and transcriptomic analyses, was the strongest predictor of ΔTG (R2=0.14, p=3.0×10−68). Conclusions Our results indicate that skeletal muscle transcript abundance at 11 genes and SNPs at a number of loci contribute to TG response to exercise training. Combining data from genomics and transcriptomics analyses identified a SNP-based gene signature that should be further tested in independent samples. PMID:26491034
Kumar, Satish; Molloy, Claire; Muñoz, Patricio; Daetwyler, Hans; Chagné, David; Volz, Richard
2015-01-01
The nonadditive genetic effects may have an important contribution to total genetic variation of phenotypes, so estimates of both the additive and nonadditive effects are desirable for breeding and selection purposes. Our main objectives were to: estimate additive, dominance and epistatic variances of apple (Malus × domestica Borkh.) phenotypes using relationship matrices constructed from genome-wide dense single nucleotide polymorphism (SNP) markers; and compare the accuracy of genomic predictions using genomic best linear unbiased prediction models with or without including nonadditive genetic effects. A set of 247 clonally replicated individuals was assessed for six fruit quality traits at two sites, and also genotyped using an Illumina 8K SNP array. Across several fruit quality traits, the additive, dominance, and epistatic effects contributed about 30%, 16%, and 19%, respectively, to the total phenotypic variance. Models ignoring nonadditive components yielded upwardly biased estimates of additive variance (heritability) for all traits in this study. The accuracy of genomic predicted genetic values (GEGV) varied from about 0.15 to 0.35 for various traits, and these were almost identical for models with or without including nonadditive effects. However, models including nonadditive genetic effects further reduced the bias of GEGV. Between-site genotypic correlations were high (>0.85) for all traits, and genotype-site interaction accounted for <10% of the phenotypic variability. The accuracy of prediction, when the validation set was present only at one site, was generally similar for both sites, and varied from about 0.50 to 0.85. The prediction accuracies were strongly influenced by trait heritability, and genetic relatedness between the training and validation families. PMID:26497141
Nagano, Soichiro; Shirasawa, Kenta; Hirakawa, Hideki; Maeda, Fumi; Ishikawa, Masami; Isobe, Sachiko N
2017-05-12
The strawberry, Fragaria × ananassa, is an allo-octoploid (2n = 8x = 56) and outcrossing species. Although it is the most widely consumed berry crop in the world, its complex genome structure has hindered its genetic and genomic analysis, and thus discrimination of subgenome-specific loci among the homoeologous chromosomes is needed. In the present study, we identified candidate subgenome-specific single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) loci, and constructed a linkage map using an S 1 mapping population of the cultivar 'Reikou' with an IStraw90 Axiom® SNP array and previously published SSR markers. The 'Reikou' linkage map consisted of 11,574 loci (11,002 SNPs and 572 SSR loci) spanning 2816.5 cM of 31 linkage groups. The 11,574 loci were located on 4738 unique positions (bin) on the linkage map. Of the mapped loci, 8999 (8588 SNPs and 411 SSR loci) showed a 1:2:1 segregation ratio of AA:AB:BB allele, which suggested the possibility of deriving loci from candidate subgenome-specific sequences. In addition, 2575 loci (2414 SNPs and 161 SSR loci) showed a 3:1 segregation of AB:BB allele, indicating they were derived from homoeologous genomic sequences. Comparative analysis of the homoeologous linkage groups revealed differences in genome structure among the subgenomes. Our results suggest that candidate subgenome-specific loci are randomly located across the genomes, and that there are small- to large-scale structural variations among the subgenomes. The mapped SNPs and SSR loci on the linkage map are expected to be seed points for the construction of pseudomolecules in the octoploid strawberry.
Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R
2014-07-01
Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.
Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.
2014-01-01
To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545
Telfer, Emily J; Stovold, Grahame T; Li, Yongjun; Silva-Junior, Orzenil B; Grattapaglia, Dario G; Dungey, Heidi S
2015-01-01
Pedigree reconstruction using molecular markers enables efficient management of inbreeding in open-pollinated breeding strategies, replacing expensive and time-consuming controlled pollination. This is particularly useful in preferentially outcrossed, insect pollinated Eucalypts known to suffer considerable inbreeding depression from related matings. A single nucleotide polymorphism (SNP) marker panel consisting of 106 markers was selected for pedigree reconstruction from the recently developed high-density Eucalyptus Infinium SNP chip (EuCHIP60K). The performance of this SNP panel for pedigree reconstruction in open-pollinated progenies of two Eucalyptus nitens seed orchards was compared with that of two microsatellite panels with 13 and 16 markers respectively. The SNP marker panel out-performed one of the microsatellite panels in the resolution power to reconstruct pedigrees and out-performed both panels with respect to data quality. Parentage of all but one offspring in each clonal seed orchard was correctly matched to the expected seed parent using the SNP marker panel, whereas parentage assignment to less than a third of the expected seed parents were supported using the 13-microsatellite panel. The 16-microsatellite panel supported all but one of the recorded seed parents, one better than the SNP panel, although there was still a considerable level of missing and inconsistent data. SNP marker data was considerably superior to microsatellite data in accuracy, reproducibility and robustness. Although microsatellites and SNPs data provide equivalent resolution for pedigree reconstruction, microsatellite analysis requires more time and experience to deal with the uncertainties of allele calling and faces challenges for data transferability across labs and over time. While microsatellite analysis will continue to be useful for some breeding tasks due to the high information content, existing infrastructure and low operating costs, the multi-species SNP resource available with the EuCHIP60k, opens a whole new array of opportunities for high-throughput, genome-wide or targeted genotyping in species of Eucalyptus.
Carmi, Shai; Hui, Ken Y.; Kochav, Ethan; Liu, Xinmin; Xue, James; Grady, Fillan; Guha, Saurav; Upadhyay, Kinnari; Ben-Avraham, Dan; Mukherjee, Semanti; Bowen, B. Monica; Thomas, Tinu; Vijai, Joseph; Cruts, Marc; Froyen, Guy; Lambrechts, Diether; Plaisance, Stéphane; Van Broeckhoven, Christine; Van Damme, Philip; Van Marck, Herwig; Barzilai, Nir; Darvasi, Ariel; Offit, Kenneth; Bressman, Susan; Ozelius, Laurie J.; Peter, Inga; Cho, Judy H.; Ostrer, Harry; Atzmon, Gil; Clark, Lorraine N.; Lencz, Todd; Pe’er, Itsik
2014-01-01
The Ashkenazi Jewish (AJ) population is a genetic isolate close to European and Middle Eastern groups, with genetic diversity patterns conducive to disease mapping. Here we report high-depth sequencing of 128 complete genomes of AJ controls. Compared with European samples, our AJ panel has 47% more novel variants per genome and is eightfold more effective at filtering benign variants out of AJ clinical genomes. Our panel improves imputation accuracy for AJ SNP arrays by 28%, and covers at least one haplotype in ≈67% of any AJ genome with long, identical-by-descent segments. Reconstruction of recent AJ history from such segments confirms a recent bottleneck of merely ≈350 individuals. Modelling of ancient histories for AJ and European populations using their joint allele frequency spectrum determines AJ to be an even admixture of European and likely Middle Eastern origins. We date the split between the two ancestral populations to ≈12–25 Kyr, suggesting a predominantly Near Eastern source for the repopulation of Europe after the Last Glacial Maximum. PMID:25203624
USDA-ARS?s Scientific Manuscript database
The genome-wide association study (GWAS) is a useful tool for detecting and characterizing traits of interest including those associated with disease resistance in soybean. The availability of 50,000 single nucleotide polymorphism (SNP) markers (SoySNP50K iSelect BeadChip; www.soybase.org) on 19,652...
USDA-ARS?s Scientific Manuscript database
Background: Our goal is to produce a high-throughput SNP genotyping platform for genomic analyses in rainbow trout that will enable fine mapping of QTL, whole genome association studies, genomic selection for improved aquaculture production traits, and genetic analyses of wild populations that aid ...
USDA-ARS?s Scientific Manuscript database
The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
Combinations of SNP genotypes from the Wellcome Trust Case Control Study of bipolar patients.
Mellerup, Erling; Jørgensen, Martin Balslev; Dam, Henrik; Møller, Gert Lykke
2018-04-01
Combinations of genetic variants are the basis for polygenic disorders. We examined combinations of SNP genotypes taken from the 446 729 SNPs in The Wellcome Trust Case Control Study of bipolar patients. Parallel computing by graphics processing units, cloud computing, and data mining tools were used to scan The Wellcome Trust data set for combinations. Two clusters of combinations were significantly associated with bipolar disorder. One cluster contained 68 combinations, each of which included five SNP genotypes. Of the 1998 patients, 305 had combinations from this cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. The other cluster contained six combinations, each of which included five SNP genotypes. Of the 1998 patients, 515 had combinations from the cluster in their genome, but none of the 1500 controls had any of these combinations in their genome. Clusters of combinations of genetic variants can be considered general risk factors for polygenic disorders, whereas accumulation of combinations from the clusters in the genome of a patient can be considered a personal risk factor.
Shan, Jingxuan; Al-Rumaihi, Khalid; Rabah, Danny; Al-Bozom, Issam; Kizhakayil, Dhanya; Farhat, Karim; Al-Said, Sami; Kfoury, Hala; Dsouza, Shoba P; Rowe, Jillian; Khalak, Hanif G; Jafri, Shahzad; Aigha, Idil I; Chouchane, Lotfi
2013-05-13
Large databases focused on genetic susceptibility to prostate cancer have been accumulated from population studies of different ancestries, including Europeans and African-Americans. Arab populations, however, have been only rarely studied. Using Affymetrix Genome-Wide Human SNP Array 6, we conducted a genome-wide association study (GWAS) in which 534,781 single nucleotide polymorphisms (SNPs) were genotyped in 221 Tunisians (90 prostate cancer patients and 131 age-matched healthy controls). TaqMan SNP Genotyping Assays on 11 prostate cancer associated SNPs were performed in a distinct cohort of 337 individuals from Arab ancestry living in Qatar and Saudi Arabia (155 prostate cancer patients and 182 age-matched controls). In-silico expression quantitative trait locus (eQTL) analysis along with mRNA quantification of nearby genes was performed to identify loci potentially cis-regulated by the identified SNPs. Three chromosomal regions, encompassing 14 SNPs, are significantly associated with prostate cancer risk in the Tunisian population (P = 1 × 10-4 to P = 1 × 10-5). In addition to SNPs located on chromosome 17q21, previously found associated with prostate cancer in Western populations, two novel chromosomal regions are revealed on chromosome 9p24 and 22q13. eQTL analysis and mRNA quantification indicate that the prostate cancer associated SNPs of chromosome 17 could enhance the expression of STAT5B gene. Our findings, identifying novel GWAS prostate cancer susceptibility loci, indicate that prostate cancer genetic risk factors could be ethnic specific.
Genetic diversity and investigation of polledness in divergent goat populations using 52 088 SNPs.
Kijas, James W; Ortiz, Judit S; McCulloch, Russell; James, Andrew; Brice, Blair; Swain, Ben; Tosser-Klopp, Gwenola
2013-06-01
The recent availability of a genome-wide SNP array for the goat genome dramatically increases the power to investigate aspects of genetic diversity and to conduct genome-wide association studies in this important domestic species. We collected and analysed genotypes from 52 088 SNPs in Boer, Cashmere and Rangeland goats that had both polled and horned individuals. Principal components analysis revealed a clear genetic division between animals for each population, and model-based clustering successfully detected evidence of admixture that matched aspects of their recorded history. For example, shared co-ancestry was detected, suggesting Boer goats have been introgressed into the Rangeland population. Further, allele frequency data successfully tracked the altered genetic profile that has taken place after 40 years of breeding Australian Cashmere goats using the Rangeland animals as the founding population. Genome-wide association mapping of the POLL locus revealed a strong signal on goat chromosome 1. The 769-kb critical interval contained the polled intersex syndrome locus, confirming the genetic basis in non-European animals is the same as identified previously in Saanen goats. Interestingly, analysis of the haplotypes carried by a small set of sex-reversed animals, known to be associated with polledness, revealed some animals carried the wild-type chromosome associated with the presence of horns. This suggests a more complex basis for the relationship between polledness and the intersex condition than initially thought while validating the application of the goat SNP50 BeadChip for fine-mapping traits in goat. © The Author(s) and Commonwealth of Australia. Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.
Wang, Hongbo; Ye, Shengtuo; Mou, Tongmin
2016-12-01
The development of hybrid rice is a practical approach for increasing rice production. However, the brown planthopper (BPH), Nilaparvata lugens Stål, causes severe yield loss of rice (Oryza sativa L.) and can threaten food security. Therefore, breeding hybrid rice resistant to BPH is the most effective and economical strategy to maintain high and stable production. Fortunately, numerous BPH resistance genes have been identified, and abundant linkage markers are available for molecular marker-assisted selection (MAS) in breeding programs. Hence, we pyramided two BPH resistance genes, Bph14 and Bph15, into a susceptive CMS restorer line Huahui938 and its derived hybrids using MAS to improve the BPH resistance of hybrid rice. Three near-isogenic lines (NILs) with pyramided Bph14 and Bph15 were obtained by molecular marker-assisted backcross (MAB) and phenotypic selection. The genomic components of these NILs were detected using the whole-genome SNP (Single nucleotide polymorphism) array, RICE6K, suggesting that the recurrent parent genome (RPG) recovery of the NILs was 87.88, 87.70 and 86.62 %, respectively. BPH bioassays showed that the improved NILs and their derived hybrids carrying homozygous Bph14 and Bph15 were resistant to BPH. However, the hybrids with heterozygous Bph14 and Bph15 remained susceptible to BPH. The developed NILs showed no significant differences in major agronomic traits and rice qualities compared with the recurrent parent. Moreover, the improved hybrids derived from the NILs exhibited better agronomic performance and rice quality compared with the controls under natural field conditions. This study demonstrates that it is essential to stack Bph14 and Bph15 into both the maternal and paternal parents for developing BPH-resistant hybrid rice varieties. The SNP array with abundant DNA markers is an efficient tool for analyzing the RPG recovery of progenies and can be used to monitor the donor segments in NILs, thus being extremely important for rice molecular breeding.
Correa, Katharina; Lhorente, Jean P; López, María E; Bassini, Liane; Naswa, Sudhir; Deeb, Nader; Di Genova, Alex; Maass, Alejandro; Davidson, William S; Yáñez, José M
2015-10-24
Pisciricketssia salmonis is the causal agent of Salmon Rickettsial Syndrome (SRS), which affects salmon species and causes severe economic losses. Selective breeding for disease resistance represents one approach for controlling SRS in farmed Atlantic salmon. Knowledge concerning the architecture of the resistance trait is needed before deciding on the most appropriate approach to enhance artificial selection for P. salmonis resistance in Atlantic salmon. The purpose of the study was to dissect the genetic variation in the resistance to this pathogen in Atlantic salmon. 2,601 Atlantic salmon smolts were experimentally challenged against P. salmonis by means of intra-peritoneal injection. These smolts were the progeny of 40 sires and 118 dams from a Chilean breeding population. Mortalities were recorded daily and the experiment ended at day 40 post-inoculation. Fish were genotyped using a 50K Affymetrix® Axiom® myDesignTM Single Nucleotide Polymorphism (SNP) Genotyping Array. A Genome Wide Association Analysis was performed on data from the challenged fish. Linear regression and logistic regression models were tested. Genome Wide Association Analysis indicated that resistance to P. salmonis is a moderately polygenic trait. There were five SNPs in chromosomes Ssa01 and Ssa17 significantly associated with the traits analysed. The proportion of the phenotypic variance explained by each marker is small, ranging from 0.007 to 0.045. Candidate genes including interleukin receptors and fucosyltransferase have been found to be physically linked with these genetic markers and may play an important role in the differential immune response against this pathogen. Due to the small amount of variance explained by each significant marker we conclude that genetic resistance to this pathogen can be more efficiently improved with the implementation of genetic evaluations incorporating genotype information from a dense SNP array.
SNPMeta: SNP annotation and SNP metadata collection without a reference genome
USDA-ARS?s Scientific Manuscript database
The increase in availability of resequencing data is greatly accelerating SNP discovery and has facilitated the development of SNP genotyping assays. This, in turn, is increasing interest in annotation of individual SNPs. Currently, these data are only available through curation, or comparison to a ...
Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis
Mason, Annaliese S.; Rousseau-Gueutin, Mathieu; Morice, Jérôme; Bayer, Philipp E.; Besharat, Naghmeh; Cousin, Anouska; Pradhan, Aneeta; Parkin, Isobel A. P.; Chèvre, Anne-Marie; Batley, Jacqueline; Nelson, Matthew N.
2016-01-01
Locating centromeres on genome sequences can be challenging. The high density of repetitive elements in these regions makes sequence assembly problematic, especially when using short-read sequencing technologies. It can also be difficult to distinguish between active and recently extinct centromeres through sequence analysis. An effective solution is to identify genetically active centromeres (functional in meiosis) by half-tetrad analysis. This genetic approach involves detecting heterozygosity along chromosomes in segregating populations derived from gametes (half-tetrads). Unreduced gametes produced by first division restitution mechanisms comprise complete sets of nonsister chromatids. Along these chromatids, heterozygosity is maximal at the centromeres, and homologous recombination events result in homozygosity toward the telomeres. We genotyped populations of half-tetrad-derived individuals (from Brassica interspecific hybrids) using a high-density array of physically anchored SNP markers (Illumina Brassica 60K Infinium array). Mapping the distribution of heterozygosity in these half-tetrad individuals allowed the genetic mapping of all 19 centromeres of the Brassica A and C genomes to the reference Brassica napus genome. Gene and transposable element density across the B. napus genome were also assessed and corresponded well to previously reported genetic map positions. Known centromere-specific sequences were located in the reference genome, but mostly matched unanchored sequences, suggesting that the core centromeric regions may not yet be assembled into the pseudochromosomes of the reference genome. The increasing availability of genetic markers physically anchored to reference genomes greatly simplifies the genetic and physical mapping of centromeres using half-tetrad analysis. We discuss possible applications of this approach, including in species where half-tetrads are currently difficult to isolate. PMID:26614742
Centromere Locations in Brassica A and C Genomes Revealed Through Half-Tetrad Analysis.
Mason, Annaliese S; Rousseau-Gueutin, Mathieu; Morice, Jérôme; Bayer, Philipp E; Besharat, Naghmeh; Cousin, Anouska; Pradhan, Aneeta; Parkin, Isobel A P; Chèvre, Anne-Marie; Batley, Jacqueline; Nelson, Matthew N
2016-02-01
Locating centromeres on genome sequences can be challenging. The high density of repetitive elements in these regions makes sequence assembly problematic, especially when using short-read sequencing technologies. It can also be difficult to distinguish between active and recently extinct centromeres through sequence analysis. An effective solution is to identify genetically active centromeres (functional in meiosis) by half-tetrad analysis. This genetic approach involves detecting heterozygosity along chromosomes in segregating populations derived from gametes (half-tetrads). Unreduced gametes produced by first division restitution mechanisms comprise complete sets of nonsister chromatids. Along these chromatids, heterozygosity is maximal at the centromeres, and homologous recombination events result in homozygosity toward the telomeres. We genotyped populations of half-tetrad-derived individuals (from Brassica interspecific hybrids) using a high-density array of physically anchored SNP markers (Illumina Brassica 60K Infinium array). Mapping the distribution of heterozygosity in these half-tetrad individuals allowed the genetic mapping of all 19 centromeres of the Brassica A and C genomes to the reference Brassica napus genome. Gene and transposable element density across the B. napus genome were also assessed and corresponded well to previously reported genetic map positions. Known centromere-specific sequences were located in the reference genome, but mostly matched unanchored sequences, suggesting that the core centromeric regions may not yet be assembled into the pseudochromosomes of the reference genome. The increasing availability of genetic markers physically anchored to reference genomes greatly simplifies the genetic and physical mapping of centromeres using half-tetrad analysis. We discuss possible applications of this approach, including in species where half-tetrads are currently difficult to isolate. Copyright © 2016 by the Genetics Society of America.
Prabhanjan, Manasa; Suresh, Raviraj V; Murthy, Megha N; Ramachandra, Nallur B
2016-03-01
To identify the role of copy number variations (CNVs) on disease risk genes and its effect on disease phenotypes in type 2 diabetes mellitus (T2DM) in 12 random populations using high throughput arrays. CNV analysis was carried out on a total of 1715 individuals from 12 populations, from ArrayExpress Archive of the European Bioinformatics Institute along with our subjects using Affymetrix Genome Wide SNP 6.0 array. CNV effect on T2DM genes were analyzed using several bioinformatics tools and a molecular protein interaction network was constructed to identify the disease mechanism altered by the CNVs. Analysis showed 34.4% of the total population to be under CNV burden for T2DM, with 83 disease causal and associated genes being under CNV influence. Hotspots were identified on chromosomes 22, 12, 6, 19 and 11.Overlap studies with case cohorts revealed significant disease risk genes such as EGFR, E2F1, PPP1R3A, HLA and TSPAN8. CNVs play a significant role in predisposing T2DM in normal cohorts and contribute to the phenotypic effects. Thus, CNVs should be considered as one of the major contributors in predisposition of the disease. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Geraldes, A; Difazio, S P; Slavov, G T; Ranjan, P; Muchero, W; Hannemann, J; Gunter, L E; Wymore, A M; Grassa, C J; Farzaneh, N; Porth, I; McKown, A D; Skyba, O; Li, E; Fujita, M; Klápště, J; Martin, J; Schackwitz, W; Pennacchio, C; Rokhsar, D; Friedmann, M C; Wasteneys, G O; Guy, R D; El-Kassaby, Y A; Mansfield, S D; Cronk, Q C B; Ehlting, J; Douglas, C J; Tuskan, G A
2013-03-01
Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. For such studies, the use of large single nucleotide polymorphism (SNP) genotyping arrays still offers the most cost-effective solution. Herein we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre-ascertained in 34 wild accessions covering most of the species latitudinal range. We adopted a candidate gene approach to the array design that resulted in the selection of 34 131 SNPs, the majority of which are located in, or within 2 kb of, 3543 candidate genes. A subset of the SNPs on the array (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%. We demonstrate that even among small numbers of samples (n = 10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca. Finally, we provide evidence for the utility of the array to address evolutionary questions such as intraspecific studies of genetic differentiation, species assignment and the detection of natural hybrids. © 2013 Blackwell Publishing Ltd.
Genomic regions underlying susceptibility to bovine tuberculosis in Holstein-Friesian cattle.
Raphaka, Kethusegile; Matika, Oswald; Sánchez-Molano, Enrique; Mrode, Raphael; Coffey, Mike Peter; Riggio, Valentina; Glass, Elizabeth Janet; Woolliams, John Arthur; Bishop, Stephen Christopher; Banos, Georgios
2017-03-23
The significant social and economic loss as a result of bovine tuberculosis (bTB) presents a continuous challenge to cattle industries in the UK and worldwide. However, host genetic variation in cattle susceptibility to bTB provides an opportunity to select for resistant animals and further understand the genetic mechanisms underlying disease dynamics. The present study identified genomic regions associated with susceptibility to bTB using genome-wide association (GWA), regional heritability mapping (RHM) and chromosome association approaches. Phenotypes comprised de-regressed estimated breeding values of 804 Holstein-Friesian sires and pertained to three bTB indicator traits: i) positive reactors to the skin test with positive post-mortem examination results (phenotype 1); ii) positive reactors to the skin test regardless of post-mortem examination results (phenotype 2) and iii) as in (ii) plus non-reactors and inconclusive reactors to the skin tests with positive post-mortem examination results (phenotype 3). Genotypes based on the 50 K SNP DNA array were available and a total of 34,874 SNPs remained per animal after quality control. The estimated polygenic heritability for susceptibility to bTB was 0.26, 0.37 and 0.34 for phenotypes 1, 2 and 3, respectively. GWA analysis identified a putative SNP on Bos taurus autosomes (BTA) 2 associated with phenotype 1, and another on BTA 23 associated with phenotype 2. Genomic regions encompassing these SNPs were found to harbour potentially relevant annotated genes. RHM confirmed the effect of these genomic regions and identified new regions on BTA 18 for phenotype 1 and BTA 3 for phenotypes 2 and 3. Heritabilities of the genomic regions ranged between 0.05 and 0.08 across the three phenotypes. Chromosome association analysis indicated a major role of BTA 23 on susceptibility to bTB. Genomic regions and candidate genes identified in the present study provide an opportunity to further understand pathways critical to cattle susceptibility to bTB and enhance genetic improvement programmes aiming at controlling and eradicating the disease.
DoGSD: the dog and wolf genome SNP database.
Bai, Bing; Zhao, Wen-Ming; Tang, Bi-Xia; Wang, Yan-Qing; Wang, Lu; Zhang, Zhang; Yang, He-Chuan; Liu, Yan-Hu; Zhu, Jun-Wei; Irwin, David M; Wang, Guo-Dong; Zhang, Ya-Ping
2015-01-01
The rapid advancement of next-generation sequencing technology has generated a deluge of genomic data from domesticated dogs and their wild ancestor, grey wolves, which have simultaneously broadened our understanding of domestication and diseases that are shared by humans and dogs. To address the scarcity of single nucleotide polymorphism (SNP) data provided by authorized databases and to make SNP data more easily/friendly usable and available, we propose DoGSD (http://dogsd.big.ac.cn), the first canidae-specific database which focuses on whole genome SNP data from domesticated dogs and grey wolves. The DoGSD is a web-based, open-access resource comprising ∼ 19 million high-quality whole-genome SNPs. In addition to the dbSNP data set (build 139), DoGSD incorporates a comprehensive collection of SNPs from two newly sequenced samples (1 wolf and 1 dog) and collected SNPs from three latest dog/wolf genetic studies (7 wolves and 68 dogs), which were taken together for analysis with the population genetic statistics, Fst. In addition, DoGSD integrates some closely related information including SNP annotation, summary lists of SNPs located in genes, synonymous and non-synonymous SNPs, sampling location and breed information. All these features make DoGSD a useful resource for in-depth analysis in dog-/wolf-related studies. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.
van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F
2015-09-17
In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.
Livingstone, Donald; Stack, Conrad; Mustiga, Guiliana M; Rodezno, Dayana C; Suarez, Carmen; Amores, Freddy; Feltus, Frank A; Mockaitis, Keithanne; Cornejo, Omar E; Motamayor, Juan C
2017-01-01
Cacao ( Theobroma cacao L.) is an important cash crop in tropical regions around the world and has a rich agronomic history in South America. As a key component in the cosmetic and confectionary industries, millions of people worldwide use products made from cacao, ranging from shampoo to chocolate. An Illumina Infinity II array was created using 13,530 SNPs identified within a small diversity panel of cacao. Of these SNPs, 12,643 derive from variation within annotated cacao genes. The genotypes of 3,072 trees were obtained, including two mapping populations from Ecuador. High-density linkage maps for these two populations were generated and compared to the cacao genome assembly. Phenotypic data from these populations were combined with the linkage maps to identify the QTLs for yield and disease resistance.
Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K.; Wang, Qin; Dennis, Joe; Alonso, M. Rosario; Andrulis, Irene L.; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W.; Benitez, Javier; Bogdanova, Natalia V.; Bojesen, Stig E.; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M.; Couch, Fergus J.; Cox, Angela; Cross, Simon S.; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F.; Fasching, Peter A.; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G.; Goldberg, Mark S.; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A.; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L.; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L.; Muir, Kenneth; Neuhausen, Susan L.; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J.; Schmidt, Marjanka K.; Schmutzler, Rita K.; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C.; Stram, Daniel O.; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H.; Tessier, Daniel C.; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M.; Vincent, Daniel; Winqvist, Robert; Wu, Anna H.; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D. P.; Hall, Per; Edwards, Stacey L.; Simard, Jacques; French, Juliet D.; Chenevix-Trench, Georgia; Dunning, Alison M.
2016-01-01
Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90–0.94; P = 8.96 × 10−15)) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10−09, r2 = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10−11, r2 = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus. PMID:27600471
Darabi, Hatef; Beesley, Jonathan; Droit, Arnaud; Kar, Siddhartha; Nord, Silje; Moradi Marjaneh, Mahdi; Soucy, Penny; Michailidou, Kyriaki; Ghoussaini, Maya; Fues Wahl, Hanna; Bolla, Manjeet K; Wang, Qin; Dennis, Joe; Alonso, M Rosario; Andrulis, Irene L; Anton-Culver, Hoda; Arndt, Volker; Beckmann, Matthias W; Benitez, Javier; Bogdanova, Natalia V; Bojesen, Stig E; Brauch, Hiltrud; Brenner, Hermann; Broeks, Annegien; Brüning, Thomas; Burwinkel, Barbara; Chang-Claude, Jenny; Choi, Ji-Yeob; Conroy, Don M; Couch, Fergus J; Cox, Angela; Cross, Simon S; Czene, Kamila; Devilee, Peter; Dörk, Thilo; Easton, Douglas F; Fasching, Peter A; Figueroa, Jonine; Fletcher, Olivia; Flyger, Henrik; Galle, Eva; García-Closas, Montserrat; Giles, Graham G; Goldberg, Mark S; González-Neira, Anna; Guénel, Pascal; Haiman, Christopher A; Hallberg, Emily; Hamann, Ute; Hartman, Mikael; Hollestelle, Antoinette; Hopper, John L; Ito, Hidemi; Jakubowska, Anna; Johnson, Nichola; Kang, Daehee; Khan, Sofia; Kosma, Veli-Matti; Kriege, Mieke; Kristensen, Vessela; Lambrechts, Diether; Le Marchand, Loic; Lee, Soo Chin; Lindblom, Annika; Lophatananon, Artitaya; Lubinski, Jan; Mannermaa, Arto; Manoukian, Siranoush; Margolin, Sara; Matsuo, Keitaro; Mayes, Rebecca; McKay, James; Meindl, Alfons; Milne, Roger L; Muir, Kenneth; Neuhausen, Susan L; Nevanlinna, Heli; Olswold, Curtis; Orr, Nick; Peterlongo, Paolo; Pita, Guillermo; Pylkäs, Katri; Rudolph, Anja; Sangrajrang, Suleeporn; Sawyer, Elinor J; Schmidt, Marjanka K; Schmutzler, Rita K; Seynaeve, Caroline; Shah, Mitul; Shen, Chen-Yang; Shu, Xiao-Ou; Southey, Melissa C; Stram, Daniel O; Surowy, Harald; Swerdlow, Anthony; Teo, Soo H; Tessier, Daniel C; Tomlinson, Ian; Torres, Diana; Truong, Thérèse; Vachon, Celine M; Vincent, Daniel; Winqvist, Robert; Wu, Anna H; Wu, Pei-Ei; Yip, Cheng Har; Zheng, Wei; Pharoah, Paul D P; Hall, Per; Edwards, Stacey L; Simard, Jacques; French, Juliet D; Chenevix-Trench, Georgia; Dunning, Alison M
2016-09-07
Genome-wide association studies have found SNPs at 17q22 to be associated with breast cancer risk. To identify potential causal variants related to breast cancer risk, we performed a high resolution fine-mapping analysis that involved genotyping 517 SNPs using a custom Illumina iSelect array (iCOGS) followed by imputation of genotypes for 3,134 SNPs in more than 89,000 participants of European ancestry from the Breast Cancer Association Consortium (BCAC). We identified 28 highly correlated common variants, in a 53 Kb region spanning two introns of the STXBP4 gene, that are strong candidates for driving breast cancer risk (lead SNP rs2787486 (OR = 0.92; CI 0.90-0.94; P = 8.96 × 10(-15))) and are correlated with two previously reported risk-associated variants at this locus, SNPs rs6504950 (OR = 0.94, P = 2.04 × 10(-09), r(2) = 0.73 with lead SNP) and rs1156287 (OR = 0.93, P = 3.41 × 10(-11), r(2) = 0.83 with lead SNP). Analyses indicate only one causal SNP in the region and several enhancer elements targeting STXBP4 are located within the 53 kb association signal. Expression studies in breast tumor tissues found SNP rs2787486 to be associated with increased STXBP4 expression, suggesting this may be a target gene of this locus.
Intracranial hemangiopericytoma: Case study with cytogenetics and genome wide SNP-A analysis.
Holland, Heidrun; Livrea, Michela; Ahnert, Peter; Koschny, Ronald; Kirsten, Holger; Meixensberger, Jürgen; Bauer, Manfred; Schober, Ralf; Fritzsch, Dominik; Krupp, Wolfgang
2011-05-15
The tumor entity of hemangiopericytoma is not universally recognized as a nosological entity by pathologists, and there is a trend toward reassigning it to other categories gradually. However, hemangiopericytomas occurring in the nervous system are included in the new WHO classification of brain tumors, and are distinguished from both meningioma and fibrous tumors. Since there are few genetic studies, we performed a comprehensive cytogenetic analysis of an infratentorial hemangiopericytoma in a 55-year-old female. It was originally classified as a grade II tumor but recurred as a grade III tumor with a proliferation index of 20%. Using trypsin-Giemsa staining (GTG-banding) and multicolor fluorescence in situ hybridization (M-FISH), we could confirm the loss of chromosomal material 10q, which has been previously described in hemangiopericytoma, and we identified de novo chromosomal aberrations on chromosome 8. Applying genome-wide high-density single nucleotide polymorphism array (SNP-A) analysis, we detected segments with loss or gain, as well as clonal deletions or regions suggestive of segmental uniparental disomy. These findings, together with the results of conventional histological and immunohistochemical characterization, provide additional evidence for the nosological separation of hemangiopericytoma in the central nervous system as a biologically different entity. Copyright © 2011 Elsevier GmbH. All rights reserved.
Schnider, D; Rieder, S; Leeb, T; Gerber, V; Neuditschko, M
2017-12-01
Recurrent airway obstruction (RAO), also known as heaves, is an asthma-like respiratory disease. Its development is strongly influenced by environmental risk factors such as sensitization and exposure to moldy hay, straw bedding and stabling indoors. A hereditary component has been documented in previous studies; however, so far no causative genetic variant that influences the risk of developing RAO has been identified. In this study, we revised an existing dataset and selected 384 horses for genotyping on the Affymetrix high-density equine SNP array. We performed an allelic case-control genome-wide association study, which revealed a suggestively significant association on equine chromosome 13 at 32 843 309 bp. This SNP is located in the protein-coding gene TXNDC11, which is possibly involved in the folding process of the multiprotein complexes DUOX1 and DUOX2. In humans, these proteins are known to take part in regulating the production of H 2 O 2 in the respiratory tract epithelium as well as in MUC5AC mucin expression. Therefore, TXNDC11 may be considered a functional candidate gene, and further research is needed to explore its potential role in RAO-affected horses. © 2017 Stichting International Foundation for Animal Genetics.
The Genetic Architecture of Adaptations to High Altitude in Ethiopia
Alkorta-Aranburu, Gorka; Beall, Cynthia M.; Witonsky, David B.; Gebremedhin, Amha; Pritchard, Jonathan K.; Di Rienzo, Anna
2012-01-01
Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control and DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo. PMID:23236293
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.
Karchin, Rachel; Diekhans, Mark; Kelly, Libusha; Thomas, Daryl J; Pieper, Ursula; Eswar, Narayanan; Haussler, David; Sali, Andrej
2005-06-15
The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org http://salilab.org/LS-SNP/supp-info.pdf.
Pendergrass, Sarah A; Verma, Shefali S; Holzinger, Emily R; Moore, Carrie B; Wallace, John; Dudek, Scott M; Huggins, Wayne; Kitchner, Terrie; Waudby, Carol; Berg, Richard; McCarty, Catherine A; Ritchie, Marylyn D
2013-01-01
Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10(-4) associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.
Biological relevance of CNV calling methods using familial relatedness including monozygotic twins.
Castellani, Christina A; Melka, Melkaye G; Wishart, Andrea E; Locke, M Elizabeth O; Awamleh, Zain; O'Reilly, Richard L; Singh, Shiva M
2014-04-21
Studies involving the analysis of structural variation including Copy Number Variation (CNV) have recently exploded in the literature. Furthermore, CNVs have been associated with a number of complex diseases and neurodevelopmental disorders. Common methods for CNV detection use SNP, CNV, or CGH arrays, where the signal intensities of consecutive probes are used to define the number of copies associated with a given genomic region. These practices pose a number of challenges that interfere with the ability of available methods to accurately call CNVs. It has, therefore, become necessary to develop experimental protocols to test the reliability of CNV calling methods from microarray data so that researchers can properly discriminate biologically relevant data from noise. We have developed a workflow for the integration of data from multiple CNV calling algorithms using the same array results. It uses four CNV calling programs: PennCNV (PC), Affymetrix® Genotyping Console™ (AGC), Partek® Genomics Suite™ (PGS) and Golden Helix SVS™ (GH) to analyze CEL files from the Affymetrix® Human SNP 6.0 Array™. To assess the relative suitability of each program, we used individuals of known genetic relationships. We found significant differences in CNV calls obtained by different CNV calling programs. Although the programs showed variable patterns of CNVs in the same individuals, their distribution in individuals of different degrees of genetic relatedness has allowed us to offer two suggestions. The first involves the use of multiple algorithms for the detection of the largest possible number of CNVs, and the second suggests the use of PennCNV over all other methods when the use of only one software program is desirable.
Jobs, Magnus; Howell, W. Mathias; Strömqvist, Linda; Mayr, Torsten; Brookes, Anthony J.
2003-01-01
Genotyping technologies need to be continually improved in terms of their flexibility, cost-efficiency, and throughput, to push forward genome variation analysis. To this end, we have leveraged the inherent simplicity of dynamic allele-specific hybridization (DASH) and coupled it to recent innovations of centrifugal arrays and iFRET. We have thereby created a new genotyping platform we term DASH-2, which we demonstrate and evaluate in this report. The system is highly flexible in many ways (any plate format, PCR multiplexing, serial and parallel array processing, spectral-multiplexing of hybridization probes), thus supporting a wide range of application scales and objectives. Precision is demonstrated to be in the range 99.8–100%, and assay costs are 0.05 USD or less per genotype assignment. DASH-2 thus provides a powerful new alternative for genotyping practice, which can be used without the need for expensive robotics support. PMID:12727908
Ferchaud, Anne-Laure; Pedersen, Susanne H; Bekkevold, Dorte; Jian, Jianbo; Niu, Yongchao; Hansen, Michael M
2014-10-06
The threespine stickleback (Gasterosteus aculeatus) has become an important model species for studying both contemporary and parallel evolution. In particular, differential adaptation to freshwater and marine environments has led to high differentiation between freshwater and marine stickleback populations at the phenotypic trait of lateral plate morphology and the underlying candidate gene Ectodysplacin (EDA). Many studies have focused on this trait and candidate gene, although other genes involved in marine-freshwater adaptation may be equally important. In order to develop a resource for rapid and cost efficient analysis of genetic divergence between freshwater and marine sticklebacks, we generated a low-density SNP (Single Nucleotide Polymorphism) array encompassing markers of chromosome regions under putative directional selection, along with neutral markers for background. RAD (Restriction site Associated DNA) sequencing of sixty individuals representing two freshwater and one marine population led to the identification of 33,993 SNP markers. Ninety-six of these were chosen for the low-density SNP array, among which 70 represented SNPs under putatively directional selection in freshwater vs. marine environments, whereas 26 SNPs were assumed to be neutral. Annotation of these regions revealed several genes that are candidates for affecting stickleback phenotypic variation, some of which have been observed in previous studies whereas others are new. We have developed a cost-efficient low-density SNP array that allows for rapid screening of polymorphisms in threespine stickleback. The array provides a valuable tool for analyzing adaptive divergence between freshwater and marine stickleback populations beyond the well-established candidate gene Ectodysplacin (EDA).
A web-based genome browser for 'SNP-aware' assay design
USDA-ARS?s Scientific Manuscript database
Human and animal genomes contain an abundance of single nucleotide polymorphisms (SNPs) that are useful for genetic testing. However, the relatively large number of SNPs present in diverse populations can pose serious problems when designing assays. It is important to “mask” some SNP positions so ...
SNP-based genotyping in lentil: linking sequence information with phenotypes
USDA-ARS?s Scientific Manuscript database
Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluate...
Bassil, Nahla V; Davis, Thomas M; Zhang, Hailong; Ficklin, Stephen; Mittmann, Mike; Webster, Teresa; Mahoney, Lise; Wood, David; Alperin, Elisabeth S; Rosyara, Umesh R; Koehorst-Vanc Putten, Herma; Monfort, Amparo; Sargent, Daniel J; Amaya, Iraida; Denoyes, Beatrice; Bianco, Luca; van Dijk, Thijs; Pirani, Ali; Iezzoni, Amy; Main, Dorrie; Peace, Cameron; Yang, Yilong; Whitaker, Vance; Verma, Sujeet; Bellon, Laurent; Brew, Fiona; Herrera, Raul; van de Weg, Eric
2015-03-07
A high-throughput genotyping platform is needed to enable marker-assisted breeding in the allo-octoploid cultivated strawberry Fragaria × ananassa. Short-read sequences from one diploid and 19 octoploid accessions were aligned to the diploid Fragaria vesca 'Hawaii 4' reference genome to identify single nucleotide polymorphisms (SNPs) and indels for incorporation into a 90 K Affymetrix® Axiom® array. We report the development and preliminary evaluation of this array. About 36 million sequence variants were identified in a 19 member, octoploid germplasm panel. Strategies and filtering pipelines were developed to identify and incorporate markers of several types: di-allelic SNPs (66.6%), multi-allelic SNPs (1.8%), indels (10.1%), and ploidy-reducing "haploSNPs" (11.7%). The remaining SNPs included those discovered in the diploid progenitor F. iinumae (3.9%), and speculative "codon-based" SNPs (5.9%). In genotyping 306 octoploid accessions, SNPs were assigned to six classes with Affymetrix's "SNPolisher" R package. The highest quality classes, PolyHigh Resolution (PHR), No Minor Homozygote (NMH), and Off-Target Variant (OTV) comprised 25%, 38%, and 1% of array markers, respectively. These markers were suitable for genetic studies as demonstrated in the full-sib family 'Holiday' × 'Korona' with the generation of a genetic linkage map consisting of 6,594 PHR SNPs evenly distributed across 28 chromosomes with an average density of approximately one marker per 0.5 cM, thus exceeding our goal of one marker per cM. The Affymetrix IStraw90 Axiom array is the first high-throughput genotyping platform for cultivated strawberry and is commercially available to the worldwide scientific community. The array's high success rate is likely driven by the presence of naturally occurring variation in ploidy level within the nominally octoploid genome, and by effectiveness of the employed array design and ploidy-reducing strategies. This array enables genetic analyses including generation of high-density linkage maps, identification of quantitative trait loci for economically important traits, and genome-wide association studies, thus providing a basis for marker-assisted breeding in this high value crop.
Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng
2017-03-15
With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for milk yield and fat yield. The new multi-trait Bayesian method using latent variables to model heterogeneous variance and covariance could work well for estimating the genomic variances and covariances for all genome regions simultaneously. Those estimated genomic parameters could be useful to improve the genomic prediction accuracy for Chinese and Nordic Holstein populations using a joint reference data in the future.
Georges, Anouk; Cambisano, Nadine; Ahariz, Naïma; Karim, Latifa; Georges, Michel
2013-01-01
A genome-wide linkage scan was conducted in a Northern-European multigenerational pedigree with nine of 40 related members affected with concomitant strabismus. Twenty-seven members of the pedigree including all affected individuals were genotyped using a SNP array interrogating > 300,000 common SNPs. We conducted parametric and non-parametric linkage analyses assuming segregation of an autosomal dominant mutation, yet allowing for incomplete penetrance and phenocopies. We detected two chromosome regions with near-suggestive evidence for linkage, respectively on chromosomes 8 and 18. The chromosome 8 linkage implied a penetrance of 0.80 and a rate of phenocopy of 0.11, while the chromosome 18 linkage implied a penetrance of 0.64 and a rate of phenocopy of 0. Our analysis excludes a simple genetic determinism of strabismus in this pedigree. PMID:24376720
Georges, Anouk; Cambisano, Nadine; Ahariz, Naïma; Karim, Latifa; Georges, Michel
2013-01-01
A genome-wide linkage scan was conducted in a Northern-European multigenerational pedigree with nine of 40 related members affected with concomitant strabismus. Twenty-seven members of the pedigree including all affected individuals were genotyped using a SNP array interrogating > 300,000 common SNPs. We conducted parametric and non-parametric linkage analyses assuming segregation of an autosomal dominant mutation, yet allowing for incomplete penetrance and phenocopies. We detected two chromosome regions with near-suggestive evidence for linkage, respectively on chromosomes 8 and 18. The chromosome 8 linkage implied a penetrance of 0.80 and a rate of phenocopy of 0.11, while the chromosome 18 linkage implied a penetrance of 0.64 and a rate of phenocopy of 0. Our analysis excludes a simple genetic determinism of strabismus in this pedigree.
Genetic diversity and trait genomic prediction in a pea diversity panel.
Burstin, Judith; Salloignon, Pauline; Chabert-Martinello, Marianne; Magnin-Robert, Jean-Bernard; Siol, Mathieu; Jacquin, Françoise; Chauveau, Aurélie; Pont, Caroline; Aubert, Grégoire; Delaitre, Catherine; Truntzer, Caroline; Duc, Gérard
2015-02-21
Pea (Pisum sativum L.), a major pulse crop grown for its protein-rich seeds, is an important component of agroecological cropping systems in diverse regions of the world. New breeding challenges imposed by global climate change and new regulations urge pea breeders to undertake more efficient methods of selection and better take advantage of the large genetic diversity present in the Pisum sativum genepool. Diversity studies conducted so far in pea used Simple Sequence Repeat (SSR) and Retrotransposon Based Insertion Polymorphism (RBIP) markers. Recently, SNP marker panels have been developed that will be useful for genetic diversity assessment and marker-assisted selection. A collection of diverse pea accessions, including landraces and cultivars of garden, field or fodder peas as well as wild peas was characterised at the molecular level using newly developed SNP markers, as well as SSR markers and RBIP markers. The three types of markers were used to describe the structure of the collection and revealed different pictures of the genetic diversity among the collection. SSR showed the fastest rate of evolution and RBIP the slowest rate of evolution, pointing to their contrasted mode of evolution. SNP markers were then used to predict phenotypes -the date of flowering (BegFlo), the number of seeds per plant (Nseed) and thousand seed weight (TSW)- that were recorded for the collection. Different statistical methods were tested including the LASSO (Least Absolute Shrinkage ans Selection Operator), PLS (Partial Least Squares), SPLS (Sparse Partial Least Squares), Bayes A, Bayes B and GBLUP (Genomic Best Linear Unbiased Prediction) methods and the structure of the collection was taken into account in the prediction. Despite a limited number of 331 markers used for prediction, TSW was reliably predicted. The development of marker assisted selection has not reached its full potential in pea until now. This paper shows that the high-throughput SNP arrays that are being developed will most probably allow for a more efficient selection in this species.
Luukkonen, Aino; Teramo, Kari; Puttonen, Hilkka; Ojaniemi, Marja; Varilo, Teppo; Chaudhari, Bimal P.; Plunkett, Jevon; Murray, Jeffrey C.; McCarroll, Steven A.; Muglia, Louis J.; Palotie, Aarno; Hallman, Mikko
2011-01-01
Preterm birth is the major cause of neonatal death and serious morbidity. Most preterm births are due to spontaneous onset of labor without a known cause or effective prevention. Both maternal and fetal genomes influence the predisposition to spontaneous preterm birth (SPTB), but the susceptibility loci remain to be defined. We utilized a combination of unique population structures, family-based linkage analysis, and subsequent case-control association to identify a susceptibility haplotype for SPTB. Clinically well-characterized SPTB families from northern Finland, a subisolate founded by a relatively small founder population that has subsequently experienced a number of bottlenecks, were selected for the initial discovery sample. Genome-wide linkage analysis using a high-density single-nucleotide polymorphism (SNP) array in seven large northern Finnish non-consanginous families identified a locus on 15q26.3 (HLOD 4.68). This region contains the IGF1R gene, which encodes the type 1 insulin-like growth factor receptor IGF-1R. Haplotype segregation analysis revealed that a 55 kb 12-SNP core segment within the IGF1R gene was shared identical-by-state (IBS) in five families. A follow-up case-control study in an independent sample representing the more general Finnish population showed an association of a 6-SNP IGF1R haplotype with SPTB in the fetuses, providing further evidence for IGF1R as a SPTB predisposition gene (frequency in cases versus controls 0.11 versus 0.05, P = 0.001, odds ratio 2.3). This study demonstrates the identification of a predisposing, low-frequency haplotype in a multifactorial trait using a well-characterized population and a combination of family and case-control designs. Our findings support the identification of the novel susceptibility gene IGF1R for predisposition by the fetal genome to being born preterm. PMID:21304894
Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F
2014-01-01
We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.
snpAD: An ancient DNA genotype caller.
Prüfer, Kay
2018-06-21
The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.
Shen, Wei; Paxton, Christian N; Szankasi, Philippe; Longhurst, Maria; Schumacher, Jonathan A; Frizzell, Kimberly A; Sorrells, Shelly M; Clayton, Adam L; Jattani, Rakhi P; Patel, Jay L; Toydemir, Reha; Kelley, Todd W; Xu, Xinjie
2018-04-01
Genetic abnormalities, including copy number variants (CNV), copy number neutral loss of heterozygosity (CN-LOH) and gene mutations, underlie the pathogenesis of myeloid malignancies and serve as important diagnostic, prognostic and/or therapeutic markers. Currently, multiple testing strategies are required for comprehensive genetic testing in myeloid malignancies. The aim of this proof-of-principle study was to investigate the feasibility of combining detection of genome-wide large CNVs, CN-LOH and targeted gene mutations into a single assay using next-generation sequencing (NGS). For genome-wide CNV detection, we designed a single nucleotide polymorphism (SNP) sequencing backbone with 22 762 SNP regions evenly distributed across the entire genome. For targeted mutation detection, 62 frequently mutated genes in myeloid malignancies were targeted. We combined this SNP sequencing backbone with a targeted mutation panel, and sequenced 9 healthy individuals and 16 patients with myeloid malignancies using NGS. We detected 52 somatic CNVs, 11 instances of CN-LOH and 39 oncogenic mutations in the 16 patients with myeloid malignancies, and none in the 9 healthy individuals. All CNVs and CN-LOH were confirmed by SNP microarray analysis. We describe a genome-wide SNP sequencing backbone which allows for sensitive detection of genome-wide CNVs and CN-LOH using NGS. This proof-of-principle study has demonstrated that this strategy can provide more comprehensive genetic profiling for patients with myeloid malignancies using a single assay. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Genomewide association study of liver abscess in beef cattle.
Keele, J W; Kuehn, L A; McDaneld, T G; Tait, R G; Jones, S A; Keel, B N; Snelling, W M
2016-02-01
Fourteen percent of U.S. cattle slaughtered in 2011 had liver abscesses, resulting in reduced carcass weight, quality, and value. Liver abscesses can result from a common bacterial cause, , which inhabits rumen lesions caused by acidosis and subsequently escapes into the blood stream, is filtered by the liver, and causes abscesses in the liver. Our aim was to identify SNP associated with liver abscesses in beef cattle. We used lung samples as a DNA source because they have low economic value, they have abundant DNA, and we had unrestricted access to sample them. We collected 2,304 lung samples from a beef processing plant: 1,152 from animals with liver abscess and 1,152 from animals without liver abscess. Lung tissue from pairs of animals, 1 with abscesses and another without, were collected from near one another on the viscera table to ensure that pairs of phenotypically extreme animals came from the same lot. Within each phenotype (abscess or no abscess), cattle were pooled by slaughter sequence into 12 pools of 96 cattle for each phenotype for a total of 24 pools. The pools were constructed by equal volume of frozen lung tissue from each animal. The DNA needed to allelotype each pool was then extracted from pooled lung tissue and the BovineHD Bead Array (777,962 SNP) was run on all 24 pools. Total intensity (TI), an indicator of copy number variants, was the sum of intensities from red and green dyes. Pooling allele frequency (PAF) was red dye intensity divided TI. Total intensity and PAF were weighted by the inverse of their respective genomic covariance matrices computed over all SNP across the genome. A false discovery rate ≤ 5% was achieved for 15 SNP for PAF and 20 SNP for TI. Genes within 50 kbp from significant SNP were in diverse pathways including maintenance of pH homeostasis in the gastrointestinal tract, maintain immune defenses in the liver, migration of leukocytes from the blood into infected tissues, transport of glutamine into the kidney in response to acidosis to facilitate production of bicarbonate to increase pH, aggregate platelets to liver injury to facilitate liver repair, and facilitate axon guidance. Evidence from the 35 detected SNP associations combined with evidence of polygenic variation indicate that there is adequate genetic variation in incidence rate of liver abscesses, which could be exploited to select sires for reduced susceptibility to subacute acidosis and associated liver abscess.
Molecular inversion probe assay for allelic quantitation
Ji, Hanlee; Welch, Katrina
2010-01-01
Molecular inversion probe (MIP) technology has been demonstrated to be a robust platform for large-scale dual genotyping and copy number analysis. Applications in human genomic and genetic studies include the possibility of running dual germline genotyping and combined copy number variation ascertainment. MIPs analyze large numbers of specific genetic target sequences in parallel, relying on interrogation of a barcode tag, rather than direct hybridization of genomic DNA to an array. The MIP approach does not replace, but is complementary to many of the copy number technologies being performed today. Some specific advantages of MIP technology include: Less DNA required (37 ng vs. 250 ng), DNA quality less important, more dynamic range (amplifications detected up to copy number 60), allele specific information “cleaner” (less SNP crosstalk/contamination), and quality of markers better (fewer individual MIPs versus SNPs needed to identify copy number changes). MIPs can be considered a candidate gene (targeted whole genome) approach and can find specific areas of interest that otherwise may be missed with other methods. PMID:19488872
CGDSNPdb: a database resource for error-checked and imputed mouse SNPs.
Hutchins, Lucie N; Ding, Yueming; Szatkiewicz, Jin P; Von Smith, Randy; Yang, Hyuna; de Villena, Fernando Pardo-Manuel; Churchill, Gary A; Graber, Joel H
2010-07-06
The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.
Huang, Jie; Howie, Bryan; McCarthy, Shane; Memari, Yasin; Walter, Klaudia; Min, Josine L; Danecek, Petr; Malerba, Giovanni; Trabetti, Elisabetta; Zheng, Hou-Feng; Gambaro, Giovanni; Richards, J Brent; Durbin, Richard; Timpson, Nicholas J; Marchini, Jonathan; Soranzo, Nicole
2015-09-14
Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele frequency in the British population. Here we demonstrate the value of this resource for improving imputation accuracy at rare and low-frequency variants in both a UK and an Italian population. We show that large increases in imputation accuracy can be achieved by re-phasing WGS reference panels after initial genotype calling. We also present a method for combining WGS panels to improve variant coverage and downstream imputation accuracy, which we illustrate by integrating 7,562 WGS haplotypes from the UK10K project with 2,184 haplotypes from the 1000 Genomes Project. Finally, we introduce a novel approximation that maintains speed without sacrificing imputation accuracy for rare variants.
Keaton, Jacob M; Gao, Chuan; Guan, Meijian; Hellwege, Jacklyn N; Palmer, Nicholette D; Pankow, James S; Fornage, Myriam; Wilson, James G; Correa, Adolfo; Rasmussen-Torvik, Laura J; Rotter, Jerome I; Chen, Yii-Der I; Taylor, Kent D; Rich, Stephen S; Wagenknecht, Lynne E; Freedman, Barry I; Ng, Maggie C Y; Bowden, Donald W
2018-04-24
Although type 2 diabetes (T2D) results from metabolic defects in insulin secretion and insulin sensitivity, most of the genetic risk loci identified to date relates to insulin secretion. We reported that T2D loci influencing insulin sensitivity may be identified through interactions with insulin secretion loci, thereby leading to T2D. Here, we hypothesize that joint testing of variant main effects and interaction effects with an insulin secretion locus increases power to identify genetic interactions leading to T2D. We tested this hypothesis with an intronic MTNR1B SNP, rs10830963, which is associated with acute insulin response to glucose, a dynamic measure of insulin secretion. rs10830963 was tested for interaction and joint (main + interaction) effects with genome-wide data in African Americans (2,452 cases and 3,772 controls) from five cohorts. Genome-wide genotype data (Affymetrix Human Genome 6.0 array) was imputed to a 1000 Genomes Project reference panel. T2D risk was modeled using logistic regression with rs10830963 dosage, age, sex, and principal component as predictors. Joint effects were captured using the Kraft two degrees of freedom test. Genome-wide significant (P < 5 × 10 -8 ) interaction with MTNR1B and joint effects were detected for CMIP intronic SNP rs17197883 (P interaction = 1.43 × 10 -8 ; P joint = 4.70 × 10 -8 ). CMIP variants have been nominally associated with T2D, fasting glucose, and adiponectin in individuals of East Asian ancestry, with high-density lipoprotein, and with waist-to-hip ratio adjusted for body mass index in Europeans. These data support the hypothesis that additional genetic factors contributing to T2D risk, including insulin sensitivity loci, can be identified through interactions with insulin secretion loci. © 2018 WILEY PERIODICALS, INC.
Do you really know where this SNP goes?
USDA-ARS?s Scientific Manuscript database
The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...
Kulbrock, Maike; Lehner, Stefanie; Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
2013-01-01
Equine recurrent uveitis (ERU) is a common eye disease affecting up to 3-15% of the horse population. A genome-wide association study (GWAS) using the Illumina equine SNP50 bead chip was performed to identify loci conferring risk to ERU. The sample included a total of 144 German warmblood horses. A GWAS showed a significant single nucleotide polymorphism (SNP) on horse chromosome (ECA) 20 at 49.3 Mb, with IL-17A and IL-17F being the closest genes. This locus explained a fraction of 23% of the phenotypic variance for ERU. A GWAS taking into account the severity of ERU, revealed a SNP on ECA18 nearby to the crystalline gene cluster CRYGA-CRYGF. For both genomic regions on ECA18 and 20, significantly associated haplotypes containing the genome-wide significant SNPs could be demonstrated. In conclusion, our results are indicative for a genetic component regulating the possible critical role of IL-17A and IL-17F in the pathogenesis of ERU. The associated SNP on ECA18 may be indicative for cataract formation in the course of ERU.
USDA-ARS?s Scientific Manuscript database
Our objective was to evaluate whether breed composition of crossbred cattle could be predicted using reference breed frequencies of SNP markers on the BovineSNP50 array. Semen DNA samples of over 2,000 bulls from 16 common commercial beef breeds were genotyped using the array and used to estimate cu...
Multifaceted Genomic Risk for Brain Function in Schizophrenia
Chen, Jiayu; Calhoun, Vince D.; Pearlson, Godfrey D.; Ehrlich, Stefan; Turner, Jessica A.; Ho, Beng-Choon; Wassink, Thomas H.; Michael, Andrew M; Liu, Jingyu
2012-01-01
Recently, deriving candidate endophenotypes from brain imaging data has become a valuable approach to study genetic influences on schizophrenia (SZ), whose pathophysiology remains unclear. In this work we utilized a multivariate approach, parallel independent component analysis, to identify genomic risk components associated with brain function abnormalities in SZ. 5157 candidate single nucleotide polymorphisms (SNPs) were derived from genome-wide array based on their possible connections with SZ and further investigated for their associations with brain activations captured with functional magnetic resonance imaging (fMRI) during a sensorimotor task. Using data from 92 SZ patients and 116 healthy controls, we detected a significant correlation (r= 0.29; p= 2.41×10−5) between one fMRI component and one SNP component, both of which significantly differentiated patients from controls. The fMRI component mainly consisted of precentral and postcentral gyri, the major activated regions in the motor task. On average, higher activation in these regions was observed in participants with higher loadings of the linked SNP component, predominantly contributed to by 253 SNPs. 138 identified SNPs were from known coding regions of 100 unique genes. 31 identified SNPs did not differ between groups, but moderately correlated with some other group-discriminating SNPs, indicating interactions among alleles contributing towards elevated SZ susceptibility. The genes associated with the identified SNPs participated in four neurotransmitter pathways: GABA receptor signaling, dopamine receptor signaling, neuregulin signaling and glutamate receptor signaling. In summary, our work provides further evidence for the complexity of genomic risk to the functional brain abnormality in SZ and suggests a pathological role of interactions between SNPs, genes and multiple neurotransmitter pathways. PMID:22440650
Fox, Caroline S; Liu, Yongmei; White, Charles C; Feitosa, Mary; Smith, Albert V; Heard-Costa, Nancy; Lohman, Kurt; Johnson, Andrew D; Foster, Meredith C; Greenawalt, Danielle M; Griffin, Paula; Ding, Jinghong; Newman, Anne B; Tylavsky, Fran; Miljkovic, Iva; Kritchevsky, Stephen B; Launer, Lenore; Garcia, Melissa; Eiriksdottir, Gudny; Carr, J Jeffrey; Gudnason, Vilmunder; Harris, Tamara B; Cupples, L Adrienne; Borecki, Ingrid B
2012-01-01
Body fat distribution, particularly centralized obesity, is associated with metabolic risk above and beyond total adiposity. We performed genome-wide association of abdominal adipose depots quantified using computed tomography (CT) to uncover novel loci for body fat distribution among participants of European ancestry. Subcutaneous and visceral fat were quantified in 5,560 women and 4,997 men from 4 population-based studies. Genome-wide genotyping was performed using standard arrays and imputed to ~2.5 million Hapmap SNPs. Each study performed a genome-wide association analysis of subcutaneous adipose tissue (SAT), visceral adipose tissue (VAT), VAT adjusted for body mass index, and VAT/SAT ratio (a metric of the propensity to store fat viscerally as compared to subcutaneously) in the overall sample and in women and men separately. A weighted z-score meta-analysis was conducted. For the VAT/SAT ratio, our most significant p-value was rs11118316 at LYPLAL1 gene (p = 3.1 × 10E-09), previously identified in association with waist-hip ratio. For SAT, the most significant SNP was in the FTO gene (p = 5.9 × 10E-08). Given the known gender differences in body fat distribution, we performed sex-specific analyses. Our most significant finding was for VAT in women, rs1659258 near THNSL2 (p = 1.6 × 10-08), but not men (p = 0.75). Validation of this SNP in the GIANT consortium data demonstrated a similar sex-specific pattern, with observed significance in women (p = 0.006) but not men (p = 0.24) for BMI and waist circumference (p = 0.04 [women], p = 0.49 [men]). Finally, we interrogated our data for the 14 recently published loci for body fat distribution (measured by waist-hip ratio adjusted for BMI); associations were observed at 7 of these loci. In contrast, we observed associations at only 7/32 loci previously identified in association with BMI; the majority of overlap was observed with SAT. Genome-wide association for visceral and subcutaneous fat revealed a SNP for VAT in women. More refined phenotypes for body composition and fat distribution can detect new loci not previously uncovered in large-scale GWAS of anthropometric traits.
Pappas, D J; Lizee, A; Paunic, V; Beutner, K R; Motyer, A; Vukcevic, D; Leslie, S; Biesiada, J; Meller, J; Taylor, K D; Zheng, X; Zhao, L P; Gourraud, P-A; Hollenbach, J A; Mack, S J; Maiers, M
2018-05-22
Four single nucleotide polymorphism (SNP)-based human leukocyte antigen (HLA) imputation methods (e-HLA, HIBAG, HLA*IMP:02 and MAGPrediction) were trained using 1000 Genomes SNP and HLA genotypes and assessed for their ability to accurately impute molecular HLA-A, -B, -C and -DRB1 genotypes in the Human Genome Diversity Project cell panel. Imputation concordance was high (>89%) across all methods for both HLA-A and HLA-C, but HLA-B and HLA-DRB1 proved generally difficult to impute. Overall, <27.8% of subjects were correctly imputed for all HLA loci by any method. Concordance across all loci was not enhanced via the application of confidence thresholds; reliance on confidence scores across methods only led to noticeable improvement (+3.2%) for HLA-DRB1. As the HLA complex is highly relevant to the study of human health and disease, a standardized assessment of SNP-based HLA imputation methods is crucial for advancing genomic research. Considerable room remains for the improvement of HLA-B and especially HLA-DRB1 imputation methods, and no imputation method is as accurate as molecular genotyping. The application of large, ancestrally diverse HLA and SNP reference data sets and multiple imputation methods has the potential to make SNP-based HLA imputation methods a tractable option for determining HLA genotypes.
Genovar: a detection and visualization tool for genomic variants.
Jung, Kwang Su; Moon, Sanghoon; Kim, Young Jin; Kim, Bong-Jo; Park, Kiejung
2012-05-08
Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None of the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, http://projects.tcag.ca/variation) and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. http://genovar.sourceforge.net/.
Genome-wide investigation of genetic changes during modern breeding of Brassica napus.
Wang, Nian; Li, Feng; Chen, Biyun; Xu, Kun; Yan, Guixin; Qiao, Jiangwei; Li, Jun; Gao, Guizhen; Bancroft, Ian; Meng, Jingling; King, Graham J; Wu, Xiaoming
2014-08-01
Considerable genome variation had been incorporated within rapeseed breeding programs over past decades. In past decades, there have been substantial changes in phenotypic properties of rapeseed as a result of extensive breeding effort. Uncovering the underlying patterns of allelic variation in the context of genome organisation would provide knowledge to guide future genetic improvement. We assessed genome-wide genetic changes, including population structure, genetic relatedness, the extent of linkage disequilibrium, nucleotide diversity and genetic differentiation based on F ST outlier detection, for a panel of 472 Brassica napus inbred accessions using a 60 k Brassica Infinium® SNP array. We found genetic diversity varied in different sub-groups. Moreover, the genetic diversity increased from 1950 to 1980 and then remained at a similar level in China and Europe. We also found ~6-10 % genomic regions revealed high F ST values. Some QTLs previously associated with important agronomic traits overlapped with these regions. Overall, the B. napus C genome was found to have more high F ST signals than the A genome, and we concluded that the C genome may contribute more valuable alleles to generate elite traits. The results of this study indicate that considerable genome variation had been incorporated within rapeseed breeding programs over past decades. These results also contribute to understanding the impact of rapeseed improvement on available genome variation and the potential for dissecting complex agronomic traits.
Nuñez-Acuña, Gustavo; Valenzuela-Muñoz, Valentina; Gallardo-Escárate, Cristian
2014-06-01
The salmon louse Caligus rogercresseyi is the dominant ectoparasite species affecting the salmon aquaculture industry in the Southern hemisphere, and it is currently the main cause for economic losses in Chilean aquaculture. However, despite the great concern over Caligus infestations, genomic information on this louse is still scarce, even while the need to develop high-resolution molecular markers is growing. This study provides the first deep transcriptome survey to identify thousands of SNP markers from C. rogercresseyi, with a total of 69,466 SNPs identified using the MiSeq platform (Illumina®), 30,605 (52%) of which were found in contigs successfully annotated against known protein databases. Furthermore, in silico gene expression profiles associated with SNP variants were evaluated, and the results evidenced a wide array of genes that were down- and upregulated throughout the developmental stages of C. rogercresseyi. Interestingly, putative KEGG pathways involved in resistance to antiparasitic agents were also identified, where ten pathways were associated with the nervous system and one was related to ABC transporters. Taken together, this information could be highly useful for investigating the molecular underpinnings involved in the susceptibility or resistance of salmon lice to chemical treatments. Copyright © 2014 Elsevier Inc. All rights reserved.
Weigel, K A; de los Campos, G; González-Recio, O; Naya, H; Wu, X L; Long, N; Rosa, G J M; Gianola, D
2009-10-01
The objective of the present study was to assess the predictive ability of subsets of single nucleotide polymorphism (SNP) markers for development of low-cost, low-density genotyping assays in dairy cattle. Dense SNP genotypes of 4,703 Holstein bulls were provided by the USDA Agricultural Research Service. A subset of 3,305 bulls born from 1952 to 1998 was used to fit various models (training set), and a subset of 1,398 bulls born from 1999 to 2002 was used to evaluate their predictive ability (testing set). After editing, data included genotypes for 32,518 SNP and August 2003 and April 2008 predicted transmitting abilities (PTA) for lifetime net merit (LNM$), the latter resulting from progeny testing. The Bayesian least absolute shrinkage and selection operator method was used to regress August 2003 PTA on marker covariates in the training set to arrive at estimates of marker effects and direct genomic PTA. The coefficient of determination (R(2)) from regressing the April 2008 progeny test PTA of bulls in the testing set on their August 2003 direct genomic PTA was 0.375. Subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP were created by choosing equally spaced and highly ranked SNP, with the latter based on the absolute value of their estimated effects obtained from the training set. The SNP effects were re-estimated from the training set for each subset of SNP, and the 2008 progeny test PTA of bulls in the testing set were regressed on corresponding direct genomic PTA. The R(2) values for subsets of 300, 500, 750, 1,000, 1,250, 1,500, and 2,000 SNP with largest effects (evenly spaced SNP) were 0.184 (0.064), 0.236 (0.111), 0.269 (0.190), 0.289 (0.179), 0.307 (0.228), 0.313 (0.268), and 0.322 (0.291), respectively. These results indicate that a low-density assay comprising selected SNP could be a cost-effective alternative for selection decisions and that significant gains in predictive ability may be achieved by increasing the number of SNP allocated to such an assay from 300 or fewer to 1,000 or more.
Livingstone, Donald; Stack, Conrad; Mustiga, Guiliana M.; Rodezno, Dayana C.; Suarez, Carmen; Amores, Freddy; Feltus, Frank A.; Mockaitis, Keithanne; Cornejo, Omar E.; Motamayor, Juan C.
2017-01-01
Cacao (Theobroma cacao L.) is an important cash crop in tropical regions around the world and has a rich agronomic history in South America. As a key component in the cosmetic and confectionary industries, millions of people worldwide use products made from cacao, ranging from shampoo to chocolate. An Illumina Infinity II array was created using 13,530 SNPs identified within a small diversity panel of cacao. Of these SNPs, 12,643 derive from variation within annotated cacao genes. The genotypes of 3,072 trees were obtained, including two mapping populations from Ecuador. High-density linkage maps for these two populations were generated and compared to the cacao genome assembly. Phenotypic data from these populations were combined with the linkage maps to identify the QTLs for yield and disease resistance. PMID:29259608
Moore, Jean-Sébastien; Bourret, Vincent; Dionne, Mélanie; Bradbury, Ian; O'Reilly, Patrick; Kent, Matthew; Chaput, Gérald; Bernatchez, Louis
2014-12-01
Anadromous Atlantic salmon (Salmo salar) is a species of major conservation and management concern in North America, where population abundance has been declining over the past 30 years. Effective conservation actions require the delineation of conservation units to appropriately reflect the spatial scale of intraspecific variation and local adaptation. Towards this goal, we used the most comprehensive genetic and genomic database for Atlantic salmon to date, covering the entire North American range of the species. The database included microsatellite data from 9142 individuals from 149 sampling locations and data from a medium-density SNP array providing genotypes for >3000 SNPs for 50 sampling locations. We used neutral and putatively selected loci to integrate adaptive information in the definition of conservation units. Bayesian clustering with the microsatellite data set and with neutral SNPs identified regional groupings largely consistent with previously published regional assessments. The use of outlier SNPs did not result in major differences in the regional groupings, suggesting that neutral markers can reflect the geographic scale of local adaptation despite not being under selection. We also performed assignment tests to compare power obtained from microsatellites, neutral SNPs and outlier SNPs. Using SNP data substantially improved power compared to microsatellites, and an assignment success of 97% to the population of origin and of 100% to the region of origin was achieved when all SNP loci were used. Using outlier SNPs only resulted in minor improvements to assignment success to the population of origin but improved regional assignment. We discuss the implications of these new genetic resources for the conservation and management of Atlantic salmon in North America. © 2014 John Wiley & Sons Ltd.
Davis, Brian W.; Schoenebeck, Jeffrey J.
2017-01-01
Domestic dog breeds display significant diversity in both body mass and skeletal size, resulting from intensive selective pressure during the formation and maintenance of modern breeds. While previous studies focused on the identification of alleles that contribute to small skeletal size, little is known about the underlying genetics controlling large size. We first performed a genome-wide association study (GWAS) using the Illumina Canine HD 170,000 single nucleotide polymorphism (SNP) array which compared 165 large-breed dogs from 19 breeds (defined as having a Standard Breed Weight (SBW) >41 kg [90 lb]) to 690 dogs from 69 small breeds (SBW ≤41 kg). We identified two loci on the canine X chromosome that were strongly associated with large body size at 82–84 megabases (Mb) and 101–104 Mb. Analyses of whole genome sequencing (WGS) data from 163 dogs revealed two indels in the Insulin Receptor Substrate 4 (IRS4) gene at 82.2 Mb and two additional mutations, one SNP and one deletion of a single codon, in Immunoglobulin Superfamily member 1 gene (IGSF1) at 102.3 Mb. IRS4 and IGSF1 are members of the GH/IGF1 and thyroid pathways whose roles include determination of body size. We also found one highly associated SNP in the 5’UTR of Acyl-CoA Synthetase Long-chain family member 4 (ACSL4) at 82.9 Mb, a gene which controls the traits of muscling and back fat thickness. We show by analysis of sequencing data from 26 wolves and 959 dogs representing 102 domestic dog breeds that skeletal size and body mass in large dog breeds are strongly associated with variants within IRS4, ACSL4 and IGSF1. PMID:28257443
The effect of algorithms on copy number variant detection.
Tsuang, Debby W; Millard, Steven P; Ely, Benjamin; Chi, Peter; Wang, Kenneth; Raskind, Wendy H; Kim, Sulgi; Brkanac, Zoran; Yu, Chang-En
2010-12-30
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery. We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212. Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
GenomeGems: evaluation of genetic variability from deep sequencing data
2012-01-01
Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151
Peña-Llopis, Samuel; Brugarolas, James
2014-01-01
Genomic technologies have revolutionized our understanding of complex Mendelian diseases and cancer. Solid tumors present several challenges for genomic analyses, such as tumor heterogeneity and tumor contamination with surrounding stroma and infiltrating lymphocytes. We developed a protocol to (i) select tissues of high cellular purity on the basis of histological analyses of immediately flanking sections and (ii) simultaneously extract genomic DNA (gDNA), messenger RNA (mRNA), noncoding RNA (ncRNA; enriched in microRNA (miRNA)) and protein from the same tissues. After tissue selection, about 12–16 extractions of DNA/RNA/protein can be obtained per day. Compared with other similar approaches, this fast and reliable methodology allowed us to identify mutations in tumors with remarkable sensitivity and to perform integrative analyses of whole-genome and exome data sets, DNA copy numbers (by single-nucleotide polymorphism (SNP) arrays), gene expression data (by transcriptome profiling and quantitative PCR (qPCR)) and protein levels (by western blotting and immunohistochemical analysis) from the same samples. Although we focused on renal cell carcinoma, this protocol may be adapted with minor changes to any human or animal tissue to obtain high-quality and high-yield nucleic acids and proteins. PMID:24136348
Goodin, Douglas S.; Khankhanian, Pouya
2014-01-01
Background Genome-wide association studies (GWAS) identify disease-associations for single-nucleotide-polymorphisms (SNPs) from scattered genomic-locations. However, SNPs frequently reside on several different SNP-haplotypes, only some of which may be disease-associated. This circumstance lowers the observed odds-ratio for disease-association. Methodology/Principal Findings Here we develop a method to identify the two SNP-haplotypes, which combine to produce each person’s SNP-genotype over specified chromosomal segments. Two multiple sclerosis (MS)-associated genetic regions were modeled; DRB1 (a Class II molecule of the major histocompatibility complex) and MMEL1 (an endopeptidase that degrades both neuropeptides and β-amyloid). For each locus, we considered sets of eleven adjacent SNPs, surrounding the putative disease-associated gene and spanning ∼200 kb of DNA. The SNP-information was converted into an ordered-set of eleven-numbers (subject-vectors) based on whether a person had zero, one, or two copies of particular SNP-variant at each sequential SNP-location. SNP-strings were defined as those ordered-combinations of eleven-numbers (0 or 1), representing a haplotype, two of which combined to form the observed subject-vector. Subject-vectors were resolved using probabilistic methods. In both regions, only a small number of SNP-strings were present. We compared our method to the SHAPEIT-2 phasing-algorithm. When the SNP-information spanning 200 kb was used, SHAPEIT-2 was inaccurate. When the SHAPEIT-2 window was increased to 2,000 kb, the concordance between the two methods, in both of these eleven-SNP regions, was over 99%, suggesting that, in these regions, both methods were quite accurate. Nevertheless, correspondence was not uniformly high over the entire DNA-span but, rather, was characterized by alternating peaks and valleys of concordance. Moreover, in the valleys of poor-correspondence, SHAPEIT-2 was also inconsistent with itself, suggesting that the SNP-string method is more accurate across the entire region. Conclusions/Significance Accurate haplotype identification will enhance the detection of genetic-associations. The SNP-string method provides a simple means to accomplish this and can be extended to cover larger genomic regions, thereby improving a GWAS’s power, even for those published previously. PMID:24727690
Gai, Xiaowu; Perin, Juan C; Murphy, Kevin; O'Hara, Ryan; D'arcy, Monica; Wenocur, Adam; Xie, Hongbo M; Rappaport, Eric F; Shaikh, Tamim H; White, Peter S
2010-02-04
Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist. We developed a suite of software tools and resources (CNV Workshop) for automated, genome-wide CNV detection from a variety of SNP array platforms. CNV Workshop includes three major components: detection, annotation, and presentation of structural variants from genome array data. CNV detection utilizes a robust and genotype-specific extension of the Circular Binary Segmentation algorithm, and the use of additional detection algorithms is supported. Predicted CNVs are captured in a MySQL database that supports cohort-based projects and incorporates a secure user authentication layer and user/admin roles. To assist with determination of pathogenicity, detected CNVs are also annotated automatically for gene content, known disease loci, and gene-based literature references. Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data, integration with the UCSC Genome Browser, and tabular displays of genomic attributes for each CNV. To our knowledge, CNV Workshop represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants. CNV Workshop has been successfully utilized for assessment of genomic variation in healthy individuals and disease cohorts and is an ideal platform for coordinating multiple associated projects. Available on the web at: http://sourceforge.net/projects/cnv.
USDA-ARS?s Scientific Manuscript database
One focus of the Sorghum Translational Genomics Lab (part of sorghum CRIS, PSGD, CSRL, USDA-ARS, Lubbock TX) is to utilize nucleotide variation between sorghum germplasm such as those derived from RNA seq for translation and validation of Single Nucleotide Polymorphism (SNP) into easy access DNA m...
USDA-ARS?s Scientific Manuscript database
Genetic diversity, population structure, and genome-wide marker-trait association analyses were conducted on a special collection of 298 homozygous lettuce (Lactuca sativa L.) lines. Each of these lines was derived from a single plant that had been genotyped with 384 SNP makers using LSGermOPA. They...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geraldes, Armando; Hannemann, Jan; Grassa, Chris
2013-01-01
Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. Despite the declining costs of genotyping by sequencing, for most studies, the use of large SNP genotyping arrays still offers the most cost-effective solution for large-scale targeted genotyping. Here we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre-ascertained in 34 wild accessions covering most of the species range. Due to the rapid decay of linkage disequilibrium in P. trichocarpa we adopted a candidate gene approach to the arraymore » design that resulted in the selection of 34,131 SNPs, the majority of which are located in, or within 2 kb, of 3,543 candidate genes. A subset of the SNPs (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%, indicating that high-quality data are generated with this array. We demonstrate that even among small numbers of samples (n=10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that due to ascertainment bias the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca (P. balsamifera and P. angustifolia). Finally, we provide evidence for the utility of the array for intraspecific studies of genetic differentiation and for species assignment and the detection of natural hybrids.« less
Trembizki, Ella; Smith, Helen; Lahra, Monica M; Chen, Marcus; Donovan, Basil; Fairley, Christopher K; Guy, Rebecca; Kaldor, John; Regan, David; Ward, James; Nissen, Michael D; Sloots, Theo P; Whiley, David M
2014-06-01
Neisseria gonorrhoeae antimicrobial resistance (AMR) is a global problem heightened by emerging resistance to ceftriaxone. Appropriate molecular typing methods are important for understanding the emergence and spread of N. gonorrhoeae AMR. We report on the development, validation and testing of a Sequenom MassARRAY iPLEX method for multilocus sequence typing (MLST)-style genotyping of N. gonorrhoeae isolates. An iPLEX MassARRAY method (iPLEX14SNP) was developed targeting 14 informative gonococcal single nucleotide polymorphisms (SNPs) previously shown to predict MLST types. The method was initially validated using 24 N. gonorrhoeae control isolates and was then applied to 397 test isolates collected throughout Queensland, Australia in the first half of 2012. The iPLEX14SNP method provided 100% accuracy for the control isolates, correctly identifying all 14 SNPs for all 24 isolates (336/336). For the 397 test isolates, the iPLEX14SNP assigned results for 5461 of the possible 5558 SNPs (SNP call rate 98.25%), with complete 14 SNP profiles obtained for 364 isolates. Based on the complete SNP profile data, there were 49 different sequence types identified in Queensland, with 11 of the 49 SNP profiles accounting for the majority (n = 280; 77%) of isolates. AMR was dominated by several geographically clustered sequence types. Using the iPLEX14SNP method, up to 384 isolates could be tested within 1 working day for less than Aus$10 per isolate. The iPLEX14SNP offers an accurate and high-throughput method for the MLST-style genotyping of N. gonorrhoeae and may prove particularly useful for large-scale studies investigating the emergence and spread of gonococcal AMR. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Joint Identification of Genetic Variants for Physical Activity in Korean Population
Kim, Jayoun; Kim, Jaehee; Min, Haesook; Oh, Sohee; Kim, Yeonjung; Lee, Andy H.; Park, Taesung
2014-01-01
There has been limited research on genome-wide association with physical activity (PA). This study ascertained genetic associations between PA and 344,893 single nucleotide polymorphism (SNP) markers in 8842 Korean samples. PA data were obtained from a validated questionnaire that included information on PA intensity and duration. Metabolic equivalent of tasks were calculated to estimate the total daily PA level for each individual. In addition to single- and multiple-SNP association tests, a pathway enrichment analysis was performed to identify the biological significance of SNP markers. Although no significant SNP was found at genome-wide significance level via single-SNP association tests, 59 genetic variants mapped to 76 genes were identified via a multiple SNP approach using a bootstrap selection stability measure. Pathway analysis for these 59 variants showed that maturity onset diabetes of the young (MODY) was enriched. Joint identification of SNPs could enable the identification of multiple SNPs with good predictive power for PA and a pathway enriched for PA. PMID:25026172
Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations
Bendall, Matthew L.; Stevens, Sarah L.R.; Chan, Leong-Keat; ...
2016-01-08
Multiple models describe the formation and evolution of distinct microbial phylogenetic groups. These evolutionary models make different predictions regarding how adaptive alleles spread through populations and how genetic diversity is maintained. Processes predicted by competing evolutionary models, for example, genome-wide selective sweeps vs gene-specific sweeps, could be captured in natural populations using time-series metagenomics if the approach were applied over a sufficiently long time frame. Direct observations of either process would help resolve how distinct microbial groups evolve. Using a 9-year metagenomic study of a freshwater lake (2005–2013), we explore changes in single-nucleotide polymorphism (SNP) frequencies and patterns of genemore » gain and loss in 30 bacterial populations. SNP analyses revealed substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied by >1000-fold among populations. SNP allele frequencies also changed dramatically over time within some populations. Interestingly, nearly all SNP variants were slowly purged over several years from one population of green sulfur bacteria, while at the same time multiple genes either swept through or were lost from this population. Furthermore, these patterns were consistent with a genome-wide selective sweep in progress, a process predicted by the ‘ecotype model’ of speciation but not previously observed in nature. In contrast, other populations contained large, SNP-free genomic regions that appear to have swept independently through the populations prior to the study without purging diversity elsewhere in the genome. Finally, evidence for both genome-wide and gene-specific sweeps suggests that different models of bacterial speciation may apply to different populations coexisting in the same environment.« less
Genome-wide population structure and evolutionary history of the Frizarta dairy sheep.
Kominakis, A; Hager-Theodorides, A L; Saridaki, A; Antonakos, G; Tsiamis, G
2017-10-01
In the present study, we used genomic data, generated with a medium density single nucleotide polymorphisms (SNP) array, to acquire more information on the population structure and evolutionary history of the synthetic Frizarta dairy sheep. First, two typical measures of linkage disequilibrium (LD) were estimated at various physical distances that were then used to make inferences on the effective population size at key past time points. Population structure was also assessed by both multidimensional scaling analysis and k-means clustering on the distance matrix obtained from the animals' genomic relationships. The Wright's fixation F ST index was also employed to assess herds' genetic homogeneity and to indirectly estimate past migration rates. The Wright's fixation F IS index and genomic inbreeding coefficients based on the genomic relationship matrix as well as on runs of homozygosity were also estimated. The Frizarta breed displays relatively low LD levels with r 2 and |D'| equal to 0.18 and 0.50, respectively, at an average inter-marker distance of 31 kb. Linkage disequilibrium decayed rapidly by distance and persisted over just a few thousand base pairs. Rate of LD decay (β) varied widely among the 26 autosomes with larger values estimated for shorter chromosomes (e.g. β=0.057, for OAR6) and smaller values for longer ones (e.g. β=0.022, for OAR2). The inferred effective population size at the beginning of the breed's formation was as high as 549, was then reduced to 463 in 1981 (end of the breed's formation) and further declined to 187, one generation ago. Multidimensional scaling analysis and k-means clustering suggested a genetically homogenous population, F ST estimates indicated relatively low genetic differentiation between herds, whereas a heat map of the animals' genomic kinship relationships revealed a stratified population, at a herd level. Estimates of genomic inbreeding coefficients suggested that most recent parental relatedness may have been a major determinant of the current effective population size. A denser than the 50k SNP panel may be more beneficial when performing genome wide association studies in the breed.
Cheng, Yu-Wei; Tan, Christopher A; Minor, Agata; Arndt, Kelly; Wysinger, Latrice; Grange, Dorothy K; Kozel, Beth A; Robin, Nathaniel H; Waggoner, Darrel; Fitzpatrick, Carrie; Das, Soma; Del Gaudio, Daniela
2014-03-01
Cornelia de Lange syndrome (CdLS) is a genetically heterogeneous disorder characterized by growth retardation, intellectual disability, upper limb abnormalities, hirsutism, and characteristic facial features. In this study we explored the occurrence of intragenic NIPBL copy number variations (CNVs) in a cohort of 510 NIPBL sequence-negative patients with suspected CdLS. Copy number analysis was performed by custom exon-targeted oligonucleotide array-comparative genomic hybridization and/or MLPA. Whole-genome SNP array was used to further characterize rearrangements extending beyond the NIPBL gene. We identified NIPBL CNVs in 13 patients (2.5%) including one intragenic duplication and a deletion in mosaic state. Breakpoint sequences in two patients provided further evidence of a microhomology-mediated replicative mechanism as a potential predominant contributor to CNVs in NIPBL. Patients for whom clinical information was available share classical CdLS features including craniofacial and limb defects. Our experience in studying the frequency of NIBPL CNVs in the largest series of patients to date widens the mutational spectrum of NIPBL and emphasizes the clinical utility of performing NIPBL deletion/duplication analysis in patients with CdLS.
Hall, Barry G
2014-01-01
SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP , a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ² probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.
Zago, V H S; Scherrer, D Z; Parra, E S; Panzoldo, N B; Alexandre, F; Nakandakare, E R; Quintão, E C R; de Faria, E C
2015-03-01
ATP binding cassette transporter G1 (ABCG1) promotes lipidation of nascent high-density lipoprotein (HDL) particles, acting as an intracellular transporter. SNP rs1893590 (c.-204A > C) of ABCG1 gene has been previously studied and reported as functional over plasma HDL-C and lipoprotein lipase activity. This study aimed to investigate the relationships of SNP rs1893590 with plasma lipids and lipoproteins in a large Brazilian population. Were selected 654 asymptomatic and normolipidemic volunteers from both genders. Clinical and anthropometrical data were taken and blood samples were drawn after 12 h fasting. Plasma lipids and lipoproteins, as well as HDL particle size and volume were determined. Genomic DNA was isolated for SNP rs1893590 detection by TaqMan(®) OpenArray(®) Real-Time PCR Plataform (Applied Biosystems). Mann-Whitney U, Chi square and two-way ANOVA were the used statistical tests. No significant differences were found in the comparison analyses between the allele groups for all studied parameters. Conversely, significant interactions were observed between SNP and age over plasma HDL-C, were volunteers under 60 years with AA genotype had increased HDL-C (p = 0.048). Similar results were observed in the group with body mass index (BMI) < 25 kg/m(2), where volunteers with AA genotype had higher HDL-C levels (p = 0.0034), plus an increased HDL particle size (p = 0.01). These findings indicate that SNP rs1893590 of ABCG1 has a significant impact over HDL-C under asymptomatic clinical conditions in an age and BMI dependent way.
Penmetsa, R. V.; Dutta, S.; Kulwal, P. L.; Saxena, R. K.; Datta, S.; Sharma, T. R.; Rosen, B.; Carrasquilla-Garcia, N.; Farmer, A. D.; Dubey, A.; Saxena, K. B.; Gao, J.; Fakrudin, B.; Singh, M. N.; Singh, B. P.; Wanjari, K. B.; Yuan, M.; Srivastava, R. K.; Kilian, A.; Upadhyaya, H. D.; Mallikarjuna, N.; Town, C. D.; Bruening, G. E.; He, G.; May, G. D.; McCombie, R.; Jackson, S. A.; Singh, N. K.; Cook, D. R.
2009-01-01
Pigeonpea (Cajanus cajan), an important food legume crop in the semi-arid regions of the world and the second most important pulse crop in India, has an average crop productivity of 780 kg/ha. The relatively low crop yields may be attributed to non-availability of improved cultivars, poor crop husbandry and exposure to a number of biotic and abiotic stresses in pigeonpea growing regions. Narrow genetic diversity in cultivated germplasm has further hampered the effective utilization of conventional breeding as well as development and utilization of genomic tools, resulting in pigeonpea being often referred to as an ‘orphan crop legume’. To enable genomics-assisted breeding in this crop, the pigeonpea genomics initiative (PGI) was initiated in late 2006 with funding from Indian Council of Agricultural Research under the umbrella of Indo-US agricultural knowledge initiative, which was further expanded with financial support from the US National Science Foundation’s Plant Genome Research Program and the Generation Challenge Program. As a result of the PGI, the last 3 years have witnessed significant progress in development of both genetic as well as genomic resources in this crop through effective collaborations and coordination of genomics activities across several institutes and countries. For instance, 25 mapping populations segregating for a number of biotic and abiotic stresses have been developed or are under development. An 11X-genome coverage bacterial artificial chromosome (BAC) library comprising of 69,120 clones have been developed of which 50,000 clones were end sequenced to generate 87,590 BAC-end sequences (BESs). About 10,000 expressed sequence tags (ESTs) from Sanger sequencing and ca. 2 million short ESTs by 454/FLX sequencing have been generated. A variety of molecular markers have been developed from BESs, microsatellite or simple sequence repeat (SSR)-enriched libraries and mining of ESTs and genomic amplicon sequencing. Of about 21,000 SSRs identified, 6,698 SSRs are under analysis along with 670 orthologous genes using a GoldenGate SNP (single nucleotide polymorphism) genotyping platform, with large scale SNP discovery using Solexa, a next generation sequencing technology, is in progress. Similarly a diversity array technology array comprising of ca. 15,000 features has been developed. In addition, >600 unique nucleotide binding site (NBS) domain containing members of the NBS-leucine rich repeat disease resistance homologs were cloned in pigeonpea; 960 BACs containing these sequences were identified by filter hybridization, BES physical maps developed using high information content fingerprinting. To enrich the genomic resources further, sequenced soybean genome is being analyzed to establish the anchor points between pigeonpea and soybean genomes. In addition, Solexa sequencing is being used to explore the feasibility of generating whole genome sequence. In summary, the collaborative efforts of several research groups under the umbrella of PGI are making significant progress in improving molecular tools in pigeonpea and should significantly benefit pigeonpea genetics and breeding. As these efforts come to fruition, and expanded (depending on funding), pigeonpea would move from an ‘orphan legume crop’ to one where genomics-assisted breeding approaches for a sustainable crop improvement are routine. PMID:20976284
Development and Applications of a Bovine 50,000 SNP Chip
USDA-ARS?s Scientific Manuscript database
To develop an Illumina iSelect high density single nucleotide polymorphism (SNP) assay for cattle, the collaborative iBMC (Illumina, USDA ARS Beltsville, University of Missouri, USDA ARS Clay Center) Consortium first performed a de novo SNP discovery project in which genomic reduced representation l...
Genome-wide association analyses for carcass quality in crossbred beef cattle
2013-01-01
Background Genetic improvement of beef quality will benefit both producers and consumers, and can be achieved by selecting animals that carry desired quantitative trait nucleotides (QTN), which result from intensive searches using genetic markers. This paper presents a genome-wide association approach utilizing single nucleotide polymorphisms (SNP) in the Illumina BovineSNP50 BeadChip to seek genomic regions that potentially harbor genes or QTN underlying variation in carcass quality of beef cattle. This study used 747 genotyped animals, mainly crossbred, with phenotypes on twelve carcass quality traits, including hot carcass weight (HCW), back fat thickness (BF), Longissimus dorsi muscle area or ribeye area (REA), marbling scores (MRB), lean yield grade by Beef Improvement Federation formulae (BIFYLD), steak tenderness by Warner-Bratzler shear force 7-day post-mortem (LM7D) as well as body composition as determined by partial rib (IMPS 103) dissection presented as a percentage of total rib weight including body cavity fat (BDFR), lean (LNR), bone (BNR), intermuscular fat (INFR), subcutaneous fat (SQFR), and total fat (TLFR). Results At the genome wide level false discovery rate (FDR < 10%), eight SNP were found significantly associated with HCW. Seven of these SNP were located on Bos taurus autosome (BTA) 6. At a less stringent significance level (P < 0.001), 520 SNP were found significantly associated with mostly individual traits (473 SNP), and multiple traits (47 SNP). Of these significant SNP, 48 were located on BTA6, and 22 of them were in association with hot carcass weight. There were 53 SNP associated with percentage of rib bone, and 12 of them were on BTA20. The rest of the significant SNP were scattered over other chromosomes. They accounted for 1.90 - 5.89% of the phenotypic variance of the traits. A region of approximately 4 Mbp long on BTA6 was found to be a potential area to harbor candidate genes influencing growth. One marker on BTA25 accounting for 2.67% of the variation in LM7D may be worth further investigation for the improvement of beef tenderness. Conclusion This study provides useful information to further assist the identification of chromosome regions and subsequently genes affecting carcass quality traits in beef cattle. It also revealed many SNP that acted pleiotropically to affect carcass quality. This knowledge is important in selecting subsets of SNP to improve the performance of beef cattle. PMID:24024930
Li, Feng; Chen, Biyun; Xu, Kun; Wu, Jinfeng; Song, Weilin; Bancroft, Ian; Harper, Andrea L.; Trick, Martin; Liu, Shengyi; Gao, Guizhen; Wang, Nian; Yan, Guixin; Qiao, Jiangwei; Li, Jun; Li, Hao; Xiao, Xin; Zhang, Tianyao; Wu, Xiaoming
2014-01-01
Association mapping can quickly and efficiently dissect complex agronomic traits. Rapeseed is one of the most economically important polyploid oil crops, although its genome sequence is not yet published. In this study, a recently developed 60K Brassica Infinium® SNP array was used to analyse an association panel with 472 accessions. The single-nucleotide polymorphisms (SNPs) of the array were in silico mapped using ‘pseudomolecules’ representative of the genome of rapeseed to establish their hypothetical order and to perform association mapping of seed weight and seed quality. As a result, two significant associations on A8 and C3 of Brassica napus were detected for erucic acid content, and the peak SNPs were found to be only 233 and 128 kb away from the key genes BnaA.FAE1 and BnaC.FAE1. BnaA.FAE1 was also identified to be significantly associated with the oil content. Orthologues of Arabidopsis thaliana HAG1 were identified close to four clusters of SNPs associated with glucosinolate content on A9, C2, C7 and C9. For seed weight, we detected two association signals on A7 and A9, which were consistent with previous studies of quantitative trait loci mapping. The results indicate that our association mapping approach is suitable for fine mapping of the complex traits in rapeseed. PMID:24510440
2011-01-01
Background Copy number aberrations (CNAs) are an important molecular signature in cancer initiation, development, and progression. However, these aberrations span a wide range of chromosomes, making it hard to distinguish cancer related genes from other genes that are not closely related to cancer but are located in broadly aberrant regions. With the current availability of high-resolution data sets such as single nucleotide polymorphism (SNP) microarrays, it has become an important issue to develop a computational method to detect driving genes related to cancer development located in the focal regions of CNAs. Results In this study, we introduce a novel method referred to as the wavelet-based identification of focal genomic aberrations (WIFA). The use of the wavelet analysis, because it is a multi-resolution approach, makes it possible to effectively identify focal genomic aberrations in broadly aberrant regions. The proposed method integrates multiple cancer samples so that it enables the detection of the consistent aberrations across multiple samples. We then apply this method to glioblastoma multiforme and lung cancer data sets from the SNP microarray platform. Through this process, we confirm the ability to detect previously known cancer related genes from both cancer types with high accuracy. Also, the application of this approach to a lung cancer data set identifies focal amplification regions that contain known oncogenes, though these regions are not reported using a recent CNAs detecting algorithm GISTIC: SMAD7 (chr18q21.1) and FGF10 (chr5p12). Conclusions Our results suggest that WIFA can be used to reveal cancer related genes in various cancer data sets. PMID:21569311
Jo, Jinkwan; Purushotham, Preethi M.; Han, Koeun; Lee, Heung-Ryul; Nah, Gyoungju; Kang, Byoung-Cheorl
2017-01-01
Single nucleotide polymorphisms (SNPs) play important roles as molecular markers in plant genomics and breeding studies. Although onion (Allium cepa L.) is an important crop globally, relatively few molecular marker resources have been reported due to its large genome and high heterozygosity. Genotyping-by-sequencing (GBS) offers a greater degree of complexity reduction followed by concurrent SNP discovery and genotyping for species with complex genomes. In this study, GBS was employed for SNP mining in onion, which currently lacks a reference genome. A segregating F2 population, derived from a cross between ‘NW-001’ and ‘NW-002,’ as well as multiple parental lines were used for GBS analysis. A total of 56.15 Gbp of raw sequence data were generated and 1,851,428 SNPs were identified from the de novo assembled contigs. Stringent filtering resulted in 10,091 high-fidelity SNP markers. Robust SNPs that satisfied the segregation ratio criteria and with even distribution in the mapping population were used to construct an onion genetic map. The final map contained eight linkage groups and spanned a genetic length of 1,383 centiMorgans (cM), with an average marker interval of 8.08 cM. These robust SNPs were further analyzed using the high-throughput Fluidigm platform for marker validation. This is the first study in onion to develop genome-wide SNPs using GBS. The resulting SNP markers and developed linkage map will be valuable tools for genetic mapping of important agronomic traits and marker-assisted selection in onion breeding programs. PMID:28959273
Lomonaco, Sara; Furumoto, Emily J; Loquasto, Joseph R; Morra, Patrizia; Grassi, Ausilia; Roberts, Robert F
2015-02-01
Identification at the genus, species, and strain levels is desirable when a probiotic microorganism is added to foods. Strains of Bifidobacterium animalis ssp. lactis (BAL) are commonly used worldwide in dairy products supplemented with probiotic strains. However, strain discrimination is difficult because of the high degree of genome identity (99.975%) between different genomes of this subspecies. Typing of monomorphic species can be carried out efficiently by targeting informative single nucleotide polymorphisms (SNP). Findings from a previous study analyzing both reference and commercial strains of BAL identified SNP that could be used to discriminate common strains into 8 groups. This paper describes development of a minisequencing assay based on the primer extension reaction (PER) targeting multiple SNP that can allow strain differentiation of BAL. Based on previous data, 6 informative SNP were selected for further testing, and a multiplex preliminary PCR was optimized to amplify the DNA regions containing the selected SNP. Extension primers (EP) annealing immediately adjacent to the selected SNP were developed and tested in simplex and multiplex PER to evaluate their performance. Twenty-five strains belonging to 9 distinct genomic clusters of B. animalis ssp. lactis were selected and analyzed using the developed minisequencing assay, simultaneously targeting the 6 selected SNP. Fragment analysis was subsequently carried out in duplicate and demonstrated that the assay yielded 8 specific profiles separating the most commonly used commercial strains. This novel multiplex PER approach provides a simple, rapid, flexible SNP-based subtyping method for proper characterization and identification of commercial probiotic strains of BAL from fermented dairy products. To assess the usefulness of this method, DNA was extracted from yogurt manufactured with and without the addition of B. animalis ssp. lactis BB-12. Extracted DNA was then subjected to the minisequencing protocol, resulting in a SNP profile matching the profile for the strain BB-12. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
USDA-ARS?s Scientific Manuscript database
A bacterial artificial chromosome (BAC) library and BAC-end sequences for Gossypium hirsutum L. have recently been developed. Here we report on genomic-based genome-wide SNP mining utilizing re-sequencing data with a BAC-end sequence reference for twelve G. hirsutum L. lines, one G. barbadense L. li...
Hatono, Saki; Nishimura, Kaori; Murakami, Yoko; Tsujimura, Mai; Yamagishi, Hiroshi
2017-09-01
The complete sequence of the mitochondrial genome was determined for two cultivars of Brassica rapa . After determining the sequence of a Chinese cabbage variety, 'Oushou hakusai', the sequence of a mizuna variety, 'Chusei shiroguki sensuji kyomizuna', was mapped against the sequence of Chinese cabbage. The precise sequences where the two varieties demonstrated variation were ascertained by direct sequencing. It was found that the mitochondrial genomes of the two varieties are identical over 219,775 bp, with a single nucleotide polymorphism (SNP) between the genomes. Because B. rapa is the maternal species of an amphidiploid crop species, Brassica juncea , the distribution of the SNP was observed both in B. rapa and B. juncea . While the mizuna type SNP was restricted mainly to cultivars of mizuna (japonica group) in B. rapa , the mizuna type was widely distributed in B. juncea . The finding that the two Brassica species have these SNP types in common suggests that the nucleotide substitution occurred in wild B. rapa before both mitotypes were domesticated. It was further inferred that the interspecific hybridization between B. rapa and B. nigra took place twice and resulted in the two mitotypes of cultivated B. juncea .
Kulbrock, Maike; Lehner, Stefanie; Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
2013-01-01
Equine recurrent uveitis (ERU) is a common eye disease affecting up to 3–15% of the horse population. A genome-wide association study (GWAS) using the Illumina equine SNP50 bead chip was performed to identify loci conferring risk to ERU. The sample included a total of 144 German warmblood horses. A GWAS showed a significant single nucleotide polymorphism (SNP) on horse chromosome (ECA) 20 at 49.3 Mb, with IL-17A and IL-17F being the closest genes. This locus explained a fraction of 23% of the phenotypic variance for ERU. A GWAS taking into account the severity of ERU, revealed a SNP on ECA18 nearby to the crystalline gene cluster CRYGA-CRYGF. For both genomic regions on ECA18 and 20, significantly associated haplotypes containing the genome-wide significant SNPs could be demonstrated. In conclusion, our results are indicative for a genetic component regulating the possible critical role of IL-17A and IL-17F in the pathogenesis of ERU. The associated SNP on ECA18 may be indicative for cataract formation in the course of ERU. PMID:23977091
The effect of rare alleles on estimated genomic relationships from whole genome sequence data.
Eynard, Sonia E; Windig, Jack J; Leroy, Grégoire; van Binsbergen, Rianne; Calus, Mario P L
2015-03-12
Relationships between individuals and inbreeding coefficients are commonly used for breeding decisions, but may be affected by the type of data used for their estimation. The proportion of variants with low Minor Allele Frequency (MAF) is larger in whole genome sequence (WGS) data compared to Single Nucleotide Polymorphism (SNP) chips. Therefore, WGS data provide true relationships between individuals and may influence breeding decisions and prioritisation for conservation of genetic diversity in livestock. This study identifies differences between relationships and inbreeding coefficients estimated using pedigree, SNP or WGS data for 118 Holstein bulls from the 1000 Bull genomes project. To determine the impact of rare alleles on the estimates we compared three scenarios of MAF restrictions: variants with a MAF higher than 5%, variants with a MAF higher than 1% and variants with a MAF between 1% and 5%. We observed significant differences between estimated relationships and, although less significantly, inbreeding coefficients from pedigree, SNP or WGS data, and between MAF restriction scenarios. Computed correlations between pedigree and genomic relationships, within groups with similar relationships, ranged from negative to moderate for both estimated relationships and inbreeding coefficients, but were high between estimates from SNP and WGS (0.49 to 0.99). Estimated relationships from genomic information exhibited higher variation than from pedigree. Inbreeding coefficients analysis showed that more complete pedigree records lead to higher correlation between inbreeding coefficients from pedigree and genomic data. Finally, estimates and correlations between additive genetic (A) and genomic (G) relationship matrices were lower, and variances of the relationships were larger when accounting for allele frequencies than without accounting for allele frequencies. Using pedigree data or genomic information, and including or excluding variants with a MAF below 5% showed significant differences in relationship and inbreeding coefficient estimates. Estimated relationships and inbreeding coefficients are the basis for selection decisions. Therefore, it can be expected that using WGS instead of SNP can affect selection decision. Inclusion of rare variants will give access to the variation they carry, which is of interest for conservation of genetic diversity.
Lu, Timothy Tehua; Lao, Oscar; Nothnagel, Michael; Junge, Olaf; Freitag-Wolf, Sandra; Caliebe, Amke; Balascakova, Miroslava; Bertranpetit, Jaume; Bindoff, Laurence Albert; Comas, David; Holmlund, Gunilla; Kouvatsi, Anastasia; Macek, Milan; Mollet, Isabelle; Nielsen, Finn; Parson, Walther; Palo, Jukka; Ploski, Rafal; Sajantila, Antti; Tagliabracci, Adriano; Gether, Ulrik; Werge, Thomas; Rivadeneira, Fernando; Hofman, Albert; Uitterlinden, André Gerardus; Gieger, Christian; Wichmann, Heinz-Erich; Ruether, Andreas; Schreiber, Stefan; Becker, Christian; Nürnberg, Peter; Nelson, Matthew Roberts; Kayser, Manfred; Krawczak, Michael
2009-07-01
Genetic matching potentially provides a means to alleviate the effects of incomplete Mendelian randomization in population-based gene-disease association studies. We therefore evaluated the genetic-matched pair study design on the basis of genome-wide SNP data (309,790 markers; Affymetrix GeneChip Human Mapping 500K Array) from 2457 individuals, sampled at 23 different recruitment sites across Europe. Using pair-wise identity-by-state (IBS) as a matching criterion, we tried to derive a subset of markers that would allow identification of the best overall matching (BOM) partner for a given individual, based on the IBS status for the subset alone. However, our results suggest that, by following this approach, the prediction accuracy is only notably improved by the first 20 markers selected, and increases proportionally to the marker number thereafter. Furthermore, in a considerable proportion of cases (76.0%), the BOM of a given individual, based on the complete marker set, came from a different recruitment site than the individual itself. A second marker set, specifically selected for ancestry sensitivity using singular value decomposition, performed even more poorly and was no more capable of predicting the BOM than randomly chosen subsets. This leads us to conclude that, at least in Europe, the utility of the genetic-matched pair study design depends critically on the availability of comprehensive genotype information for both cases and controls.
Narrowing the wingless-2 mutation to a 227 kb candidate region on chicken chromosome 12
Webb, A E; Youngworth, I A; Kaya, M; Gitter, C L; O’Hare, E A; May, B; Cheng, H H; Delany, M E
2018-01-01
ABSTRACT Wingless-2 (wg-2) is an autosomal recessive mutation in chicken that results in an embryonic lethal condition. Affected individuals exhibit a multisystem syndrome characterized by absent wings, truncated legs, and craniofacial, kidney, and feather malformations. Previously, work focused on phenotype description, establishing the autosomal recessive pattern of Mendelian inheritance and placing the mutation on an inbred genetic background to create the congenic line UCD Wingless-2.331. The research described in this paper employed the complementary tools of breeding, genetics, and genomics to map the chromosomal location of the mutation and successively narrow the size of the region for analysis of the causative element. Specifically, the wg-2 mutation was initially mapped to a 7 Mb region of chromosome 12 using an Illumina 3 K SNP array. Subsequent SNP genotyping and exon sequencing combined with analysis from improved genome assemblies narrowed the region of interest to a maximum size of 227 kb. Within this region, 3 validated and 3 predicted candidate genes are found, and these are described. The wg-2 mutation is a valuable resource to contribute to an improved understanding of the developmental pathways involved in chicken and avian limb development as well as serving as a model for human development, as the resulting syndrome shares features with human congenital disorders. PMID:29562287
2013-01-01
Background The apparent effect of a single nucleotide polymorphism (SNP) on phenotype depends on the linkage disequilibrium (LD) between the SNP and a quantitative trait locus (QTL). However, the phase of LD between a SNP and a QTL may differ between Bos indicus and Bos taurus because they diverged at least one hundred thousand years ago. Here, we test the hypothesis that the apparent effect of a SNP on a quantitative trait depends on whether the SNP allele is inherited from a Bos taurus or Bos indicus ancestor. Methods Phenotype data on one or more traits and SNP genotype data for 10 181 cattle from Bos taurus, Bos indicus and composite breeds were used. All animals had genotypes for 729 068 SNPs (real or imputed). Chromosome segments were classified as originating from B. indicus or B. taurus on the basis of the haplotype of SNP alleles they contained. Consequently, SNP alleles were classified according to their sub-species origin. Three models were used for the association study: (1) conventional GWAS (genome-wide association study), fitting a single SNP effect regardless of subspecies origin, (2) interaction GWAS, fitting an interaction between SNP and subspecies-origin, and (3) best variable GWAS, fitting the most significant combination of SNP and sub-species origin. Results Fitting an interaction between SNP and subspecies origin resulted in more significant SNPs (i.e. more power) than a conventional GWAS. Thus, the effect of a SNP depends on the subspecies that the allele originates from. Also, most QTL segregated in only one subspecies, suggesting that many mutations that affect the traits studied occurred after divergence of the subspecies or the mutation became fixed or was lost in one of the subspecies. Conclusions The results imply that GWAS and genomic selection could gain power by distinguishing SNP alleles based on their subspecies origin, and that only few QTL segregate in both B. indicus and B. taurus cattle. Thus, the QTL that segregate in current populations likely resulted from mutations that occurred in one of the subspecies and can have both positive and negative effects on the traits. There was no evidence that selection has increased the frequency of alleles that increase body weight. PMID:24168700
High-resolution genetic maps of Eucalyptus improve Eucalyptus grandis genome assembly.
Bartholomé, Jérôme; Mandrou, Eric; Mabiala, André; Jenkins, Jerry; Nabihoudine, Ibouniyamine; Klopp, Christophe; Schmutz, Jeremy; Plomion, Christophe; Gion, Jean-Marc
2015-06-01
Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome. © 2014 CIRAD. New Phytologist © 2014 New Phytologist Trust.
Genome-wide association study of acute post-surgical pain in humans
Kim, Hyungsuk; Ramsay, Edward; Lee, Hyewon; Wahl, Sharon; Dionne, Raymond A
2009-01-01
Aims Testing a relatively small genomic region with a few hundred SNPs provides limited information. Genome-wide association studies (GWAS) provide an opportunity to overcome the limitation of candidate gene association studies. Here, we report the results of a GWAS for the responses to an NSAID analgesic. Materials & methods European Americans (60 females and 52 males) undergoing oral surgery were genotyped with Affymetrix 500K SNP assay. Additional SNP genotyping was performed from the gene in linkage disequilibrium with the candidate SNP revealed by the GWAS. Results GWAS revealed a candidate SNP (rs2562456) associated with analgesic onset, which is in linkage disequilibrium with a gene encoding a zinc finger protein. Additional SNP genotyping of ZNF429 confirmed the association with analgesic onset in humans (p = 1.8 × 10−10, degrees of freedom = 103, F = 28.3). We also found candidate loci for the maximum post-operative pain rating (rs17122021, p = 6.9 × 10−7) and post-operative pain onset time (rs6693882, p = 2.1 × 10−6), however, correcting for multiple comparisons did not sustain these genetic associations. Conclusion GWAS for acute clinical pain followed by additional SNP genotyping of a neighboring gene suggests that genetic variations in or near the loci encoding DNA binding proteins play a role in the individual variations in responses to analgesic drugs. PMID:19207018
Roses, A D
2001-10-01
Pharmacogenetics is the variability of drug response due to inherited characteristics in individuals. Drug metabolizing enzymes have been studied for decades, first as chemical reactions and, more recently, as specific polymorphisms of known molecules. With the availability of whole-genome single-nucleotide polymorphism (SNP) maps, it will soon be possible to create an SNP profile for patients who experience adverse events (AEs) or who respond clinically to the medicine (efficacy). Proof-of-principle experiments have demonstrated that high density SNP maps in chromosomal regions of genetic linkage facilitate the identification of susceptibility disease genes. Whole-genome SNP mapping analyses aimed at determining linkage disequilibrium (LD) profiles along an ordered human genome backbone are in progress. SNP 'fingerprints' or SNP PRINTs(sm) will be used to identify patients at greater risk of an AE, or those patients with a greater chance of responding to a medicine. As LD maps for various ethnic populations are constructed, the number of SNPs necessary to measure for an individual will decrease. Standardized pharmacogenetic maps for drug registration and post-marketing surveillance will result in safer, more effective and more cost-efficient medicines. The timing of these pharmacogenetic applications will occur over the next 5 years. In contrast, the benefits of pharmacogenomic applications such as the identification of new tractable targets will not be visible as new medicines for 7-12 years, due to the lengthy drug development and registration processes.
SNP ID-info: SNP ID searching and visualization platform.
Yang, Cheng-Hong; Chuang, Li-Yeh; Cheng, Yu-Huei; Wen, Cheng-Hao; Chang, Phei-Lang; Chang, Hsueh-Wei
2008-09-01
Many association studies provide the relationship between single nucleotide polymorphisms (SNPs), diseases and cancers, without giving a SNP ID, however. Here, we developed the SNP ID-info freeware to provide the SNP IDs within inputting genetic and physical information of genomes. The program provides an "SNP-ePCR" function to generate the full-sequence using primers and template inputs. In "SNPosition," sequence from SNP-ePCR or direct input is fed to match the SNP IDs from SNP fasta-sequence. In "SNP search" and "SNP fasta" function, information of SNPs within the cytogenetic band, contig position, and keyword input are acceptable. Finally, the SNP ID neighboring environment for inputs is completely visualized in the order of contig position and marked with SNP and flanking hits. The SNP identification problems inherent in NCBI SNP BLAST are also avoided. In conclusion, the SNP ID-info provides a visualized SNP ID environment for multiple inputs and assists systematic SNP association studies. The server and user manual are available at http://bio.kuas.edu.tw/snpid-info.
Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification
Faye, Laura L.; Machiela, Mitchell J.; Kraft, Peter; Bull, Shelley B.; Sun, Lei
2013-01-01
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website. PMID:23950724
McIntosh, Laura A; Marion, Miranda C; Sudman, Marc; Comeau, Mary E; Becker, Mara L; Bohnsack, John F; Fingerlin, Tasha E; Griffin, Thomas A; Haas, J Peter; Lovell, Daniel J; Maier, Lisa A; Nigrovic, Peter A; Prahalad, Sampath; Punaro, Marilynn; Rosé, Carlos D; Wallace, Carol A; Wise, Carol A; Moncrieffe, Halima; Howard, Timothy D; Langefeld, Carl D; Thompson, Susan D
2017-11-01
Juvenile idiopathic arthritis (JIA) is the most common childhood rheumatic disease and has a strong genomic component. To date, JIA genetic association studies have had limited sample sizes, used heterogeneous patient populations, or included only candidate regions. The aim of this study was to identify new associations between JIA patients with oligoarticular disease and those with IgM rheumatoid factor (RF)-negative polyarticular disease, which are clinically similar and the most prevalent JIA disease subtypes. Three cohorts comprising 2,751 patients with oligoarticular or RF-negative polyarticular JIA were genotyped using the Affymetrix Genome-Wide SNP Array 6.0 or the Illumina HumanCoreExome-12+ Array. Overall, 15,886 local and out-of-study controls, typed on these platforms or the Illumina HumanOmni2.5, were used for association analyses. High-quality single-nucleotide polymorphisms (SNPs) were used for imputation to 1000 Genomes prior to SNP association analysis. Meta-analysis showed evidence of association (P < 1 × 10 -6 ) at 9 regions: PRR9_LOR (P = 5.12 × 10 -8 ), ILDR1_CD86 (P = 6.73 × 10 -8 ), WDFY4 (P = 1.79 × 10 -7 ), PTH1R (P = 1.87 × 10 -7 ), RNF215 (P = 3.09 × 10 -7 ), AHI1_LINC00271 (P = 3.48 × 10 -7 ), JAK1 (P = 4.18 × 10 -7 ), LINC00951 (P = 5.80 × 10 -7 ), and HBP1 (P = 7.29 × 10 -7 ). Of these, PRR9_LOR, ILDR1_CD86, RNF215, LINC00951, and HBP1 were shown, for the first time, to be autoimmune disease susceptibility loci. Furthermore, associated SNPs included cis expression quantitative trait loci for WDFY4, CCDC12, MTP18, SF3A1, AHI1, COG5, HBP1, and GPR22. This study provides evidence of both unique JIA risk loci and risk loci overlapping between JIA and other autoimmune diseases. These newly associated SNPs are shown to influence gene expression, and their bounding regions tie into molecular pathways of immunologic relevance. Thus, they likely represent regions that contribute to the pathology of oligoarticular JIA and RF-negative polyarticular JIA. © 2017, American College of Rheumatology.
A set of 14 DIP-SNP markers to detect unbalanced DNA mixtures.
Liu, Zhizhen; Liu, Jinding; Wang, Jiaqi; Chen, Deqing; Liu, Zidong; Shi, Jie; Li, Zeqin; Li, Wenyan; Zhang, Gengqian; Du, Bing
2018-03-04
Unbalanced DNA mixture is still a difficult problem for forensic practice. DIP-STRs are useful markers for detection of minor DNA but they are not widespread in the human genome and having long amplicons. In this study, we proposed a novel type of genetic marker, termed DIP-SNP. DIP-SNP refers to the combination of INDEL and SNP in less than 300bp length of human genome. The multiplex PCR and SNaPshot assay were established for 14 DIP-SNP markers in a Chinese Han population from Shanxi, China. This novel compound marker allows detection of the minor DNA contributor with sensitivity from 1:50 to 1:1000 in a DNA mixture of any gender with 1 ng-10 ng DNA template. Most of the DIP-SNP markers had a relatively high probability of informative alleles with an average I value of 0.33. In all, we proposed DIP-SNP as a novel kind of genetic marker for detection of minor contributor from unbalanced DNA mixture and established the detection method by associating the multiplex PCR and SNaPshot assay. DIP-SNP polymorphisms are promising markers for forensic or clinical mixture examination because they are shorter, widespread and higher sensitive. Copyright © 2018 Elsevier Inc. All rights reserved.
Kumar, Sunil; Ambrosini, Giovanna; Bucher, Philipp
2017-01-04
SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Bayesian antedependence model for whole genome prediction.
Yang, Wenzhao; Tempelman, Robert J
2012-04-01
Hierarchical mixed effects models have been demonstrated to be powerful for predicting genomic merit of livestock and plants, on the basis of high-density single-nucleotide polymorphism (SNP) marker panels, and their use is being increasingly advocated for genomic predictions in human health. Two particularly popular approaches, labeled BayesA and BayesB, are based on specifying all SNP-associated effects to be independent of each other. BayesB extends BayesA by allowing a large proportion of SNP markers to be associated with null effects. We further extend these two models to specify SNP effects as being spatially correlated due to the chromosomally proximal effects of causal variants. These two models, that we respectively dub as ante-BayesA and ante-BayesB, are based on a first-order nonstationary antedependence specification between SNP effects. In a simulation study involving 20 replicate data sets, each analyzed at six different SNP marker densities with average LD levels ranging from r(2) = 0.15 to 0.31, the antedependence methods had significantly (P < 0.01) higher accuracies than their corresponding classical counterparts at higher LD levels (r(2) > 0. 24) with differences exceeding 3%. A cross-validation study was also conducted on the heterogeneous stock mice data resource (http://mus.well.ox.ac.uk/mouse/HS/) using 6-week body weights as the phenotype. The antedependence methods increased cross-validation prediction accuracies by up to 3.6% compared to their classical counterparts (P < 0.001). Finally, we applied our method to other benchmark data sets and demonstrated that the antedependence methods were more accurate than their classical counterparts for genomic predictions, even for individuals several generations beyond the training data.
Impact of SNPs on Protein Phosphorylation Status in Rice (Oryza sativa L.).
Lin, Shoukai; Chen, Lijuan; Tao, Huan; Huang, Jian; Xu, Chaoqun; Li, Lin; Ma, Shiwei; Tian, Tian; Liu, Wei; Xue, Lichun; Ai, Yufang; He, Huaqin
2016-11-11
Single nucleotide polymorphisms (SNPs) are widely used in functional genomics and genetics research work. The high-quality sequence of rice genome has provided a genome-wide SNP and proteome resource. However, the impact of SNPs on protein phosphorylation status in rice is not fully understood. In this paper, we firstly updated rice SNP resource based on the new rice genome Ver. 7.0, then systematically analyzed the potential impact of Non-synonymous SNPs (nsSNPs) on the protein phosphorylation status. There were 3,897,312 SNPs in Ver. 7.0 rice genome, among which 9.9% was nsSNPs. Whilst, a total 2,508,261 phosphorylated sites were predicted in rice proteome. Interestingly, we observed that 150,197 (39.1%) nsSNPs could influence protein phosphorylation status, among which 52.2% might induce changes of protein kinase (PK) types for adjacent phosphorylation sites. We constructed a database, SNP_rice, to deposit the updated rice SNP resource and phosSNPs information. It was freely available to academic researchers at http://bioinformatics.fafu.edu.cn. As a case study, we detected five nsSNPs that potentially influenced heterotrimeric G proteins phosphorylation status in rice, indicating that genetic polymorphisms showed impact on the signal transduction by influencing the phosphorylation status of heterotrimeric G proteins. The results in this work could be a useful resource for future experimental identification and provide interesting information for better rice breeding.
Selection and Management of DNA Markers for Use in Genomic Evaluation
USDA-ARS?s Scientific Manuscript database
A database was constructed to store genotypes for 50,972 single-nucleotide polymorphisms (SNP) from the Illumina BovineSNP50 BeadChip for over 30,000 animals. The database allows storage of multiple samples per animal and stores all SNP genotypes for a sample in a single row. An indicator specifies ...
A Coordinated Approach to Peach SNP Discovery in RosBREED
USDA-ARS?s Scientific Manuscript database
In the USDA-funded multi-institutional and trans-disciplinary project, “RosBREED”, crop-specific SNP genome scan platforms are being developed for peach, apple, strawberry, and cherry at a resolution of at least one polymorphic SNP marker every 5 cM in any random cross, for use in Pedigree-Based Ana...
Al-Absi, Boshra; Razif, Muhammad F M; Noor, Suzita M; Saif-Ali, Riyadh; Aqlan, Mohammed; Salem, Sameer D; Ahmed, Radwan H; Muniandy, Sekaran
2017-10-01
Genome-wide and candidate gene association studies have previously revealed links between a predisposition to acute lymphoblastic leukemia (ALL) and genetic polymorphisms in the following genes: IKZF1 (7p12.2; ID: 10320), DDC (7p12.2; ID: 1644), CDKN2A (9p21.3; ID: 1029), CEBPE (14q11.2; ID: 1053), and LMO1 (11p15; ID: 4004). In this study, we aimed to conduct an investigation into the possible association between polymorphisms in these genes and ALL within a sample of Yemeni children of Arab-Asian descent. Seven single-nucleotide polymorphisms (SNPs) in IKZF1, three SNPs in DDC, two SNPs in CDKN2A, two SNPs in CEBPE, and three SNPs in LMO1 were genotyped in 289 Yemeni children (136 cases and 153 controls), using the nanofluidic Dynamic Array (Fluidigm 192.24 Dynamic Array). Logistic regression analyses were used to estimate ALL risk, and the strength of association was expressed as odds ratios with 95% confidence intervals. We found that the IKZF1 SNP rs10235796 C allele (p = 0.002), the IKZF1 rs6964969 A>G polymorphism (p = 0.048, GG vs. AA), the CDKN2A rs3731246 G>C polymorphism (p = 0.047, GC+CC vs. GG), and the CDKN2A SNP rs3731246 C allele (p = 0.007) were significantly associated with ALL in Yemenis of Arab-Asian descent. In addition, a borderline association was found between IKZF1 rs4132601 T>G variant and ALL risk. No associations were found between the IKZF1 SNPs (rs11978267; rs7789635), DDC SNPs (rs3779084; rs880028; rs7809758), CDKN2A SNP (rs3731217), the CEBPE SNPs (rs2239633; rs12434881) and LMO1 SNPs (rs442264; rs3794012; rs4237770) with ALL in Yemeni children. The IKZF1 SNPs, rs10235796 and rs6964969, and the CDKN2A SNP rs3731246 (previously unreported) could serve as risk markers for ALL susceptibility in Yemeni children.
Song, H; Li, L; Ma, P; Zhang, S; Su, G; Lund, M S; Zhang, Q; Ding, X
2018-06-01
This study investigated the efficiency of genomic prediction with adding the markers identified by genome-wide association study (GWAS) using a data set of imputed high-density (HD) markers from 54K markers in Chinese Holsteins. Among 3,056 Chinese Holsteins with imputed HD data, 2,401 individuals born before October 1, 2009, were used for GWAS and a reference population for genomic prediction, and the 220 younger cows were used as a validation population. In total, 1,403, 1,536, and 1,383 significant single nucleotide polymorphisms (SNP; false discovery rate at 0.05) associated with conformation final score, mammary system, and feet and legs were identified, respectively. About 2 to 3% genetic variance of 3 traits was explained by these significant SNP. Only a very small proportion of significant SNP identified by GWAS was included in the 54K marker panel. Three new marker sets (54K+) were herein produced by adding significant SNP obtained by linear mixed model for each trait into the 54K marker panel. Genomic breeding values were predicted using a Bayesian variable selection (BVS) model. The accuracies of genomic breeding value by BVS based on the 54K+ data were 2.0 to 5.2% higher than those based on the 54K data. The imputed HD markers yielded 1.4% higher accuracy on average (BVS) than the 54K data. Both the 54K+ and HD data generated lower bias of genomic prediction, and the 54K+ data yielded the lowest bias in all situations. Our results show that the imputed HD data were not very useful for improving the accuracy of genomic prediction and that adding the significant markers derived from the imputed HD marker panel could improve the accuracy of genomic prediction and decrease the bias of genomic prediction. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Candidate loci involved in domestication and improvement detected by a published 90K wheat SNP array
Gao, Lifeng; Zhao, Guangyao; Huang, Dawei; Jia, Jizeng
2017-01-01
Selection is one of the most important forces in crop evolution. Common wheat is a major world food crop and a typical allopolyploid with a huge and complex genome. We applied four approaches to detect loci selected in wheat during domestication and improvement. A total of 7,984 candidate loci were detected, accounting for 23.3% of all 34,317 SNPs analysed, a much higher proportion than estimated in previous reports. We constructed a first generation wheat selection map which revealed the following new insights on genome-wide selection: (1) diversifying selection acted by increasing, decreasing or not affecting gene frequencies; (2) the number of loci under selection during domestication was much higher than that during improvement; (3) the contribution to wheat improvement by the D sub-genome was relatively small due to the bottleneck of hexaploidisation and diversity can be expanded by using synthetic wheat and introgression lines; and (4) clustered selection regions occur throughout the wheat genome, including the centromere regions. This study will not only help future wheat breeding and evolutionary studies, but will also accelerate study of other crops, especially polyploids. PMID:28327671
Arenillas, Leonor; Mallo, Mar; Ramos, Fernando; Guinta, Kathryn; Barragán, Eva; Lumbreras, Eva; Larráyoz, María-José; De Paz, Raquel; Tormo, Mar; Abáigar, María; Pedro, Carme; Cervera, José; Such, Esperanza; José Calasanz, María; Díez-Campelo, María; Sanz, Guillermo F; Hernández, Jesús María; Luño, Elisa; Saumell, Sílvia; Maciejewski, Jaroslaw; Florensa, Lourdes; Solé, Francesc
2013-12-01
Cytogenetic aberrations identified by metaphase cytogenetics (MC) have diagnostic, prognostic, and therapeutic implications in myelodysplastic syndromes (MDS). However, in some MDS patients MC study is unsuccesful. Single nucleotide polymorphism array (SNP-A) based karyotyping could be helpful in these cases. We performed SNP-A in 62 samples from bone marrow or peripheral blood of primary MDS with an unsuccessful MC study. SNP-A analysis enabled the detection of aberrations in 31 (50%) patients. We used the copy number alteration information to apply the International Prognostic Scoring System (IPSS) and we observed differences in survival between the low/intermediate-1 and intermediate-2/high risk patients. We also saw differences in survival between very low/low/intermediate and the high/very high patients when we applied the revised IPSS (IPSS-R). In conclusion, SNP-A can be used successfully in PB samples and the identification of CNA by SNP-A improve the diagnostic and prognostic evaluation of this group of MDS patients. Copyright © 2013 Wiley Periodicals, Inc.
BlueSNP: R package for highly scalable genome-wide association studies using Hadoop clusters.
Huang, Hailiang; Tata, Sandeep; Prill, Robert J
2013-01-01
Computational workloads for genome-wide association studies (GWAS) are growing in scale and complexity outpacing the capabilities of single-threaded software designed for personal computers. The BlueSNP R package implements GWAS statistical tests in the R programming language and executes the calculations across computer clusters configured with Apache Hadoop, a de facto standard framework for distributed data processing using the MapReduce formalism. BlueSNP makes computationally intensive analyses, such as estimating empirical p-values via data permutation, and searching for expression quantitative trait loci over thousands of genes, feasible for large genotype-phenotype datasets. http://github.com/ibm-bioinformatics/bluesnp
Genome-wide Association Mapping of Qualitatively Inherited Traits in a Germplasm Collection.
Bandillo, Nonoy B; Lorenz, Aaron J; Graef, George L; Jarquin, Diego; Hyten, David L; Nelson, Randall L; Specht, James E
2017-07-01
Genome-wide association (GWA) has been used as a tool for dissecting the genetic architecture of quantitatively inherited traits. We demonstrate here that GWA can also be highly useful for detecting many major genes governing categorically defined phenotype variants that exist for qualitatively inherited traits in a germplasm collection. Genome-wide association mapping was applied to categorical phenotypic data available for 10 descriptive traits in a collection of ∼13,000 soybean [ (L.) Merr.] accessions that had been genotyped with a 50,000 single nucleotide polymorphism (SNP) chip. A GWA on a panel of accessions of this magnitude can offer substantial statistical power and mapping resolution, and we found that GWA mapping resulted in the identification of strong SNP signals for 24 classical genes as well as several heretofore unknown genes controlling the phenotypic variants in those traits. Because some of these genes had been cloned, we were able to show that the narrow GWA mapping SNP signal regions that we detected for the phenotypic variants had chromosomal bp spans that, with just one exception, overlapped the bp region of the cloned genes, despite local variation in SNP number and nonuniform SNP distribution in the chip set. Copyright © 2017 Crop Science Society of America.
Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui
2016-01-01
The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. © The Author(s) 2016. Published by Oxford University Press.
Biotechnology and apple breeding in Japan
Igarashi, Megumi; Hatsuyama, Yoshimichi; Harada, Takeo; Fukasawa-Akada, Tomoko
2016-01-01
Apple is a fruit crop of significant economic importance, and breeders world wide continue to develop novel cultivars with improved characteristics. The lengthy juvenile period and the large field space required to grow apple populations have imposed major limitations on breeding. Various molecular biological techniques have been employed to make apple breeding easier. Transgenic technology has facilitated the development of apples with resistance to fungal or bacterial diseases, improved fruit quality, or root stocks with better rooting or dwarfing ability. DNA markers for disease resistance (scab, powdery mildew, fire-blight, Alternaria blotch) and fruit skin color have also been developed, and marker-assisted selection (MAS) has been employed in breeding programs. In the last decade, genomic sequences and chromosome maps of various cultivars have become available, allowing the development of large SNP arrays, enabling efficient QTL mapping and genomic selection (GS). In recent years, new technologies for genetic improvement, such as trans-grafting, virus vectors, and genome-editing, have emerged. Using these techniques, no foreign genes are present in the final product, and some of them show considerable promise for application to apple breeding. PMID:27069388
Biotechnology and apple breeding in Japan.
Igarashi, Megumi; Hatsuyama, Yoshimichi; Harada, Takeo; Fukasawa-Akada, Tomoko
2016-01-01
Apple is a fruit crop of significant economic importance, and breeders world wide continue to develop novel cultivars with improved characteristics. The lengthy juvenile period and the large field space required to grow apple populations have imposed major limitations on breeding. Various molecular biological techniques have been employed to make apple breeding easier. Transgenic technology has facilitated the development of apples with resistance to fungal or bacterial diseases, improved fruit quality, or root stocks with better rooting or dwarfing ability. DNA markers for disease resistance (scab, powdery mildew, fire-blight, Alternaria blotch) and fruit skin color have also been developed, and marker-assisted selection (MAS) has been employed in breeding programs. In the last decade, genomic sequences and chromosome maps of various cultivars have become available, allowing the development of large SNP arrays, enabling efficient QTL mapping and genomic selection (GS). In recent years, new technologies for genetic improvement, such as trans-grafting, virus vectors, and genome-editing, have emerged. Using these techniques, no foreign genes are present in the final product, and some of them show considerable promise for application to apple breeding.
Yang, Jian; Bakshi, Andrew; Zhu, Zhihong; Hemani, Gibran; Vinkhuyzen, Anna A E; Lee, Sang Hong; Robinson, Matthew R; Perry, John R B; Nolte, Ilja M; van Vliet-Ostaptchouk, Jana V; Snieder, Harold; Esko, Tonu; Milani, Lili; Mägi, Reedik; Metspalu, Andres; Hamsten, Anders; Magnusson, Patrik K E; Pedersen, Nancy L; Ingelsson, Erik; Soranzo, Nicole; Keller, Matthew C; Wray, Naomi R; Goddard, Michael E; Visscher, Peter M
2015-10-01
We propose a method (GREML-LDMS) to estimate heritability for human complex traits in unrelated individuals using whole-genome sequencing data. We demonstrate using simulations based on whole-genome sequencing data that ∼97% and ∼68% of variation at common and rare variants, respectively, can be captured by imputation. Using the GREML-LDMS method, we estimate from 44,126 unrelated individuals that all ∼17 million imputed variants explain 56% (standard error (s.e.) = 2.3%) of variance for height and 27% (s.e. = 2.5%) of variance for body mass index (BMI), and we find evidence that height- and BMI-associated variants have been under natural selection. Considering the imperfect tagging of imputation and potential overestimation of heritability from previous family-based studies, heritability is likely to be 60-70% for height and 30-40% for BMI. Therefore, the missing heritability is small for both traits. For further discovery of genes associated with complex traits, a study design with SNP arrays followed by imputation is more cost-effective than whole-genome sequencing at current prices.
Atopic dermatitis in West Highland white terriers is associated with a 1.3-Mb region on CFA 17.
Roque, Joana B; O'Leary, Caroline A; Duffy, David L; Kyaw-Tanner, Myat; Gharahkhani, Puya; Vogelnest, Linda; Mason, Kenneth; Shipstone, Michael; Latter, Melanie
2012-03-01
Canine atopic dermatitis (AD) is an allergic inflammatory skin disease that shares similarities with AD in humans. Canine AD is likely to be an inherited disease in dogs and is common in West Highland white terriers (WHWTs). We performed a genome-wide association study using the Affymetrix Canine SNP V2 array consisting of over 42,800 single nucleotide polymorphisms, on 35 atopic and 25 non-atopic WHWTs. A gene-dropping simulation method, using SIB-PAIR, identified a projected 1.3 Mb area of association (genome-wide P = 6 × 10(-5) to P = 7 × 10(-4)) on CFA 17. Nineteen genes on CFA 17, including 1 potential candidate gene (PTPN22), were located less than 0.5 Mb from the interval of association identified on the genome-wide association analysis. Four haplotypes within this locus were differently distributed between cases and controls in this population of dogs. These findings suggest that a major locus for canine AD in WHWTs may be located on, or in close proximity to an area on CFA 17.
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie
2014-06-18
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.« less
Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie
2014-05-12
Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.« less
GStream: Improving SNP and CNV Coverage on Genome-Wide Association Studies
Alonso, Arnald; Marsal, Sara; Tortosa, Raül; Canela-Xandri, Oriol; Julià, Antonio
2013-01-01
We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. PMID:23844243
Taranto, F; D'Agostino, N; Greco, B; Cardi, T; Tripodi, P
2016-11-21
Knowledge on population structure and genetic diversity in vegetable crops is essential for association mapping studies and genomic selection. Genotyping by sequencing (GBS) represents an innovative method for large scale SNP detection and genotyping of genetic resources. Herein we used the GBS approach for the genome-wide identification of SNPs in a collection of Capsicum spp. accessions and for the assessment of the level of genetic diversity in a subset of 222 cultivated pepper (Capsicum annum) genotypes. GBS analysis generated a total of 7,568,894 master tags, of which 43.4% uniquely aligned to the reference genome CM334. A total of 108,591 SNP markers were identified, of which 105,184 were in C. annuum accessions. In order to explore the genetic diversity of C. annuum and to select a minimal core set representing most of the total genetic variation with minimum redundancy, a subset of 222 C. annuum accessions were analysed using 32,950 high quality SNPs. Based on Bayesian and Hierarchical clustering it was possible to divide the collection into three clusters. Cluster I had the majority of varieties and landraces mainly from Southern and Northern Italy, and from Eastern Europe, whereas clusters II and III comprised accessions of different geographical origins. Considering the genome-wide genetic variation among the accessions included in cluster I, a second round of Bayesian (K = 3) and Hierarchical (K = 2) clustering was performed. These analysis showed that genotypes were grouped not only based on geographical origin, but also on fruit-related features. GBS data has proven useful to assess the genetic diversity in a collection of C. annuum accessions. The high number of SNP markers, uniformly distributed on the 12 chromosomes, allowed the accessions to be distinguished according to geographical origin and fruit-related features. SNP markers and information on population structure developed in this study will undoubtedly support genome-wide association mapping studies and marker-assisted selection programs.
Genomic imbalances in esophageal carcinoma cell lines involve Wnt pathway genes.
Brown, Jacqueline; Bothma, Hannelie; Veale, Robin; Willem, Pascale
2011-06-28
To identify molecular markers shared across South African esophageal squamous cell carcinoma (ESCC) cell lines using cytogenetics, fluorescence in situ hybridization (FISH) and single nucleotide polymorphism (SNP) array copy number analysis. We used conventional cytogenetics, FISH, and multicolor FISH to characterize the chromosomal rearrangements of five ESCC cell lines established in South Africa. The whole genome copy number profile was established from 250K SNP arrays, and data was analyzed with the CNAT 4.0 and GISTIC software. We detected common translocation breakpoints involving chromosomes 1p11-12 and 3p11.2, the latter correlated with the deletion, or interruption of the EPHA3 gene. The most significant amplifications involved the following chromosomal regions and genes: 11q13.3 (CCND1, FGF3, FGF4, FGF19, MYEOV), 8q24.21(C-MYC, FAM84B), 11q22.1-q22.3 (BIRC2, BIRC3), 5p15.2 (CTNND2), 3q11.2-q12.2 (MINA) and 18p11.32 (TYMS, YES1). The significant deletions included 1p31.2-p31.1 (CTH, GADD45α, DIRAS3), 2q22.1 (LRP1B), 3p12.1-p14.2 (FHIT), 4q22.1-q32.1 (CASP6, SMAD1), 8p23.2-q11.1 (BNIP3L) and 18q21.1-q21.2 (SMAD4, DCC). The 3p11.2 translocation breakpoint was shared across four cell lines, supporting a role for genes involved at this site, in particular, the EPHA3 gene which has previously been reported to be deleted in ESCC. The finding that a significant number of genes that were amplified (FGF3, FGF4, FGF19, CCND1 and C-MYC) or deleted (SFRP2 gene) are involved in the Wnt and fibroblast growth factor signaling pathways, suggests that these pathways may be activated in these cell lines.
Genomic imbalances in esophageal carcinoma cell lines involve Wnt pathway genes
Brown, Jacqueline; Bothma, Hannelie; Veale, Robin; Willem, Pascale
2011-01-01
AIM: To identify molecular markers shared across South African esophageal squamous cell carcinoma (ESCC) cell lines using cytogenetics, fluorescence in situ hybridization (FISH) and single nucleotide polymorphism (SNP) array copy number analysis. METHODS: We used conventional cytogenetics, FISH, and multicolor FISH to characterize the chromosomal rearrangements of five ESCC cell lines established in South Africa. The whole genome copy number profile was established from 250K SNP arrays, and data was analyzed with the CNAT 4.0 and GISTIC software. RESULTS: We detected common translocation breakpoints involving chromosomes 1p11-12 and 3p11.2, the latter correlated with the deletion, or interruption of the EPHA3 gene. The most significant amplifications involved the following chromosomal regions and genes: 11q13.3 (CCND1, FGF3, FGF4, FGF19, MYEOV), 8q24.21(C-MYC, FAM84B), 11q22.1-q22.3 (BIRC2, BIRC3), 5p15.2 (CTNND2), 3q11.2-q12.2 (MINA) and 18p11.32 (TYMS, YES1). The significant deletions included 1p31.2-p31.1 (CTH, GADD45α, DIRAS3), 2q22.1 (LRP1B), 3p12.1-p14.2 (FHIT), 4q22.1-q32.1 (CASP6, SMAD1), 8p23.2-q11.1 (BNIP3L) and 18q21.1-q21.2 (SMAD4, DCC). The 3p11.2 translocation breakpoint was shared across four cell lines, supporting a role for genes involved at this site, in particular, the EPHA3 gene which has previously been reported to be deleted in ESCC. CONCLUSION: The finding that a significant number of genes that were amplified (FGF3, FGF4, FGF19, CCND1 and C-MYC) or deleted (SFRP2 gene) are involved in the Wnt and fibroblast growth factor signaling pathways, suggests that these pathways may be activated in these cell lines. PMID:21734802
Bertelsen, H P; Gregersen, V R; Poulsen, N; Nielsen, R O; Das, A; Madsen, L B; Buitenhuis, A J; Holm, L-E; Panitz, F; Larsen, L B; Bendixen, C
2016-04-01
Rennet-induced milk coagulation is an important trait for cheese production. Recent studies have reported an alarming frequency of cows producing poorly coagulating milk unsuitable for cheese production. Several genetic factors are known to affect milk coagulation, including variation in the major milk proteins; however, recent association studies indicate genetic effects from other genomic regions as well. The aim of this study was to detect genetic variation affecting milk coagulation properties, measured as curd-firming rate (CFR) and milk pH. This was achieved by examining allele frequency differences between pooled whole-genome sequences of phenotypically extreme samples (pool-seq).. Curd-firming rate and raw milk pH were measured for 415 Danish Holstein cows, and each animal was sequenced at low coverage. Pools were created containing whole genome sequence reads from samples with "extreme" values (high or low) for both phenotypic traits. A total of 6,992,186 and 5,295,501 SNP were assessed in relation to CFR and milk pH, respectively. Allele frequency differences were calculated between pools and 32 significantly different SNP were detected, 1 for milk pH and 31 for CFR, of which 19 are located on chromosome 6. A total of 9 significant SNP, which were selected based on the possible function of proximal candidate genes, were genotyped in the entire sample set ( = 415) to test for an association. The most significant SNP was located proximal to , explaining 33% of the phenotypic variance. , coding for κ-casein, is the most studied in relation to milk coagulation due to its position on the surface of the casein micelles and the direct involvement in milk coagulation. Three additional SNP located on chromosome 6 showed significant associations explaining 7, 3.6, and 1.3% of the phenotypic variance of CFR. The significant SNP on chromosome 6 were shown to be in linkage disequilibrium with the SNP peaking proximal to ; however, after accounting for the genotype of the peak SNP within this QTL, significant effects (-value < 0.1) could still be detected for 2 of the SNP accounting for 2 and 1% of the phenotypic variance. These 2 interesting SNP were located within introns or proximal to the candidate genes-solute carrier family 4 (sodium bicarbonate cotransporter), member 4 () and LIM and calponin homology domains 1 (), respectively-making them interesting targets for further analysis.
Holland, Heidrun; Ahnert, Peter; Koschny, Ronald; Kirsten, Holger; Bauer, Manfred; Schober, Ralf; Meixensberger, Jürgen; Fritzsch, Dominik; Krupp, Wolfgang
2012-06-15
Astrocytomas represent the largest and most common subgroup of brain tumors. Anaplastic astrocytoma (WHO grade III) may arise from low-grade diffuse astrocytoma (WHO grade II) or as primary tumors without any precursor lesion. Comprehensive analyses of anaplastic astrocytomas combining both cytogenetic and molecular cytogenetic techniques are rare. Therefore, we analyzed genomic alterations of five anaplastic astrocytomas using high-density single nucleotide polymorphism arrays combined with GTG-banding and FISH-techniques. By cytogenetics, we found 169 structural chromosomal aberrations most frequently involving chromosomes 1, 2, 3, 4, 10, and 12, including two not previously described alterations, a nonreciprocal translocation t(3;11)(p12;q13), and one interstitial chromosomal deletion del(2)(q21q31). Additionally, we detected previously not documented loss of heterozygosity (LOH) without copy number changes in 4/5 anaplastic astrocytomas on chromosome regions 5q11.2, 5q22.1, 6q21, 7q21.11, 7q31.33, 8q11.22, 14q21.1, 17q21.31, and 17q22, suggesting segmental uniparental disomy (UPD), applying high-density single nucleotide polymorphism arrays. UPDs are currently considered to play an important role in the initiation and progression of different malignancies. The significance of previously not described genetic alterations in anaplastic astrocytomas presented here needs to be confirmed in a larger series. Copyright © 2012 Elsevier GmbH. All rights reserved.
Metzger, Julia; Ohnesorge, Bernhard; Distl, Ottmar
2012-01-01
Equine guttural pouch tympany (GPT) is a hereditary condition affecting foals in their first months of life. Complex segregation analyses in Arabian and German warmblood horses showed the involvement of a major gene as very likely. Genome-wide linkage and association analyses including a high density marker set of single nucleotide polymorphisms (SNPs) were performed to map the genomic region harbouring the potential major gene for GPT. A total of 85 Arabian and 373 German warmblood horses were genotyped on the Illumina equine SNP50 beadchip. Non-parametric multipoint linkage analyses showed genome-wide significance on horse chromosomes (ECA) 3 for German warmblood at 16–26 Mb and 34–55 Mb and for Arabian on ECA15 at 64–65 Mb. Genome-wide association analyses confirmed the linked regions for both breeds. In Arabian, genome-wide association was detected at 64 Mb within the region with the highest linkage peak on ECA15. For German warmblood, signals for genome-wide association were close to the peak region of linkage at 52 Mb on ECA3. The odds ratio for the SNP with the highest genome-wide association was 0.12 for the Arabian. In conclusion, the refinement of the regions with the Illumina equine SNP50 beadchip is an important step to unravel the responsible mutations for GPT. PMID:22848553
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing
2014-01-01
Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well as the identification of putative genes proximal to the SNPs. Differences in the distribution of recombination events between the sexes is evident, and regions of homeology have been identified which are reflective of the recent salmonid whole genome duplication. PMID:24571138
2014-01-01
Background Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories. Methods 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,193 SNPs were conducted using UNPHASED, which combines information across families and unrelated individuals. We attempted to replicate signals found in 23 genomic regions using existing data on nonoverlapping samples from the Psychiatric GWAS Consortium and Schizophrenia-GENE-plus cohorts (10,352 schizophrenia patients and 24,474 controls). Results No individual SNP showed compelling evidence for association with psychosis in our data. However, we observed a trend for association with same risk alleles at loci previously associated with schizophrenia (one-sided p = .003). A polygenic score analysis found that the Psychiatric GWAS Consortium’s panel of SNPs associated with schizophrenia significantly predicted disease status in our sample (p = 5 × 10–14) and explained approximately 2% of the phenotypic variance. Conclusions Although narrowly defined phenotypes have their advantages, we believe new loci may also be discovered through meta-analysis across broad phenotypes. The novel statistical methodology we introduced to model effect size heterogeneity between studies should help future GWAS that combine association evidence from related phenotypes. Applying these approaches, we highlight three loci that warrant further investigation. We found that SNPs conveying risk for schizophrenia are also predictive of disease status in our data. PMID:23871474
Verma, Sujeet; Zurn, Jason D; Salinas, Natalia; Mathey, Megan M; Denoyes, Beatrice; Hancock, James F; Finn, Chad E; Bassil, Nahla V; Whitaker, Vance M
2017-01-01
The cultivated strawberry (Fragaria×ananassa) is consumed worldwide for its flavor and nutritional benefits. Genetic analysis of commercially important traits in strawberry are important for the development of breeding methods and tools for this species. Although several quantitative trait loci (QTL) have been previously detected for fruit quality and flowering traits using low-density genetic maps, clarity on the sub-genomic locations of these QTLs was missing. Recent discoveries in allo-octoploid strawberry genomics led to the development of the IStraw90 single-nucleotide polymorphism (SNP) array, enabling high-density genetic maps and finer resolution QTL analysis. In this study, breeder-specified traits were evaluated in the Eastern (Michigan) and Western (Oregon) United States for a common set of breeding populations during 2 years. Several QTLs were validated for soluble solids content (SSC), fruit weight (FWT), pH and titratable acidity (TA) using a pedigree-based QTL analysis approach. For fruit quality, a QTL for SSC on linkage group (LG) 6A, a QTL for FWT on LG 2BII, a QTL for pH on LG 4CII and two QTLs for TA on LGs 2A and 5B were detected. In addition, a large-effect QTL for flowering was detected at the distal end of LG 4A, coinciding with the FaPFRU locus. Marker haplotype analysis in the FaPFRU region indicated that the homozygous recessive genotype was highly predictive of seasonal flowering. SNP probes in the FaPFRU region may help facilitate marker-assisted selection for this trait. PMID:29138689
Identification of Allelic Imbalance with a Statistical Model for Subtle Genomic Mosaicism
Xia, Rui; Vattathil, Selina; Scheet, Paul
2014-01-01
Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual's germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual's inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail. PMID:25166618
Genome-Wide Analysis Reveals Selection for Important Traits in Domestic Horse Breeds
Petersen, Jessica L.; Mickelson, James R.; Rendahl, Aaron K.; Valberg, Stephanie J.; Andersson, Lisa S.; Axelsson, Jeanette; Bailey, Ernie; Bannasch, Danika; Binns, Matthew M.; Borges, Alexandre S.; Brama, Pieter; da Câmara Machado, Artur; Capomaccio, Stefano; Cappelli, Katia; Cothran, E. Gus; Distl, Ottmar; Fox-Clipsham, Laura; Graves, Kathryn T.; Guérin, Gérard; Haase, Bianca; Hasegawa, Telhisa; Hemmann, Karin; Hill, Emmeline W.; Leeb, Tosso; Lindgren, Gabriella; Lohi, Hannes; Lopes, Maria Susana; McGivney, Beatrice A.; Mikko, Sofia; Orr, Nicholas; Penedo, M. Cecilia T.; Piercy, Richard J.; Raekallio, Marja; Rieder, Stefan; Røed, Knut H.; Swinburne, June; Tozaki, Teruaki; Vaudin, Mark; Wade, Claire M.; McCue, Molly E.
2013-01-01
Intense selective pressures applied over short evolutionary time have resulted in homogeneity within, but substantial variation among, horse breeds. Utilizing this population structure, 744 individuals from 33 breeds, and a 54,000 SNP genotyping array, breed-specific targets of selection were identified using an FST-based statistic calculated in 500-kb windows across the genome. A 5.5-Mb region of ECA18, in which the myostatin (MSTN) gene was centered, contained the highest signature of selection in both the Paint and Quarter Horse. Gene sequencing and histological analysis of gluteal muscle biopsies showed a promoter variant and intronic SNP of MSTN were each significantly associated with higher Type 2B and lower Type 1 muscle fiber proportions in the Quarter Horse, demonstrating a functional consequence of selection at this locus. Signatures of selection on ECA23 in all gaited breeds in the sample led to the identification of a shared, 186-kb haplotype including two doublesex related mab transcription factor genes (DMRT2 and 3). The recent identification of a DMRT3 mutation within this haplotype, which appears necessary for the ability to perform alternative gaits, provides further evidence for selection at this locus. Finally, putative loci for the determination of size were identified in the draft breeds and the Miniature horse on ECA11, as well as when signatures of selection surrounding candidate genes at other loci were examined. This work provides further evidence of the importance of MSTN in racing breeds, provides strong evidence for selection upon gait and size, and illustrates the potential for population-based techniques to find genomic regions driving important phenotypes in the modern horse. PMID:23349635
Verma, Sujeet; Zurn, Jason D; Salinas, Natalia; Mathey, Megan M; Denoyes, Beatrice; Hancock, James F; Finn, Chad E; Bassil, Nahla V; Whitaker, Vance M
2017-01-01
The cultivated strawberry ( Fragaria × ananassa ) is consumed worldwide for its flavor and nutritional benefits. Genetic analysis of commercially important traits in strawberry are important for the development of breeding methods and tools for this species. Although several quantitative trait loci (QTL) have been previously detected for fruit quality and flowering traits using low-density genetic maps, clarity on the sub-genomic locations of these QTLs was missing. Recent discoveries in allo-octoploid strawberry genomics led to the development of the IStraw90 single-nucleotide polymorphism (SNP) array, enabling high-density genetic maps and finer resolution QTL analysis. In this study, breeder-specified traits were evaluated in the Eastern (Michigan) and Western (Oregon) United States for a common set of breeding populations during 2 years. Several QTLs were validated for soluble solids content (SSC), fruit weight (FWT), pH and titratable acidity (TA) using a pedigree-based QTL analysis approach. For fruit quality, a QTL for SSC on linkage group (LG) 6A, a QTL for FWT on LG 2BII, a QTL for pH on LG 4CII and two QTLs for TA on LGs 2A and 5B were detected. In addition, a large-effect QTL for flowering was detected at the distal end of LG 4A, coinciding with the FaPFRU locus. Marker haplotype analysis in the FaPFRU region indicated that the homozygous recessive genotype was highly predictive of seasonal flowering. SNP probes in the FaPFRU region may help facilitate marker-assisted selection for this trait.
Bramon, Elvira; Pirinen, Matti; Strange, Amy; Lin, Kuang; Freeman, Colin; Bellenguez, Céline; Su, Zhan; Band, Gavin; Pearson, Richard; Vukcevic, Damjan; Langford, Cordelia; Deloukas, Panos; Hunt, Sarah; Gray, Emma; Dronov, Serge; Potter, Simon C; Tashakkori-Ghanbaria, Avazeh; Edkins, Sarah; Bumpstead, Suzannah J; Arranz, Maria J; Bakker, Steven; Bender, Stephan; Bruggeman, Richard; Cahn, Wiepke; Chandler, David; Collier, David A; Crespo-Facorro, Benedicto; Dazzan, Paola; de Haan, Lieuwe; Di Forti, Marta; Dragović, Milan; Giegling, Ina; Hall, Jeremy; Iyegbe, Conrad; Jablensky, Assen; Kahn, René S; Kalaydjieva, Luba; Kravariti, Eugenia; Lawrie, Stephen; Linszen, Don H; Mata, Ignacio; McDonald, Colm; McIntosh, Andrew; Myin-Germeys, Inez; Ophoff, Roel A; Pariante, Carmine M; Paunio, Tiina; Picchioni, Marco; Ripke, Stephan; Rujescu, Dan; Sauer, Heinrich; Shaikh, Madiha; Sussmann, Jessika; Suvisaari, Jaana; Tosato, Sarah; Toulopoulou, Timothea; Van Os, Jim; Walshe, Muriel; Weisbrod, Matthias; Whalley, Heather; Wiersma, Durk; Blackwell, Jenefer M; Brown, Matthew A; Casas, Juan P; Corvin, Aiden; Duncanson, Audrey; Jankowski, Janusz A Z; Markus, Hugh S; Mathew, Christopher G; Palmer, Colin N A; Plomin, Robert; Rautanen, Anna; Sawcer, Stephen J; Trembath, Richard C; Wood, Nicholas W; Barroso, Ines; Peltonen, Leena; Lewis, Cathryn M; Murray, Robin M; Donnelly, Peter; Powell, John; Spencer, Chris C A
2014-03-01
Genome-wide association studies (GWAS) have identified several loci associated with schizophrenia and/or bipolar disorder. We performed a GWAS of psychosis as a broad syndrome rather than within specific diagnostic categories. 1239 cases with schizophrenia, schizoaffective disorder, or psychotic bipolar disorder; 857 of their unaffected relatives, and 2739 healthy controls were genotyped with the Affymetrix 6.0 single nucleotide polymorphism (SNP) array. Analyses of 695,193 SNPs were conducted using UNPHASED, which combines information across families and unrelated individuals. We attempted to replicate signals found in 23 genomic regions using existing data on nonoverlapping samples from the Psychiatric GWAS Consortium and Schizophrenia-GENE-plus cohorts (10,352 schizophrenia patients and 24,474 controls). No individual SNP showed compelling evidence for association with psychosis in our data. However, we observed a trend for association with same risk alleles at loci previously associated with schizophrenia (one-sided p = .003). A polygenic score analysis found that the Psychiatric GWAS Consortium's panel of SNPs associated with schizophrenia significantly predicted disease status in our sample (p = 5 × 10(-14)) and explained approximately 2% of the phenotypic variance. Although narrowly defined phenotypes have their advantages, we believe new loci may also be discovered through meta-analysis across broad phenotypes. The novel statistical methodology we introduced to model effect size heterogeneity between studies should help future GWAS that combine association evidence from related phenotypes. Applying these approaches, we highlight three loci that warrant further investigation. We found that SNPs conveying risk for schizophrenia are also predictive of disease status in our data. Copyright © 2014 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Extended diversity analysis of cultivated grapevine Vitis vinifera with 10K genome-wide SNPs.
Laucou, Valérie; Launay, Amandine; Bacilieri, Roberto; Lacombe, Thierry; Adam-Blondon, Anne-Françoise; Bérard, Aurélie; Chauveau, Aurélie; de Andrés, Maria Teresa; Hausmann, Ludger; Ibáñez, Javier; Le Paslier, Marie-Christine; Maghradze, David; Martinez-Zapater, José Miguel; Maul, Erika; Ponnaiah, Maharajah; Töpfer, Reinhard; Péros, Jean-Pierre; Boursiquot, Jean-Michel
2018-01-01
Grapevine is a very important crop species that is mainly cultivated worldwide for fruits, wine and juice. Identification of the genetic bases of performance traits through association mapping studies requires a precise knowledge of the available diversity and how this diversity is structured and varies across the whole genome. An 18k SNP genotyping array was evaluated on a panel of Vitis vinifera cultivars and we obtained a data set with no missing values for a total of 10207 SNPs and 783 different genotypes. The average inter-SNP spacing was ~47 kbp, the mean minor allele frequency (MAF) was 0.23 and the genetic diversity in the sample was high (He = 0.32). Fourteen SNPs, chosen from those with the highest MAF values, were sufficient to identify each genotype in the sample. Parentage analysis revealed 118 full parentages and 490 parent-offspring duos, thus confirming the close pedigree relationships within the cultivated grapevine. Structure analyses also confirmed the main divisions due to an eastern-western gradient and human usage (table vs. wine). Using a multivariate approach, we refined the structure and identified a total of eight clusters. Both the genetic diversity (He, 0.26-0.32) and linkage disequilibrium (LD, 28.8-58.2 kbp) varied between clusters. Despite the short span LD, we also identified some non-recombining haplotype blocks that may complicate association mapping. Finally, we performed a genome-wide association study that confirmed previous works and also identified new regions for important performance traits such as acidity. Taken together, all the results contribute to a better knowledge of the genetics of the cultivated grapevine.
Extended diversity analysis of cultivated grapevine Vitis vinifera with 10K genome-wide SNPs
Launay, Amandine; Bacilieri, Roberto; Lacombe, Thierry; Adam-Blondon, Anne-Françoise; Bérard, Aurélie; Chauveau, Aurélie; de Andrés, Maria Teresa; Maghradze, David; Maul, Erika; Ponnaiah, Maharajah; Töpfer, Reinhard; Péros, Jean-Pierre; Boursiquot, Jean-Michel
2018-01-01
Grapevine is a very important crop species that is mainly cultivated worldwide for fruits, wine and juice. Identification of the genetic bases of performance traits through association mapping studies requires a precise knowledge of the available diversity and how this diversity is structured and varies across the whole genome. An 18k SNP genotyping array was evaluated on a panel of Vitis vinifera cultivars and we obtained a data set with no missing values for a total of 10207 SNPs and 783 different genotypes. The average inter-SNP spacing was ~47 kbp, the mean minor allele frequency (MAF) was 0.23 and the genetic diversity in the sample was high (He = 0.32). Fourteen SNPs, chosen from those with the highest MAF values, were sufficient to identify each genotype in the sample. Parentage analysis revealed 118 full parentages and 490 parent-offspring duos, thus confirming the close pedigree relationships within the cultivated grapevine. Structure analyses also confirmed the main divisions due to an eastern-western gradient and human usage (table vs. wine). Using a multivariate approach, we refined the structure and identified a total of eight clusters. Both the genetic diversity (He, 0.26–0.32) and linkage disequilibrium (LD, 28.8–58.2 kbp) varied between clusters. Despite the short span LD, we also identified some non-recombining haplotype blocks that may complicate association mapping. Finally, we performed a genome-wide association study that confirmed previous works and also identified new regions for important performance traits such as acidity. Taken together, all the results contribute to a better knowledge of the genetics of the cultivated grapevine. PMID:29420602
Liu, Jun-Jun; Sniezko, Richard; Murray, Michael; Wang, Ning; Chen, Hao; Zamany, Arezoo; Sturrock, Rona N.; Savin, Douglas; Kegley, Angelia
2016-01-01
Whitebark pine (WBP, Pinus albicaulis Engelm.) is an endangered conifer species due to heavy mortality from white pine blister rust (WPBR, caused by Cronartium ribicola) and mountain pine beetle (Dendroctonus ponderosae). Information about genetic diversity and population structure is of fundamental importance for its conservation and restoration. However, current knowledge on the genetic constitution and genomic variation is still limited for WBP. In this study, an integrated genomics approach was applied to characterize seed collections from WBP breeding programs in western North America. RNA-seq analysis was used for de novo assembly of the WBP needle transcriptome, which contains 97,447 protein-coding transcripts. Within the transcriptome, single nucleotide polymorphisms (SNPs) were discovered, and more than 22,000 of them were non-synonymous SNPs (ns-SNPs). Following the annotation of genes with ns-SNPs, 216 ns-SNPs within candidate genes with putative functions in disease resistance and plant defense were selected to design SNP arrays for high-throughput genotyping. Among these SNP loci, 71 were highly polymorphic, with sufficient variation to identify a unique genotype for each of the 371 individuals originating from British Columbia (Canada), Oregon and Washington (USA). A clear genetic differentiation was evident among seed families. Analyses of genetic spatial patterns revealed varying degrees of diversity and the existence of several genetic subgroups in the WBP breeding populations. Genetic components were associated with geographic variables and phenotypic rating of WPBR disease severity across landscapes, which may facilitate further identification of WBP genotypes and gene alleles contributing to local adaptation and quantitative resistance to WPBR. The WBP genomic resources developed here provide an invaluable tool for further studies and for exploitation and utilization of the genetic diversity preserved within this endangered conifer and other five-needle pines. PMID:27992468
The Genomic Architecture of Sporadic Heart Failure
Dorn, Gerald W
2011-01-01
Common or sporadic systolic heart failure (heart failure) is the clinical syndrome of insufficient forward cardiac output resulting from myocardial disease. Most heart failure is the consequence of ischemic or idiopathic cardiomyopathy. There is a clear familial predisposition to heart failure, with a genetic component estimated to confer between 20 and 30% of overall risk. The multifactorial etiology of this syndrome has complicated identification of its genetic underpinnings. Until recently, almost all genetic studies of heart failure were designed and deployed according to the common disease-common variant hypothesis, in which individual risk alleles impart a small positive or negative effect and overall genetic risk is the cumulative impact of all functional genetic variations. Early studies employed a candidate gene approach, focused mainly on factors within adrenergic and renin-angiotensin pathways that affect heart failure progression and are targeted by standard pharmacotherapeutics. Many of these reported allelic associations with heart failure have not been replicated. However, the preponderance of data support risk-modifier effects for the Arg389Gly polymorphism of β1-adrenergic receptors and the intron 16 in/del polymorphism of angiotensin converting enzyme. Recent unbiased studies using genome-wide single nucleotide polymorphism (SNP) microarrays have shown fewer positive results than when these platforms were applied to hypertension, myocardial infarction, or diabetes, possibly reflecting the complex etiology of heart failure. A new cardiovascular gene-centric sub-genome SNP array identified a common heat failure risk allele at 1p36 in multiple independent cohorts, but the biological mechanism for this association is still uncertain. It is likely that common gene polymorphisms account for only a fraction of individual genetic heart failure risk, and future studies using deep resequencing are likely to identify rare gene variants with larger biological effects. PMID:21566223
Aragam, Nagesh; Wang, Ke-Sheng; Pan, Yue
2011-10-01
Major depressive disorder (MDD) is a universally prevalent, genetic, and environment dependent mental condition that disables people of every culture, race, gender, and age. While the gender differences for MDD have been widely reported in literature, few genome-wide analyses of gender differences have been reported to date. We conducted a genome-wide association analysis of gender differences for MDD using the Netherlands NESDA and NTR population-based samples (1726 cases and 1630 controls). PLINK software was used to analyze the genome-wide association data of Perlegen 600 K SNP Chips. We identified 40 male-specific and 56 female-specific MDD associated SNPs with P-values less than 10(-4). The best male-specific SNP was rs9352774 (P=2.26 × 10(-6)) within LGSN gene while the best female-specific SNP was rs2715148 (P=5.64 × 10(-7)) within PCLO gene. We also found 38 SNPs showing gene × gender interactions in influencing MDD (P<10(-4)). The best SNP was rs12692709 (P=5.75 × 10(-6)) near FIGN gene at 2q24.3 while the next best SNP was rs11039588 (P=1.16 × 10(-5)) within OR4B1 gene. The findings from this study need be replicated in other populations. These results provide genetic basis for gender differences in MDD and will serve as a resource for replication in other populations to elucidate the potential role of these genetic variants in MDD. Copyright © 2011 Elsevier B.V. All rights reserved.
Insights into the genetic architecture of morphological traits in two passerine bird species.
Silva, C N S; McFarlane, S E; Hagen, I J; Rönnegård, L; Billing, A M; Kvalnes, T; Kemppainen, P; Rønning, B; Ringsby, T H; Sæther, B-E; Qvarnström, A; Ellegren, H; Jensen, H; Husby, A
2017-09-01
Knowledge about the underlying genetic architecture of phenotypic traits is needed to understand and predict evolutionary dynamics. The number of causal loci, magnitude of the effects and location in the genome are, however, still largely unknown. Here, we use genome-wide single-nucleotide polymorphism (SNP) data from two large-scale data sets on house sparrows and collared flycatchers to examine the genetic architecture of different morphological traits (tarsus length, wing length, body mass, bill depth, bill length, total and visible badge size and white wing patches). Genomic heritabilities were estimated using relatedness calculated from SNPs. The proportion of variance captured by the SNPs (SNP-based heritability) was lower in house sparrows compared with collared flycatchers, as expected given marker density (6348 SNPs in house sparrows versus 38 689 SNPs in collared flycatchers). Indeed, after downsampling to similar SNP density and sample size, this estimate was no longer markedly different between species. Chromosome-partitioning analyses demonstrated that the proportion of variance explained by each chromosome was significantly positively related to the chromosome size for some traits and, generally, that larger chromosomes tended to explain proportionally more variation than smaller chromosomes. Finally, we found two genome-wide significant associations with very small-effect sizes. One SNP on chromosome 20 was associated with bill length in house sparrows and explained 1.2% of phenotypic variation (V P ), and one SNP on chromosome 4 was associated with tarsus length in collared flycatchers (3% of V P ). Although we cannot exclude the possibility of undetected large-effect loci, our results indicate a polygenic basis for morphological traits.
Genomic analysis of genetic heterogeneity and evolution in high-grade serous ovarian carcinoma
Cooke, Susanna L; Ng, Charlotte KY; Melnyk, Nataliya; Garcia, Maria J; Hardcastle, Tom; Temple, Jillian; Langdon, Simon; Huntsman, David; Brenton, James D
2010-01-01
Resistance to chemotherapy in ovarian cancer is poorly understood. Evolutionary models of cancer predict that, following treatment, resistance emerges either due to outgrowth of an intrinsically resistant sub-clone, or evolves in residual disease under the selective pressure of treatment. To investigate genetic evolution in high-grade serous (HGS) ovarian cancers we first analysed cell line series derived from three cases of HGS carcinoma before and after platinum resistance had developed (PEO1, PEO4 and PEO6, PEA1 and PEA2, and PEO14 and PEO23). Analysis with 24-colour fluorescence in situ hybridisation and SNP array comparative genomic hybridisation (CGH) showed mutually exclusive endoreduplication and loss of heterozygosity events in clones present at different timepoints in the same individual. This implies that platinum sensitive and resistant disease was not linearly related but shared a common ancestor at an early stage of tumour development. Array CGH analysis of six paired pre- and post-neoadjuvant treatment HGS samples from the CTCR-OV01 clinical study did not show extensive copy number differences, suggesting that one clone was strongly dominant at presentation. These data show that cisplatin resistance in HGS carcinoma develops from pre-existing minor clones but that enrichment for these clones is not apparent during short-term chemotherapy treatment. PMID:20581869
DOE Office of Scientific and Technical Information (OSTI.GOV)
SacconePhD, Scott F; Chesler, Elissa J; Bierut, Laura J
Commercial SNP microarrays now provide comprehensive and affordable coverage of the human genome. However, some diseases have biologically relevant genomic regions that may require additional coverage. Addiction, for example, is thought to be influenced by complex interactions among many relevant genes and pathways. We have assembled a list of 486 biologically relevant genes nominated by a panel of experts on addiction. We then added 424 genes that showed evidence of association with addiction phenotypes through mouse QTL mappings and gene co-expression analysis. We demonstrate that there are a substantial number of SNPs in these genes that are not well representedmore » by commercial SNP platforms. We address this problem by introducing a publicly available SNP database for addiction. The database is annotated using numeric prioritization scores indicating the extent of biological relevance. The scores incorporate a number of factors such as SNP/gene functional properties (including synonymy and promoter regions), data from mouse systems genetics and measures of human/mouse evolutionary conservation. We then used HapMap genotyping data to determine if a SNP is tagged by a commercial microarray through linkage disequilibrium. This combination of biological prioritization scores and LD tagging annotation will enable addiction researchers to supplement commercial SNP microarrays to ensure comprehensive coverage of biologically relevant regions.« less
Consolandi, Clarissa
2009-01-01
One major goal of genetic research is to understand the role of genetic variation in living systems. In humans, by far the most common type of such variation involves differences in single DNA nucleotides, and is thus termed single nucleotide polymorphism (SNP). The need for improvement in throughput and reliability of traditional techniques makes it necessary to develop new technologies. Thus the past few years have witnessed an extraordinary surge of interest in DNA microarray technology. This new technology offers the first great hope for providing a systematic way to explore the genome. It permits a very rapid analysis of thousands genes for the purpose of gene discovery, sequencing, mapping, expression, and polymorphism detection. We generated a series of analytical tools to address the manufacturing, detection and data analysis components of a microarray experiment. In particular, we set up a universal array approach in combination with a PCR-LDR (polymerase chain reaction-ligation detection reaction) strategy for allele identification in the HLA gene.
Petrin, Aline L.; Daack-Hirsch, Sandra; L’Heureux, Jamie; Murray, Jeffrey C
2010-01-01
Objective The objective of this study was to use array-CGH to detect causal microdeletions in samples of subjects with cleft lip and palate. Subjects We analyzed DNA samples from a male patient and parents that was seen during surgical screening for an Operation Smile medical mission in the Philippines. Method We used Affymetrix Genome Wide Human SNP Array 6.0 followed by sequencing and quantitative PCR using SYBR Green I dye. Results We report the second case of 3q29 microdeletion syndrome including cleft lip with or without cleft palate and the first case of this microdeletion syndrome inherited from a phenotypically normal mosaic parent. Conclusions Our findings confirm the utility of aCGH to detect causal microdeletions; indicate that parental somatic mosaicism should be considered in healthy parents for genetic counseling of the families and discuss important ethical implications of sharing health impact results from research studies with the participant families. PMID:20500065
Erdoğan, Onur; Aydin Son, Yeşim
2014-01-01
Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.
Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae)
Li, Gang; Davis, Brian W.; Eizirik, Eduardo; Murphy, William J.
2016-01-01
Inter-species hybridization has been recently recognized as potentially common in wild animals, but the extent to which it shapes modern genomes is still poorly understood. Distinguishing historical hybridization events from other processes leading to phylogenetic discordance among different markers requires a well-resolved species tree that considers all modes of inheritance and overcomes systematic problems due to rapid lineage diversification by sampling large genomic character sets. Here, we assessed genome-wide phylogenetic variation across a diverse mammalian family, Felidae (cats). We combined genotypes from a genome-wide SNP array with additional autosomal, X- and Y-linked variants to sample ∼150 kb of nuclear sequence, in addition to complete mitochondrial genomes generated using light-coverage Illumina sequencing. We present the first robust felid time tree that accounts for unique maternal, paternal, and biparental evolutionary histories. Signatures of phylogenetic discordance were abundant in the genomes of modern cats, in many cases indicating hybridization as the most likely cause. Comparison of big cat whole-genome sequences revealed a substantial reduction of X-linked divergence times across several large recombination cold spots, which were highly enriched for signatures of selection-driven post-divergence hybridization between the ancestors of the snow leopard and lion lineages. These results highlight the mosaic origin of modern felid genomes and the influence of sex chromosomes and sex-biased dispersal in post-speciation gene flow. A complete resolution of the tree of life will require comprehensive genomic sampling of biparental and sex-limited genetic variation to identify and control for phylogenetic conflict caused by ancient admixture and sex-biased differences in genomic transmission. PMID:26518481
Wood, David L. A.; Nones, Katia; Steptoe, Anita; Christ, Angelika; Harliwong, Ivon; Newell, Felicity; Bruxner, Timothy J. C.; Miller, David; Cloonan, Nicole; Grimmond, Sean M.
2015-01-01
Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci. PMID:25965996
Howard, Nicholas P; van de Weg, Eric; Bedford, David S; Peace, Cameron P; Vanderzande, Stijn; Clark, Matthew D; Teh, Soon Li; Cai, Lichun; Luby, James J
2017-01-01
The apple (Malus×domestica) cultivar Honeycrisp has become important economically and as a breeding parent. An earlier study with SSR markers indicated the original recorded pedigree of ‘Honeycrisp’ was incorrect and ‘Keepsake’ was identified as one putative parent, the other being unknown. The objective of this study was to verify ‘Keepsake’ as a parent and identify and genetically describe the unknown parent and its grandparents. A multi-family based dense and high-quality integrated SNP map was created using the apple 8 K Illumina Infinium SNP array. This map was used alongside a large pedigree-connected data set from the RosBREED project to build extended SNP haplotypes and to identify pedigree relationships. ‘Keepsake’ was verified as one parent of ‘Honeycrisp’ and ‘Duchess of Oldenburg’ and ‘Golden Delicious’ were identified as grandparents through the unknown parent. Following this finding, siblings of ‘Honeycrisp’ were identified using the SNP data. Breeding records from several of these siblings suggested that the previously unreported parent is a University of Minnesota selection, MN1627. This selection is no longer available, but now is genetically described through imputed SNP haplotypes. We also present the mosaic grandparental composition of ‘Honeycrisp’ for each of its 17 chromosome pairs. This new pedigree and genetic information will be useful in future pedigree-based genetic studies to connect ‘Honeycrisp’ with other cultivars used widely in apple breeding programs. The created SNP linkage map will benefit future research using the data from the Illumina apple 8 and 20 K and Affymetrix 480 K SNP arrays. PMID:28243452
Optimization of the genotyping-by-sequencing strategy for population genomic analysis in conifers.
Pan, Jin; Wang, Baosheng; Pei, Zhi-Yong; Zhao, Wei; Gao, Jie; Mao, Jian-Feng; Wang, Xiao-Ru
2015-07-01
Flexibility and low cost make genotyping-by-sequencing (GBS) an ideal tool for population genomic studies of nonmodel species. However, to utilize the potential of the method fully, many parameters affecting library quality and single nucleotide polymorphism (SNP) discovery require optimization, especially for conifer genomes with a high repetitive DNA content. In this study, we explored strategies for effective GBS analysis in pine species. We constructed GBS libraries using HpaII, PstI and EcoRI-MseI digestions with different multiplexing levels and examined the effect of restriction enzymes on library complexity and the impact of sequencing depth and size selection of restriction fragments on sequence coverage bias. We tested and compared UNEAK, Stacks and GATK pipelines for the GBS data, and then developed a reference-free SNP calling strategy for haploid pine genomes. Our GBS procedure proved to be effective in SNP discovery, producing 7000-11 000 and 14 751 SNPs within and among three pine species, respectively, from a PstI library. This investigation provides guidance for the design and analysis of GBS experiments, particularly for organisms for which genomic information is lacking. © 2014 John Wiley & Sons Ltd.
A genome-wide association search for type 2 diabetes genes in African Americans.
Palmer, Nicholette D; McDonough, Caitrin W; Hicks, Pamela J; Roh, Bong H; Wing, Maria R; An, S Sandy; Hester, Jessica M; Cooke, Jessica N; Bostrom, Meredith A; Rudock, Megan E; Talbert, Matthew E; Lewis, Joshua P; Ferrara, Assiamira; Lu, Lingyi; Ziegler, Julie T; Sale, Michele M; Divers, Jasmin; Shriner, Daniel; Adeyemo, Adebowale; Rotimi, Charles N; Ng, Maggie C Y; Langefeld, Carl D; Freedman, Barry I; Bowden, Donald W; Voight, Benjamin F; Scott, Laura J; Steinthorsdottir, Valgerdur; Morris, Andrew P; Dina, Christian; Welch, Ryan P; Zeggini, Eleftheria; Huth, Cornelia; Aulchenko, Yurii S; Thorleifsson, Gudmar; McCulloch, Laura J; Ferreira, Teresa; Grallert, Harald; Amin, Najaf; Wu, Guanming; Willer, Cristen J; Raychaudhuri, Soumya; McCarroll, Steve A; Langenberg, Claudia; Hofmann, Oliver M; Dupuis, Josée; Qi, Lu; Segrè, Ayellet V; van Hoek, Mandy; Navarro, Pau; Ardlie, Kristin; Balkau, Beverley; Benediktsson, Rafn; Bennett, Amanda J; Blagieva, Roza; Boerwinkle, Eric; Bonnycastle, Lori L; Boström, Kristina Bengtsson; Bravenboer, Bert; Bumpstead, Suzannah; Burtt, Noël P; Charpentier, Guillaume; Chines, Peter S; Cornelis, Marilyn; Couper, David J; Crawford, Gabe; Doney, Alex S F; Elliott, Katherine S; Elliott, Amanda L; Erdos, Michael R; Fox, Caroline S; Franklin, Christopher S; Ganser, Martha; Gieger, Christian; Grarup, Niels; Green, Todd; Griffin, Simon; Groves, Christopher J; Guiducci, Candace; Hadjadj, Samy; Hassanali, Neelam; Herder, Christian; Isomaa, Bo; Jackson, Anne U; Johnson, Paul R V; Jørgensen, Torben; Kao, Wen H L; Klopp, Norman; Kong, Augustine; Kraft, Peter; Kuusisto, Johanna; Lauritzen, Torsten; Li, Man; Lieverse, Aloysius; Lindgren, Cecilia M; Lyssenko, Valeriya; Marre, Michel; Meitinger, Thomas; Midthjell, Kristian; Morken, Mario A; Narisu, Narisu; Nilsson, Peter; Owen, Katharine R; Payne, Felicity; Perry, John R B; Petersen, Ann-Kristin; Platou, Carl; Proença, Christine; Prokopenko, Inga; Rathmann, Wolfgang; Rayner, N William; Robertson, Neil R; Rocheleau, Ghislain; Roden, Michael; Sampson, Michael J; Saxena, Richa; Shields, Beverley M; Shrader, Peter; Sigurdsson, Gunnar; Sparsø, Thomas; Strassburger, Klaus; Stringham, Heather M; Sun, Qi; Swift, Amy J; Thorand, Barbara; Tichet, Jean; Tuomi, Tiinamaija; van Dam, Rob M; van Haeften, Timon W; van Herpt, Thijs; van Vliet-Ostaptchouk, Jana V; Walters, G Bragi; Weedon, Michael N; Wijmenga, Cisca; Witteman, Jacqueline; Bergman, Richard N; Cauchi, Stephane; Collins, Francis S; Gloyn, Anna L; Gyllensten, Ulf; Hansen, Torben; Hide, Winston A; Hitman, Graham A; Hofman, Albert; Hunter, David J; Hveem, Kristian; Laakso, Markku; Mohlke, Karen L; Morris, Andrew D; Palmer, Colin N A; Pramstaller, Peter P; Rudan, Igor; Sijbrands, Eric; Stein, Lincoln D; Tuomilehto, Jaakko; Uitterlinden, Andre; Walker, Mark; Wareham, Nicholas J; Watanabe, Richard M; Abecasis, Goncalo R; Boehm, Bernhard O; Campbell, Harry; Daly, Mark J; Hattersley, Andrew T; Hu, Frank B; Meigs, James B; Pankow, James S; Pedersen, Oluf; Wichmann, H-Erich; Barroso, Inês; Florez, Jose C; Frayling, Timothy M; Groop, Leif; Sladek, Rob; Thorsteinsdottir, Unnur; Wilson, James F; Illig, Thomas; Froguel, Philippe; van Duijn, Cornelia M; Stefansson, Kari; Altshuler, David; Boehnke, Michael; McCarthy, Mark I; Soranzo, Nicole; Wheeler, Eleanor; Glazer, Nicole L; Bouatia-Naji, Nabila; Mägi, Reedik; Randall, Joshua; Johnson, Toby; Elliott, Paul; Rybin, Denis; Henneman, Peter; Dehghan, Abbas; Hottenga, Jouke Jan; Song, Kijoung; Goel, Anuj; Egan, Josephine M; Lajunen, Taina; Doney, Alex; Kanoni, Stavroula; Cavalcanti-Proença, Christine; Kumari, Meena; Timpson, Nicholas J; Zabena, Carina; Ingelsson, Erik; An, Ping; O'Connell, Jeffrey; Luan, Jian'an; Elliott, Amanda; McCarroll, Steven A; Roccasecca, Rosa Maria; Pattou, François; Sethupathy, Praveen; Ariyurek, Yavuz; Barter, Philip; Beilby, John P; Ben-Shlomo, Yoav; Bergmann, Sven; Bochud, Murielle; Bonnefond, Amélie; Borch-Johnsen, Knut; Böttcher, Yvonne; Brunner, Eric; Bumpstead, Suzannah J; Chen, Yii-Der Ida; Chines, Peter; Clarke, Robert; Coin, Lachlan J M; Cooper, Matthew N; Crisponi, Laura; Day, Ian N M; de Geus, Eco J C; Delplanque, Jerome; Fedson, Annette C; Fischer-Rosinsky, Antje; Forouhi, Nita G; Frants, Rune; Franzosi, Maria Grazia; Galan, Pilar; Goodarzi, Mark O; Graessler, Jürgen; Grundy, Scott; Gwilliam, Rhian; Hallmans, Göran; Hammond, Naomi; Han, Xijing; Hartikainen, Anna-Liisa; Hayward, Caroline; Heath, Simon C; Hercberg, Serge; Hicks, Andrew A; Hillman, David R; Hingorani, Aroon D; Hui, Jennie; Hung, Joe; Jula, Antti; Kaakinen, Marika; Kaprio, Jaakko; Kesaniemi, Y Antero; Kivimaki, Mika; Knight, Beatrice; Koskinen, Seppo; Kovacs, Peter; Kyvik, Kirsten Ohm; Lathrop, G Mark; Lawlor, Debbie A; Le Bacquer, Olivier; Lecoeur, Cécile; Li, Yun; Mahley, Robert; Mangino, Massimo; Manning, Alisa K; Martínez-Larrad, María Teresa; McAteer, Jarred B; McPherson, Ruth; Meisinger, Christa; Melzer, David; Meyre, David; Mitchell, Braxton D; Mukherjee, Sutapa; Naitza, Silvia; Neville, Matthew J; Oostra, Ben A; Orrù, Marco; Pakyz, Ruth; Paolisso, Giuseppe; Pattaro, Cristian; Pearson, Daniel; Peden, John F; Pedersen, Nancy L; Perola, Markus; Pfeiffer, Andreas F H; Pichler, Irene; Polasek, Ozren; Posthuma, Danielle; Potter, Simon C; Pouta, Anneli; Province, Michael A; Psaty, Bruce M; Rayner, Nigel W; Rice, Kenneth; Ripatti, Samuli; Rivadeneira, Fernando; Rolandsson, Olov; Sandbaek, Annelli; Sandhu, Manjinder; Sanna, Serena; Sayer, Avan Aihie; Scheet, Paul; Seedorf, Udo; Sharp, Stephen J; Shields, Beverley; Sijbrands, Eric J G; Silveira, Angela; Simpson, Laila; Singleton, Andrew; Smith, Nicholas L; Sovio, Ulla; Swift, Amy; Syddall, Holly; Syvänen, Ann-Christine; Tanaka, Toshiko; Tönjes, Anke; Uitterlinden, André G; van Dijk, Ko Willems; Varma, Dhiraj; Visvikis-Siest, Sophie; Vitart, Veronique; Vogelzangs, Nicole; Waeber, Gérard; Wagner, Peter J; Walley, Andrew; Ward, Kim L; Watkins, Hugh; Wild, Sarah H; Willemsen, Gonneke; Witteman, Jaqueline C M; Yarnell, John W G; Zelenika, Diana; Zethelius, Björn; Zhai, Guangju; Zhao, Jing Hua; Zillikens, M Carola; Borecki, Ingrid B; Loos, Ruth J F; Meneton, Pierre; Magnusson, Patrik K E; Nathan, David M; Williams, Gordon H; Silander, Kaisa; Salomaa, Veikko; Smith, George Davey; Bornstein, Stefan R; Schwarz, Peter; Spranger, Joachim; Karpe, Fredrik; Shuldiner, Alan R; Cooper, Cyrus; Dedoussis, George V; Serrano-Ríos, Manuel; Lind, Lars; Palmer, Lyle J; Franks, Paul W; Ebrahim, Shah; Marmot, Michael; Kao, W H Linda; Pramstaller, Peter Paul; Wright, Alan F; Stumvoll, Michael; Hamsten, Anders; Buchanan, Thomas A; Valle, Timo T; Rotter, Jerome I; Siscovick, David S; Penninx, Brenda W J H; Boomsma, Dorret I; Deloukas, Panos; Spector, Timothy D; Ferrucci, Luigi; Cao, Antonio; Scuteri, Angelo; Schlessinger, David; Uda, Manuela; Ruokonen, Aimo; Jarvelin, Marjo-Riitta; Waterworth, Dawn M; Vollenweider, Peter; Peltonen, Leena; Mooser, Vincent; Sladek, Robert
2012-01-01
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10(-8)). SNP rs7560163 (P = 7.0×10(-9), OR (95% CI) = 0.75 (0.67-0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10(-5)) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
The pitfalls of platform comparison: DNA copy number array technologies assessed
2009-01-01
Background The accurate and high resolution mapping of DNA copy number aberrations has become an important tool by which to gain insight into the mechanisms of tumourigenesis. There are various commercially available platforms for such studies, but there remains no general consensus as to the optimal platform. There have been several previous platform comparison studies, but they have either described older technologies, used less-complex samples, or have not addressed the issue of the inherent biases in such comparisons. Here we describe a systematic comparison of data from four leading microarray technologies (the Affymetrix Genome-wide SNP 5.0 array, Agilent High-Density CGH Human 244A array, Illumina HumanCNV370-Duo DNA Analysis BeadChip, and the Nimblegen 385 K oligonucleotide array). We compare samples derived from primary breast tumours and their corresponding matched normals, well-established cancer cell lines, and HapMap individuals. By careful consideration and avoidance of potential sources of bias, we aim to provide a fair assessment of platform performance. Results By performing a theoretical assessment of the reproducibility, noise, and sensitivity of each platform, notable differences were revealed. Nimblegen exhibited between-replicate array variances an order of magnitude greater than the other three platforms, with Agilent slightly outperforming the others, and a comparison of self-self hybridizations revealed similar patterns. An assessment of the single probe power revealed that Agilent exhibits the highest sensitivity. Additionally, we performed an in-depth visual assessment of the ability of each platform to detect aberrations of varying sizes. As expected, all platforms were able to identify large aberrations in a robust manner. However, some focal amplifications and deletions were only detected in a subset of the platforms. Conclusion Although there are substantial differences in the design, density, and number of replicate probes, the comparison indicates a generally high level of concordance between platforms, despite differences in the reproducibility, noise, and sensitivity. In general, Agilent tended to be the best aCGH platform and Affymetrix, the superior SNP-CGH platform, but for specific decisions the results described herein provide a guide for platform selection and study design, and the dataset a resource for more tailored comparisons. PMID:19995423
Welderufael, B G; Løvendahl, Peter; de Koning, Dirk-Jan; Janss, Lucas L G; Fikse, W F
2018-01-01
Because mastitis is very frequent and unavoidable, adding recovery information into the analysis for genetic evaluation of mastitis is of great interest from economical and animal welfare point of view. Here we have performed genome-wide association studies (GWAS) to identify associated single nucleotide polymorphisms (SNPs) and investigate the genetic background not only for susceptibility to - but also for recoverability from mastitis. Somatic cell count records from 993 Danish Holstein cows genotyped for a total of 39378 autosomal SNP markers were used for the association analysis. Single SNP regression analysis was performed using the statistical software package DMU. Substitution effect of each SNP was tested with a t -test and a genome-wide significance level of P -value < 10 -4 was used to declare significant SNP-trait association. A number of significant SNP variants were identified for both traits. Many of the SNP variants associated either with susceptibility to - or recoverability from mastitis were located in or very near to genes that have been reported for their role in the immune system. Genes involved in lymphocyte developments (e.g., MAST3 and STAB2 ) and genes involved in macrophage recruitment and regulation of inflammations ( PDGFD and PTX3 ) were suggested as possible causal genes for susceptibility to - and recoverability from mastitis, respectively. However, this is the first GWAS study for recoverability from mastitis and our results need to be validated. The findings in the current study are, therefore, a starting point for further investigations in identifying causal genetic variants or chromosomal regions for both susceptibility to - and recoverability from mastitis.
Genomic Prediction of Testcross Performance in Canola (Brassica napus)
Jan, Habib U.; Abbadi, Amine; Lücke, Sophie; Nichols, Richard A.; Snowdon, Rod J.
2016-01-01
Genomic selection (GS) is a modern breeding approach where genome-wide single-nucleotide polymorphism (SNP) marker profiles are simultaneously used to estimate performance of untested genotypes. In this study, the potential of genomic selection methods to predict testcross performance for hybrid canola breeding was applied for various agronomic traits based on genome-wide marker profiles. A total of 475 genetically diverse spring-type canola pollinator lines were genotyped at 24,403 single-copy, genome-wide SNP loci. In parallel, the 950 F1 testcross combinations between the pollinators and two representative testers were evaluated for a number of important agronomic traits including seedling emergence, days to flowering, lodging, oil yield and seed yield along with essential seed quality characters including seed oil content and seed glucosinolate content. A ridge-regression best linear unbiased prediction (RR-BLUP) model was applied in combination with 500 cross-validations for each trait to predict testcross performance, both across the whole population as well as within individual subpopulations or clusters, based solely on SNP profiles. Subpopulations were determined using multidimensional scaling and K-means clustering. Genomic prediction accuracy across the whole population was highest for seed oil content (0.81) followed by oil yield (0.75) and lowest for seedling emergence (0.29). For seed yieId, seed glucosinolate, lodging resistance and days to onset of flowering (DTF), prediction accuracies were 0.45, 0.61, 0.39 and 0.56, respectively. Prediction accuracies could be increased for some traits by treating subpopulations separately; a strategy which only led to moderate improvements for some traits with low heritability, like seedling emergence. No useful or consistent increase in accuracy was obtained by inclusion of a population substructure covariate in the model. Testcross performance prediction using genome-wide SNP markers shows considerable potential for pre-selection of promising hybrid combinations prior to resource-intensive field testing over multiple locations and years. PMID:26824924
Detection of selective sweeps in cattle using genome-wide SNP data
2013-01-01
Background The domestication and subsequent selection by humans to create breeds and biological types of cattle undoubtedly altered the patterning of variation within their genomes. Strong selection to fix advantageous large-effect mutations underlying domesticability, breed characteristics or productivity created selective sweeps in which variation was lost in the chromosomal region flanking the selected allele. Selective sweeps have now been identified in the genomes of many animal species including humans, dogs, horses, and chickens. Here, we attempt to identify and characterise regions of the bovine genome that have been subjected to selective sweeps. Results Two datasets were used for the discovery and validation of selective sweeps via the fixation of alleles at a series of contiguous SNP loci. BovineSNP50 data were used to identify 28 putative sweep regions among 14 diverse cattle breeds. Affymetrix BOS 1 prescreening assay data for five breeds were used to identify 85 regions and validate 5 regions identified using the BovineSNP50 data. Many genes are located within these regions and the lack of sequence data for the analysed breeds precludes the nomination of selected genes or variants and limits the prediction of the selected phenotypes. However, phenotypes that we predict to have historically been under strong selection include horned-polled, coat colour, stature, ear morphology, and behaviour. Conclusions The bias towards common SNPs in the design of the BovineSNP50 assay led to the identification of recent selective sweeps associated with breed formation and common to only a small number of breeds rather than ancient events associated with domestication which could potentially be common to all European taurines. The limited SNP density, or marker resolution, of the BovineSNP50 assay significantly impacted the rate of false discovery of selective sweeps, however, we found sweeps in common between breeds which were confirmed using an ultra-high-density assay scored in a small number of animals from a subset of the breeds. No sweep regions were shared between indicine and taurine breeds reflecting their divergent selection histories and the very different environmental habitats to which these sub-species have adapted. PMID:23758707
Novel approach for deriving genome wide SNP analysis data from archived blood spots
2012-01-01
Background The ability to transport and store DNA at room temperature in low volumes has the advantage of optimising cost, time and storage space. Blood spots on adapted filter papers are popular for this, with FTA (Flinders Technology Associates) Whatman™TM technology being one of the most recent. Plant material, plasmids, viral particles, bacteria and animal blood have been stored and transported successfully using this technology, however the method of porcine DNA extraction from FTA Whatman™TM cards is a relatively new approach, allowing nucleic acids to be ready for downstream applications such as PCR, whole genome amplification, sequencing and subsequent application to single nucleotide polymorphism microarrays has hitherto been under-explored. Findings DNA was extracted from FTA Whatman™TM cards (following adaptations of the manufacturer’s instructions), whole genome amplified and subsequently analysed to validate the integrity of the DNA for downstream SNP analysis. DNA was successfully extracted from 288/288 samples and amplified by WGA. Allele dropout post WGA, was observed in less than 2% of samples and there was no clear evidence of amplification bias nor contamination. Acceptable call rates on porcine SNP chips were also achieved using DNA extracted and amplified in this way. Conclusions DNA extracted from FTA Whatman cards is of a high enough quality and quantity following whole genomic amplification to perform meaningful SNP chip studies. PMID:22974252
Galvan, Antonella; Falvella, Felicia S; Frullanti, Elisa; Spinola, Monica; Incarbone, Matteo; Nosotti, Mario; Santambrogio, Luigi; Conti, Barbara; Pastorino, Ugo; Gonzalez-Neira, Anna; Dragani, Tommaso A
2010-03-01
We analyzed a series of young (median age = 52 years) non-smoker lung cancer patients and their unaffected siblings as controls, using a genome-wide 620 901 single-nucleotide polymorphism (SNP) array analysis and a case-control DNA pooling approach. We identified 82 putatively associated SNPs that were retested by individual genotyping followed by use of the sib transmission disequilibrium test, pointing to 36 SNPs associated with lung cancer risk in the discordant sibs series. Analysis of these 36 SNPs in a polygenic model characterized by additive and interchangeable effects of rare alleles revealed a highly statistically significant dosage-dependent association between risk allele carrier status and proportion of cancer cases. Replication of the same 36 SNPs in a population-based series confirmed the association with lung cancer for three SNPs, suggesting that phenocopies and genetic heterogeneity can play a major role in the complex genetics of lung cancer risk in the general population.
A whole genome analyses of genetic variants in two Kelantan Malay individuals.
Wan Juhari, Wan Khairunnisa; Md Tamrin, Nur Aida; Mat Daud, Mohd Hanif Ridzuan; Isa, Hatin Wan; Mohd Nasir, Nurfazreen; Maran, Sathiya; Abdul Rajab, Nur Shafawati; Ahmad Amin Noordin, Khairul Bariah; Nik Hassan, Nik Norliza; Tearle, Rick; Razali, Rozaimi; Merican, Amir Feisal; Zilfalil, Bin Alwi
2014-12-01
The sequencing of two members of the Royal Kelantan Malay family genomes will provide insights on the Kelantan Malay whole genome sequences. The two Kelantan Malay genomes were analyzed for the SNP markers associated with thalassemia and Helicobacter pylori infection. Helicobacter pylori infection was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia and beta-thalassemia was known to be one of the most common inherited and genetic disorder in Malaysia. By combining SNP information from literatures, GWAS study and NCBI ClinVar, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from previous study of Helicobacter pylori infection among Malay patients, 6 SNPs were from NCBI ClinVar and 2 SNPs from GWAS studies. The analysis reveals that both Royal Kelantan Malay genomes shared all the 10 SNPs identified by Maran (Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan, 2011) and one SNP from GWAS study. In addition, the analysis also reveals that both Royal Kelantan Malay genomes shared 3 SNP markers; HBG1 (rs1061234), HBB (rs1609812) and BCL11A (rs766432) where all three markers were associated with beta-thalassemia. Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population.
2012-01-01
Background Significant quantitative trait loci (QTL) for carcass weight were previously mapped on several chromosomes in Japanese Black half-sib families. Two QTL, CW-1 and CW-2, were narrowed down to 1.1-Mb and 591-kb regions, respectively. Recent advances in genomic tools allowed us to perform a genome-wide association study (GWAS) in cattle to detect associations in a general population and estimate their effect size. Here, we performed a GWAS for carcass weight using 1156 Japanese Black steers. Results Bonferroni-corrected genome-wide significant associations were detected in three chromosomal regions on bovine chromosomes (BTA) 6, 8, and 14. The associated single nucleotide polymorphisms (SNP) on BTA 6 were in linkage disequilibrium with the SNP encoding NCAPG Ile442Met, which was previously identified as a candidate quantitative trait nucleotide for CW-2. In contrast, the most highly associated SNP on BTA 14 was located 2.3-Mb centromeric from the previously identified CW-1 region. Linkage disequilibrium mapping led to a revision of the CW-1 region within a 0.9-Mb interval around the associated SNP, and targeted resequencing followed by association analysis highlighted the quantitative trait nucleotides for bovine stature in the PLAG1-CHCHD7 intergenic region. The association on BTA 8 was accounted for by two SNP on the BovineSNP50 BeadChip and corresponded to CW-3, which was simultaneously detected by linkage analyses using half-sib families. The allele substitution effects of CW-1, CW-2, and CW-3 were 28.4, 35.3, and 35.0 kg per allele, respectively. Conclusion The GWAS revealed the genetic architecture underlying carcass weight variation in Japanese Black cattle in which three major QTL accounted for approximately one-third of the genetic variance. PMID:22607022
2013-01-01
Background Field pea (Pisum sativum L.) is a self-pollinating, diploid, cool-season food legume. Crop production is constrained by multiple biotic and abiotic stress factors, including salinity, that cause reduced growth and yield. Recent advances in genomics have permitted the development of low-cost high-throughput genotyping systems, allowing the construction of saturated genetic linkage maps for identification of quantitative trait loci (QTLs) associated with traits of interest. Genetic markers in close linkage with the relevant genomic regions may then be implemented in varietal improvement programs. Results In this study, single nucleotide polymorphism (SNP) markers associated with expressed sequence tags (ESTs) were developed and used to generate comprehensive linkage maps for field pea. From a set of 36,188 variant nucleotide positions detected through in silico analysis, 768 were selected for genotyping of a recombinant inbred line (RIL) population. A total of 705 SNPs (91.7%) successfully detected segregating polymorphisms. In addition to SNPs, genomic and EST-derived simple sequence repeats (SSRs) were assigned to the genetic map in order to obtain an evenly distributed genome-wide coverage. Sequences associated with the mapped molecular markers were used for comparative genomic analysis with other legume species. Higher levels of conserved synteny were observed with the genomes of Medicago truncatula Gaertn. and chickpea (Cicer arietinum L.) than with soybean (Glycine max [L.] Merr.), Lotus japonicus L. and pigeon pea (Cajanus cajan [L.] Millsp.). Parents and RIL progeny were screened at the seedling growth stage for responses to salinity stress, imposed by addition of NaCl in the watering solution at a concentration of 18 dS m-1. Salinity-induced symptoms showed normal distribution, and the severity of the symptoms increased over time. QTLs for salinity tolerance were identified on linkage groups Ps III and VII, with flanking SNP markers suitable for selection of resistant cultivars. Comparison of sequences underpinning these SNP markers to the M. truncatula genome defined genomic regions containing candidate genes associated with saline stress tolerance. Conclusion The SNP assays and associated genetic linkage maps developed in this study permitted identification of salinity tolerance QTLs and candidate genes. This constitutes an important set of tools for marker-assisted selection (MAS) programs aimed at performance enhancement of field pea cultivars. PMID:24134188
Leonforte, Antonio; Sudheesh, Shimna; Cogan, Noel O I; Salisbury, Philip A; Nicolas, Marc E; Materne, Michael; Forster, John W; Kaur, Sukhjiwan
2013-10-17
Field pea (Pisum sativum L.) is a self-pollinating, diploid, cool-season food legume. Crop production is constrained by multiple biotic and abiotic stress factors, including salinity, that cause reduced growth and yield. Recent advances in genomics have permitted the development of low-cost high-throughput genotyping systems, allowing the construction of saturated genetic linkage maps for identification of quantitative trait loci (QTLs) associated with traits of interest. Genetic markers in close linkage with the relevant genomic regions may then be implemented in varietal improvement programs. In this study, single nucleotide polymorphism (SNP) markers associated with expressed sequence tags (ESTs) were developed and used to generate comprehensive linkage maps for field pea. From a set of 36,188 variant nucleotide positions detected through in silico analysis, 768 were selected for genotyping of a recombinant inbred line (RIL) population. A total of 705 SNPs (91.7%) successfully detected segregating polymorphisms. In addition to SNPs, genomic and EST-derived simple sequence repeats (SSRs) were assigned to the genetic map in order to obtain an evenly distributed genome-wide coverage. Sequences associated with the mapped molecular markers were used for comparative genomic analysis with other legume species. Higher levels of conserved synteny were observed with the genomes of Medicago truncatula Gaertn. and chickpea (Cicer arietinum L.) than with soybean (Glycine max [L.] Merr.), Lotus japonicus L. and pigeon pea (Cajanus cajan [L.] Millsp.). Parents and RIL progeny were screened at the seedling growth stage for responses to salinity stress, imposed by addition of NaCl in the watering solution at a concentration of 18 dS m-1. Salinity-induced symptoms showed normal distribution, and the severity of the symptoms increased over time. QTLs for salinity tolerance were identified on linkage groups Ps III and VII, with flanking SNP markers suitable for selection of resistant cultivars. Comparison of sequences underpinning these SNP markers to the M. truncatula genome defined genomic regions containing candidate genes associated with saline stress tolerance. The SNP assays and associated genetic linkage maps developed in this study permitted identification of salinity tolerance QTLs and candidate genes. This constitutes an important set of tools for marker-assisted selection (MAS) programs aimed at performance enhancement of field pea cultivars.
Nakajima, Ayaka; Kawaguchi, Fuki; Uemoto, Yoshinobu; Fukushima, Moriyuki; Yoshida, Emi; Iwamoto, Eiji; Akiyama, Takayuki; Kohama, Namiko; Kobayashi, Eiji; Honda, Takeshi; Oyama, Kenji; Mannen, Hideyuki; Sasazaki, Shinji
2018-05-01
The objective of this study was to identify genomic regions associated with fat-related traits using a Japanese Black cattle population in Hyogo. From 1836 animals, those with high or low values were selected on the basis of corrected phenotype and then pooled into high and low groups (n = 100 each), respectively. DNA pool-based genome-wide association study (GWAS) was performed using Illumina BovineSNP50 BeadChip v2 with three replicate assays for each pooled sample. GWAS detected that two single nucleotide polymorphisms (SNPs) on BTA7 (ARS-BFGL-NGS-35463 and Hapmap23838-BTA-163815) and one SNP on BTA12 (ARS-BFGL-NGS-2915) significantly affected fat percentage (FAR). The significance of ARS-BFGL-NGS-35463 on BTA7 was confirmed by individual genotyping in all pooled samples. Moreover, association analysis between SNP and FAR in 803 Japanese Black cattle revealed a significant effect of SNP on FAR. Thus, further investigation of these regions is required to identify FAR-associated genes and mutations, which can lead to the development of DNA markers for marker-assisted selection for the genetic improvement of beef quality. © 2018 Japanese Society of Animal Science.
Dalman, Kerstin; Himmelstrand, Kajsa; Olson, Åke; Lind, Mårten; Brandström-Durling, Mikael; Stenlid, Jan
2013-01-01
The dense single nucleotide polymorphisms (SNP) panels needed for genome wide association (GWA) studies have hitherto been expensive to establish and use on non-model organisms. To overcome this, we used a next generation sequencing approach to both establish SNPs and to determine genotypes. We conducted a GWA study on a fungal species, analysing the virulence of Heterobasidion annosum s.s., a necrotrophic pathogen, on its hosts Picea abies and Pinus sylvestris. From a set of 33,018 single nucleotide polymorphisms (SNP) in 23 haploid isolates, twelve SNP markers distributed on seven contigs were associated with virulence (P<0.0001). Four of the contigs harbour known virulence genes from other fungal pathogens and the remaining three harbour novel candidate genes. Two contigs link closely to virulence regions recognized previously by QTL mapping in the congeneric hybrid H. irregulare × H. occidentale. Our study demonstrates the efficiency of GWA studies for dissecting important complex traits of small populations of non-model haploid organisms with small genomes. PMID:23341945
USDA-ARS?s Scientific Manuscript database
SNP effects estimated in genomic selection programs allow for the prediction of direct genomic values (DGV) both at genome-wide and chromosomal level. As a consequence, genome-wide (G_GW) or chromosomal (G_CHR) correlation matrices between genomic predictions for different traits can be calculated. ...
Association Analysis of the Ephrin-B2 Gene in African-Americans with End-Stage Renal Disease
Hicks, Pamela J.; Staten, Jennifer L.; Palmer, Nicholette D.; Langefeld, Carl D.; Ziegler, Julie T.; Keene, Keith L.; Sale, Michele M.; Bowden, Donald W.; Freedman, Barry I.
2008-01-01
Background Genome scans in African-Americans with end-stage renal disease (ESRD) identified linkage on chromosome 13q33 in the region containing the ephrin-B2 ligand (EFNB2) genes. Interactions between the ephrin-B2 receptor and ephrin-B2 ligand play essential roles in renal angiogenesis, blood vessel maturation, and kidney disease. Methods The EFNB2 gene was evaluated as a positional candidate for non-diabetic and diabetic ESRD susceptibility in 1,071 unrelated African-American subjects; 316 with non-diabetic etiologies of ESRD, 394 with type 2 diabetes-associated ESRD and 361 healthy controls. Single nucleotide polymorphism (SNP) genotyping was performed on the Sequenom Mass Array System. Statistical analyses were computed using Dandelion version 1.26, Snpaddmix version 1.4 and Haploview version 3.32. Results Twenty-eight HapMap tag SNPs were genotyped spanning the 39 kilobases (kb) of the EFNB2 coding region, with average spacing of 1.43 kb. Analysis of 710 ESRD patient samples and 361 controls provided no evidence of single SNP associations in either diabetic or non-diabetic ESRD; although nominal evidence of association with all-cause ESRD was observed with a two SNP (p = 0.022) and three SNP (p = 0.023) haplotype, both containing SNPs rs7490924 and rs2391335 in intron 1. Conclusions Although an attractive positional candidate gene, polymorphisms in the EFNB2 gene do not appear to contribute in a substantial way to non-diabetic, diabetic or all-cause ESRD susceptibility in African-Americans. Additional genes within the chromosome 13q33 linkage interval are likely contributors to African-American non-diabetic ESRD. PMID:18580054
Scherrer, Daniel Zanetti; Zago, Vanessa Helena de Souza; Vieira, Isabela Calanca; Parra, Eliane Soler; Panzoldo, Natália Baratella; Alexandre, Fernanda; Secolin, Rodrigo; Baracat, Jamal; Quintão, Eder Carlos Rocha; de Faria, Eliana Cotta
2015-01-01
Background Evidences suggest that paraoxonase 1 (PON1) confers important antioxidant and anti-inflammatory properties when associated with high-density lipoprotein (HDL). Objective To investigate the relationships between p.Q192R SNP of PON1, biochemical parameters and carotid atherosclerosis in an asymptomatic, normolipidemic Brazilian population sample. Methods We studied 584 volunteers (females n = 326, males n = 258; 19-75 years of age). Total genomic DNA was extracted and SNP was detected in the TaqMan® SNP OpenArray® genotyping platform (Applied Biosystems, Foster City, CA). Plasma lipoproteins and apolipoproteins were determined and PON1 activity was measured using paraoxon as a substrate. High-resolution β-mode ultrasonography was used to measure cIMT and the presence of carotid atherosclerotic plaques in a subgroup of individuals (n = 317). Results The presence of p.192Q was associated with a significant increase in PON1 activity (RR = 12.30 (11.38); RQ = 46.96 (22.35); QQ = 85.35 (24.83) μmol/min; p < 0.0001), HDL-C (RR= 45 (37); RQ = 62 (39); QQ = 69 (29) mg/dL; p < 0.001) and apo A-I (RR = 140.76 ± 36.39; RQ = 147.62 ± 36.92; QQ = 147.49 ± 36.65 mg/dL; p = 0.019). Stepwise regression analysis revealed that heterozygous and p.192Q carriers influenced by 58% PON1 activity towards paraoxon. The univariate linear regression analysis demonstrated that p.Q192R SNP was not associated with mean cIMT; as a result, in the multiple regression analysis, no variables were selected with 5% significance. In logistic regression analysis, the studied parameters were not associated with the presence of carotid plaques. Conclusion In low-risk individuals, the presence of the p.192Q variant of PON1 is associated with a beneficial plasma lipid profile but not with carotid atherosclerosis. PMID:26039660
Yoshikawa, Munemitsu; Yamashiro, Kenji; Miyake, Masahiro; Oishi, Maho; Akagi-Kurashige, Yumiko; Kumagai, Kyoko; Nakata, Isao; Nakanishi, Hideo; Oishi, Akio; Gotoh, Norimoto; Yamada, Ryo; Matsuda, Fumihiko; Yoshimura, Nagahisa
2014-10-21
We investigated the association between refractive error in a Japanese population and myopia-related genes identified in two recent large-scale genome-wide association studies. Single-nucleotide polymorphisms (SNPs) in 51 genes that were reported by the Consortium for Refractive Error and Myopia and/or the 23andMe database were genotyped in 3712 healthy Japanese volunteers from the Nagahama Study using HumanHap610K Quad, HumanOmni2.5M, and/or HumanExome Arrays. To evaluate the association between refractive error and recently identified myopia-related genes, we used three approaches to perform quantitative trait locus analyses of mean refractive error in both eyes of the participants: per-SNP, gene-based top-SNP, and gene-based all-SNP analyses. Association plots of successfully replicated genes also were investigated. In our per-SNP analysis, eight myopia gene associations were replicated successfully: GJD2, RASGRF1, BICC1, KCNQ5, CD55, CYP26A1, LRRC4C, and B4GALNT2.Seven additional gene associations were replicated in our gene-based analyses: GRIA4, BMP2, QKI, BMP4, SFRP1, SH3GL2, and EHBP1L1. The signal strength of the reported SNPs and their tagging SNPs increased after considering different linkage disequilibrium patterns across ethnicities. Although two previous studies suggested strong associations between PRSS56, LAMA2, TOX, and RDH5 and myopia, we could not replicate these results. Our results confirmed the significance of the myopia-related genes reported previously and suggested that gene-based replication analyses are more effective than per-SNP analyses. Our comparison with two previous studies suggested that BMP3 SNPs cause myopia primarily in Caucasian populations, while they may exhibit protective effects in Asian populations. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Scherrer, Daniel Zanetti; Zago, Vanessa Helena de Souza; Vieira, Isabela Calanca; Parra, Eliane Soler; Panzoldo, Natália Baratella; Alexandre, Fernanda; Secolin, Rodrigo; Baracat, Jamal; Quintão, Eder Carlos Rocha; Faria, Eliana Cotta de
2015-07-01
Evidences suggest that paraoxonase 1 (PON1) confers important antioxidant and anti-inflammatory properties when associated with high-density lipoprotein (HDL). To investigate the relationships between p.Q192R SNP of PON1, biochemical parameters and carotid atherosclerosis in an asymptomatic, normolipidemic Brazilian population sample. We studied 584 volunteers (females n = 326, males n = 258; 19-75 years of age). Total genomic DNA was extracted and SNP was detected in the TaqMan® SNP OpenArray® genotyping platform (Applied Biosystems, Foster City, CA). Plasma lipoproteins and apolipoproteins were determined and PON1 activity was measured using paraoxon as a substrate. High-resolution β-mode ultrasonography was used to measure cIMT and the presence of carotid atherosclerotic plaques in a subgroup of individuals (n = 317). The presence of p.192Q was associated with a significant increase in PON1 activity (RR = 12.30 (11.38); RQ = 46.96 (22.35); QQ = 85.35 (24.83) μmol/min; p < 0.0001), HDL-C (RR= 45 (37); RQ = 62 (39); QQ = 69 (29) mg/dL; p < 0.001) and apo A-I (RR = 140.76 ± 36.39; RQ = 147.62 ± 36.92; QQ = 147.49 ± 36.65 mg/dL; p = 0.019). Stepwise regression analysis revealed that heterozygous and p.192Q carriers influenced by 58% PON1 activity towards paraoxon. The univariate linear regression analysis demonstrated that p.Q192R SNP was not associated with mean cIMT; as a result, in the multiple regression analysis, no variables were selected with 5% significance. In logistic regression analysis, the studied parameters were not associated with the presence of carotid plaques. In low-risk individuals, the presence of the p.192Q variant of PON1 is associated with a beneficial plasma lipid profile but not with carotid atherosclerosis.
Fu, Yong-Bi; Peterson, Gregory W; Dong, Yibo
2016-04-07
Genotyping-by-sequencing (GBS) has emerged as a useful genomic approach for exploring genome-wide genetic variation. However, GBS commonly samples a genome unevenly and can generate a substantial amount of missing data. These technical features would limit the power of various GBS-based genetic and genomic analyses. Here we present software called IgCoverage for in silico evaluation of genomic coverage through GBS with an individual or pair of restriction enzymes on one sequenced genome, and report a new set of 21 restriction enzyme combinations that can be applied to enhance GBS applications. These enzyme combinations were developed through an application of IgCoverage on 22 plant, animal, and fungus species with sequenced genomes, and some of them were empirically evaluated with different runs of Illumina MiSeq sequencing in 12 plant species. The in silico analysis of 22 organisms revealed up to eight times more genome coverage for the new combinations consisted of pairing four- or five-cutter restriction enzymes than the commonly used enzyme combination PstI + MspI. The empirical evaluation of the new enzyme combination (HinfI + HpyCH4IV) in 12 plant species showed 1.7-6 times more genome coverage than PstI + MspI, and 2.3 times more genome coverage in dicots than monocots. Also, the SNP genotyping in 12 Arabidopsis and 12 rice plants revealed that HinfI + HpyCH4IV generated 7 and 1.3 times more SNPs (with 0-16.7% missing observations) than PstI + MspI, respectively. These findings demonstrate that these novel enzyme combinations can be utilized to increase genome sampling and improve SNP genotyping in various GBS applications. Copyright © 2016 Fu et al.
Irvin, Marguerite R; Sitlani, Colleen M; Noordam, Raymond; Avery, Christie L; Bis, Joshua C; Floyd, James S; Li, Jin; Limdi, Nita A; Srinivasasainagendra, Vinodh; Stewart, James; de Mutsert, Renée; Mook-Kanamori, Dennis O; Lipovich, Leonard; Kleinbrink, Erica L; Smith, Albert; Bartz, Traci M; Whitsel, Eric A; Uitterlinden, Andre G; Wiggins, Kerri L; Wilson, James G; Zhi, Degui; Stricker, Bruno H; Rotter, Jerome I; Arnett, Donna K; Psaty, Bruce M; Lange, Leslie A
2018-06-01
We evaluated interactions of SNP-by-ACE-I/ARB and SNP-by-TD on serum potassium (K+) among users of antihypertensive treatments (anti-HTN). Our study included seven European-ancestry (EA) (N = 4835) and four African-ancestry (AA) cohorts (N = 2016). We performed race-stratified, fixed-effect, inverse-variance-weighted meta-analyses of 2.5 million SNP-by-drug interaction estimates; race-combined meta-analysis; and trans-ethnic fine-mapping. Among EAs, we identified 11 significant SNPs (P < 5 × 10 -8 ) for SNP-ACE-I/ARB interactions on serum K+ that were located between NR2F1-AS1 and ARRDC3-AS1 on chromosome 5 (top SNP rs6878413 P = 1.7 × 10 -8 ; ratio of serum K+ in ACE-I/ARB exposed compared to unexposed is 1.0476, 1.0280, 1.0088 for the TT, AT, and AA genotypes, respectively). Trans-ethnic fine mapping identified the same group of SNPs on chromosome 5 as genome-wide significant for the ACE-I/ARB analysis. In conclusion, SNP-by-ACE-I /ARB interaction analyses uncovered loci that, if replicated, could have future implications for the prevention of arrhythmias due to anti-HTN treatment-related hyperkalemia. Before these loci can be identified as clinically relevant, future validation studies of equal or greater size in comparison to our discovery effort are needed.
Genome-wide association studies for diabetic macular edema and proliferative diabetic retinopathy.
Graham, Patricia S; Kaidonis, Georgia; Abhary, Sotoodeh; Gillies, Mark C; Daniell, Mark; Essex, Rohan W; Chang, John H; Lake, Stewart R; Pal, Bishwanath; Jenkins, Alicia J; Hewitt, Alex W; Lamoureux, Ecosse L; Hykin, Philip G; Petrovsky, Nikolai; Brown, Matthew A; Craig, Jamie E; Burdon, Kathryn P
2018-05-08
Diabetic macular edema (DME) and proliferative diabetic retinopathy (PDR) are sight-threatening complications of diabetes mellitus and leading causes of adult-onset blindness worldwide. Genetic risk factors for diabetic retinopathy (DR) have been described previously, but have been difficult to replicate between studies, which have often used composite phenotypes and been conducted in different populations. This study aims to identify genetic risk factors for DME and PDR as separate complications in Australians of European descent with type 2 diabetes. Caucasian Australians with type 2 diabetes were evaluated in a genome-wide association study (GWAS) to compare 270 DME cases and 176 PDR cases with 435 non-retinopathy controls. All participants were genotyped by SNP array and after data cleaning, cases were compared to controls using logistic regression adjusting for relevant covariates. The top ranked SNP for DME was rs1990145 (p = 4.10 × 10 - 6 , OR = 2.02 95%CI [1.50, 2.72]) on chromosome 2. The top-ranked SNP for PDR was rs918519 (p = 3.87 × 10 - 6 , OR = 0.35 95%CI [0.22, 0.54]) on chromosome 5. A trend towards association was also detected at two SNPs reported in the only other reported GWAS of DR in Caucasians; rs12267418 near MALRD1 (p = 0.008) in the DME cohort and rs16999051 in the diabetes gene PCSK2 (p = 0.007) in the PDR cohort. This study has identified loci of interest for DME and PDR, two common ocular complications of diabetes. These findings require replication in other Caucasian cohorts with type 2 diabetes and larger cohorts will be required to identify genetic loci with statistical confidence. There is considerable overlap in the patient cohorts with each retinopathy subtype, complicating the search for genes that contribute to PDR and DME biology.
Rose, Amy E.; Poliseno, Laura; Wang, Jinhua; Clark, Michael; Pearlman, Alexander; Wang, Guimin; Vega y Saenz de Miera, Eleazar C.; Medicherla, Ratna; Christos, Paul J.; Shapiro, Richard; Pavlick, Anna; Darvishian, Farbod; Zavadil, Jiri; Polsky, David; Hernando, Eva; Ostrer, Harry; Osman, Iman
2011-01-01
Superficial spreading melanoma (SSM) and nodular melanoma (NM) are believed to represent sequential phases of linear progression from radial to vertical growth. Several lines of clinical, pathological and epidemiologic evidence suggest, however, that SSM and NM might be the result of independent pathways of tumor development. We utilized an integrative genomic approach that combines single nucleotide polymorphism array (SNP 6.0, Affymetrix) with gene expression array (U133A 2.0, Affymetrix) to examine molecular differences between SSM and NM. Pathway analysis of the most differentially expressed genes between SSM and NM (N=114) revealed significant differences related to metabolic processes. We identified 8 genes (DIS3, FGFR1OP, G3BP2, GALNT7, MTAP, SEC23IP, USO1, ZNF668) in which NM/SSM-specific copy number alterations correlated with differential gene expression (P<0.05, Spearman’s rank). SSM-specific genomic deletions in G3BP2, MTAP, and SEC23IP were independently verified in two external data sets. Forced overexpression of metabolism-related gene methylthioadenosine phosphorylase (MTAP) in SSM resulted in reduced cell growth. The differential expression of another metabolic related gene, aldehyde dehydrogenase 7A1 (ALDH7A1), was validated at the protein level using tissue microarrays of human melanoma. In addition, we show that the decreased ALDH7A1 expression in SSM may be the result of epigenetic modifications. Our data reveal recurrent genomic deletions in SSM not present in NM, which challenge the linear model of melanoma progression. Furthermore, our data suggest a role for altered regulation of metabolism-related genes as a possible cause of the different clinical behavior of SSM and NM. PMID:21343389
Gene expression levels as endophenotypes in genome-wide association studies of Alzheimer disease
Zou, F.; Carrasquillo, M. M.; Pankratz, V. S.; Belbin, O.; Morgan, K.; Allen, M.; Wilcox, S. L.; Ma, L.; Walker, L. P.; Kouri, N.; Burgess, J. D.; Younkin, L. H.; Younkin, Samuel G.; Younkin, C. S.; Bisceglio, G. D.; Crook, J. E.; Dickson, D. W.; Petersen, R. C.; Graff-Radford, N.; Younkin, Steven G.; Ertekin-Taner, N.
2010-01-01
Background: Late-onset Alzheimer disease (LOAD) is a common disorder with a substantial genetic component. We postulate that many disease susceptibility variants act by altering gene expression levels. Methods: We measured messenger RNA (mRNA) expression levels of 12 LOAD candidate genes in the cerebella of 200 subjects with LOAD. Using the genotypes from our LOAD genome-wide association study for the cis-single nucleotide polymorphisms (SNPs) (n = 619) of these 12 LOAD candidate genes, we tested for associations with expression levels as endophenotypes. The strongest expression cis-SNP was tested for AD association in 7 independent case-control series (2,280 AD and 2,396 controls). Results: We identified 3 SNPs that associated significantly with IDE (insulin degrading enzyme) expression levels. A single copy of the minor allele for each significant SNP was associated with ∼twofold higher IDE expression levels. The most significant SNP, rs7910977, is 4.2 kb beyond the 3′ end of IDE. The association observed with this SNP was significant even at the genome-wide level (p = 2.7 × 10−8). Furthermore, the minor allele of rs7910977 associated significantly (p = 0.0046) with reduced LOAD risk (OR = 0.81 with a 95% CI of 0.70-0.94), as expected biologically from its association with elevated IDE expression. Conclusions: These results provide strong evidence that IDE is a late-onset Alzheimer disease (LOAD) gene with variants that modify risk of LOAD by influencing IDE expression. They also suggest that the use of expression levels as endophenotypes in genome-wide association studies may provide a powerful approach for the identification of disease susceptibility alleles. GLOSSARY AD = Alzheimer disease; CI = confidence interval; GWAS = genome-wide association study; LOAD = late-onset Alzheimer disease; mRNA = messenger RNA; OR = odds ratio; SNP = single nucleotide polymorphism. PMID:20142614
NIH CIDR Program Studies For whole exome sequencing projects, we pretest all samples using a high -density SNP array (>200,000 markers). For custom targeted sequencing, we pretest all samples using a 96 pretest samples using a 96 SNP GoldenGate assay. This extensive pretesting allows us to unambiguously tie
Sequence-Based Genotyping for Marker Discovery and Co-Dominant Scoring in Germplasm and Populations
Truong, Hoa T.; Ramos, A. Marcos; Yalcin, Feyruz; de Ruiter, Marjo; van der Poel, Hein J. A.; Huvenaars, Koen H. J.; Hogers, René C. J.; van Enckevort, Leonora. J. G.; Janssen, Antoine; van Orsouw, Nathalie J.; van Eijk, Michiel J. T.
2012-01-01
Conventional marker-based genotyping platforms are widely available, but not without their limitations. In this context, we developed Sequence-Based Genotyping (SBG), a technology for simultaneous marker discovery and co-dominant scoring, using next-generation sequencing. SBG offers users several advantages including a generic sample preparation method, a highly robust genome complexity reduction strategy to facilitate de novo marker discovery across entire genomes, and a uniform bioinformatics workflow strategy to achieve genotyping goals tailored to individual species, regardless of the availability of a reference sequence. The most distinguishing features of this technology are the ability to genotype any population structure, regardless whether parental data is included, and the ability to co-dominantly score SNP markers segregating in populations. To demonstrate the capabilities of SBG, we performed marker discovery and genotyping in Arabidopsis thaliana and lettuce, two plant species of diverse genetic complexity and backgrounds. Initially we obtained 1,409 SNPs for arabidopsis, and 5,583 SNPs for lettuce. Further filtering of the SNP dataset produced over 1,000 high quality SNP markers for each species. We obtained a genotyping rate of 201.2 genotypes/SNP and 58.3 genotypes/SNP for arabidopsis (n = 222 samples) and lettuce (n = 87 samples), respectively. Linkage mapping using these SNPs resulted in stable map configurations. We have therefore shown that the SBG approach presented provides users with the utmost flexibility in garnering high quality markers that can be directly used for genotyping and downstream applications. Until advances and costs will allow for routine whole-genome sequencing of populations, we expect that sequence-based genotyping technologies such as SBG will be essential for genotyping of model and non-model genomes alike. PMID:22662172
SiNoPsis: Single Nucleotide Polymorphisms selection and promoter profiling.
Boloc, Daniel; Rodríguez, Natalia; Gassó, Patricia; Abril, Josep F; Bernardo, Miquel; Lafuente, Amalia; Mas, Sergi
2017-09-14
The selection of a Single Nucleotide Polymorphism (SNP) using bibliographic methods can be a very time-consuming task. Moreover, a SNP selected in this way may not be easily visualized in its genomic context by a standard user hoping to correlate it with other valuable information. Here we propose a web form built on top of Circos that can assist SNP-centred screening, based on their location in the genome and the regulatory modules they can disrupt. Its use may allow researchers to prioritize SNPs in genotyping and disease studies. SiNoPsis is bundled as a web portal. It focuses on the different structures involved in the genomic expression of a gene, especially those found in the core promoter upstream region. These structures include transcription factor binding sites (for promoter and enhancer signals), histones, and promoter flanking regions. Additionally, the tool provides eQTL and linkage disequilibrium (LD) properties for a given SNP query, yielding further clues about other indirectly associated SNPs. Possible disruptions of the aforementioned structures affecting gene transcription are reported using multiple resource databases. SiNoPsis has a simple user-friendly interface, which allows single queries by gene symbol, genomic coordinates, Ensembl gene identifiers, RefSeq transcript identifiers and SNPs. It is the only portal providing useful SNP selection based on regulatory modules and LD with functional variants in both textual and graphic modes (by properly defining the arguments and parameters needed to run Circos). SiNoPsis is freely available at https://compgen.bio.ub.edu/SiNoPsis /. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Pyne, Robert; Honig, Josh; Vaiciunas, Jennifer; Koroch, Adolfina; Wyenandt, Christian; Bonos, Stacy; Simon, James
2017-01-01
Limited understanding of sweet basil (Ocimum basilicum L.) genetics and genome structure has reduced efficiency of breeding strategies. This is evidenced by the rapid, worldwide dissemination of basil downy mildew (Peronospora belbahrii) in the absence of resistant cultivars. In an effort to improve available genetic resources, expressed sequence tag simple sequence repeat (EST-SSR) and single nucleotide polymorphism (SNP) markers were developed and used to genotype the MRI x SB22 F2 mapping population, which segregates for response to downy mildew. SNP markers were generated from genomic sequences derived from double digestion restriction site associated DNA sequencing (ddRADseq). Disomic segregation was observed in both SNP and EST-SSR markers providing evidence of an O. basilicum allotetraploid genome structure and allowing for subsequent analysis of the mapping population as a diploid intercross. A dense linkage map was constructed using 42 EST-SSR and 1,847 SNP markers spanning 3,030.9 cM. Multiple quantitative trait loci (QTL) model (MQM) analysis identified three QTL that explained 37-55% of phenotypic variance associated with downy mildew response across three environments. A single major QTL, dm11.1 explained 21-28% of phenotypic variance and demonstrated dominant gene action. Two minor QTL dm9.1 and dm14.1 explained 5-16% and 4-18% of phenotypic variance, respectively. Evidence is provided for an additive effect between the two minor QTL and the major QTL dm11.1 increasing downy mildew susceptibility. Results indicate that ddRADseq-facilitated SNP and SSR marker genotyping is an effective approach for mapping the sweet basil genome.
Honig, Josh; Vaiciunas, Jennifer; Koroch, Adolfina; Wyenandt, Christian; Bonos, Stacy; Simon, James
2017-01-01
Limited understanding of sweet basil (Ocimum basilicum L.) genetics and genome structure has reduced efficiency of breeding strategies. This is evidenced by the rapid, worldwide dissemination of basil downy mildew (Peronospora belbahrii) in the absence of resistant cultivars. In an effort to improve available genetic resources, expressed sequence tag simple sequence repeat (EST-SSR) and single nucleotide polymorphism (SNP) markers were developed and used to genotype the MRI x SB22 F2 mapping population, which segregates for response to downy mildew. SNP markers were generated from genomic sequences derived from double digestion restriction site associated DNA sequencing (ddRADseq). Disomic segregation was observed in both SNP and EST-SSR markers providing evidence of an O. basilicum allotetraploid genome structure and allowing for subsequent analysis of the mapping population as a diploid intercross. A dense linkage map was constructed using 42 EST-SSR and 1,847 SNP markers spanning 3,030.9 cM. Multiple quantitative trait loci (QTL) model (MQM) analysis identified three QTL that explained 37–55% of phenotypic variance associated with downy mildew response across three environments. A single major QTL, dm11.1 explained 21–28% of phenotypic variance and demonstrated dominant gene action. Two minor QTL dm9.1 and dm14.1 explained 5–16% and 4–18% of phenotypic variance, respectively. Evidence is provided for an additive effect between the two minor QTL and the major QTL dm11.1 increasing downy mildew susceptibility. Results indicate that ddRADseq-facilitated SNP and SSR marker genotyping is an effective approach for mapping the sweet basil genome. PMID:28922359
Initiation Application Schedule Service Information and Pricing Services Sample Requirements Pricing SNP Genotyping General Information Genome Wide Association Custom FFPE Sample Options Methylation Linkage Consortium Developed Mouse Whole Genome Sequencing General Information Whole Genome Whole Exome Custom
Goldstone, Robert J.; McLuckie, Joyce; Smith, David G. E.
2015-01-01
Typing of Mycobacterium avium subspecies paratuberculosis strains presents a challenge, since they are genetically monomorphic and traditional molecular techniques have limited discriminatory power. The recent advances and availability of whole-genome sequencing have extended possibilities for the characterization of Mycobacterium avium subspecies paratuberculosis, and whole-genome sequencing can provide a phylogenetic context to facilitate global epidemiology studies. In this study, we developed a single nucleotide polymorphism (SNP) assay based on PCR and restriction enzyme digestion or sequencing of the amplified product. The SNP analysis was performed using genome sequence data from 133 Mycobacterium avium subspecies paratuberculosis isolates with different genotypes from 8 different host species and 17 distinct geographic regions around the world. A total of 28,402 SNPs were identified among all of the isolates. The minimum number of SNPs required to distinguish between all of the 133 genomes was 93 and between only the type C isolates was 41. To reduce the number of SNPs and PCRs required, we adopted an approach based on sequential detection of SNPs and a decision tree. By the analysis of 14 SNPs Mycobacterium avium subspecies paratuberculosis isolates can be characterized within 14 phylogenetic groups with a higher discriminatory power than mycobacterial interspersed repetitive unit–variable number tandem repeat assay and other typing methods. Continuous updating of genome sequences is needed in order to better characterize new phylogenetic groups and SNP profiles. The novel SNP assay is a discriminative, simple, reproducible method and requires only basic laboratory equipment for the large-scale global typing of Mycobacterium avium subspecies paratuberculosis isolates. PMID:26677250
The diploid genome sequence of an Asian individual
Wang, Jun; Wang, Wei; Li, Ruiqiang; Li, Yingrui; Tian, Geng; Goodman, Laurie; Fan, Wei; Zhang, Junqing; Li, Jun; Zhang, Juanbin; Guo, Yiran; Feng, Binxiao; Li, Heng; Lu, Yao; Fang, Xiaodong; Liang, Huiqing; Du, Zhenglin; Li, Dong; Zhao, Yiqing; Hu, Yujie; Yang, Zhenzhen; Zheng, Hancheng; Hellmann, Ines; Inouye, Michael; Pool, John; Yi, Xin; Zhao, Jing; Duan, Jinjie; Zhou, Yan; Qin, Junjie; Ma, Lijia; Li, Guoqing; Yang, Zhentao; Zhang, Guojie; Yang, Bin; Yu, Chang; Liang, Fang; Li, Wenjie; Li, Shaochuan; Li, Dawei; Ni, Peixiang; Ruan, Jue; Li, Qibin; Zhu, Hongmei; Liu, Dongyuan; Lu, Zhike; Li, Ning; Guo, Guangwu; Zhang, Jianguo; Ye, Jia; Fang, Lin; Hao, Qin; Chen, Quan; Liang, Yu; Su, Yeyang; san, A.; Ping, Cuo; Yang, Shuang; Chen, Fang; Li, Li; Zhou, Ke; Zheng, Hongkun; Ren, Yuanyuan; Yang, Ling; Gao, Yang; Yang, Guohua; Li, Zhuo; Feng, Xiaoli; Kristiansen, Karsten; Wong, Gane Ka-Shu; Nielsen, Rasmus; Durbin, Richard; Bolund, Lars; Zhang, Xiuqing; Li, Songgang; Yang, Huanming; Wang, Jian
2009-01-01
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics. PMID:18987735
Babushok, Daria V.; Xie, Hongbo M.; Roth, Jacquelyn J.; Perdigones, Nieves; Olson, Timothy S.; Cockroft, Joshua D.; Gai, Xiaowu; Perin, Juan C.; Li, Yimei; Paessler, Michele E.; Hakonarson, Hakon; Podsakoff, Gregory M.; Mason, Philip J.; Biegel, Jaclyn A.; Bessler, Monica
2013-01-01
Summary The bone marrow failure syndromes (BMFS) are a heterogeneous group of rare blood disorders characterized by inadequate haematopoiesis, clonal evolution, and increased risk of leukaemia. Single nucleotide polymorphism arrays (SNP-A) have been proposed as a tool for surveillance of clonal evolution in BMFS. To better understand the natural history of BMFS and to assess the clinical utility of SNP-A in these disorders, we analysed 124 SNP-A from a comprehensively characterized cohort of 91 patients at our BMFS centre. SNP-A were correlated with medical histories, haematopathology, cytogenetic and molecular data. To assess clonal evolution, longitudinal analysis of SNP-A was performed in 25 patients. We found that acquired copy number-neutral loss of heterozygosity (CN-LOH) was significantly more frequent in acquired aplastic anaemia (aAA) than in other BMFS (odds ratio 12.2, p<0.01). Homozygosity by descent was most common in congenital BMFS, frequently unmasking autosomal recessive mutations. Copy number variants (CNVs) were frequently polymorphic, and we identified CNVs enriched in neutropenia and aAA. Our results suggest that acquired CN-LOH is a general phenomenon in aAA that is probably mechanistically and prognostically distinct from typical CN-LOH of myeloid malignancies. Our analysis of clinical utility of SNP-A shows the highest yield of detecting new clonal haematopoiesis at diagnosis and at relapse. PMID:24116929
Babushok, Daria V; Xie, Hongbo M; Roth, Jacquelyn J; Perdigones, Nieves; Olson, Timothy S; Cockroft, Joshua D; Gai, Xiaowu; Perin, Juan C; Li, Yimei; Paessler, Michele E; Hakonarson, Hakon; Podsakoff, Gregory M; Mason, Philip J; Biegel, Jaclyn A; Bessler, Monica
2014-01-01
The bone marrow failure syndromes (BMFS) are a heterogeneous group of rare blood disorders characterized by inadequate haematopoiesis, clonal evolution, and increased risk of leukaemia. Single nucleotide polymorphism arrays (SNP-A) have been proposed as a tool for surveillance of clonal evolution in BMFS. To better understand the natural history of BMFS and to assess the clinical utility of SNP-A in these disorders, we analysed 124 SNP-A from a comprehensively characterized cohort of 91 patients at our BMFS centre. SNP-A were correlated with medical histories, haematopathology, cytogenetic and molecular data. To assess clonal evolution, longitudinal analysis of SNP-A was performed in 25 patients. We found that acquired copy number-neutral loss of heterozygosity (CN-LOH) was significantly more frequent in acquired aplastic anaemia (aAA) than in other BMFS (odds ratio 12·2, P < 0·01). Homozygosity by descent was most common in congenital BMFS, frequently unmasking autosomal recessive mutations. Copy number variants (CNVs) were frequently polymorphic, and we identified CNVs enriched in neutropenia and aAA. Our results suggest that acquired CN-LOH is a general phenomenon in aAA that is probably mechanistically and prognostically distinct from typical CN-LOH of myeloid malignancies. Our analysis of clinical utility of SNP-A shows the highest yield of detecting new clonal haematopoiesis at diagnosis and at relapse. © 2013 John Wiley & Sons Ltd.
Improving mapping and SNP-calling performance in multiplexed targeted next-generation sequencing
2012-01-01
Background Compared to classical genotyping, targeted next-generation sequencing (tNGS) can be custom-designed to interrogate entire genomic regions of interest, in order to detect novel as well as known variants. To bring down the per-sample cost, one approach is to pool barcoded NGS libraries before sample enrichment. Still, we lack a complete understanding of how this multiplexed tNGS approach and the varying performance of the ever-evolving analytical tools can affect the quality of variant discovery. Therefore, we evaluated the impact of different software tools and analytical approaches on the discovery of single nucleotide polymorphisms (SNPs) in multiplexed tNGS data. To generate our own test model, we combined a sequence capture method with NGS in three experimental stages of increasing complexity (E. coli genes, multiplexed E. coli, and multiplexed HapMap BRCA1/2 regions). Results We successfully enriched barcoded NGS libraries instead of genomic DNA, achieving reproducible coverage profiles (Pearson correlation coefficients of up to 0.99) across multiplexed samples, with <10% strand bias. However, the SNP calling quality was substantially affected by the choice of tools and mapping strategy. With the aim of reducing computational requirements, we compared conventional whole-genome mapping and SNP-calling with a new faster approach: target-region mapping with subsequent ‘read-backmapping’ to the whole genome to reduce the false detection rate. Consequently, we developed a combined mapping pipeline, which includes standard tools (BWA, SAMtools, etc.), and tested it on public HiSeq2000 exome data from the 1000 Genomes Project. Our pipeline saved 12 hours of run time per Hiseq2000 exome sample and detected ~5% more SNPs than the conventional whole genome approach. This suggests that more potential novel SNPs may be discovered using both approaches than with just the conventional approach. Conclusions We recommend applying our general ‘two-step’ mapping approach for more efficient SNP discovery in tNGS. Our study has also shown the benefit of computing inter-sample SNP-concordances and inspecting read alignments in order to attain more confident results. PMID:22913592
Maurice-Van Eijndhoven, M H T; Bovenhuis, H; Veerkamp, R F; Calus, M P L
2015-09-01
The aim of this study was to identify if genomic variations associated with fatty acid (FA) composition are similar between the Holstein-Friesian (HF) and native dual-purpose breeds used in the Dutch dairy industry. Phenotypic and genotypic information were available for the breeds Meuse-Rhine-Yssel (MRY), Dutch Friesian (DF), Groningen White Headed (GWH), and HF. First, the reliability of genomic breeding values of the native Dutch dual-purpose cattle breeds MRY, DF, and GWH was evaluated using single nucleotide polymorphism (SNP) effects estimated in HF, including all SNP or subsets with stronger associations in HF. Second, the genomic variation of the regions associated with FA composition in HF (regions on Bos taurus autosome 5, 14, and 26), were studied in the different breeds. Finally, similarities in genotype and allele frequencies between MRY, DF, GWH, and HF breeds were assessed for specific regions associated with FA composition. On average across the traits, the highest reliabilities of genomic prediction were estimated for GWH (0.158) and DF (0.116) when the 8 to 22 SNP with the strongest association in HF were included. With the same set of SNP, GEBV for MRY were the least reliable (0.022). This indicates that on average only 2 (MRY) to 16% (GWH) of the genomic variation in HF is shared with the native Dutch dual-purpose breeds. The comparison of predicted variances of different regions associated with milk and milk fat composition showed that breeds clearly differed in genomic variation within these regions. Finally, the correlations of allele frequencies between breeds across the 8 to 22 SNP with the strongest association in HF were around 0.8 between the Dutch native dual-purpose breeds, whereas the correlations between the native breeds and HF were clearly lower and around 0.5. There was no consistent relationship between the reliabilities of genomic prediction for a specific breed and the correlation between the allele frequencies of this breed and HF. In conclusion, most of the genomic variation associated with FA composition in the Dutch dual-purpose breeds appears to be breed-specific. Furthermore, the minor allele frequencies of genes having an effect on the milk FA composition in HF were shown to be much smaller in the breeds MRY, DF, and GWH, especially for the MRY breed. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Brief Overview of a Decade of Genome-Wide Association Studies on Primary Hypertension.
Azam, Afifah Binti; Azizan, Elena Aisha Binti
2018-01-01
Primary hypertension is widely believed to be a complex polygenic disorder with the manifestation influenced by the interactions of genomic and environmental factors making identification of susceptibility genes a major challenge. With major advancement in high-throughput genotyping technology, genome-wide association study (GWAS) has become a powerful tool for researchers studying genetically complex diseases. GWASs work through revealing links between DNA sequence variation and a disease or trait with biomedical importance. The human genome is a very long DNA sequence which consists of billions of nucleotides arranged in a unique way. A single base-pair change in the DNA sequence is known as a single nucleotide polymorphism (SNP). With the help of modern genotyping techniques such as chip-based genotyping arrays, thousands of SNPs can be genotyped easily. Large-scale GWASs, in which more than half a million of common SNPs are genotyped and analyzed for disease association in hundreds of thousands of cases and controls, have been broadly successful in identifying SNPs associated with heart diseases, diabetes, autoimmune diseases, and psychiatric disorders. It is however still debatable whether GWAS is the best approach for hypertension. The following is a brief overview on the outcomes of a decade of GWASs on primary hypertension.
2014-01-01
Background Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. Results We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. Conclusions The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel. PMID:24669946
Kai, Wataru; Nomura, Kazuharu; Fujiwara, Atushi; Nakamura, Yoji; Yasuike, Motoshige; Ojima, Nobuhiko; Masaoka, Tetsuji; Ozaki, Akiyuki; Kazeto, Yukinori; Gen, Koichiro; Nagao, Jiro; Tanaka, Hideki; Kobayashi, Takanori; Ototake, Mitsuru
2014-03-26
Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel.
Xu, Qing; Mei, Gui; Sun, Dongxiao; Zhang, Qin; Zhang, Yuan; Yin, Cengceng; Chen, Huiyong; Ding, Xiangdong; Liu, Jianfeng
2012-11-02
We previously localized a quantitative trait locus (QTL) on bovine chromosome 6 affecting milk production traits to a 1.5-Mb region between BMS483 and MNB-209 via genome scanning followed by fine mapping. Totally 15 genes were mapped within such linkage region through bioinformatic analysis of the cattle-human comparative map and bovine genome assembly. Of them, the UDP-glucose dehydrogenase (UGDH) was suggested as a potential positional candidate gene for milk production traits based on its corresponding physiological and biochemical functions and genetic effects. By sequencing all the coding exons and the untranslated regions in UGDH with pooled DNA of 8 sires represented the separated families detected in our previous studies, a total of ten SNPs were identified and genotyped in 1417 Holstein cows of 8 separation families. Individual SNP-based association analysis revealed 4 significant associations of SNP Ex1-1, SNP Int3-1, SNP Int5-1, and SNP Ex12-3 with milk yield (P < 0.05), and 2 significant associations of SNP Ex1-1 and SNP Ex12-3 with protein yield (P < 0.05). Furthermore, our haplotype-based association analyses indicated that haplotypes G-C-C, formed by SNP Ex12-2-SNP Int11-1-SNP Ex11-1, T-G, formed by SNP Int9-3-SNP Int9-2, and C-C, formed by SNP Int5-1-SNP Int3-1, are significantly associated with protein percentage (F=4.15; P=0.0418) and fat percentage (F=5.18~7.25; P=0.0072~0.0231). Finally, by using an in vitro expression assay, we demonstrated that the A allele of SNP Ex1-1 and T allele of SNP Ex11-1of UGDH significantly decreases the expression of UGDH by 68.0% at the RNA, and 50.1% at the protein level, suggesting that SNP Ex1-1 and Ex11-1 represent two functional polymorphisms affecting expression of UGDH and may partly contributed to the observed association of the gene with milk production traits in our samples. Taken together, our findings strongly indicate that UGDH gene could be involved in genetic variation underlying the QTL for milk production traits.